Llm Infrastructure Data Scientist

Windsor, ON, CA, Canada

Job Description

About the Role



We are seeking a highly motivated

LLM Infrastructure Data Scientist

to help us build, scale, and optimize the infrastructure powering our large language model (LLM) development and deployment. You will work at the intersection of data science, systems engineering, and machine learning, with a focus on improving the efficiency, reliability, and performance of large-scale model training and inference.

Key Responsibilities



Analyze and optimize end-to-end training and inference pipelines for LLMs, focusing on performance, cost, and scalability. Monitor and diagnose model infrastructure issues across GPUs/TPUs, distributed training systems, and data pipelines. Design and implement robust telemetry, logging, and metrics collection systems for LLM workloads. Work closely with ML engineers, research scientists, and infrastructure teams to drive data-informed decisions. Develop dashboards and automated reports to track key metrics such as utilization, throughput, model convergence, and error rates. Run A/B experiments on infrastructure changes and evaluate their impact using rigorous statistical methods. Contribute to the design and testing of new systems for model versioning, reproducibility, and model governance.

Minimum Qualifications



Bachelor's or Master's degree in Computer Science, Statistics, Applied Mathematics, or a related field. 3+ years of experience in data science, ML infrastructure, or related technical roles. Strong skills in Python and data analysis libraries (e.g., Pandas, NumPy, Scikit-learn). Experience with distributed computing frameworks (e.g., Ray, Spark, or Dask). Familiarity with ML model training workflows, especially deep learning and LLMs (e.g., using PyTorch, TensorFlow, or JAX). Experience with visualization tools (e.g., Plotly, Dash, Grafana, or Tableau). Knowledge of cloud infrastructure (AWS, GCP, Azure) and containerized environments (Docker, Kubernetes).
Job Type: Full-time

Pay: $40,000.00-$60,000.00 per year

Experience:

* Machine learning: 1 year (preferred)

Beware of fraud agents! do not pay money to get a job

MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD2525975
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Windsor, ON, CA, Canada
  • Education
    Not mentioned