Arc Compute operates high-performance GPU clusters and is focused on improving efficiency, throughput, and reliability at scale. We're looking for an Senior Embedded Software Engineer to help build the software that makes our GPU infrastructure faster and more efficient.
What You'll Be Doing
Build and improve GPU performance telemetry using CUDA, DCGM and low-level profiling data.
Participate in exploring scheduling and optimization strategies to make multi-GPU workloads run more efficiently.
Performance optimization, analysis, and tuning of DL models in various domains like LLM, Multimodal and Generative AI.
Scale performance of DL models across different architectures and types of NVIDIA accelerators.
Collaborate with team members and other partners.
What We're Looking For
7+ years of work experience in software development, design patterns and software engineering principles.
3+ years of experience in CUDA development and GPU performance concepts.
Proven experience owning and architecting performance-critical systems or telemetry pipelines.
Mentorship experience or demonstrated ability to guide junior and mid-level engineers.
C/C++ programming and software design skills. Python experience is a plus.
Modeling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU.
Familiarity with Linux environments and debugging on real hardware.
Comfortable working onsite with GPU servers and real workloads.
Experience with Git.
Nice to Have
Experience deploying or operating systems in Kubernetes, Docker-based environments, or other job orchestration frameworks.
Understanding of AI model serving backends, ML runtimes, or AI compilers (e.g., TensorRT, TVM, XLA).
Experience with performance profiling tools such as Nsight Systems/Compute.
Experience leading performance investigations or driving cross-team initiatives. (added senior signal)
Job Type: Full-time
Application question(s):
Are you able to work onsite in Toronto with GPU servers and real workloads?
Which of the following have you used professionally? Please briefly describe your primary language.
? C
? C++
? Python
Have you worked in Linux environments and used Git for version control in production projects?
Have you owned or architected a GPU performance or telemetry system end-to-end?
Have you used CUDA-based profiling or telemetry tools (e.g., DCGM, Nsight Systems, Nsight Compute)?
Have you led investigations to identify and resolve GPU performance bottlenecks?
Have you optimized performance for LLMs, multimodal, or generative models? What techniques did you apply (e.g., kernel fusion, memory optimization, batching)?
Have you used Git in a collaborative, production codebase?
Have you deployed or operated systems using Docker or Kubernetes?
Have you led or driven cross-team performance initiatives?
Experience:
Software engineering: 7 years (required)
CUDA or GPU-focused development: 3 years (required)
Work Location: In person
Beware of fraud agents! do not pay money to get a job
MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.