Staff AI Research Engineer, Video Understanding and Vision Language Models Job in Rivian and Volkswagen Group Technologies

Staff Ai Research Engineer, Video Understanding And Vision Language Models

Toronto, ON, CA, Canada

Rivian and Volkswagen Group Technologies

Apply Now

Job Description

About Us:

Rivian and Volkswagen Group Technologies is a joint venture between two industry leaders with a clear vision for automotive's next chapter. From operating systems to zonal controllers to cloud and connectivity solutions, we're addressing the challenges of electric vehicles through technology that will set the standards for software-defined vehicles around the world.

The road to the future is uncharted. By combining our expertise across connectivity, AI, security and more, we'll map a new way forward. Working together, we'll create a future that's more connected, more intelligent, more sustainable for everyone.
Role Summary:

Join our dynamic team in the automotive industry as an AI Research Engineer specializing in Video Understanding and Vision Language Models (VLM). Contribute to developing cutting-edge AI solutions for complex video analysis and multimodal interactions. In this role, you will focus on building and optimizing advanced algorithms for video understanding, VLM development, scene analysis, and real-time processing. Your work will enhance vehicle safety, improve user experiences through rich multimodal interactions, and enable innovative in-vehicle applications that leverage both visual and language data.
Responsibilities:
Develop and implement advanced video understanding algorithms, including action recognition, video captioning, and video question answering, integrating VLM approaches. Design, train, and fine-tune Vision Language Models (VLMs) using large-scale multimodal datasets, particularly automotive and video-centric data. Deploy and optimize VLM and video processing models for edge devices and real-time automotive applications, ensuring efficient performance and latency. Collaborate with cross-functional teams to integrate video understanding and VLM solutions into in-vehicle systems and infotainment platforms for enhanced user interaction. Conduct cutting-edge research to stay up-to-date with the latest advancements in video understanding, VLM technologies, and multimodal learning. Create scalable, maintainable pipelines for multimodal data processing, VLM training, and deployment. Document workflows, algorithms, and results for internal and external stakeholders, with a focus on VLM and video analysis techniques.

Qualifications:
MS or PhD in Computer Science, Electrical Engineering, or a related field with a focus on Computer Vision, Natural Language Processing, or Multimodal Learning. Strong programming skills in Python, with extensive experience in deep learning frameworks like TensorFlow or PyTorch, and specifically with VLM libraries and frameworks. Proficiency in computer vision techniques, including CNNs, Transformers, video modeling, and multimodal architectures. Hands-on experience with large-scale video data and VLM tasks such as video captioning, video question answering, and multimodal retrieval. Familiarity with image and video processing libraries (e.g., OpenCV, scikit-image) and natural language processing libraries (e.g., Hugging Face Transformers). Experience with hardware accelerators (e.g., GPUs, NPUs) for training and deploying complex VLM and video models.

Preferred Qualifications

Authored or co-authored publications in top-tier computer vision, natural language processing, or AI conferences/journals (e.g., CVPR, ICCV, NeurIPS, ICML, ACL). Deep knowledge of spatiotemporal modeling, video understanding techniques, and advanced VLM architectures. Familiarity with MLOps practices, including model deployment pipelines, CI/CD for VLM, and multimodal data management. Background in optimizing large-scale models for resource-constrained and edge environments, specifically focusing on VLM and video processing. Extensive experience with annotation tools and synthetic data generation for multimodal training datasets, including video and text data.

Company Statements:
#

Equal Opportunity

Rivian and Volkswagen Group Technologies is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, sex, sexual orientation, gender, gender expression, gender identity, genetic information or characteristics, physical or mental disability, marital/domestic partner status, age, military/veteran status, medical condition, or any other characteristic protected by law. We are also committed to ensuring compliance with all applicable fair employment practice laws regarding citizenship and immigration status.

Rivian and Volkswagen Group Technologies is committed to ensuring that our hiring process is accessible for persons with disabilities. If you have a disability or limitation, such as those covered by the Americans with Disabilities Act, that requires accommodations to assist you in the search and application process, please email us at candidateaccommodations@rivian.com.#

Candidate Data Privacy

Rivian and VW Group Technologies ("Rivian and Volkswagen Group Technologies") may collect, use and disclose your personal information or personal data (within the meaning of the applicable data protection laws) when you apply for employment and/or participate in our recruitment processes ("Candidate Personal Data"). This data includes contact, demographic, communications, educational, professional, employment, social media/website, network/device, recruiting system usage/interaction, security and preference information. Rivian and Volkswagen Group Technologies may use your Candidate Personal Data for the purposes of (i) tracking interactions with our recruiting system; (ii) carrying out, analyzing and improving our application and recruitment process, including assessing you and your application and conducting employment, background and reference checks; (iii) establishing an employment relationship or entering into an employment contract with you; (iv) complying with our legal, regulatory and corporate governance obligations; (v) recordkeeping; (vi) ensuring network and information security and preventing fraud; and (vii) as otherwise required or permitted by applicable law.

Rivian and Volkswagen Group Technologies may share your Candidate Personal Data with (i) internal personnel who have a need to know such information in order to perform their duties, including individuals on our People Team, Finance, Legal, and the team(s) with the position(s) for which you are applying; (ii) Rivian and Volkswagen Group Technologies affiliates; and (iii) Rivian and Volkswagen Group Technologies' service providers, including providers of background checks, staffing services, and cloud services.

Rivian and Volkswagen Group Technologies may transfer or store internationally your Candidate Personal Data, including to or in the United States, Canada, and the European Union and in the cloud, and this data may be subject to the laws and accessible to the courts, law enforcement and national security authorities of such jurisdictions.

Please see our Candidate Data Privacy Notice (English) and Candidate Data Privacy Notice (Serbian) for more information.