Machine Learning Engineer - Visual Agents - Special Projects

Apple Inc

Cupertino, California, USA Posted 13 days ago

$126,800 - $220,900/year

Role Details

The Special Projects team at Apple is developing novel experiences powered by state-of-the-art agentic vision-language models that incorporate visual context into conversational interaction. We are looking for a Machine Learning Engineer to help us build, fine-tune, and rigorously evaluate these systems. A successful candidate has hands-on experience with vision-language models, knows how to translate ambiguous product requirements into measurable evaluation criteria, and is excited to work at the intersection of multimodal modeling and agentic AI. Build and evaluate vision-language agents that perceive real-world scenes and incorporate that context into conversational models Curate, annotate, and build multimodal datasets to support model training and evaluation Develop automated evaluation pipelines including LLM-as-judge frameworks, human evaluation protocols, and domain-specific benchmarks Fine-tune Large Language Models (LLMs) and Visual-Language Models (VLMs) to improve performance for specific use cases Work closely with other ML Researchers to define evaluation criteria and methodology to systematically evaluate foundation models Design controlled experiments to measure model capabilities, identify failure modes, and drive iterative model improvements Conduct robust statistical analysis to identify model deficiencies and failure modes and performance gaps. BA or Master’s degree in Computer Science or Machine Learning 2+ years of hands-on experience building and evaluating generative AI or multimodal models Experience working with vision-language models or multimodal systems Proficiency in Python and ML frameworks (Pytorch or Tensorflow) PhD in Computer Science, Machine Learning, Statistics, or other STEM field Prior industry internship or research experience applying ML to product use cases Experience with video understanding, temporal reasoning, or activity recognition Familiarity with agentic system design including tool use, grounding, or perceive-act loops Experience building or working with large-scale multimodal data and annotation pipelines Proficiency in training, fine-tuning, and evaluation of foundation models and frameworks Publications or technical presentations in Machine Learning journals or conferences Excellent communication skills and cross functional collaboration

For more details click Job Post.

About Apple Inc

Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software