Mustafa Shukor
Founding Research Scientist (UMA). CS PhD (Sorbonne).
Prev: FAIR (Meta), LeRobot (Hugging Face), MLR (Apple).
Updates
- 2025-12: Introducing VL-JEPA, a vision-language model trained in the latent space.
- 2025-12: We are launching UMA, a company building general-purpose humanoids.
- 2025-11: VLAb is out! Our codebase for VLAs pretraining, including SmolVLA.
- 2025-09: l3m is released. Our codebase for large-scale pretraining of AIMv2, CLIP, Native VLMs and LLMs.
- 2025-09: 2 papers accepted at NeurIPS 2025: Scaling laws for optimal data mixtures and Learning to Steer.
- 2025-07: We released our work on Scaling laws for optimal data mixtures.
- 2025-07: Our work on Scaling laws for native multimodal models got an Oral (~top 2%) at ICCV 2025.
- 2025-07: 2 papers accepted at ICCV 2025: Scaling laws for NMMs and Multimodal Steering.
- 2025-06: We released SmolVLA an efficient foundation model for robotics: paper, code, and blogpost.
- 2025-02: AIMv2 is accepted as a Spotlight paper (~top 5%) at CVPR 2025.
- 2024-11: We released AIMv2, a SoTA large-scale vision encoder.
- 2024-09:
Implicit Multimodal Alignment,
CoX-LMM,
DiffCut and
Skipping Computations, are accepted at NeurIPS 2024.
- 2024-04: Beyond task performance is accepted at ICLR 2024.
- 2023-12: UnIVAL: Unified Model for Image, Video, Audio and Language Tasks, is accepted at TMLR 2023: paper and code.
- 2023-09: Rewarded soups is accepted at NeurIPS 2023.