Selected ships across two decades — vision platforms, foundation models, generative media, and the products they powered.
| Frontier — MSL | ||
| 2025 | ◆ | Muse SparkMeta MSL First public ship from Meta SuperIntelligence Labs — multimodal generation for everyone, debuting the new lab’s point of view on creative AI. |
| 2025 | ● | meta.aiMeta Meta’s consumer AI assistant — multimodal across web, mobile, and the family of apps, reaching billions of users. |
| Llama — GenAI | ||
| 2025 | ◆ | Llama 4Meta Natively multimodal frontier model family — mixture-of-experts at open-weights scale, the strongest Llama yet. |
| 2024 | ▲ | MovieGenMeta A cast of foundation models for high-quality video, audio and editing — one of the most ambitious generative-media programs of the year. |
| 2024 | ◆ | Llama 3.2 / Connect ’24Meta Vision-language Llama plus on-device variants — pushed open-weights multimodal to phones and edge. |
| 2024 | ◆ | Llama 3.1Meta 405B open-weights flagship — the first time an open model competed at the absolute frontier. |
| 2024 | ◆ | Llama 3Meta The 8B / 70B that re-anchored the open-weights ecosystem and powered Meta AI’s first global rollout. |
| 2023 | ◆ | Llama 2Meta Open-weights LLM that anchored an ecosystem — the model that made “open frontier” a real category. |
| 2023 | ▲ | EmuMeta Image foundation model behind Meta’s generative stack — the “photogenic needles in a haystack” recipe for quality at scale. |
| 2023 | ▲ | Emu Video / Emu EditMeta Text-to-video and instruction-based image editing on the Emu foundation — the research powering AI features across the family of apps. |
| 2023 | ● | Connect ’23Meta Meta AI assistant, AI Studio, generative imagery and Ray-Ban smart-glasses AI — first public surfaces of the GenAI org’s work. |
| Perception & Language — FAIR Accel | ||
| 2022 | ▲ | Make-A-VideoFAIR First text-to-video system at Meta — one of the earliest demonstrations of the modality, ahead of the public wave. |
| 2022 | ▲ | Make-A-SceneFAIR Text-and-sketch image generation — controllability as a first-class generative primitive, not an afterthought. |
| 2021 | ▲ | Ego4DFAIR · consortium 3,000+ hours of first-person video and a benchmark suite — opened up egocentric perception as a research field. |
| Computer Vision & AR — AML, FAIAR | ||
| 2020 | ▲ | Self-supervised Overhead ImageryarXiv Self-supervised pretraining on satellite imagery — learning useful representations without map labels, for downstream geospatial tasks. |
| 2019 | ▲ | Road ConnectivityCVPR ’19 Joint orientation & segmentation for road extraction from satellite imagery — the mapping work behind Daylight. |
| 2019 | ▲ | Billion-scale SSLFAIR / AML Semi-supervised learning at billion-image scale — demonstrating data scale as the lever for vision pretraining. |
| 2018 | ▲ | R(2+1)DCVPR ’18 Spatiotemporal convolutions for action recognition — the standard video-CNN factorization for years after. |
| 2018 | ▲ | Detect-and-TrackCVPR ’18 Efficient pose estimation in videos — tracking and detection as a single learned operator. |
| 2018 | ▲ | What Makes a Video a VideoCVPR ’18 Probing what temporal information video models actually use — an honest look at when motion really matters. |
| 2018 | ▲ | WSL — Hashtag pretrainingFacebook Weakly-supervised pretraining on 3.5B hashtagged images — SoTA ImageNet at the time, all from real-world signal. |
| 2018 | ◆ | Fashion / Product UnderstandingFacebook Visual product understanding powering shopping experiences — fine-grained recognition deployed at platform scale. |
| 2017 | ● | LumosFacebook The platform that put visual understanding behind hundreds of products at Facebook scale. |
| 2016 | ● | Caffe2Go — mobile modelsFacebook On-device deep learning for mobile — the runtime that made real-time AI features practical on phones. |
| 2016 | ◆ | Automatic Alt TextFacebook First-of-its-kind accessibility feature — AI-generated descriptions of every image, for screen readers, at platform scale. |
| Computer Vision — FAIR | ||
| 2015 | ▲ | Metric Learning with Adaptive Density DiscriminationICLR ’16 A density-aware approach to deep metric learning — better embeddings without explicit class labels. |
| 2014 | ▲ | C3D — Learning Spatiotemporal FeaturesICCV ’15 3D ConvNets for video — the foundational architecture for early deep video understanding. |
| 2014 | ▲ | PANDACVPR ’14 Pose-aligned networks for deep attribute modeling — one of the first deep approaches to fine-grained person attributes. |
| Earlier | ||
| 2012 | ▲ | 3D Reconstruction — Google MapsGoogle · intern Aerial-imagery 3D reconstruction contributing to Google Earth city-scale models — SF City Hall among them. |
| 2011 | ▲ | GTSAMGeorgia Tech Factor-graph SLAM library — contributions during PhD research with Frank Dellaert’s group. |