Projects.

Selected ships across two decades — vision platforms, foundation models, generative media, and the products they powered.

◆ shipped ▲ research ● platform

Frontier — MSL
2025	◆	Muse SparkMeta MSL First public ship from Meta SuperIntelligence Labs — multimodal generation for everyone, debuting the new lab’s point of view on creative AI.
2025	●	meta.aiMeta Meta’s consumer AI assistant — multimodal across web, mobile, and the family of apps, reaching billions of users.
Llama — GenAI
2025	◆	Llama 4Meta Natively multimodal frontier model family — mixture-of-experts at open-weights scale, the strongest Llama yet.
2024	▲	MovieGenMeta A cast of foundation models for high-quality video, audio and editing — one of the most ambitious generative-media programs of the year.
2024	◆	Llama 3.2 / Connect ’24Meta Vision-language Llama plus on-device variants — pushed open-weights multimodal to phones and edge.
2024	◆	Llama 3.1Meta 405B open-weights flagship — the first time an open model competed at the absolute frontier.
2024	◆	Llama 3Meta The 8B / 70B that re-anchored the open-weights ecosystem and powered Meta AI’s first global rollout.
2023	◆	Llama 2Meta Open-weights LLM that anchored an ecosystem — the model that made “open frontier” a real category.
2023	▲	EmuMeta Image foundation model behind Meta’s generative stack — the “photogenic needles in a haystack” recipe for quality at scale.
2023	▲	Emu Video / Emu EditMeta Text-to-video and instruction-based image editing on the Emu foundation — the research powering AI features across the family of apps.
2023	●	Connect ’23Meta Meta AI assistant, AI Studio, generative imagery and Ray-Ban smart-glasses AI — first public surfaces of the GenAI org’s work.
Perception & Language — FAIR Accel
2022	▲	Make-A-VideoFAIR First text-to-video system at Meta — one of the earliest demonstrations of the modality, ahead of the public wave.
2022	▲	Make-A-SceneFAIR Text-and-sketch image generation — controllability as a first-class generative primitive, not an afterthought.
2021	▲	Ego4DFAIR · consortium 3,000+ hours of first-person video and a benchmark suite — opened up egocentric perception as a research field.
Computer Vision & AR — AML, FAIAR
2020	▲	Self-supervised Overhead ImageryarXiv Self-supervised pretraining on satellite imagery — learning useful representations without map labels, for downstream geospatial tasks.
2019	▲	Road ConnectivityCVPR ’19 Joint orientation & segmentation for road extraction from satellite imagery — the mapping work behind Daylight.
2019	▲	Billion-scale SSLFAIR / AML Semi-supervised learning at billion-image scale — demonstrating data scale as the lever for vision pretraining.
2018	▲	R(2+1)DCVPR ’18 Spatiotemporal convolutions for action recognition — the standard video-CNN factorization for years after.
2018	▲	Detect-and-TrackCVPR ’18 Efficient pose estimation in videos — tracking and detection as a single learned operator.
2018	▲	What Makes a Video a VideoCVPR ’18 Probing what temporal information video models actually use — an honest look at when motion really matters.
2018	▲	WSL — Hashtag pretrainingFacebook Weakly-supervised pretraining on 3.5B hashtagged images — SoTA ImageNet at the time, all from real-world signal.
2018	◆	Fashion / Product UnderstandingFacebook Visual product understanding powering shopping experiences — fine-grained recognition deployed at platform scale.
2017	●	LumosFacebook The platform that put visual understanding behind hundreds of products at Facebook scale.
2016	●	Caffe2Go — mobile modelsFacebook On-device deep learning for mobile — the runtime that made real-time AI features practical on phones.
2016	◆	Automatic Alt TextFacebook First-of-its-kind accessibility feature — AI-generated descriptions of every image, for screen readers, at platform scale.
Computer Vision — FAIR
2015	▲	Metric Learning with Adaptive Density DiscriminationICLR ’16 A density-aware approach to deep metric learning — better embeddings without explicit class labels.
2014	▲	C3D — Learning Spatiotemporal FeaturesICCV ’15 3D ConvNets for video — the foundational architecture for early deep video understanding.
2014	▲	PANDACVPR ’14 Pose-aligned networks for deep attribute modeling — one of the first deep approaches to fine-grained person attributes.
Earlier
2012	▲	3D Reconstruction — Google MapsGoogle · intern Aerial-imagery 3D reconstruction contributing to Google Earth city-scale models — SF City Hall among them.
2011	▲	GTSAMGeorgia Tech Factor-graph SLAM library — contributions during PhD research with Frank Dellaert’s group.

← Back to home

Projects.

Tweaks