Publications

Selected work on vision–language–action (VLA), multimodal foundation models, video understanding, and efficient learning systems. For the complete list (including co-authored and earlier work), see my Google Scholar.


Models & Technical Reports

  • π0.7 (Physical Intelligence, 2026) — Steerable generalist robot foundation model with emergent compositional capabilities across dexterous manipulation tasks and robot platforms.
    π0.7 Blog

  • Nova 2 (Amazon, 2025) — Multimodal reasoning and generation foundation models.
    Technical Report

  • Nova Multimodal Embedding (Amazon, 2025) — Multimodal embeddings for agentic RAG and semantic search across video, image, document, and audio.
    Technical Report

  • Nova 1 / Nova 1 Premier (Amazon, 2024–2025) — First-generation Nova multimodal foundation models.
    Nova 1 · Nova 1 Premier


Vision–Language Models


Multimodal Understanding


Learning Methods, Efficiency & Open Source