Image:

Visual Intelligence Workshop on Foundation Models

The program will be available shortly. Please check back later.

Visual Intelligence Workshop on Foundation Models

Welcome to the Visual Intelligence Workshop on Foundation Models!

This workshop hosts leading researchers to examine how their theoretical foundations expose a paradox between statistical compression and semantic meaning, how emergent phenomena challenge conventional assumptions, and how evaluation practices continue to shape trustworthy foundation models.

The workshop is open for everyone. Well met!

Click here to add the event to your calendar

Session 1

Title: The Compression Paradox: Why AI and Humans See the World Differently

Presenter: Ravid Shwartz-Ziv, Assistant Professor and Faculty Fellow at NYU’s Center for Data Science

Abstract

Foundation models achieve superhuman performance, but do they understand like humans? Using information theory, we reveal a fundamental paradox: while AI optimizes for statistical compression, minimizing redundancy, humans maintain "inefficient" representations that preserve meaning.

Through studies on debate, concept formation, and chess, we show this divergence has consequences. Our multi-agent debate system wins 85% of rounds yet misses human reasoning. Chess models reach grandmaster level through memorization but fail when patterns change. Information-theoretic analysis reveals models capture broad categories but lack fine-grained conceptual understanding.

These findings suggest that foundation models optimize for the wrong objective, statistical efficiency over semantic comprehension. Progress toward human-aligned AI requires rethinking architectural principles to balance compression with meaning.

About the speaker

Ravid is an Assistant Professor and Faculty Fellow at NYU’s Center for Data Science, where I lead cutting-edge research in artificial intelligence, with a particular focus on Large Language Models (LLMs) and their applications. His research spans theoretical foundations and practical implementations, combining academic rigor with industry impact, particularly focusing on 1) Pioneering novel approaches for analyzing LLM representations and intermediate layer dynamics, 2) Developing efficient model adaptation and personalization techniques, 3) Advancing information-theoretic frameworks for understanding neural networks, and 4) Creating innovative benchmarking frameworks for evaluation of AI systems.

Session 2

Title: From Benchmarks to Self-Judgment: Evaluating Foundation and Agentic Models in the Age of Trust

Presenter: Srishti Gautam, Senior Researcher at Microsoft

Abstract

AI systems are growing up. Yesterday’s Foundation Models (FMs) mostly sat still, predicting the next token. Today’s Agentic Models (AMs) get up, explore, plan, and act, even in messy real-world tasks like spreadsheet automation or multi-step reasoning. This evolution has fundamentally changed how we evaluate them. Traditional static benchmarks, once sufficient for single-shot reasoning, no longer capture the complexity and adaptiveness of autonomous agents.

In response, evaluation methods have evolved too: from static leaderboards to LLM-as-a-Judge systems, and now to Agent-as-a-Judge frameworks, where autonomous systems execute and critique each other’s outputs. These agentic evaluators promise scale and adaptability, but also inherit the biases, blind spots, and trust issues of the models that power them. This talk explores how fairness, reliability, and trustworthiness must be redefined in this new evaluation landscape. We will discuss the risks and opportunities when our evaluators are as complex as the systems they assess. By the end, we will ask the critical question: can systems truly judge themselves fairly, and what does “trustworthy evaluation” mean in an age of increasingly autonomous AI?

About the speaker

Srishti is a Senior Researcher at Microsoft. She earned her PhD from UiT The Arctic University of Norway, where her research focused on explainable artificial intelligence, and related interdisciplinary applications. During her doctoral studies, she also served as a Visiting Researcher at Harvard University, contributing to advancements in fairness of large language models. In recognition of her work, she was named runner-up for the Best Nordic PhD Thesis Award at SCIA 2025.

Session 3

Title: In Search of Hidden Talents: Emergence in Foundation Model

Presenter: Oscar Skean, PhD Candidate in Computer Science at the University of Kentucky

Abstract

Foundation models have transformed modern machine learning by delivering strong performance across a wide range of tasks with minimal task-specific tuning. Trained on vast and diverse datasets, they learn general-purpose representations adaptable to domains from natural language processing to computer vision. These models often exhibit emergent phenomena, capabilities they were not explicitly trained for, that can be harnessed to improve performance and extend applicability without costly retraining.

In this talk, we explore strategies for uncovering and leveraging such hidden capabilities to enhance off-the-shelf foundation models with little computational overhead. We highlight prominent examples from the literature, including segmentation and depth estimation emerging in vision models trained without explicit supervision. We also present our recent work investigating intermediate layers in language models, which often provide richer and more robust representations than final-layer outputs, a finding that generalizes to vision models as well. Finally, we offer practical guidelines for discovering and exploiting these capabilities to efficiently unlock new levels of performance.

About the speaker

Oscar is a PhD candidate in Computer Science at the University of Kentucky, where his research lies at the intersection of representation learning and information theory. He develops information-theoretic methods to advance the understanding of neural network behavior across various modalities, including computer vision and large language models. He recently finished a Machine Learning Engineering internship at Stripe in San Francisco.

Join the workshop here