EgoIntrospect

A multimodal egocentric dataset and benchmark for understanding what users feel, want, and need to remember.

Scroll

Ego-Introspect Dataset

EgoIntrospect at a Glance

180 hours of multimodal egocentric recordings from 60 participants, linking daily-life observations with user-provided internal-state annotations.

180h

recordings

scenarios

internal-state
dimensions

participants

Labels Only the Wearer Can Provide

Ground truth comes from participants themselves.

Synchronized Multimodal Wearable Signals

Video, audio, gaze, motion, and physiology are aligned across devices.

Captured in User-Chosen Daily Scenarios

Long-form recordings follow natural routines rather than scripted tasks.

External Observer

What happened?

Objective facts anyone can see

Objects

tray, phone, mug
Actions

open, pour, walk
Scenes

kitchen, sidewalk, café
Speech / Audio

"can you grab…", music

One egocentric record pairs first-person observation with wearer-provided meaning across six daily-life scenarios.

Wearer-Provided Meaning

Subjective state only I can report

What did it mean to me?

Affective Experience

What did I feel? What moment mattered?

“I felt surprised because it was so cheap.”

Moment of Recording Emotion Analysis

Interactive Intent

What did I want the assistant to do? When should it help?

“I need help finding local food nearby.”

Contextual Request Proactive Recommendation

Cognitive Memory

What do I remember? What should be preserved?

“I want to remember this map for later.”

Memory Recall Prediction Memory Intent Modeling

1. In-Situ Marker

Participants mark meaningful moments during recording (voice or text).

"Hey Assistant, mark this."

2. Retrospective Review

Wearers review the moment with video, audio, gaze and other signals.

3. Structured Label

Annotators turn responses into structured affective, interactive, and cognitive tags.

Felt Wanted Remembered

These labels are the ground truth for benchmarks on user internal states.

Slide 03 — coming soon

Slide 04 — coming soon

Section 03

The data.

Placeholder for dataset stats, sample visualizations, and collection methodology.

Section 04

The benchmark.

Placeholder for tasks, metrics, and leaderboard preview.

Contributors & Citation

Zeyu Wang^1*, Chang Liu^1,8*, Eduardus Tjitrahardja¹, Yuntao Wang^1‡, Borislav Pavlov¹, Fangfei Gou¹, Jose Manuel Davila¹, Dai Shi², Ran Xu¹, Yue Pan³, Jiayi Tan³, Shuting Chang², Qi Wang², Jinzhao Li¹, Jiacheng Hua¹, Yifei Huang⁴, Jingwei Sun⁵, Yu Zhang⁵, Liuxin Zhang⁵, Guocai Yao⁶, Jia Jia¹, Yin Li⁷, Qianying Wang⁵, Yuanchun Shi¹, Miao Liu^1‡

Tsinghua University
Tongji University
Renmin University of China
The University of Tokyo
Lenovo Group
Peking University
University of Wisconsin–Madison
Shanghai Qi Zhi Institute

* Equal contribution
‡ Corresponding authors

If our work inspired you, please cite us:

@article{wang2026egointrospect,
  title={EgoIntrospect: An Egocentric Dataset and Benchmark for User-Centric Internal State Reasoning},
  author={Wang, Zeyu and Liu, Chang and Tjitrahardja, Eduardus and Wang, Yuntao and Pavlov, Borislav and Gou, Fangfei and Davila, Jose Manuel and Shi, Dai and Xu, Ran and Pan, Yue and others},
  journal={arXiv preprint arXiv:2605.17262},
  year={2026}
}