EgoIntrospect

A multimodal egocentric dataset and benchmark for understanding what users feel, want, and need to remember.

Read Paper
Scroll
Ego-Introspect Dataset

EgoIntrospect at a Glance

180 hours of multimodal egocentric recordings from 60 participants, linking daily-life observations with user-provided internal-state annotations.

01

Labels Only the Wearer Can Provide

Ground truth comes from participants themselves.

02

Synchronized Multimodal Wearable Signals

Video, audio, gaze, motion, and physiology are aligned across devices.

03

Captured in User-Chosen Daily Scenarios

Long-form recordings follow natural routines rather than scripted tasks.

Labels Only the Wearer Can Provide

Egocentric video captures what happened. EgoIntrospect captures what the moment meant to the wearer.

External Observer

What happened?

Objective facts anyone can see
  • Objects
    tray, phone, mug
  • Actions
    open, pour, walk
  • Scenes
    kitchen, sidewalk, café
  • Speech / Audio
    "can you grab…", music

One egocentric record pairs first-person observation with wearer-provided meaning across six daily-life scenarios.

Wearer-Provided Meaning
Subjective state only I can report

What did it mean to me?

Affective Experience

What did I feel? What moment mattered?

“I felt surprised because it was so cheap.”

Moment of Recording Emotion Analysis

Interactive Intent

What did I want the assistant to do? When should it help?

“I need help finding local food nearby.”

Contextual Request Proactive Recommendation

Cognitive Memory

What do I remember? What should be preserved?

“I want to remember this map for later.”

Memory Recall Prediction Memory Intent Modeling

1. In-Situ Marker

Participants mark meaningful moments during recording (voice or text).

"Hey Assistant, mark this."

2. Retrospective Review

Wearers review the moment with video, audio, gaze and other signals.

3. Structured Label

Annotators turn responses into structured affective, interactive, and cognitive tags.

Felt Wanted Remembered

These labels are the ground truth for benchmarks on user internal states.

Slide 03 — coming soon
Slide 04 — coming soon
Section 03

The data.

Placeholder for dataset stats, sample visualizations, and collection methodology.

Section 04

The benchmark.

Placeholder for tasks, metrics, and leaderboard preview.

Contributors & Citation

Zeyu Wang1*, Chang Liu1,8*, Eduardus Tjitrahardja1, Yuntao Wang1‡, Borislav Pavlov1, Fangfei Gou1, Jose Manuel Davila1, Dai Shi2, Ran Xu1, Yue Pan3, Jiayi Tan3, Shuting Chang2, Qi Wang2, Jinzhao Li1, Jiacheng Hua1, Yifei Huang4, Jingwei Sun5, Yu Zhang5, Liuxin Zhang5, Guocai Yao6, Jia Jia1, Yin Li7, Qianying Wang5, Yuanchun Shi1, Miao Liu1‡