Visualize Thread by @OfficialNathanY

✨ Visual Editor

palette Canvas & Background

Presets

Custom Colors

Gradient:arrow_forward

Text Color:

Gradient Angle135°

Background Pattern

Grain Texture

Aspect Ratio

style Card Style

Preset

Padding40px

Card Radius16px

Enable Card Shadow

Glassmorphism Effect

Show Watermark AGENCY

Show Timestamps

Show X Logo

text_fields Typography

Font Family

Font Size16px

Nathan Yan

@OfficialNathanY

What if your robot could understand any object you describe, just from a phone camera?

RADIO-ViPE builds a 3D map from raw monocular video that you can query with natural language.
(1/4)

VIDEO

Nathan Yan

@OfficialNathanY

How it works:
A foundation model (RADIO) extracts dense "meaning vectors" per pixel, then they reuse those same vectors three ways: improving optical flow on blank surfaces, adding a semantic loss into the geometry optimizer, and connecting similar keyframes in the factor graph.

Nathan Yan

@OfficialNathanY

Instead of bolting semantics after SLAM (like ConceptGraphs/HOV-SG), the semantic similarity error lives inside the bundle adjustment loss function. When the optimizer adjusts a camera pose, it’s satisfying geometric AND semantic consistency in the same gradient step.

Nathan Yan

@OfficialNathanY

For dynamic scenes (people walking, furniture getting moved), they track how stable each pixel’s semantic embedding is over time. If it is consistently similar across views, then we trust it. Otherwise, we suppress it.

Nathan Yan

@OfficialNathanY

(Day 8 of reading interesting VLA/World related model papers)

arxiv.org/pdf/2604.26067

Nathan Yan

@OfficialNathanY

Could World Models actually help with Autonomous self-driving Cars?

WorldVLM bridges a VLM (the brain) and a physics-aware world model (the body) in order to try and solve this problem.

Generated by Thread Navigator

100%

view_carousel Carousel Studio NEW

Press ⌘ + S to quick-export