Hi,👋 we have updated the app and fixed multiple bugs. We are lacking funds, request to free user not to use Adblock. Ads are non intrusive. 😊

@OfficialNathanY: What if your robot could under...

@OfficialNathanY
4 views May 03, 2026
Advertisement
1
What if your robot could understand any object you describe, just from a phone camera?

RADIO-ViPE builds a 3D map from raw monocular video that you can query with natural language.
(1/4)
2
How it works:
A foundation model (RADIO) extracts dense "meaning vectors" per pixel, then they reuse those same vectors three ways: improving optical flow on blank surfaces, adding a semantic loss into the geometry optimizer, and connecting similar keyframes in the factor graph.
3
Instead of bolting semantics after SLAM (like ConceptGraphs/HOV-SG), the semantic similarity error lives inside the bundle adjustment loss function. When the optimizer adjusts a camera pose, it’s satisfying geometric AND semantic consistency in the same gradient step.
4
For dynamic scenes (people walking, furniture getting moved), they track how stable each pixel’s semantic embedding is over time. If it is consistently similar across views, then we trust it. Otherwise, we suppress it.
5
(Day 8 of reading interesting VLA/World related model papers)

arxiv.org/pdf/2604.26067
6
Could World Models actually help with Autonomous self-driving Cars?

WorldVLM bridges a VLM (the brain) and a physics-aware world model (the body) in order to try and solve this problem.
Media image
Actions
Visual Editor Carousel Maker NEW
Update Thread
What You Can Do
  • Download as PDF
  • Save to Notion
  • Export as Markdown
  • Visual Editor
  • LinkedIn & Instagram Carousel Maker
Create Free Account

Includes 7-day Premium trial

Advertisement