What if your robot could understand any object you describe, just from a phone camera? RADIO-ViPE builds a 3D map from raw monocular video that you can query with natural language. (1/4) ...