ICRA 2023 - Scalable Autonomous Driving 워크샵 (Day 5)

(적는 중)

Toyota

Foundation models == Zero-shot generalization

Pre-trained models are great out of the box. We should leverage geometry more. Mathematical formulation in geometry (e.g. epipolar geometry) are essentially the best pre-trained models!

Geometric foundation models? (== Data + Principles)

 

Fisher Yu - calable 4D Scene Understanding with Less Hassle

  • Scaling with Less hassle
    • Learn and benefit from large-scale data
    • Avoid manual designs / manual labeling
  • Representation Learning in 4D world

    • Learning robust 2D tracking
    • Learning high-fidelity instalce-level representation
    • Leanring 3D motion
  • Current tracking algorithms

    • Object detection -> Motion estimation -> Initial Assocation (Location, Appearance) -> Association Optimization
    • Pipeline is very complicated and also is fragile.
    • Do humans go through all these steps?
      • Humans only need to do ‘Object detection -> Appearance association’