By decomposing the scene into 3D and individual objects, better depth and ego-motion in the scene is learned, especially on very dynamic scenes.
We tested this method on both KITTI and Cityscapes urban driving datasets, and found that it outperforms state-of-the-art approaches, and is approaching in quality methods which used stereo pair videos as training supervision. Importantly, we are able to recover correctly the depth of a car moving at the same speed as the ego-motion vehicle. This has been challenging previously — in this case, the moving vehicle appears (in a monocular input) as static, exhibiting the same behavior as the static horizon, resulting in an inferred infinite depth. While stereo inputs can solve that ambiguity, our approach is the first one that is able to correctly infer that from a monocular input.
I saw this earlier but you beat me to posting it I would love to do some work with it. Heres a link to a youtube video using tensor flow and mono depth. Check out his channel I think he has a couple others and links to the code. Great Stuff!
That’s awesome! Are you an engineer?
I went through the Udacity program and I am very passionate about the field but have not made the career change yet, I feel I still have a lot to learn and a lot I want to explore.
That’s great! I wish I had your technical ability/inclination. I tried taking a computer science class at McGill and I just couldn’t pay attention. I feel like I am much more of a theoretician than an engineer.