I know that it can be done :). It’s my direct field of research (localization and mapping of autonomous robots with a focus on building 3D model from camera images e.g NeRF related methods )what i was trying to say is that you cannot have high safety using just cameras. But I think we agree there :)
I’ll be curious to know how they handle environment with a clear lack of depth information (highway roads), how they optimized the processing power (estimating depth is one thing but building a continuous 3D model is different), and the image blur when moving at high speed :). Sensor fusion between visual slam and LiDAR is not complex (since the LiDAR provide what you estimate with your neural occupancy grid anyway, what you get is a more accurate measurement) so on the technological side they don’t really gain much, mainly a gain for the cost.
My guess is that they probably still do a lot of feature detection (lines and stuff) in the background and a lot of what you experience when you drive is improvement in depth estimation and feature detection on rgb images? But maybe not I’ll be really interested to read about it more :). Do you have the research paper that the Tesla algo relies on?
Just to be clear, i have no doubt it works :). I have used similar system for mobile robots and I don’t see why it would not. But I’m also worried they it will lull people in a false sense of safety while the driver should stay alert.
Don’t have the paper, my info comes mainly from various interviews with people involved in the thing. Elon of course, Andrej Karpathy is the other (he was in charge of their AI program for some time).
They apparently used to use feature detection and object recognition in RGB images, then gave up on that (as generating coherent RGB images just adds latency and object recognition was too inflexible) and they’re now just going by raw photon count data from the sensor fed directly into the neural nets that generate the 3d model. Once trained this apparently can do some insane stuff like pull edge data out from below the noise floor.
This may be of interest– This is also from 2 years ago, before Tesla switched to occupancy networks everywhere. I’d say that’s a pretty good equivalent of a LiDAR scan…
I Googled it to see because I thought they maybe were using event cameras then but no, they use 10bit instead of classic 8bit but they are not litterally counting photons (which would not be useful). It’s interesting that it improved the precision and recall of their « object detection model ». Guess the image is of better quality then.
The link from 2 years ago is not particularly impressive: https://arxiv.org/abs/1406.2283 this is an equal valent paper I think from 2014
Not sure the exact details- I heard they were sampling 10 bits per pixel but a bunch of their release notes talked about photon count detection back when they switched to that system.
Given that the HW3 cameras started being used to just generate RGB images, I suspect the current iteration is working by just pulling RAW format frames and interpreting them as a photon count grid, from there detecting edges and geometry with the occupancy network.
I’ve not seen much of anything published by Tesla on the subject. I suspect most of their research they are keeping hush hush to get a leg up on the competition. They share everything regarding EV tech because they want to push the industry in that direction, but I think they see FSD as their secret sauce that they might sell hardware kits but not let others too far under the hood.
It’s an interesting discussion thanks!
I know that it can be done :). It’s my direct field of research (localization and mapping of autonomous robots with a focus on building 3D model from camera images e.g NeRF related methods )what i was trying to say is that you cannot have high safety using just cameras. But I think we agree there :)
I’ll be curious to know how they handle environment with a clear lack of depth information (highway roads), how they optimized the processing power (estimating depth is one thing but building a continuous 3D model is different), and the image blur when moving at high speed :). Sensor fusion between visual slam and LiDAR is not complex (since the LiDAR provide what you estimate with your neural occupancy grid anyway, what you get is a more accurate measurement) so on the technological side they don’t really gain much, mainly a gain for the cost.
My guess is that they probably still do a lot of feature detection (lines and stuff) in the background and a lot of what you experience when you drive is improvement in depth estimation and feature detection on rgb images? But maybe not I’ll be really interested to read about it more :). Do you have the research paper that the Tesla algo relies on?
Just to be clear, i have no doubt it works :). I have used similar system for mobile robots and I don’t see why it would not. But I’m also worried they it will lull people in a false sense of safety while the driver should stay alert.
Don’t have the paper, my info comes mainly from various interviews with people involved in the thing. Elon of course, Andrej Karpathy is the other (he was in charge of their AI program for some time).
They apparently used to use feature detection and object recognition in RGB images, then gave up on that (as generating coherent RGB images just adds latency and object recognition was too inflexible) and they’re now just going by raw photon count data from the sensor fed directly into the neural nets that generate the 3d model. Once trained this apparently can do some insane stuff like pull edge data out from below the noise floor.
This may be of interest– This is also from 2 years ago, before Tesla switched to occupancy networks everywhere. I’d say that’s a pretty good equivalent of a LiDAR scan…
I Googled it to see because I thought they maybe were using event cameras then but no, they use 10bit instead of classic 8bit but they are not litterally counting photons (which would not be useful). It’s interesting that it improved the precision and recall of their « object detection model ». Guess the image is of better quality then.
The link from 2 years ago is not particularly impressive: https://arxiv.org/abs/1406.2283 this is an equal valent paper I think from 2014
Not sure the exact details- I heard they were sampling 10 bits per pixel but a bunch of their release notes talked about photon count detection back when they switched to that system.
Given that the HW3 cameras started being used to just generate RGB images, I suspect the current iteration is working by just pulling RAW format frames and interpreting them as a photon count grid, from there detecting edges and geometry with the occupancy network.
I’ve not seen much of anything published by Tesla on the subject. I suspect most of their research they are keeping hush hush to get a leg up on the competition. They share everything regarding EV tech because they want to push the industry in that direction, but I think they see FSD as their secret sauce that they might sell hardware kits but not let others too far under the hood.
I think you are absolutely correct for the interpretation of the photon count :)