When I was a kid, I used to play a game with my dad on road trips. I would point out the most far-off vehicle on the freeway that I could make out and my dad would try to identify it as far away as he could. My father has a nearly encyclopedic knowledge of cars, particularly older ones, and as a kid I was always amazed by his ability to identify the make, model, and year from surprising distances and be rattling off engine displacement options by the time we passed it (he also liked to drive fast back then).
I've been thinking about this every time someone repeats the trope that Elon Musk and Andrej Karpathy made into a slide at Tesla's "Autonomy Day" presentation earlier this week: "PEOPLE DRIVE VISION ONLY." The argument, which Tesla uses to defend its vision-only approach to autonomous driving, is that humans can drive using input from just two optical sensors, so why do autonomous cars need expensive sensors like lidar?
This is a surprising argument to hear from an autonomous vehicle developer, given that the human eye is not only technically more capable than any camera (at least any costing less than $35 million), it's also miraculously embedded into a human brain that is a the holy grail of bioengineering. Of course Tesla has to say that it can provide full autonomy almost entirely with vision (with a little help from a forward radar unit designed for adaptive cruise control), since it's already taken customer deposits for precisely that. What's odd is how many people are willing to believe that neural net technology and a processor that performs 25% better than a top-end computer graphics card (Tesla claims 144 trillion deep learning operations per second (TOPS) compared to 110 TOPS from a $3,000 NVIDIA Titan V) can match the miracle of evolution that is the human vision and cognition system.
Watching videos of Tesla's image recognition system posted to Youtube by Autopilot hacker whiz greentheonly, I immediately noticed that if it were to play the game my dad and I played as a kid it would be miserable at it. The system only has a few categories with which to label vehicles—car, minivan, light truck, truck, bus—but it struggles to be certain about what it's seeing. Its labels jump from category to category and come and go, as vehicles move from one camera's field of vision to another. Sometimes it even categorizes the same vehicle two different ways depending on which camera it's seeing it from, as in the image above [Ed: in many of the images in this post, the system also accurately classified the misidentified vehicles].
To be perfectly clear, categorizing vehicles is not necessarily a core safety function of the system or even an important one. As such, we are not commenting on the safety of Autopilot or the quality of its image recognition system per se. But there's also no denying one obvious fact about Autopilot: it wants to think that almost everything on the road is a minivan.
One of the issues appears to be that Autopilot is trained with a dataset that classifies minivans but not crossovers. Given the large volume of crossovers on the road, and their visual similarity to minivans, it's not surprising that the yellow minivan label pops up a lot in Tesla's image recognition system. This is pretty hilarious, given that people tend to buy crossovers specifically to avoid the minivan stigma that Autopilot then slaps their BMW X3 or whatever with. Still, it would be a lot less funny if Tesla simply changed its "minivan" category label to "crossover," considering the latter vehicle class doesn't yet have the same potential for humor.
Besides, don't a lot of new cars kind of look like minivans anyway? It's certainly hard to blame Autopilot for labeling this Hyundai Elantra wagon as a minivan since it its shape is extremely crossover or MPV-like. But wait... what is the alleged "minivan" at the far right hand side of this frame?
Yes, that is a Jeep Wrangler that Autopilot classified as both a minivan and a car. I mean, sure, the owner probably uses it more like a car or a minivan than anything else, but couldn't you at least throw it a bone and classify it as a light truck? The owner did not shell out good money and endure FCA build quality to be referred to as a car, let alone a minivan.
It seemed for a while like Autopilot was differentiating between cars and "minivans" based on sides, but when I saw it classify a Mini Cooper as a minivan I knew that couldn't be right. Maybe it's proportions? Possible anything with a "two box" design instead of a "three box" sedan shape?
Well, that theory was fun while it lasted. Here's a sweet single-cab pickup truck that Autopilot thinks is a minivan. It's about as far from a two-box shape as it gets, and this is a pretty clear shot of it. OK, so it's not the size or the shape... maybe it is proportions after all?
OK, that is just obviously a sedan and not a minivan. It's right next to the camera, the trunk is right there, the proportions aren't CUV-like at all. What the heck?
OK, that's it... we give up. If Autopilot thinks a Ford Mustang is a "minivan" we're not even going to keep trying to figure out what kind of logic it's using. One thing is clear: as far as Tesla's image recognition is concerned, literally every car on the road could possibly be a minivan. Or maybe there's a minivan inside every car on the road, and only Autopilot can see it?
We will puzzle over the troubling results of this extremely important investigation over the weekend, but in the meantime we hope this starts to illustrate how incredible humans are at what automated driving systems are trying to do. The kind of classification that Autopilot is trying to do here is so trivially easy for humans that we don't even consciously acknowledge that this is a minivan and that is a crossover, let alone that this is a Mustang and that is a BMW X3. Unless of course you're an enormous car nerd like my dad and I, and that's what you do for fun while you're driving.