You Should Be Worried About Tesla’s Trove of Private Vehicle Data
Experts believe that anonymized data could still be pieced back together like a high-tech puzzle.
Connected vehicles aren't coming, they're already here. Many new vehicles on the road have some form of cellular connectivity, though the capabilities are often limited to basic telematics or remote start functions. That's starting to change as automakers are adding more advanced functionality in the name of convenience—and data harvesting.
Tesla is one of the many automakers that rely on fleet-sourced data to improve its products. This means collecting a trove of information from every vehicle on the road, including location data and video clips, to help build out its autonomous vehicle software that CEO Elon Musk continues to promise will flourish into a full-fledged self-driving solution soon (-ish). Behind the scenes, the cogs are churning through petabytes of data that contains waypoints and other information about drivers' everyday lives, something which privacy experts say should be a major concern.
Connected vehicles generate a significant amount of information useful for manufacturers. Some, like Tesla, use this information to train future fleet vehicles by comparing a vehicle's expected behavior to a human's actual actions. It then compiles this data (which may contain information generated by sensors, cameras, and potentially location information) in a package to send back to Tesla's servers. It's unclear how much data Tesla vehicles ultimately generate, though it likely isn't a small amount. Waymo vehicles, for example, generated more than 750 megabytes of data every second back in 2013. While compression and data efficiency has improved in the past decade, the number of onboard sensors has increased, meaning that more data points may be taken into account.
Now, Tesla says that the data that is collected is anonymized. Analyzation of the data by researchers shows that trip logs, for example, are stripped of the sending vehicle's VIN and instead are assigned a random ID number. That random ID can still persist for weeks, meaning that all uploads from one particular vehicle could, in theory, be linked together with enough time and effort. In fact, some like John Verdi of the Future of Privacy Forum, believe that it is "extraordinarily difficult to de-identify" massive amounts of historic data once it is individualized with chronicled location information like a person's home, workplace, and habits.
Even when Tesla vehicles aren't operating in the partially automated Autopilot mode, they're operating in something called "Shadow Mode" that simulates the driving process as if Autopilot were engaged. When a deviation occurs between the car's expected behavior and the driver's actions, the vehicle creates a snapshot of the data—that includes location, camera data, vehicle speed, and other various sensor data—and pumps it back to the mothership for review. It's one of the many ways that Tesla is able to gain abundant data to train its neural network.
Tesla vehicles also perform this snapshotting action when a vehicle crashes. In fact, security researchers have been able to extract this captured data, including video footage of collisions, from vehicles sold at auction or parted out online after being purchased by a salvage company. The automaker has also historically released specific vehicle data to news outlets when it benefited the company to do so, like when a protester claimed her brakes didn't work in 2021 or after a deadly crash where a Model S crashed into a tree in Houston.
Turning off Tesla's access to your data is as easy as flipping a virtual switch. However, it comes at a price. By revoking your consent to upload your trip data, you also lose other connected convenience features that rely on internet connectivity like web browsing, internet radio, voice commands, and over-the-air updates. Drivers must decide: is it worth the trade?
Today, this problem is uniquely Tesla, though that could change at any moment. Musk said last week that he expects that all cars will eventually have some form of self-driving capability. Given that vehicles are connected now, more than ever, it may be fair to assume that all vehicles will also eventually have the ability to share data with their respective parent companies over the air—what might data privacy look like five or 10 years from now?
The short answer is: nobody knows. Innovation by automakers is outpacing legislation, meaning that consumers may not have protection for the data collected by automakers. The data could be sold, leased, or brokered for any number of reasons—after all, in-car software alone is worth billions of dollars, why not personally identifiable data? Regulating the scope of the data that automakers can collect, the retention period of the data, and precisely defining how (if at all) that data can be monetized is key to defining a consumer's expectation of privacy, but the law must catch up before that can begin to happen.
Got a tip or question for the author? Contact them directly: email@example.com