The race to build fully autonomous cars has gone into hyper-drive, with major car-makers such as GM, Daimler, BMW and Audi promising SAE Level 5 autonomous driving by sometime in 2021.

Goldman Sachs predicts that robo taxis will grow the ride-hailing and sharing business from $5 billion in revenue today to $285 billion by 2030. Autonomous driving will re-define mobility, and historic earning streams are sure to be toppled.

Even with all the road testing the car-makers are doing, the only way the car companies can meet their ambitious goals is by leveraging the power of analytics and artificial intelligence (AI) to learn on real-world roads and accelerate development using simulations. The auto-makers are using simulation techniques such as hardware-in-the-loop (HIL) and software-in-the-loop (SIL) to make this happen.

This reliance on AI and analytics means that data engineering, management and storage will become more important than ever. Here’s a checklist of what the car-makers have been learning during road tests and the steps required to manage the data that will make autonomous vehicles a reality:

  1. Meet the data challenge. Running the tests for autonomous cars generates a great deal of data, something like 6 to 8 gigabytes every second for each car. In 2017 alone, the industry created some 250 exabytes of data. Leading-edge concepts are needed to handle such volumes and gain value from it. The most effective approach bases its analysis on the value of the data. For this, deep knowledge of the automotive industry and the data are needed. Inherently, the data consists of a time series, which helps in accomplishing the data-value approach, which is essentially a way for auto-makers to analyse, then derive business value, from the data.
  2. Bridge the automotive research and development (R&D) and the computer and data science worlds. Auto manufacturers build cars and are less intimate with data science. Bridging the disciplines would expose them to innovative technologies developed in the data science world that could help contend with the data generated by R&D cars, such as globally distributed data lakes that store raw data until it is needed. While R&D departments have data science skills in focused teams, the car-makers often need assistance to leverage the capabilities of data science and AI throughout the lifetime of their data so they can meet their goals.
  3. Make the data usable and connect it. Autonomous driving technologies such as light detection and ranging (LiDAR), surround cameras and radar generate a lot of specialised data in formats such as the Automotive Data and Time-Triggered Framework (ADTF), ROSbag, and main distribution frame 4 (MDF4). Tools are now available that natively access these automotive formats in a distributed and parallel way on a petabyte scale. By using big data analytics, it’s possible to take a several hundred gigabyte ROSbag file that, in the past, would take several days to analyse, and get it in the hands of engineers in a matter of minutes or seconds. Fast access and analysis at the sensor level is desirable. But only the connection to the autonomous driving model based on the metadata of the topics/streams brings complete value. As a result, this data can be understood, shared and used.
  4. Reduce the data volume by being selective. Engineers can use AI to determine which elements of the data are valuable and which must be placed in cold, frozen For example, a typical test generates 30 frames of video per second, much of it taken in an open road situation where nothing much happens. The engineers don’t want hours of open road video in which the car was basically on cruise control. The data that’s more valuable gives them insights into how the autonomous vehicle behaved when it came to a crossing or had some interaction with the environment. The superfluous data then gets moved to cold storage supported by a volume topology architecture.
  5. Optimise autonomous data. If during a human-controlled drive the autonomous shadow driver makes decisions different from those of the human, these events should be recorded. Similarly, when semi-autonomous cars are corrected by human drivers, this must be noted so engineers can make a correction. The goal: to create a culture of continuous improvement where the autonomous cars reach the point where the general public will accept them. And once the industry reaches that break-point, the continuous improvement culture continues on indefinitely.

Given the pain-staking detail the data tracks, it’s understandable why developing a fully autonomous car will take at least 3 to 5 years. The carmakers still need to run several years of road tests and simulations before there’s a high degree of confidence that autonomous vehicles can replace traditional cars driven by humans.

As it has evolved, autonomous driving has become more of a computer and data science problem than a car manufacturing issue. This explains why tech companies such as Google’s Waymo are currently leading the way. However, if the car-makers act correctly and decisively, the tech groups may not maintain that leadership. Car-makers must team with leading computer and data science companies as close partners to catch up.

After all, the race for autonomous driving is on because the future of the car industry depends on it.