Artificial intelligence (AI). It’s the hot tech trend everyone has an opinion on — it’s the doom or hope of humanity, it’s the technology with “superpowers” that every company wants to have in its products and solutions offerings. Many products are labelled as AI for marketing reasons, which creates a lot of confusion about the term itself. To clarify matters, let’s ask a simple question: What is AI?

The question is simple, but the answer isn’t because there is no formal definition of intelligence. We might not be able to entirely define it, but most would agree that humans are intelligent. And in my opinion, it is our ability to find solutions for complex problems that makes us think we are intelligent. An average human with enough experience can develop strategies for challenging situations. Take a foosball game as an example. Even if you’ve never played it, once you understand what’s it all about you start playing and try to score a goal. As you gain experience, you’ll try out different strategies. When you play, complex processes are happening in your nervous system: learning from experience, anticipating the dynamics of the environment and optimising your behaviour to reach the final goal. If we agree that these are critical aspects of intelligence, whatever is labelled as AI should have the same characteristics.

Am I saying machines will need to think and act on their own to be considered AI? Well, yes, I actually am. And this is what deep reinforcement learning (DRL), a special field of machine learning (ML), might lead us to. DRL is a combination of two ML techniques — deep learning, well-known for its applications in various fields such as object recognition; and reinforcement learning, which formalises the idea of machines interacting with environments and learning from experience instead of from provided data.

Through DRL, machines interact with their environments by processing observations and taking actions without human guidance. In doing so, they generate data, which is then combined with a feedback signal informing the machine whether it’s performing well or not. Sticking to the foosball example — if the machine manages to score a goal, it receives positive feedback. The deep learning part helps the machine to identify patterns in the self-generated data and to adjust its behaviour towards receiving more positive feedback in the future.

DRL enables machines to find their own solution for complex tasks. The machine just needs access to the environment and simple feedback on its performance. That’s all — no need for hard-coded rules or expert data. Simply put: With DRL you don’t need to know how things can be done as long as you know the final goal.

The business benefits can be tremendous. Imagine any complex challenge, such as the full supply chain of an international company or a manufacturer’s production line. In such cases you want on-time delivery, low costs, no delays, high quality and low energy consumption, but it’s not easy to find the right way to achieve and balance all those goals. With DRL, machines can reach superhuman capabilities and support you in a way you’ve never imagined.

Then why don’t we see more DRL business applications? Partially, it’s because DRL training takes time. Machines need experiences to start learning on their own. In the training phase their behaviour is far from optimal, and no supplier/producer can afford such chaos in the system.

Simulations are a good way to overcome this problem. The machine can learn what it needs to do in a simulated environment much faster and without doing any harm to the real business. The simulations need to be very precise and close to the real world. For automotive, the simulations need to be like real traffic conditions; for supply chain, you would need to simulate how orders and deliveries work. The quality of the result depends highly on the quality of the simulations. The blueprint for a successful DRL application is therefore to take a real system, create precise simulations of it, let the machine learn in this environment and re-transfer the learned behaviour to the real system.

One example of how DRL can be used in real systems are the tests Bosch Rexroth and DXC Technology are performing through a half-automated foosball machine. The foosball machine’s goal is simple: It must learn to score goals to win against its opponent, and this is the only thing it gets rewarded for. In the process, the machine develops the strategy completely on its own, and in simulations, the machine learned to score goals after only 2 hours. After another 6 hours, it performed much better than any of the developers could.

The machines controlling the game are managed by standard automation solutions such as servo motors, drives and controls, which are used in various industry cases. Ultimately, the idea is to gain knowledge and to bring greater flexibility into production. Ideally, in the future, manufacturers won’t need to ask programmers to add functions. They will tell the machine what they want, and the system will do the rest.

The technology and the applications of DRL are exciting, but researchers are realistic, and it will take time to apply simulated cases to reality. There are plenty of challenges, and creating a realistic simulation environment is just one of them. But, if all things go right, DRL is a huge leap forward on the path towards real AI.