AI in reinforcement training is normally set to maximize its chances of winning, but that doesn’t necessarily make for strong training. The software may only learn to excel in a narrow set of conditions and leave itself open to exploits. Much like human experts, DeepMind’s new approach has one of the AIs focus on exploiting the other’s weaknesses. AlphaStar gradually learned to try a wider variety of strategies that could do more to counter unconventional, highly exploitative tactics (aka cheese)

The technology still has its limits. It needs much more training than a human to match a comparable level of skill, for a start. This is still no small feat given the complexity of StarCraft, and it bodes well for DeepMind’s long-term plans. As with the company’s earlier game research, the ultimate plan is to translate AlphaStar’s progress into real-world applications. A more robustly-trained AI could help self-driving cars and robots handle unusual situations they wouldn’t otherwise be prepared to handle.