The next iteration of AI is the step of generating its own examples with which it builds the models to extract rules. That’s what AlphaGo Zero did in generating a million examples of different rounds of Go to improve its own play. That was achieved through reinforcement learning, which relies on “feedback — positive reinforcement for what’s right and penalties for what’s gone wrong,” Rogers said.
While that ability opens up great possibilities for systems to learn to answer the questions we want answered, the thing to remember is that the systems “are very much unitaskers,” Rogers said. AlphaGo Zero may be an unparalleled Go player, but playing Go is the only thing “the program is designed to do.” Through transfer learning, AI systems can shift to apply the same kind of deep learning to another domain. Still, they would not do so on their own; someone would have to set them up for that.
Read more in