How smart is the form of artificial intelligence known as deep learning computer networks, and how closely do these machines mimic the human brain? They have improved greatly in recent years, but still have a long way to go, a team of UCLA cognitive psychologists reports in the journal PLOS Computational Biology.
Supporters have expressed enthusiasm for the use of these networks to do many individual tasks, and even jobs, traditionally performed by people. However, results of the five experiments in this study showed that it’s easy to fool the networks, and the networks’ method of identifying objects using computer vision differs substantially from human vision.
“The machines have severe limitations that we need to understand,” said Philip Kellman, a UCLA distinguished professor of psychology and a senior author of the study. “We’re saying, ‘Wait, not so fast.'”
Machine vision, he said, has drawbacks. In the first experiment, the psychologists showed one of the best deep learning networks, called VGG-19, color images of animals and objects. The images had been altered. For example, the surface of a golf ball was displayed on a teapot; zebra stripes were placed on a camel; and the pattern of a blue and red argyle sock was shown on an elephant. VGG-19 ranked its top choices and chose the correct item as its first choice for only five of 40 objects.
“We can fool these artificial systems pretty easily,” said co-author Hongjing Lu, a UCLA professor of psychology. “Their learning mechanisms are much less sophisticated than the human mind.”
VGG-19 thought there was a 0 percent chance that the elephant was an elephant and only a 0.41 percent chance the teapot was a teapot. Its first choice for the teapot was a golf ball, which shows that the artificial intelligence network looks at the texture of an object more so than its shape, said lead author Nicholas Baker, a UCLA psychology graduate student.
“It’s absolutely reasonable for the golf ball to come up, but alarming that the teapot doesn’t come up anywhere among the choices,” Kellman said. “It’s not picking up shape.”
Humans identify objects primarily from their shape, Kellman said. The researchers suspected the computer networks were using a different method.
In the second experiment, the psychologists showed images of glass figurines to VGG-19 and to a second deep learning network, called AlexNet. VGG-19 performed better on all the experiments in which both networks were tested. Both networks were trained to recognize objects using an image database called ImageNet.
However, both networks did poorly, unable to identify the glass figurines. Neither VGG-19 nor AlexNet correctly identified the figurines as their first choices. An elephant figurine was ranked with almost a 0 percent chance of being an elephant by both networks. Most of the top responses were puzzling to the researchers, such as VGG-19’s choice of “website” for “goose” and “can opener” for “polar bear.” On average, AlexNet ranked the correct answer 328th out of 1,000 choices.
“The machines make very different errors from humans,” Lu said.
In the third experiment, the researchers showed 40 drawings outlined in black, with images in white, to both VGG-19 and AlexNet. These first three experiments were meant to discover whether the devices identified objects by their shape.