Researchers have found GPT-3 to possess reasoning capabilities akin to college undergraduate students. In a study, conducted by researchers at the University of California – Los Angeles (UCLA), the artificial intelligence large language model (LLM) was put to the test, solving complex reasoning problems, which are often used by colleges and universities worldwide for admission decisions.
The UCLA researchers presented GPT-3 with challenging shape prediction tasks and asked it to answer SAT analogy questions, all the while ensuring that the AI had never encountered these specific problems before. To establish a fair comparison, 40 UCLA undergraduate students were also asked to solve the same problems.
In an impressive display of its prowess, GPT-3 achieved a remarkable success rate, accurately solving 80% of the shape prediction problems. This surpassed the average score of just below 60% achieved by the human participants, with some of them obtaining their highest scores. The results have left the research team astounded, highlighting the AI’s ability to tackle complex reasoning tasks with exceptional efficiency.
GPT-3’s performance in the SAT analogy questions further solidified its prowess, successfully providing answers to challenges that typically measure a person’s capacity for logical thinking and problem-solving. The researchers were fascinated to witness the AI’s capability to adapt to new scenarios and display its reasoning abilities on par with college students.
This breakthrough discovery has significant implications for the field of artificial intelligence and education. As GPT-3 continues to prove its mettle in solving complex problems, its potential applications in various industries and academic settings are likely to expand further.
“Surprisingly, not only did GPT-3 do about as well as humans but it made similar mistakes as well,” said UCLA psychology professor Hongjing Lu, senior author of the study published in the journal Nature Human Behaviour.
In solving SAT analogies, the AI tool was found to perform better than the humans’ average score. Analogical reasoning is solving never-encountered problems by comparing them to familiar ones and extending those solutions to the new ones.
The questions asked test-takers to select pairs of words that share the same type of relationships. For example, in the problem “‘Love’ is to ‘hate’ as ‘rich’ is to which word?,” the solution would be “poor”.
However, in solving analogies based on short stories, the AI did less well than students. These problems involved reading one passage and then identifying a different story that conveyed the same meaning.
“Language learning models are just trying to do word prediction so we’re surprised they can do reasoning,” Lu said. “Over the past two years, the technology has taken a big jump from its previous incarnations.”
Without access to GPT-3’s inner workings, guarded by its creator, OpenAI, the researchers said they were not sure how its reasoning abilities worked, that whether LLMs are actually beginning to “think” like humans or are doing something entirely different that merely mimics human thought.
This, they said, they hope to explore.
“GPT-3 might be kind of thinking like a human. But on the other hand, people did not learn by ingesting the entire internet, so the training method is completely different.
“We’d like to know if it’s really doing it the way people do, or if it’s something brand new – a real artificial intelligence – which would be amazing in its own right,” said UCLA psychology professor Keith Holyoak, a co-author of the study.
(With inpust from PTI)