In 1950, British mathematician and logician Alan Turing proposed a test to determine if a computer was as intelligent as a human. He called the test the “Imitation Game.” You may recall a 2014 movie of the same name. It dramatized Turing’s role during World War II in breaking Enigma, the Nazi’s ultra-complex code machine.
For decades, the Imitation Game, now known as the Turing test, has stood as an unattainable benchmark, challenging the field of artificial intelligence (AI). When computers achieve a human’s level of intelligence, it will be a major AI milestone.
Until recently, AI-written texts had clear giveaways that humans did not produce the words. Early-generation AI large language models tended to repeat awkward phrases and struggled with showing emotional depth. These flaws easily revealed to an evaluator the machine origin of the text.
However, recent AI models have improved dramatically with the advent of OpenAI’s chatGPT-4. The advances are largely due to more sophisticated, machine-learning techniques. Combining these technical advances with better training data sets allows the models to simulate intricate writing patterns, manage complex syntax, and mimic certain emotional undertones. The result is written output that is difficult to distinguish from human-generated text.
In May, Cameron Jones and Benjamin Bergen, cognitive scientists at the University of California San Diego, posted a moderated, pre-publication paper on Cornell University’s arXiv site titled “People cannot distinguish GPT-4 from a human in a Turing test.” Their study had 500 participants acting as evaluators. Each evaluator had a simultaneous, five-minute interview with another human and GPT-4. After the interview, the evaluator had to decide which was human and which was AI. In addition, they provided their confidence level and reasoning for making their selections.
The study found that the evaluators could not reliably tell the difference between the machine and the human. The researchers concluded that the AI passed the Turing test. While this result is encouraging, other researchers must replicate the results before the industry can claim that the Turing milestone has been achieved.
In any case, a new era of AI and human interactions has begun, raising several issues about how we deal with computers that can author reports, articles, prose, or even poetry, as well, or better, than most humans. One such question looming before society is “How much do we value the ‘human touch’ in writing as opposed to text generated by AI.” Many people view human-produced material as inherently more valuable because of its perceived authenticity and the unique insight and emotional weight it can bring. This is especially true for creative works, like poetry, where there remains a significant cultural and emotional attachment to human authorship; people want to feel connected with another person’s mind, not an algorithm.
In studies of AI-written poetry, however, when no authorship was disclosed, participants rated the AI writing to be as appealing as human-created works. This raises questions about the role of intention and perception in our valuation of art and creativity: Does it matter if a piece of writing is AI-generated if it evokes a genuine emotional response?
I wanted to evaluate this question first-hand. I asked my 11-year-old grandson Joe to draft an original story that we could use for a children’s book. He wrote a remarkably interesting account of “Daniel and the Temple of Time” in three chapters. I fed each chapter of Joe’s story into chatGPT-4o and asked it to produce the fully rounded-out text as a children’s book using prose appropriate for an 11-year-old.
In less than a minute, GPT4o followed my guidelines and produced three chapters of about 5,000 words. I was amazed at how well-written it was. I had my grandson edit the text to ensure it was at a level that another 11-year-old would understand. He did not have to change very much, so the machine followed my directions appropriately.
I added the illustrations for the book, using AI to generate them. That technology is similarly amazing. I took a selection of text from each page and fed it into Microsoft’s Image Creator. The result, within seconds, was four different, high-quality images, each designed to match the text selection. I chose one and used it in the manuscript.
The entire process of writing and illustrating Joe’s book took about one-quarter of the time I devoted to writing and illustrating other, similar-sized children’s books that were created in the old-fashioned way. The emotional impact and quality of the writing are as good as, or better than, any human. These tools allowed Joe and me to explore new creative possibilities and push the boundaries of traditional storytelling. The result is a hybrid creation, blending human imagination with AI’s vast capabilities.
How the authorship is credited is an important consideration when humans write with AI assistance. While Joe is listed as the author of the original story on the title page, it is also noted that the text was generated by chatGPT4o and edited by Joe. Similarly, Microsoft Image Creator is credited with the generation of the illustrations with direction from a human.
As AI-written materials continue to pass the Turing test reliably, the impact on our understanding of writing, creativity, and value is profound. The human touch in writing will likely remain valued, though potentially in new ways. Society may find itself increasingly grappling with questions of authenticity, ethics, and the role of technology in human expression. The promise is that rather than overshadowing human creativity, AI will amplify and expand how we express and connect through the written word. Hopefully, Joe’s book exemplifies how that goal can be achieved.
Anthony J. Marolda has degrees in physics and is a writer and painter residing in Annisquam. He used chatGPT4o to provide some insights for this article.