The major problem with most telling of the test is we don’t do it. The game is to be played with three participants: two competitors, and a questioner. Of course today the assumption is it’ll be a human and a machine, no questioner. The goal was not for the machine to trick a human but for the machine to appear more human to a questioner than a human being questioned at the same time.
Does any of that matter? I have no idea. I suspect Turing would say no as flippantly as he predicted in the paper that “The original question, ‘Can machines think?’ I believe to be too meaningless to deserve discussion. Nevertheless I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted.”
I’d strongly recommend anyone interested in having genuine discussions about LLMs read the paper. It’s genuinely a quick and easy read that’s still relevant. It reads as though it could have been a blog post linked here yesterday.
It's been decades (?) since I read the paper, but I think the questioner is key for multiple reasons, especially if you consider a generalized iterated version of the Turing test as it develops into the future.
I think the general idea is about being able to detect a difference between a machine and human, not whether the human alone can guess, as you're pointing to. In a general case, you can think of the questioner as some kind of detector, a classification system, an algorithm or method.
Let's say the classification system, the questioner, is able to be improved, and in this sense, there develops a kind of adversarial or challenge relationship between the AI developer and the questioner developer. Both improve, such that the AI becomes more humanlike, the questioner is improved and then can tell the difference again, and so forth and so on. Whether or not the AI "passes" the test isn't a static outcome; it likely passes, then fails, then passes again and so forth as the AI and questioner improve.
What's key is that you could argue that what happens is the AI becomes more humanlike, but at the same time the questioner also develops a more detailed model or representation of what it means to be humanlike. In this case, you could argue that the questioner must develop some descriptive representation of "human-likeness" that's just as sophisticated as the AI instantiates it, and what likely would occur is that the AI would become more humanlike in response to the improved respresentations and classification of the questioner. The questioner in some sense is a kind of mirror image instantiation of humanness as that represented by the AI, and vice versa.
It's the questioner in this iterated Turing test that ensures the AI becomes more humanlike, maybe to an extent the humans themselves aren't able to understand or recognize during the test. The AI wouldn't necessarily be imitating the human, it would be imitating what the questioner thinks is human.
It tests imitation skills. What makes the test interesting is the point of view that for some kinds of skills, as the imitation gets good enough, it becomes indistinguishable from the thing it seeks to imitate. The simplest example of this is purely abstract things like a song. Any imitation of a song that gets ever closer to the imitated song will eventually become indistinguishable from the imitated song. People like Hofsdtadter touched on this on the timeless G.E.B.
That's what makes the imitation game so interesting. Any ontological debates about what imitation means, implementation details or limitations are orthogonal to this yet this is what most people everywhere even in here obssess about. Missing the forest for the trees. The point is not to ask whether x is intelligence for any x under consideration but to use this as a reference when it comes to thinking about what is intelligence.
Super imitators or super predictors, the name of the game is helping each other get a sense of what intelligence (the one we have) is. On humans, other mammals, insects, etc.
In philosophy of mind, there is the concept of a “zombie”. This is a person who acts just like a real person would in all circumstances, except that they do not have an internal experience of their senses. No “qualia”.
My little engineering brain has always recoiled at any use of these zombies in an argument. In my reckoning the only way a machine could act human in all circumstances would be if it had a rich internal representation of the world, including sensory data, goals, opinions, fears, weaknesses…
The LLMs are getting better at the Turing test, and as they get better I wonder how correct my intuition about zombies is.
If you pretend they have the intelligence of an infant, they can pass the test. For some reason, people always try to use adult human intelligence as a point of reference. Infants are intelligent too.
My take is that are still making too many assumptions about "intelligence" and conflating human intelligence with adult human intelligence with non-human animal intelligence, etc.
Rather than some fundamental law of AI, one might better view the Turing Test as a then-obvious product of its social backstory and circumstances. Turing was a son and grandson of civil servants, engineers, army officers, and gentry. He grew up in the inter-war (WWI-WWII) British Empire. He originally called his test "the imitation game". And introduced it in a paper he published in a philosophy journal.
In that context - being able to present yourself as an intelligent human, in strictly written communication with other humans, is a "d'oh, table stakes" human skill. The Empire was based on ink-on-paper communications. If you couldn't keep the people who read your correspondence, paperwork, and reports convinced that you were an intelligent (and dutiful, honorable, etc.) person - yeah.
(Yes, that was only the ideal, and the British Empire frequently fell rather short. But what is an "imitation game", described in a philosophy journal? An ideal.)
The major problem with most telling of the test is we don’t do it. The game is to be played with three participants: two competitors, and a questioner. Of course today the assumption is it’ll be a human and a machine, no questioner. The goal was not for the machine to trick a human but for the machine to appear more human to a questioner than a human being questioned at the same time.
Does any of that matter? I have no idea. I suspect Turing would say no as flippantly as he predicted in the paper that “The original question, ‘Can machines think?’ I believe to be too meaningless to deserve discussion. Nevertheless I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted.”
I’d strongly recommend anyone interested in having genuine discussions about LLMs read the paper. It’s genuinely a quick and easy read that’s still relevant. It reads as though it could have been a blog post linked here yesterday.
It's been decades (?) since I read the paper, but I think the questioner is key for multiple reasons, especially if you consider a generalized iterated version of the Turing test as it develops into the future.
I think the general idea is about being able to detect a difference between a machine and human, not whether the human alone can guess, as you're pointing to. In a general case, you can think of the questioner as some kind of detector, a classification system, an algorithm or method.
Let's say the classification system, the questioner, is able to be improved, and in this sense, there develops a kind of adversarial or challenge relationship between the AI developer and the questioner developer. Both improve, such that the AI becomes more humanlike, the questioner is improved and then can tell the difference again, and so forth and so on. Whether or not the AI "passes" the test isn't a static outcome; it likely passes, then fails, then passes again and so forth as the AI and questioner improve.
What's key is that you could argue that what happens is the AI becomes more humanlike, but at the same time the questioner also develops a more detailed model or representation of what it means to be humanlike. In this case, you could argue that the questioner must develop some descriptive representation of "human-likeness" that's just as sophisticated as the AI instantiates it, and what likely would occur is that the AI would become more humanlike in response to the improved respresentations and classification of the questioner. The questioner in some sense is a kind of mirror image instantiation of humanness as that represented by the AI, and vice versa.
It's the questioner in this iterated Turing test that ensures the AI becomes more humanlike, maybe to an extent the humans themselves aren't able to understand or recognize during the test. The AI wouldn't necessarily be imitating the human, it would be imitating what the questioner thinks is human.
It tests imitation skills. What makes the test interesting is the point of view that for some kinds of skills, as the imitation gets good enough, it becomes indistinguishable from the thing it seeks to imitate. The simplest example of this is purely abstract things like a song. Any imitation of a song that gets ever closer to the imitated song will eventually become indistinguishable from the imitated song. People like Hofsdtadter touched on this on the timeless G.E.B.
That's what makes the imitation game so interesting. Any ontological debates about what imitation means, implementation details or limitations are orthogonal to this yet this is what most people everywhere even in here obssess about. Missing the forest for the trees. The point is not to ask whether x is intelligence for any x under consideration but to use this as a reference when it comes to thinking about what is intelligence.
Super imitators or super predictors, the name of the game is helping each other get a sense of what intelligence (the one we have) is. On humans, other mammals, insects, etc.
In philosophy of mind, there is the concept of a “zombie”. This is a person who acts just like a real person would in all circumstances, except that they do not have an internal experience of their senses. No “qualia”.
My little engineering brain has always recoiled at any use of these zombies in an argument. In my reckoning the only way a machine could act human in all circumstances would be if it had a rich internal representation of the world, including sensory data, goals, opinions, fears, weaknesses…
The LLMs are getting better at the Turing test, and as they get better I wonder how correct my intuition about zombies is.
If you pretend they have the intelligence of an infant, they can pass the test. For some reason, people always try to use adult human intelligence as a point of reference. Infants are intelligent too.
My take is that are still making too many assumptions about "intelligence" and conflating human intelligence with adult human intelligence with non-human animal intelligence, etc.
I would say LLMs do pass the Turing test, at least in meaningful and useful contexts - hence all the hype.
But has a rigorous experiment, with proper statistics, been conducted to test if a frontier LLM can consistently pass as a human interlocutor?
How many tests could the Turing Test test if the Turing Test could test tests?
Rather than some fundamental law of AI, one might better view the Turing Test as a then-obvious product of its social backstory and circumstances. Turing was a son and grandson of civil servants, engineers, army officers, and gentry. He grew up in the inter-war (WWI-WWII) British Empire. He originally called his test "the imitation game". And introduced it in a paper he published in a philosophy journal.
In that context - being able to present yourself as an intelligent human, in strictly written communication with other humans, is a "d'oh, table stakes" human skill. The Empire was based on ink-on-paper communications. If you couldn't keep the people who read your correspondence, paperwork, and reports convinced that you were an intelligent (and dutiful, honorable, etc.) person - yeah.
(Yes, that was only the ideal, and the British Empire frequently fell rather short. But what is an "imitation game", described in a philosophy journal? An ideal.)