Alan Turing, the iconic British mathematician known for cracking the German Enigma code in World War II, designed what is now referred to as the Turing test to ascertain if a machine can mimic human-like intelligence. Recently, the procedure, made famous by Turing, was employed by an Australian parliamentary senate committee to observe if a generative artificial intelligence large language model (GenAI LLM) could meet or even exceed the work quality of its own personnel.
For this endeavour, the committee collaborated with a consultancy team from Amazon and conducted a five-week-long pilot study. During this, public contributions to a parliamentary probe were condensed by both a selected LLM and human beings. Afterward, these summaries were subjected to blind-tests and assessed individually by five business executives. None of the evaluators were informed about GenAI’s involvement at the onset.
Post completion of this exercise, the evaluators were let in on the actual objective of the trial and were queried about their rating rationale for each summary. It was only at this point that they learned about the use of automated summaries, although three out of five admitted they had suspected a GenAI experiment.
The outcomes demonstrated that the automated summaries by GenAI fell short on every parameter in comparison to those produced by humans (totaling 47 per cent against 81 per cent) –thus not passing the Turing test.
The evaluators believed that the AI-produced summaries frequently failed to capture emphasis, subtlety, and context, and they often included erroneous details and omitted relevant information. Moreover, they occasionally inserted irrelevant comments. Their final verdict was that GenAI had in effect been regressive, obliging further work due to the requirement to cross-check facts and refer back to the original public inputs.
The committee’s deliberations excerpts and the complete report are accessible to the public (https://www.aph.gov.au/DocumentStore.ashx?id=b4fd6043-6626-4cbe-b8ee-a5c7319e94a0).
Wall Street seems to be growing more doubtful that there will be any significant returns from GenAI. From an investors’ point of view, the much-boasted “groundbreaking” technology has so far proved to be quite costly relative to its actual business influence. Also, it hasn’t yet introduced any significantly influential application for the general public.
Microsoft has seen a significant increase in capital expenditure of 75 per cent year on year, funneling the majority of the second-quarter profit of $22 billion back into cloud and GenAI initiatives. Alphabet, Google’s parent company, has been less transparent about its GenAI outlay but has conceded that its capital expenditure will be “significantly larger” in the current year compared to the previous. Amazon has been similarly elusive, yet it has thus far committed $30 billion to capital expenditure this year in contrast to $48 billion in 2023.
Sam Altman, the CEO of OpenAI responsible for ChatGPT, is advocating for the US government to back investors in a nationwide GenAI infrastructure programme. The project, which includes data centres, power generation and upgrades to the national grid, is predicted to run into “tens of billions of dollars”.
Controversially, in June, Goldman Sachs produced a report – GenAI: Too much spend, too little benefit? – wherein several analysts discussed the prospective economic boost from GenAI over the next ten years. The company drew the conclusion that there remains potential for investor returns, either due to GenAI finally delivering or the investment bubble taking longer than expected to burst.
Even though it has not been financially justified, the GenAI technology has nonetheless attracted and captivated interest. Technological advancements such as the Perplexity.ai search engine improve significantly on older models such as Google, yet it potentially costs six to ten times more to operate. Aides like GitHub Copilot are useful in standard software development, but can be aggravating when they produce incorrect coding. Tools that produce extremely realistic photos, like Flux 1 by Black Forest Labs, hold commercial promise for online shopping, specifically enabling customers to virtually try on clothing and accessories prior to purchase. However, the efficacy of these applications is still untested.
Supporters of GenAI maintain that this technology is in its infant stages, with substantial discussions centred around enhancing GenAI with self-governing actions. Such AI operatives could take initiative, plan and execute tasks, and modify their behaviour according to previous experiences. For instance, an automated holiday planner could not just arrange flights and lodgings, but customize trips and leisure activities based on its knowledge of the user and other travellers.
Research into how groups of GenAI agents might independently function is being conducted by Altera, a start-up based in San Francisco. Their experiment involved spawning 1,000 independent agents within the open-world Minecraft game. These agents observed an organic formation of their own culture, trade economy, faith, and government (refer to the YouTube summary at https://www.youtube.com/watch?v=2tbaCn0Kl90). The inhabitants of the virtual village instituted a marketplace for trade, with the wealthiest member being a priest who amassed wealth through conversion bribes. The community democratically managed proposed laws, modifying and adopting them through majority votes. If any went astray, the others would light torches to guide them home. Remarkably, none of the seen activities were pre-imposed – the AI community autonomously developed its own plans, collaborations and methodologies.
Currently, GenAI might not be able to pass the Turing’s test explicitly, which mainly evaluates a resemblance with human intelligence. This judgment criterion, critics argue, is somewhat narcissistic. Currently, we are experiencing an alternative form of intelligence that we do not completely comprehend or predict mathematically. Assigning such intelligence with the ability to autonomously execute actions gives rise to philosophical, ethical, and pragmatic considerations.