It's a rather dreary day here in London, with light rain falling as we delve into another week of artificial intelligence discourse. This week I'm reminded of Alan Turing's assertion: "I'm only a mathematician not a prophet." It serves as a pertinent backdrop as we consider the challenge of separating signal from noise in a landscape increasingly awash with claims of breakthroughs and advancements. It seems every week brings forth a new development that some herald as revolutionary yet the true value often gets lost amidst the fervour. Today we'll sift through the latest in artificial intelligence aiming to discern what's genuinely significant and what might simply be the latest iteration of a well-worn narrative. This is Turing's Torch: Artificial Intelligence Weekly — the bits that matter, minus the hype. The Royal Statistical Society recently put out a paper reminding everyone that at its heart, artificial intelligence is, well, statistics. All the clever algorithms we're as a result excited about are fundamentally reliant on statistical principles. That might seem obvious yet it's a point worth underlining because what it really means is that artificial intelligence particularly machine learning thrives on data. It identifies patterns, makes predictions, and learns from experience. Yet all of that hinges on understanding the underlying distribution of the data, spotting biases, and evaluating the significance of the results. Without a firm grasp of statistical methods, you can easily build artificial intelligence systems that are unreliable, unfair, or just plain wrong. It's a bit like those celebrity chefs who insist on using only the finest ingredients even though most of us can't tell the difference. Statistical rigour might not be glamorous, yet it's the bedrock of reliable artificial intelligence. You can slap fancy algorithms on top, yet without that solid foundation, the whole thing is likely to crumble. And ultimately, understanding your data is more important than the code you write. This matters because artificial intelligence is increasingly being used to make decisions that affect our lives, from medical diagnoses to loan applications. If these systems are built on shaky statistical foundations, the consequences could be serious. Think about algorithms that perpetuate existing social inequalities or misdiagnose diseases because they haven't been properly validated. The paper rightly emphasizes the need for artificial intelligence developers to be statistically literate to understand the limitations of their models and to be able to interpret the results responsibly. This connects to the broader theme of transparency and accountability in artificial intelligence. There's a growing call for artificial intelligence systems to be more explainable as a result that we can understand how they arrive at their decisions. Statistical methods play a crucial role here helping us to unpack the inner workings of complex artificial intelligence models and identify potential sources of bias. Related to that one of the ongoing challenges in data science is determining the true effect of a particular action or treatment when you can't run a proper experiment. A technique called Propensity Score Matching, or PSM, is designed to help with this. Essentially PSM tries to mimic a controlled experiment by pairing individuals with similar characteristics yet who have either received the treatment or not. It's a way of levelling the playing field when you're analysing data that wasn't collected under controlled conditions. Imagine you're trying to assess the impact of a new training programme on employee performance. You can't randomly assign employees to receive the training as a result you instead use PSM to match employees who did receive the training with similar employees who didn't based on factors like experience education and previous performance. This allows you to compare their subsequent performance more fairly, hopefully isolating the effect of the training programme itself. Now this is important because as a result much of the data we analyse in the real world comes from observational studies rather than controlled experiments. Think about market research, public health studies, or even internal company data. Being able to draw more reliable conclusions from this type of data can inform decisions about resource allocation policy changes and strategic planning. If you're relying on flawed data, you're potentially making costly mistakes, or worse, implementing policies that have unintended negative consequences. Of course, like any statistical method, PSM isn't a magic bullet. It relies heavily on the quality and completeness of the data. If you're missing key variables that influence both the treatment and the outcome, you can still end up with biased results. It's a bit like trying to assemble a jigsaw puzzle with missing pieces you might get a picture yet it won't be the whole story. The bigger picture here is that as data analysis becomes more pervasive we need to be increasingly aware of the limitations of the tools we use. There's a tendency to treat data as objective truth yet it's crucial to remember that the conclusions we draw are only as good as the methods we employ. I'm always reminded of the old saying: garbage in, garbage out. And with that in mind it's worth remembering that despite considerable excitement the vast majority of artificial intelligence models designed for healthcare purposes are failing to pass regulatory scrutiny. To unpack that a bit regulatory bodies are demanding rigorous evidence of clinical safety and effectiveness before they will approve these artificial intelligence systems for use. This isn't simply a matter of ticking boxes it requires demonstrating that the models perform reliably across diverse patient populations and that they don't introduce unacceptable risks. The challenges are considerable. Gathering sufficiently diverse data sets can be difficult. Overfitting, where a model performs well on training data yet poorly in real-world settings, is a common problem. And, perhaps most critically, demonstrating transparency and interpretability in complex artificial intelligence systems remains a major hurdle. The implications of this are far-reaching, particularly in the context of accountability. If an artificial intelligence model makes an incorrect diagnosis, who is responsible? Is it the developer of the model, the healthcare provider using it, or some combination of both? These are not trivial questions, and regulators are understandably cautious about approving systems where such fundamental issues remain unresolved. This caution while understandable also means that the potential benefits of artificial intelligence in medicine – improved diagnostics more personalized treatments and increased efficiency – are being delayed. We are seeing in effect a tension between innovation and regulation a familiar pattern in technology and one that requires careful navigation if we are to unlock the true potential of artificial intelligence in healthcare. It seems to me that the regulators are wise to be demanding proof of effectiveness given that artificial intelligence models are only as good as the data used to train them and healthcare data can be notoriously messy and biased. And related to that one of the ongoing challenges with large language models is their tendency to confidently assert things that are to put it politely not entirely accurate. A recent experiment attempts to address this by building a system where the model not only generates an answer yet also provides a confidence score and a justification for that score. The process involves several steps. First, the model produces an initial answer, along with its self-assessed confidence level. Then, it evaluates its own output to determine if that confidence is actually warranted. Finally, the system automatically checks the answer against current information available online, a bit like a fact-checking assistant. The intention is to make these models more transparent and reliable. If a model can accurately assess its own certainty it could flag potentially unreliable answers which is important as these systems are increasingly used in situations where accuracy is critical. Think of medical diagnoses, legal advice, or financial forecasting. If a model is going to be wrong, it would be useful to know that it knows it might be wrong. This ties into the broader push for more accountable artificial intelligence. We're seeing increasing pressure for artificial intelligence systems to be less of a black box especially as they are deployed in sensitive areas of life. The ability for a model to explain its reasoning and quantify its uncertainty could be a step towards greater trust. Of course, just because a machine sounds confident doesn't necessarily mean it's correct, even if it thinks it is. We've all met humans like that. The real test will be whether these self-evaluations actually improve the accuracy and reliability of the models in real-world scenarios. Still, a little bit of self-doubt might be a healthy thing, even for an artificial intelligence. We're also seeing a subtle yet important shift occurring in how businesses are using artificial intelligence and it's not quite the wholesale takeover some might have predicted. Rather than ripping out entire enterprise systems artificial intelligence is being deployed more strategically targeting the somewhat cumbersome customer relationship management layers that have accumulated over time. Think of it this way: businesses have traditionally used CRM systems as a sort of bolt-on to manage interactions with customers. These systems often involve multiple layers of software and processes, leading to inefficiencies and delays. What's happening now is that artificial intelligence is stepping in to streamline these processes creating a more unified and real-time approach to customer experience. Instead of just automating tasks, artificial intelligence is essentially orchestrating the entire customer journey, from initial contact to ongoing support. This matters because it allows companies to respond more quickly and effectively to customer needs. By removing the bottlenecks caused by outdated CRM systems, businesses can provide a more seamless and personalized experience. This could translate to increased customer satisfaction, loyalty, and ultimately, a stronger bottom line. The competitive pressure to deliver that experience will be intense. Of course, there's also a potential downside. As artificial intelligence takes on a more central role in customer interactions, it raises questions about data privacy and security. We need to ensure that these systems are designed and implemented in a way that protects customer information and prevents misuse. And if a system goes rogue, the potential damage will be magnified. It strikes me that the narrative has subtly shifted. We're no longer asking if artificial intelligence will replace human workers, yet rather how it will reshape the existing infrastructure. And in this case it seems the first casualties will be the bloated and inefficient CRM systems that have plagued businesses for far too long. This also connects to how methods are evolving for carefully introducing machine learning models into real-world applications. It's not as simple as just switching them on and hoping for the best. What we're really talking about is risk management. A model might perform flawlessly in a controlled testing environment yet the moment it encounters the messy unpredictable nature of actual data things can go awry. User behaviour changes, data patterns shift, and suddenly your whiz-bang algorithm is making questionable decisions. The answer is to phase things in slowly using techniques with names like A/B testing where you compare the old and the new systems on a small subset of users. Or canary testing, which involves releasing the new model to a very restricted group and monitoring its performance. Interleaved testing alternates between models for different users to see which works best. There's even shadow testing, where the new model runs in the background, collecting data yet not actually influencing the outcome. The reason this matters is that deploying these models impacts everything from financial trading to medical diagnoses. A poorly implemented algorithm could lead to significant financial losses, misdiagnoses, or any number of unintended consequences. And as machine learning becomes more deeply embedded in critical infrastructure, the potential for disruption increases exponentially. This is of course part of the wider conversation about accountability and oversight. We're all increasingly reliant on systems that are essentially black boxes. One might be forgiven for thinking that calling these techniques 'testing' is stretching the definition a bit. Surely testing is what one does before deployment, not during? Yet perhaps that's the point. We're entering an era where the testing never truly ends, and constant vigilance is the price of algorithmic progress. Switching gears a bit the actor Val Kilmer whose career was curtailed by throat cancer is now appearing on screen again thanks to artificial intelligence. Digital techniques have been used to recreate his likeness and voice. In practice this means producers can insert a convincing digital representation of Kilmer into new productions even though the real person can no longer perform in the traditional sense. It is more than a deepfake, it is a full resurrection of a performer. The implications are considerable. For one, it raises ethical questions about the use of someone's likeness without their full, ongoing consent. How much control does an actor, or their estate, retain over these digital recreations? And what are the implications for working actors? Will studios opt to revive deceased stars rather than hire new talent? The financial incentives are obvious. This also touches on the broader unease around artificial intelligence and creative industries. We have seen similar debates around artificial intelligence-generated art and music where the lines between homage exploitation and outright theft become increasingly blurred. As artificial intelligence gets better at mimicking human creativity the value of original human work could diminish or at least be harder to protect. One wonders, though, if audiences will truly embrace these digital resurrections. There is after all something inherently unsettling about seeing a simulated version of someone who is no longer with us performing as if nothing has changed. Perhaps viewers will find it more ghostly than glorious. It certainly raises questions about authenticity that the industry must answer. That brings us to a more fundamental issue in artificial intelligence development and Yann LeCun a name familiar to anyone following developments in machine learning has recently unveiled something called the LeWorldModel. This is intended to tackle a persistent problem in how artificial intelligence systems predict future events. The issue in essence is that when artificial intelligence models are trained using raw visual data – pixels to be precise – they can fall into a trap. Instead of truly understanding and predicting what's happening, they sometimes resort to simply repeating patterns. It's a bit like a student who when faced with a difficult exam question just rewrites the question itself hoping to score some points. This is known as 'representation collapse' where the model generates redundant embeddings just repeating itself to meet the prediction goals rather than providing genuinely informative outputs. The goal is to allow artificial intelligence to reason and plan in complex environments. If LeCun's approach proves successful it could lead to artificial intelligence systems that are not only better at predicting what will happen next yet also possess a deeper understanding of their surroundings. This has implications for areas like robotics where a robot needs to anticipate the consequences of its actions and in video games where artificial intelligence characters could become more realistic and adaptable. This is not about some abstract intellectual exercise. The ability of an artificial intelligence to accurately predict the future has very real consequences for control systems autonomous vehicles even financial modelling. If these systems are making decisions based on flawed predictions, the outcomes could be, shall we say, sub-optimal. It's perhaps a little disheartening that one of the core challenges in artificial intelligence remains getting the systems to simply not cheat to properly understand rather than just mimic. Yet then again, perhaps we shouldn't be surprised. Isn't that a problem we've been grappling with in human education for centuries? There's been some interesting tinkering with language models as well, with researchers unveiling a fine-tuning method called TinyLoRA. The claim is that it can achieve decent results using a remarkably. small number of parameters – we're talking potentially as few as one. Now, for the uninitiated, fine-tuning is like giving a language model a specific set of instructions after it's already learned the basics. Parameters, in this context, are the variables the model adjusts during training to improve its performance. The general assumption has been that more parameters equal better reasoning, yet TinyLoRA throws a wrench in that idea. By employing what they call 'extreme sharing settings,' the model can apparently. function with minimal parameters while still maintaining a reasonable level of performance. Think of it as a highly efficient engine, squeezing every last drop of power from a tiny fuel tank. The implications are fairly significant, if this holds up under further scrutiny. It suggests that current model training practices might be unnecessarily resource-intensive. If TinyLoRA or something like it proves viable it could lead to a shift in how models are developed potentially saving considerable time and energy. There's also a financial incentive here, since training large models is becoming increasingly expensive, and consumes vast amounts of electricity. This also ties into the broader discussion around transparency and efficiency. If we can achieve comparable results with smaller more easily understood models it could make artificial intelligence more accessible and less of a black box. Of course, it also raises questions about what we're actually measuring with these benchmarks. Are we truly capturing genuine reasoning ability, or are we simply rewarding models that are good at pattern recognition? It's tempting to get carried away with the possibilities yet I'm old enough to remember when cold fusion was just around the corner. The artificial intelligence world like any other has a habit of getting excited about things that later turn out to be less revolutionary than initially advertised. Still, the idea of achieving high accuracy with fewer parameters is intriguing, if only because it challenges the prevailing wisdom. Perhaps we're on the verge of a more elegant approach to artificial intelligence. In a similar vein the latest refinement in artificial intelligence development involves a technique called Paged Attention designed to optimise the use of GPU memory when running large language models. Now, these models require substantial memory to track data at the token level for each request they process. The conventional approach allocates a large, fixed amount of memory for each request, anticipating the longest possible sequence it might handle. This often leads to significant wastage, as memory sits idle when it's not fully utilised. Paged Attention, in contrast, dynamically allocates memory as needed. Think of it like virtual memory on your computer, yet applied to artificial intelligence processing. The goal is to minimise unused slots and maximise the concurrent tasks that can run without hitting memory constraints. The implications of this are fairly straightforward. By using GPU memory more efficiently developers can potentially scale artificial intelligence models further and improve performance without necessarily requiring more expensive hardware. This is particularly relevant in contexts where resources are limited or where maximising throughput is critical. It's less about adding more data, and more about squeezing more work out of the silicon you already have. We've seen similar efforts in other areas of computing, where optimising memory management has led to significant gains in efficiency and performance. This is part of a wider trend of refining existing artificial intelligence models rather than simply making them bigger. There's a growing recognition that brute force scaling isn't always the most effective or sustainable approach. Of course, the proof will be in the pudding. Whether Paged Attention truly delivers on its promises remains to be seen. Yet it does highlight the ongoing effort to make these systems more efficient and less resource-intensive which is a welcome development in a field often characterised by its insatiable appetite for computing power. And speaking of efficiency Google's researchers have announced a new compression technique for Large Language Models claiming to significantly reduce memory requirements and boost processing speed without losing accuracy. This technique, called TurboQuant, addresses the growing problem of memory bottlenecks in these increasingly large and complex models. In essence, TurboQuant makes the models more compact by quantizing, or compressing, the data they use. Think of it like zipping a large file on your computer. By reducing the size of the Key-Value cache – a crucial part of how. these models remember and retrieve information – TurboQuant allows for faster data access and processing. The headline figure is a sixfold reduction in memory use and up to an eightfold increase in speed. The implications of this are considerable. If these claims hold up, it could lead to more efficient and accessible artificial intelligence applications. Imagine running complex artificial intelligence models on devices with limited resources, or processing vast amounts of data much faster than currently possible. This could benefit a wide range of industries, from healthcare to finance, where rapid data analysis is crucial. The drive for greater efficiency is a recurring theme in the field. As models grow ever larger, the computational demands become increasingly unsustainable. Optimisation techniques like TurboQuant are essential if we want to avoid a scenario where. only the largest corporations can afford to train and deploy these advanced artificial intelligence systems. Of course, one should always treat claims of zero accuracy loss with a degree of scepticism. There's usually a trade-off somewhere, and the real-world performance may not always match the lab results. That said, if TurboQuant delivers on its promises, it could be a significant step towards more efficient and sustainable artificial intelligence. Nvidia has also released an updated version of its Nemotron language model, called Cascade 2. The claim is that it offers enhanced reasoning capabilities while using fewer computing resources than similar models. The important detail here is the notion of ‘intelligence density'. Typically, larger language models require vast amounts of computing power to operate effectively. Nemotron-Cascade 2 aims to achieve comparable performance by activating only a portion of its Another week gone another deluge of claims about what artificial intelligence will achieve by Tuesday. Sorting the signal from the noise becomes ever more crucial. For a daily distillation of developments, you can sign up for the artificial intelligence briefing at jonathan-harris dot online. And if you're looking for a slightly longer perspective my book "Beyond Earth: How artificial intelligence Is Transforming Space Exploration" might be of interest. It's available at books dot jonathan-harris dot online slash ai-space. It's a look at the facts, rather than the aspirations. That's your lot for this week's Turing's Torch. If you want the daily brief, head to jonathan-harris dot online. Same time next week — try not to believe the press releases.