Oliver Eberle and his colleagues from TU Berlin, UCLA, UCSD, and Microsoft Research make a compelling case for a fundamental shift in AI research in Position: We Need An Algorithmic Understanding of Generative AI:
What algorithms do LLMs actually learn and use to solve problems? Studies addressing this question are sparse, as research priorities are focused on improving performance through scale, leaving a theoretical and empirical gap in understanding emergent algorithms. This position paper proposes AlgEval: a framework for systematic research into the algorithms that LLMs learn and use.
The field of AI has been in a gold rush, driven by the belief that scaling models is the primary path to greater intelligence. It’s a compelling idea, and the results have been remarkable. That said, this paper argues that we are building ever-more powerful engines without truly understanding the principles of their internal combustion. We can observe their behaviour, but we lack a deep, principled understanding of the actual algorithms these models invent and execute to solve problems. What does an LLM do, step-by-step, when it reasons through a task? I suspect for most systems, we genuinely don't know.
This paper argues for a re-centring of our efforts, moving from a focus on scale to a focus on "algorithmic understanding." The proposal is to treat these models not as magical black boxes to be coaxed with clever prompts, but as computational systems that can be scientifically dissected. This means forming hypotheses about the kinds of algorithms a model might be using—is it performing a classic tree search, or has it learned some other heuristic?—and then using interpretability tools to verify them. It's a shift from being mystics to being scientists, creating a vocabulary of the "algorithmic primitives" that models learn and analysing the grammar they use to combine them.
The implications of this approach are massive. Without understanding the underlying algorithms, ensuring safety and alignment feels more like hope than engineering. As the paper's own case study on graph navigation shows, LLMs don't necessarily default to the neat, human-designed algorithms we might expect; they develop their own emergent, and sometimes messy, strategies. Understanding these learned processes is the only sustainable path toward building more robust, efficient, and trustworthy AI. It points to a future where the craft is less about prompt engineering and more about a rigorous, almost cognitive, science of dissecting and directing the reasoning of our artificial counterparts.