As we know, the Deepseek R1 AI foundational model has been making headlines, with some even calling it the Sputnik moment for AI. I beg to differ—here’s why. Before discussing the pros and cons of R1, we need to take a step back and zoom out to understand where we are in the AI market. I presented a keynote at EF in 2019 on the future of AI and blockchain, and my thesis remains true. As part of my thesis, I demonstrated that AI will eventually converge and that brute force and excessive computing do not necessarily yield better models—in fact, the benefits taper off dramatically.
I firmly believe that a state-of-the-art R&D lab can build a foundational model or large language model (LLM) for less than $10 million with a team of fewer than 5 ML researchers/engineers—and even build a $1 billion company—by using optimization and agents with a hierarchical chain of command to create a high level of order in the model. I explore that idea by first explaining why brute-force scaling meets diminishing returns, then illustrating the argument with concrete examples and breakthrough research, and finally reinforcing the notion that lean, efficient systems represent the future of AI innovation. Ultimately, this means that anyone, regardless of scale, can challenge the incumbents in the AI space.
The principle behind diminishing returns is elegantly captured by the “law of J, S, and D curves.” Initially, models may experience exponential growth—a rapid "J curve"—but as more resources are poured into the system, improvements level off into an "S curve" and may eventually decline, entering a "D curve." This behavior, seen in phenomena ranging from the self-similarity of fractals to Pareto’s law in economics, indicates that additional compute only delivers marginal benefits after a certain threshold. Scaling laws for neural language models further support this observation, showing that as models grow larger, their performance improvements adhere to predictable power-law relations. In essence, it becomes more effective to focus on optimization and structured design rather than on expanding hardware capacity indiscriminately.
A groundbreaking insight emerging from recent research is that we are witnessing a convergence in intelligence driven by both the cost of computing and quality. As compute becomes more expensive beyond a critical point, incremental gains shrink, forcing innovators to develop smarter, more efficient algorithms. Many now hypothesize that this convergence reflects a fundamental property of complex systems—where power-law dynamics and inherent stochastic processes drive models toward an optimal architecture that intrinsically links quality with cost-efficiency. This revelation is truly mindblowing because it suggests that the future of AI will be defined not by raw scale but by elegant design.
Concrete examples from the industry vividly illustrate these points. Consider Deepseek R1, a breakthrough model that processed 14.8 trillion tokens using 2,048 Nvidia H800 GPUs. Despite consuming around 2.788 million GPU hours and costing approximately $5.58 million, its success was not solely due to vast compute power. Instead, Deepseek R1 was engineered to activate only the necessary components during inference, thereby reducing redundant processing and maximizing efficiency. This smart allocation of resources demonstrates that targeted optimization enables even modest budgets to achieve state-of-the-art performance.
Every Sunday afternoon, I play tournaments at the London Chess Club. Recently, I stumbled upon a YouTube video titled “Wall St Gambit.” Intrigued by a gambit I had never heard of, I clicked on it. It turned out to be a competition organized by Kaggle and FIDE Chess, challenging participants to create agents that play chess under strict resource constraints. In this simulation, competitors must develop an agent that operates effectively within severe CPU and memory limitations. This challenge illustrates that novel, optimized techniques can address growing complexity—not only in chess but also in advancements in modeling and inference techniques that extend well beyond traditional heuristic-based algorithms.
Additional evidence comes from the realm of chess AI. In competitions with stringent resource limits—such as systems operating with just 5 MiB of RAM, a single 2.20GHz CPU core, and a 64KiB submission size cap—developers have crafted algorithms that excel despite severe constraints. Techniques like the Minimax algorithm, enhanced by alpha-beta pruning, and Monte Carlo Tree Search (MCTS) are computationally intensive and perform a depth-first search through every branch. To simulate against the engine, I used a small but robust agent employing efficient, well-optimized algorithms and heuristics, utilizing the following methods: Null Move Pruning, Internal Iterative Reductions, Late Move Pruning, Reverse Futility Pruning, Quiescence Search, Razoring. I was able to see effective results and run about 4MiB under the 5MiB RAM; this was written in C++ for performance and optimization.
The conversation about efficiency is further enriched by recent discussions on scaling laws, which indicate that as models scale, their performance improvements follow predictable power-law relations—a phenomenon observed in many fields, from physics to biology. This perspective explains why Deepseek V3 is generating buzz in the LLM community, as its novel architecture and optimization techniques set new standards for efficiency. Reinforcing dramatic performance improvements achieved through lean design and innovative thinking, these insights underscore that breakthroughs are bypassing traditional dependencies on established hardware platforms like NVIDIA’s CUDA.
Despite skepticism from those who insist that only massive budgets and large teams can yield groundbreaking AI models, the evidence increasingly supports a leaner approach. Critics argue that high performance necessitates extensive hardware investments, yet the success of systems like Deepseek R1 and Deepseek V3, along with the ingenious design of efficient chess engines, tells a different story. While industry giants like NVIDIA continue to dominate the hardware, infrastructure, and kernel layers—having built entire ecosystems that resemble cities—the application layer remains nascent and ripe for innovation. In other words, anyone with a smart, efficient strategy can compete with and potentially outmaneuver the established players in foundational models and LLMs.
The state of play in AI suggests that the coming years will be defined by the ability to do more with less. As research continues to validate the power-law scaling in neural language models, we can expect breakthroughs that favor intelligent, lean designs over brute-force scaling. This convergence on efficiency not only promises to revolutionize AI but also carries significant market and geopolitical implications. Varying approaches to data privacy, governmental oversight, and the rapid evolution of hardware and software infrastructures are reshaping the competitive landscape. In this emerging environment, the future belongs to those who can maximize performance while minimizing resource expenditure—a future where convergence on intelligence becomes the norm.
In conclusion, the future of AI is not about amassing endless compute power or expanding teams to unsustainable sizes. Instead, it is about strategically deploying resources—leveraging optimization, employing hierarchical structures, and using intelligent design to create models that deliver exceptional performance with minimal inputs. In short, a state-of-the-art R&D lab can build a foundational model/LLM for under $10 million with a few dedicated and experienced researchers building a unicorn. With mounting evidence from scaling laws, breakthrough models like Deepseek, and ongoing discussions in the AI community, it is clear that efficiency is the new frontier. The convergence on intelligence—doing more with less—is not just an ideal; it is the roadmap for shifting the focus from brute-force computation to elegant and efficient design, proving that anyone can compete with the incumbents in foundational models and LLMs.