The Case for Compact AI

Reading the Communications March 2025 issue, it struck me how many articles assume large language models (LLMs) are the inevitable and best future path for artificial intelligence (AI). Here, I encourage readers to question that assumption.

To be clear: I use LLMs, a lot, for solo and tactical tasks such as condensing my arguments into this editorial response. But for strategic tasks that might be critiqued externally, I need other tools that are faster, simpler, and whose reasoning can be explained and audited. So while I do not want to replace LLMs, I want to ensure we are also supporting and exploring alternatives.

In software engineering (SE), very few researchers explore alternatives to LLMs. A recent systematic review found only 5% of hundreds of SE LLM papers considered alternatives.¹ This a major methodological mistake that ignores simpler and faster methods. For instance, UCL researchers found SVM+TF-IDF methods vastly outperformed standard “Big AI” for effort estimation (100 times faster, with greater accuracy).⁷

In SE, one reason for asking “if not LLM, then what?” is that software often exhibits “funneling”: that is, despite internal complexity, software behavior converges to few outcomes, enabling simpler reasoning.²^,⁵ Funneling explains how my “BareLogic”³ active learner can build models using very little data for (for example) 63 SE multi-objective optimization tasks from the MOOT repository.⁴ These tasks are quite diverse and include software process decisions, optimizing configuration parameters, and tuning learners for better analytics. Successful MOOT modeling results in better advice for project managers, better control of software options, and enhanced analytics from learners that are better tuned to the local data.

MOOT includes 100,000s of examples with up to 1,000 settings. Each example is labeled with up to five effects. In practice, obtaining labels is slow, expensive, and error-prone. Hence, the task of active learners such as BareLogic is to find the best example(s), after requesting the least number of labels.⁶ To do this, BareLogic labels N = 4 random examples, then:

Scores and sorts labeled examples by “distance to heaven” (where “heaven” is the ideal target for optimization, for example, weight=0, mpg=max).
Splits the sort into $\sqrt{N}$ best and N − $\sqrt{N}$ rest examples.
Trains a two-class Bayes classifier on the best and rest sets.
Finds the unlabeled example X that is most likely best via arg max_X (log(like(best | X ) ) − log(like(rest | X ) ) )
Labels X, then increments N.
If N < Stop, go to step 1. Else return the top-ranked labeled example and a regression tree built from the N-labeled examples.

BareLogic was written for teaching purposes as a simple demonstrator of active learning. But in a result consistent with “funneling,” this quick-and-dirty tool achieves near optimal results using a handful of labels, as shown by the histogram on the right side of the figure here, across 63 tasks. Eight labels yielded 62% of the optimal result; 16 labels reached nearly 80%, 32 labels approached 90% optimality, 64 labels barely improves on 32 labels, and so forth.

The lesson here is that achieving state-of-the-art results can be achieved with smarter questioning, not planetary-scale computation. Active learning addresses many common LLM concerns such as slow training times, excessive energy needs, esoteric hardware requirements, testability, reproducibility, and explainability. The accompanying figure was created without billions of parameters. Active learners need no vast pre-existing knowledge or massive datasets, avoiding the colossal energy and specialized hardware demands of large-scale AI. Further, unlike LLMs where testing is slow and often irreproducible, BareLogic’s Bayesian active learning is fast (for example, for 63 tasks and 20 repeated trials, the figure here was generated in three minutes on a standard laptop). Most importantly, active learning fosters human-AI partnership.

Figure. — **Figure**. Twenty runs of BareLogic on 63 multi-objective tasks. Histogram shows mean (1 − (most − b4.min)/(b4.mu − b4.min)). ‘most’ is the best example returned by BareLogic; ‘b4’ are the untreated examples; ‘min’ is the optimal example closest to heaven.

Unlike opaque LLMs, BareLogic’s results are explainable via small labelled sets (for example, N = 32). Whenever a label is required, humans can understand and guide the reasoning. The resulting tiny regression tree models offer concise, effective, and generalizable insights.

Active learning provides a compelling alternative to sheer scale in AI. Its ability to deliver rapid, efficient, and transparent results fundamentally questions the “bigger is better” assumption dominating current thinking about AI. It tells us that intelligence requires more than just size.

I am not the only one proposing weight loss for AI. The success of LLM distillation (shrinking huge models for specific purposes⁸) shows that giant models are not always necessary. Active learning pushes this idea even further, showing that leaner, smarter modeling can achieve great results. So why not, before we build the behemoth, try something smaller and faster?

The Case for Compact AI

DOI

September 2025 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

The Case for Compact AI

DOI

September 2025 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.