Guide Labs introduces a new kind of interpretable LLM | TechCrunch - iTechsNews

The challenge when arguing with a deep learning model is often in understanding why it does what it does: Whether it’s repeated xAI combat sessions to tweak Grok’s weird politics, ChatGPT’s fight with sycophancy, or common hallucinations, navigating a neural network with trillions of parameters isn’t easy.

Today, Guide Labs, a San Francisco startup founded by CEO Julius Adebayo and Chief Scientist Aya Abdelsalam Ismail, offers an answer to this problem. On Monday, the company opened an 8-billion-parameter LLM, Steerling-8B, trained with a new architecture designed to make its actions easy to interpret: Each token produced by the model can be traced back to its origin in the LLM training data.

This can be as simple as establishing reference materials for the facts cited by the model, or as complex as understanding how the model understands humor or gender.

“If I have a trillion ways to code gender, and I code it into 1 billion of the 1 trillion things I have, you have to make sure you find all the 1 billion things I coded, and then you have to be able to reliably turn it on and off,” Adebayo told TechCrunch. “You can do it with current models, but it’s very fragile… It’s kind of one of the holy grail questions.”

Adebayo began this work while earning his PhD at MIT, co-authoring a widely cited 2020 paper that showed existing methods for understanding deep learning models are not reliable. This work ultimately led to the creation of a new way of building LLM: Developers inject a conceptual layer into the model that groups data into observable categories. This requires more data annotation up front, but using other AI models, they were able to train this model as the biggest proof of concept to date.

“The kind of interpretability that people are doing is … neuroscience on a model, and we’re going to flip it,” Adebayo said. “What we’re doing is actually designing the model from the ground up, so you don’t have to do the neuroscience.”

Thanks for the pictures:Laboratory Guide

One problem with this approach is that it could eliminate some of the emergent behaviors that make LLMs so interesting: Their ability to generalize in new ways about things they haven’t been trained to do. Adebayo says this is still happening in his company’s model: His team pursues what they call “discovery concepts” that the model discovered on its own, such as quantum computing.

Techcrunch event

Boston, MA
|
June 9, 2026

Adebayo argues that this interpretable architecture will be something everyone needs. For consumer-facing LLMs, these techniques should allow modelers to do things like block the use of copyrighted material or better control output around topics like violence or drug abuse. Regulated industries will require more auditable LLMs, such as in finance, where the model for evaluating loan applicants must consider things like financial records, but not races. There is also the need for interpretability in scientific work, which is another area where Guide Labs has developed technology. Protein folding has been a big success for deep learning models, but scientists need more information about why their software came up with successful combinations.

“This model shows that training interpretable models is no longer a kind of science; it is now an engineering problem,” Adebayo said. “We’ve got the science down and we can scale them, and there’s no reason why this kind of thing can’t match the performance of frontier-level models” that have many more parameters.

Guide Labs says the Steerling-8B can achieve 90% of the capabilities of existing models, but uses less training data thanks to its new architecture. The next step for the company, which emerged from Y Combinator and raised a $9 million seed round from Initialized Capital in November 2024, is to build a larger model and start offering users API and agent access.

“The way we currently train models is super primitive, so democratizing the inherent interpretability is going to be a good thing for us as a human race in the long run,” Adebayo told TechCrunch. “When we’re going after these models that are going to be super intelligent, you don’t want something that’s mysterious to you to make the decisions for you.”

Leave a Comment Cancel reply