Different AI labs have different priorities. For example, OpenAI has traditionally focused on consumer users, while its competitor Anthropic tends to target enterprises. We recently discovered Elon Musk’s xAI puts a special emphasis on tutorials in video games.
On Friday, Business Insider’s Grace Kay published a detailed and wide-ranging report on xAI, the AI startup recently acquired by SpaceX, with a particular focus on how Musk is making life difficult for employees. But this particular anecdote stood out:
In one instance last year, the model’s release was delayed for several days because Musk was unhappy with how the chatbot answered detailed questions about the video game “Baldur’s Gate,” according to people familiar with the matter. High-level engineers were pulled from other projects to improve pre-launch responses, they said.
Of course, you can imagine the frustration of any respected and experienced engineer who shows up to work thinking they’re going to solve fundamental problems of knowledge and machine intelligence only to be drafted to help a 54-year-old beat his video game. But the anecdote raises an even more pressing question: Did Musk get the gaming skills he wanted?
To answer this question, our resident RPG enthusiast Ram Iyer put together a set of five general questions about Baldur’s Gate that we pitted against xAI and the three main models in a sort of quasi-benchmark that I’ve decided to call BaldurBench.
In the interest of journalistic transparency, I’ve published all chat transcripts so you can see them here: Grok, ChatGPT, Claude and Gemini.
First, the good news: Grok actually provides some pretty good information. His answers were a bit dense with gamer jargon – “save-scumming” instead of saves and “DPS” instead of damage – but the answers were useful and well-informed if you knew what he was talking about. Grok also really loves spreadsheets and theory, which is about what you’d expect.
There are a lot of Baldur’s Gate guides, and the models generally drew from the same ones, so the biggest differences were stylistic. ChatGPT prefers bulleted lists and sentence fragments, while Gemini loves it in bold important words.
Techcrunch event
Boston, MA
|
June 9, 2026
The biggest surprise was Claude, who was all about giving me information that would spoil my game experience. When I asked him about good party tracks, he closed the tutorial with “don’t stress too much and play what you enjoy”. Thanks, Claude!
It’s important to note that this is an area we know (thanks to Business Insider reports) that xAI has specifically focused on achieving parity. So we shouldn’t read too much into the fact that after the announced sprint, Grok’s advice turned out roughly the same as the other models. Still, it’s nice to know that xAI can work if it tries.