LLM Architecture Gallery
**LLM Architecture Gallery, A Visual Guide to How Modern AI Models Are Built**
If you’ve ever tried to compare large language models, you know how quickly things get confusing. One model tweaks attention. Another swaps normalization. A third goes full Mixture of Experts and suddenly you’re knee deep in diagrams and half remembered blog posts.
That’s exactly why Sebastian Raschka’s LLM Architecture Gallery feels so refreshing.
It’s not hype. It’s not speculation. It’s a carefully organized **visual collection of model architectures**, pulled from his broader LLM comparisons, complete with fact sheets and links back to detailed explanations. Think of it as a well labeled map in a landscape that’s been expanding at breakneck speed.
What I love is how it shows patterns.
You can see how **DeepSeek V3’s template** influenced a whole wave of open MoE models. How Meta’s large MoE follows that playbook but sticks to a more conventional attention stack. How Qwen variants either stay close to the DeepSeek recipe or subtly tweak things like shared experts. And then there are dense models like OLMo 3 or Qwen3 that barely change the core decoder recipe at all, which is fascinating in its own way.
Some models lean harder into **local attention** like Gemma. Others experiment with removing positional encodings in certain layers. There’s even a trillion parameter Moonshot model that essentially scales the DeepSeek idea upward. Same skeleton, just… bigger.
If you’re building, researching, or just trying to understand where all this is heading, this gallery helps you see the evolution. Dense versus sparse. Classic multi head attention versus hybrids like DeltaNet. Full attention versus alternating local and global setups.
And once you see these design choices side by side, something clicks.
You realize progress isn’t random. It’s iterative. Layer by layer. Choice by choice.
Honestly, having everything in one place makes it easier to spot where the next architectural shift might come from. And if you care about where AI models are headed, that perspective is incredibly valuable.



Kommentar abschicken