Kimi K2 is INSANE... (Open-Source is BACK!)
- This might be the next Deepseek moment.
- A Chinese company just released an open source model called Kimmy K2.
- The model is taking the industry by storm with its smooth training loss curve.
- It is based on a trillion tokens and boasts incredible performance.
- Kimmy K2 is designed for coding, reasoning, and autonomous problem solving.
This might be the next Deepseek moment. A Chinese company just released another open source model called Kimmy K2 and it is taking the industry by storm. The reason is this graph right here, this is the training loss curve and people are so surprised by how smooth it is. Typically, you get all of these spikes in here which cause issues that you need to correct, but for Kimmy it was almost flawless.
And the especially cool thing, it is based on a trillion tokens. That is a massive model! So they came up with this new approach that they implemented and it worked very similar to how Deepseek was insanely efficient, more efficient than we had really ever seen before. But that's great. It trained really well.
What does that actually mean? Well, first of all, it is a massive open source model that performs incredibly well. Kimik K2 is a state-of-the-art mixture of experts language model with 32 billion activated parameters and 1 trillion total parameters.
And here's the key. It's trained with the Muon optimizer and it achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agenta capabilities. So it is incredibly good at coding, incredibly good at multi-agent, and incredibly good at tool calling. Pre-trained on a trillion tokens with zero training instability.
They used this Muon Clip optimizer at an unprecedented scale, developing novel optimization techniques to resolve instabilities while scaling up. This model is specifically designed for tool use, reasoning, and autonomous problem solving.
According to Crystal, who is on the Kimmy Moonshot team, Kimmy supports up to 2 million tokens in the context window. She said the entire lab, the AI lab is only 200 people. So the Kimmy website doesn't support it directly yet, but they have tested it.
There was a little bit of quality loss, but it is very possible. So we have two versions: the base and the instruct, but you know what we don't have? The reasoning version. Now that it's open source, everybody has their hands on it. Lots of reasoning versions of Kimmy K2 are coming soon.
Let's look at the benchmarks; they are stunning. This is a frontier-level model. Here is SWI bench verified Kimmy K2 instruct beating Deepseek, beating Quen, beating GPT4.1 and coming in right behind Cloud4 Opus, really what is known as the best coding model on the planet.
SWED Bench Multilingual once again beating all of those other models, coming right behind Cloud 4 Sonnet. Live Codebench actually beats Cloud 4 Opus, Gemini 2.5 Flash coming in at 53.7 OJ bench beating all the other models on the list.
Here's Amy20025 for math coming in number one above Cloud4 Opus and Gemini 2.5 Flash, and again without a reasoning version. GPQA Diamond coming in at number 175.1 above Cloud4 Opus and Gemini 2.5 Flash. I am so excited to test this model and it is completely open source.
Open weights. The training process was open source. They are going to be releasing a research paper on it soon. It is amazing. If you want the full set of benchmarks, go to their Hugging Face card. They have everything here: Ader, Polyglot, ACE Bench, Amy 20, 2425 Math 500, Polymath, GPQA Diamond, Humanity's Last Exam, MMLU Pro and so much more.
And there are plenty of inference providers that are already loading this up and serving it. If you want to get the most out of Kimmy K2 and other models, you need to optimize your prompt engineering. You could do that with Humanity's Last Prompt Engineering guide created by myself and my team.
It is completely free and it teaches you all the best prompts, engineering tips, and tricks. Link in the description below, and you can also get inference through Kimmy directly. It is $0.15 per million input tokens, $0.60 without a cache, and $2.50 per output tokens.
The weights, the technical blog, and the GitHub page are all open and available right now, and if you just want to try it without the API, try Kimmy AI immediately. So a few words from industry experts and AI leaders.
Here's Sebastian Rashka: Kimmy K2 is basically Deepseek V3 but with fewer heads and more experts. And again, I just cannot wait till they actually give it chain of thought and reasoning ability.
Ychn Jin says: Holy S***, Kimmy K2 was pre-trained on 15.5 trillion tokens using Muon Clip. With training spikes, they have officially scaled to 1 trillion parameter LLM level. Many doubted it could scale but here we are.
DD says: China just dropped the best open source model for coding and agenic tool use. Kimik K2 scores an insane 65.8 on sweebench verified it is as cheap as Gemini Flash at $0.6 per million input and $2.5 per million output.
And he gives an example: it one-shots this data analysis task in Python and creates a website for a few cents. Look at that unbelievable hard Maru. Every ML engineer's dream loss curve right there. It just goes down. No spikes, no interruptions.
Here's another prompt. Here's an example of it. So the XAI headquarters right before releasing their biggest model Grok 4: busy area, many people working in the large office. So this is Grok 3, this is Grok 4 and this is Kimmy K2. It looks amazing.
And Kimmy K2 is now available on OpenRouter. So if you want to give it a try and that API, go ahead. It's ready.
Ethan Moik, Professor Wharton: Kimmy K2 seems to be a very good and giant and odd open weights model that may be the new leader in open LLMs. It is not beating the frontier closed models on my weird test, but it doesn't have a reasoner yet.
And look at this. Ani Hunnon got Kimmy K2 1 trillion model 4bit quant to run on 2512 gigabyte N3 ultras with MLX LM. So there we go running and pretty quickly I must say.
Here's Cedric saying Kimmy K2 1 shotted Minecraft for the web. That took me four days and six attempts to get Gemini 2.5 Pro. Wow.
And of course planning the Liberator once again Jailbrok. So nothing is safe from Pliny.
So that's it for today. If you want to see me test this model thoroughly, let me know in the comments. If you enjoyed this video, please consider giving a like and subscribe, and I'll see you in the next one.