Deepmind founder says his latest LLM Inflection-2 is the second best in the world


Inflection claims that its new language model, Inflection-2, outperforms direct competitors such as Google PaLM-2 and Claude 2, and is second only to GPT-4.

The new model is said to be significantly more powerful than its predecessor, Inflection-1, and, according to the startup, demonstrates improved factual knowledge, better style control, and significantly improved reasoning.

Inflection-1 was released in July. It was roughly on par with GPT-3.5 and PaLM-540B. Inflection-2 should now catch up with GPT-4, the company claims.

Inflection-2 Outperforms Claude 2 and PaLM 2 Large on Benchmarks

Inflection-2 was trained on 5,000 NVIDIA H100 GPUs with a mixing accuracy of fp8 for about 10²⁵ FLOPs. According to Inflection, this puts it in the same training class as Google’s flagship PaLM 2 Large, which will soon be replaced by Gemini.



However, Inflection-2 outperforms PaLM 2 Large on most standard AI performance tests, including the widely used MMLU benchmark, which covers a broad range of language-related tasks from high school to professional level, and other language tests such as TriviaQA, HellaSwag, and GSM8k.

Comparison of Inflection-1, Google’s PaLM 2-Large and Inflection-2 for a number of commonly used academic benchmarks. (N-values in parentheses) | Image: Inflection

Compared to GPT-4, Inflection-2 scored 89.0 on the HellaSwag 10-shot, approaching GPT-4’s score of 95.3. In addition, Inflection says its latest LLM outperforms Claude 2 with chain-of-thought reasoning, i.e., an already optimized prompting process.

Final results of the MMLU language comprehension test. As always, benchmark results and real-world use may differ. | Image: Inflection

Inflection-2 falls well short of GPT-4 for coding and math tasks, but performs better than Metas Llama 2, for example. Inflection-2 is not optimized for coding, Inflection writes, so there is room for improvement in future models.

Inflection-2 in coding and math benchmarks compared to the competition. | Image: Inflection

Pi chatbot will soon run on Inflection-2

Inflection-2 will soon run the company’s Pi chatbot. The infrastructure is being upgraded from Nvidia A100 to H100 GPUs, which should speed up inference, i.e., the processing of input by the AI model. Despite its multiple size (175 billion parameters), Inflection-2 should be cheaper and faster than Inflection-1.

Inflection is already planning to train even larger models on the full capacity of the 22,000-GPU cluster. The next AI model will be about ten times larger and will be released in about six months, the company says. You can test Pi at


