Are Google Gemini and OpenAI’s GPT-4 peak LLM?


With Gemini, Google is the first company to offer a more powerful LLM than OpenAI’s GPT-4, if current benchmarks are to be believed.

However, the fact that the Ultra version of Gemini beats GPT-4 in 30 out of 32 benchmarks is not the big news about Google’s LLM release. The big news is that Gemini barely beats GPT-4.

Even the more compact Gemini Pro variant is only on par with OpenAI’s year-old GPT-3.5 model. This begs the question: Couldn’t Google do better – or have LLMs already reached their limits? Bill Gates, who’s still talking to Microsoft and OpenAI, thinks the latter.

Google CEO Pichai still believes in LLM scaling and sees “a lot of headroom”

In an interview with MIT, Google CEO Sundar Pichai says, “The scaling laws are still going to work.” Pichai expects AI models to become more powerful and efficient as they grow in size and complexity. Google still sees “a lot of headroom” for scaling language models, according to Pichai.



To make this progress measurable, new benchmarks are needed. Pichai points to the widely measured MMLU (massive multi-task language understanding) benchmark, in which Gemini breaks the 90 percent barrier for the first time, outperforming humans (89.8 percent). Just two years ago, the MMLU standard was 30 to 40 percent, Pichai says.

However, if you look at the MMLU numbers in Gemini’s technical report, which Google does not put on the big stage, you can see that Google performs better than GPT-4 only for one of the two prompting methods, the more complex one (CoT@32). For the prompting method reported by OpenAI (5-shot, five examples), Gemini Ultra performs worse than GPT-4.

Image: Google Deepmind

This shows how close GPT-4 and Gemini are in many areas. Even older language models like PaLM 2 are not completely left behind in the MMLU benchmark.

According to Pichai, many current benchmarks have already reached their limits. This also affects the perception of progress. “So it may not seem like much, but it’s making progress,” Pichai said.

There is more room for improvement in other areas, such as multimodal processing and task handling. This includes, for example, the ability to respond to an image with appropriate text. This is where Gemini makes bigger leaps, but it’s still not groundbreaking.


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top