Is Google Gemini really powerful and capable to take on GPT-4?

Spread the love

Mumbai: Google introduced Gemini a generative AI model, which the search giant has claimed Gemini to be its largest and most capable AI model. Google’s large language model (LLM) Gemini comes in three different sizes — Ultra, Pro and Nano.

Gemini Ultra is the largest and most capable model for highly complex tasks. Pro is the best model for scaling across a wide range of tasks and Nano is the most efficient model for on-device tasks.

Gemini’s announcement comes nearly eight years after Google’s journey as the first AI company. Google acquired the UK-based artificial intelligence research laboratory DeepMind Technologies in 2014, which later became the company’s subsidiary as Google DeepMind.

“Now, we’re taking the next step on our journey with Gemini, our most capable and general model yet, with state-of-the-art performance across many leading benchmarks. Our first version, Gemini 1.0, is optimized for different sizes: Ultra, Pro and Nano. These are the first models of the Gemini era and the first realization of the vision we had when we formed Google DeepMind earlier this year,” wrote Google and Alphabet CEO Sundar Pichai in a blog post.

“This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company. I’m genuinely excited for what’s ahead, and for the opportunities Gemini will unlock for people everywhere,” added Pichai.

“For a long time, we’ve wanted to build a new generation of AI models, inspired by the way people understand and interact with the world. AI that feels less like a smart piece of software and more like something useful and intuitive — an expert helper or assistant. Today, we’re a step closer to this vision as we introduce Gemini, the most capable and general model we’ve ever built,” wrote Demis Hassabis, CEO and Co-Founder of Google DeepMind in a blog post.

“Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research. It was built from the ground up to be multimodal, which means it can generalise and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video,” added Hassabis.

Since Google announced Gemini last week, it has been compared with Microsoft-backed OpenAI’s ChatGPT and its claimed performances are being scrutinised. Google Gemini is leveraging the CoT (Chain of Thought) methodology for its large language (LLM) model.

Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development, according to Hassabis.

“With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities, “ claimed Hassabis in his blog post.

“Our new benchmark approach to MMLU enables Gemini to use its reasoning capabilities to think more carefully before answering difficult questions, leading to significant improvements over just using its first impression, he added.

(source – Google blog)

However, social media platform X users have been critical and voicing their opinions and doubts about Google Gemini’s capabilities and performance claims against GPT-4.

“For me, using uncertainty routed chain of thought guided evaluation to claim better MMLU score was kinda incomplete. It was clear as shown in the paper that on greedy and on CoT32 analysis, GPT4 beats Gemini. Uncertainty routed CoT, a new and probably could be a better way to judge the performance but there was very little explanation in the paper as to “why this technique doesn’t benefit GPT4,” wrote Saurabh Kumar, Co-Founder of Adora.

“It is only when Gemini Ultra uses “CoT@32” – which is likely something like running 32 parallel Chain-of-Thought chains, selecting the best answers among them – that Gemini Ultra surpasses GPT-4.This is disappointing, as Gemini Ultra, a newer model, should win on 5 shot itself,” wrote Harry Surden, Professor of Law, University of Colorado Law School.

“Gemini does win against GPT-4 with CoT@32 but not on 5-shot. This likely indicates that Gemini is inherently more powerful but somehow without proper prompting that capability doesn’t get exposed. Maybe GPT-4 still has better IFT? Still, this is a super exciting milestone!,” wrote Shital Shah, Microsoft’s Principal Research Engineer.

“Digging deeper into the MMLU Gemini Beat – Gemini doesn’t really Beat GPT-4 On This Key Benchmark. The Gemini MMLU beat is specifically at CoT@32. GPT-4 still beats Gemini for the standard 5-shot – 86.4% vs. 83.7%….,” wrote Bindu Reddy, CEO of Abacus AI and shared a screenshot questioning Google Gemini’s performance.

“The influencers express concerns about the practicality of CoT@32 in real-world scenarios and emphasize GPT-4’s continued superiority. They have also highlighted the importance of the MMLU benchmark and advocated for more transparent evaluations through API endpoints or model weights rather than relying on blog posts,” said Smitarani Tripathy, GlobalData’s Social Media Analyst.

“Influencers are sceptical regarding Gemini’s capabilities and are urging for practical assessments before forming definitive opinions. They have emphasized a preference for direct 5-shot vs. 5-shot comparisons to ensure a more straightforward evaluation,” added Tripathy.