TechyMag.com - is an online magazine where you can find news and updates on modern technologies


Back
Software

Claude 3 artificial intelligence model outperforms GPT-4 for the first time at Chatbot Arena

Claude 3 artificial intelligence model outperforms GPT-4 for the first time at Chatbot Arena
0 0 1 0

The large language model (LLM) Claude 3 Opus from Anthropic has surpassed GPT-4 from OpenAI for the first time on Chatbot Arena.

"The king is dead," wrote software developer Nick Dobos in a post on X (Twitter), comparing GPT-4 Turbo and Claude 3 Opus.

Chatbot Arena is a crowdsourcing open platform for evaluating large language models. To compile the ranking, a large number of human reviews of models' performance are evaluated using the Elo rating system. The test works as follows: people enter a query and select the best answer from several options from different models. Based on thousands of user tests, a leaderboard is formed and ranked.

The Chatbot Arena leaderboard was launched on May 3, 2023, and GPT-4 was included in the ranking on May 10th. Since then, various variations of GPT-4 have consistently topped the ranking. Until now. Therefore, the appearance of a new leader in this field attracts attention. Moreover, one of Anthropic's smaller models, Haiku, also drew attention with its performance on the leaderboard.

"For the first time, the best available models - Opus for complex tasks, Haiku for efficiency and cost-effectiveness - are available from a provider other than OpenAI," said independent AI researcher Simon Wilson. "It's reassuring - we all benefit from diversity in leading providers in this field. But GPT-4 has been around for over a year, and it took this year for someone to catch up to it."

Following Claude 3 Opus and two versions of GPT-4 in the ranking is the model Bard (Gemini Pro) from Google. However, while the difference in Elo points between the top three positions is insignificant (2-3 points), Bard lags behind third place by 45 points. All other competitors scored less than 1200 points.

Source: arstechnica

Thanks, your opinion accepted.

Comments (0)

There are no comments for now

Leave a Comment:

To be able to leave a comment - you have to authorize on our website

Related Posts