Inflection AI launches new model for Pi chatbot, nearly matches GPT-4 

Join leaders in Boston on March 27 for an exclusive night of networking, insights, and conversation. Request an invite here.

Today, Inflection AI, the Palo Alto-based startup founded by DeepMind co-founder Mustafa Suleyman and LinkedIn co-founder Reid Hoffman, announced a new foundation model called Inflection-2.5.

Built on the work done so far, Inflection-2.5 outperforms the company’s original Inflection-1 significantly and nearly matches OpenAI’s GPT-4 model, especially across STEM subjects. It now powers the company’s Pi assistant, designed to take on ChatGPT and Gemini, and can be tested via mobile and web.

The move marks the latest effort in the rapidly evolving AI space to take on the dominance of OpenAI, which continues to clarify its approach to developing AI for humanity. Just recently, Anthropic released Claude 3 Opus, which became the first model to beat GPT-4. 

Performs better but still lags behind GPT-4

Since its inception, Inflection AI has been building an “empathetic, useful and safe” AI that acts more personally and colloquially than other models, including the GPT series. The company used unique empathetic fine-tuning to give the model behind Pi a signature personality and an exceptional EQ (emotional quotient).

VB Event

The AI Impact Tour – Boston

We’re excited for the next stop on the AI Impact Tour in Boston on March 27th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on best practices for data integrity in 2024 and beyond. Space is limited, so request an invite today.

Request an invite

With the introduction of the upgraded Inflection 2.5, the startup, which raised a $1.3 billion round in June 2023, is building up the IQ aspect, covering areas like physics and mathematics. In a blog post published today, the company said users talking with Pi, underpinned by Inflection 2.5, can discuss a range of topics, right from discussing a hobby to coding, checking answers to a biology paper or drafting a business plan.

In terms of performance in benchmarks, the upgraded model shows substantial improvements over Inflection 1 across the board and closes on GPT-4 – although it still lags. 

For instance, on the MMLU benchmark, measuring performance across tasks ranging from high school to professional-level difficulty, Inflection-2.5 scored 85.5, sitting just behind GPT-4’s 87.3. Similarly, in STEM exams, the model performed nearly as well as the OpenAI model, scoring 63 in the Hungarian Math exam (vs 68 of GPT4) and 85th percentile in Physics GRE, against GPT-4’s 97th percentile. 

In the GSM8K benchmark, consisting of 8.5K high-quality grade school math problems, the Inflection model scored 86.3, against GPT-4’s 92. In 0-shot HumanEval, designed to evaluate the code generation capabilities, it scored 73.8 vs GPT4’s 79.3.

While the performance is not better than GPT 4, Inflection AI did point out that this “94% GPT-4 level performance” has been achieved with much more efficient training than that done for the OpenAI large language model (LLM).

According to the company, Inflection-2.5 took only 40% of the training FLOPs (compute) of GPT-4 to get these results.

In addition, just like the GPT-4, the model also incorporates real-time web search capabilities, giving users the most up-to-date information on current events. This will be a major upgrade, given the company has positioned Pi assistant as an AI for everyone. However, it is worth noting that the quality of results with web retrieval might be a tad different because no benchmark uses that.

How to access Inflection-2.5?

Inflection AI has already rolled out the new model for its Pi chatbot. This means anyone using the assistant can start testing its capabilities. 

The company has not shared how users are benefitting from the upgraded model but did say that the change has made a significant impact on user sentiment, engagement, and retention, accelerating the chatbot’s organic user growth.

Currently, the Pi chatbot, which is available on Android, iOS, web and as a desktop application, sees one million daily and six million monthly active users. More than four billion messages have been exchanged with the AI, with an average conversation lasting 33 minutes.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Source link

About The Author

Scroll to Top