Human Benchmark Testing

AI model achieves human level performance on general intelligence test

Dec. 24 (UPI) --A new artificial intelligence (AI) model has just achieved human-level results on a test designed to measure "general intelligence". On December 20, OpenAI's o3 system scored 85% on ...

20don MSN

A new AI benchmark tests whether chatbots protect human wellbeing

Most AI benchmarks measure intelligence and instruction-following rather than psychological safety. Humane Bench evaluates models based on core principles of human flourishing, prioritizing wellbeing, ...

10d

Gemini 3 Pro scores 69% trust in blinded testing up from 16% for Gemini 2.5: The case for evaluating AI on real-world trust, not academic benchmarks

A new vendor-neutral evaluation from Prolific, however, puts Gemini 3 at the top of the leaderboard. This isn't on a set of ...

Transformer on MSN

Hide inaccessible results

AI model achieves human level performance on general intelligence test

A new AI benchmark tests whether chatbots protect human wellbeing

Gemini 3 Pro scores 69% trust in blinded testing up from 16% for Gemini 2.5: The case for evaluating AI on real-world trust, not academic benchmarks

When high scores don’t mean high intelligence: how to build better benchmarks

OpenAI Claims Its New Model Reached Human Level on a Test for ‘General Intelligence.’ What Does That Mean?

OpenAI's simulated reasoning AI models matched human levels on ARC-AGI benchmark — Here's what that means for you

An AI system has reached human level on a test for ‘general intelligence’. Here’s what that means

Can AI really compete with human data scientists? OpenAI’s new benchmark puts it to the test