AI & PersonhoodJun 18, 20268 min read

OpenAI's AI Science Reality Check: One Benchmark Humbles the Models, One Lab Loop Proves the Promise

On 17 June 2026 OpenAI shipped LifeSciBench — a 750-task benchmark where the best AI model passes only one research task in three — and, the same day, a near-autonomous AI chemist that drove a real, human-verified wet-lab discovery. The two results are one story about capability and honest measurement.

By Humphrey Theodore K. Ng'ambi

All writing

Responses (0)

No responses yet. Be the first to share your thoughts.

More on AI & Personhood

AI & Personhood

The AI Forecaster Who Walked Away From $2 Million Says We Are Creating a New Species

In July 2026 The Diary of a CEO published two hours with Daniel Kokotajlo — the AI forecaster who refused to trade $2 million for silence when he left OpenAI. His message: we may be creating a new species, and there is a 70% chance the transition goes horribly wrong. I take him seriously. I also refuse despair. Here is the pro-AI, pro-dignity middle ground.

21 min read · Jul 13, 2026

AI & Personhood

AI Agents Recreated a Classic Creativity Test and Stalled

On 10 July 2026 Sakana AI published a GECCO best-paper-nominated study with MIT and NYU replicating Picbreeder — the legendary collaborative evolution experiment — using vision-language agents. The agents kept circling back to familiar images and never made the conceptual leaps human players made. What's missing has a name: open-endedness, and the study measures the gap.

Thinking delivered, twice a month.

Join the newsletter for essays on emergence, systems, and the human future.

OpenAI's AI Science Reality Check: One Benchmark Humbles the Models, One Lab Loop Proves the Promise

Responses (0)

More on AI & Personhood

The AI Forecaster Who Walked Away From $2 Million Says We Are Creating a New Species

AI Agents Recreated a Classic Creativity Test and Stalled

Thinking delivered, twice a month.

LifeSciBench: a benchmark built to be hard

Why the format is the point

The Molecule.one AI chemist: capability, measured

Checked by human hands

Capability and humility, read together

Why a dignity-first reading prefers the rubric

The EI test

The reality check, in one line

Frequently Asked Questions

What is the OpenAI LifeSciBench AI benchmark?

How well did AI models score on LifeSciBench?

What did the Molecule.one AI chemist actually discover?

Were the AI chemist results verified by humans?

Can AI do scientific research on its own now?

Sources and Further Reading

Stay in the Conversation

AI Agents Recreated a Classic Creativity Test and Stalled

Meta Builds Its Own AI Chip and Doubles Down on Compute

Meta Pulls AI Likeness Feature After Consent Backlash