We're excited to announce a major expansion of BiomedArena.AI, the world’s first open, live platform for testing LLMs in biomedical research tasks. In response to community demand, we’ve onboarded 12 of the most advanced LLMs available today. These models deliver state-of-the-art reasoning and retrieval capabilities.

🧠Newly Added Models

OpenAI

GPT-5, GPT-5-mini
o3, o3-mini

Anthropic

Claude-4.1-Opus
Claude-4.0-Sonnet

Google DeepMind

Gemini-2.5-Pro
Gemini-2.5-Flash

Perplexity

Sonar-Reasoning-Pro
Sonar-Pro
Sonar-Reasoning
Sonar-Huge

💭 Enable Thinking Mode for Deeper Reasoning

Users can now unlock Thinking Mode to access reasoning-enhanced models specifically designed for multi-step problem solving, literature synthesis, and experimental planning.

🔍 Try them on your most complex biomedical questions, from GWAS interpretation to drug repurposing queries, and see how the models compare!

⚠️ Note: Thinking Mode must be enabled. It may take 5-10x longer to generate responses, but offers more robust answers for complex biomedical tasks.

📊 Biomedical AI Benchmarking with CARDBiomedBench

Alongside the model expansion, we’ve updated our Safety vs. Accuracy performance map, based on evaluations from CARDBiomedBench, our curated benchmark spanning over 68,000 Q/A pairs across genetics, pharmacology, and clinical reasoning.

This visualization shows each model’s trade-off between:

Commitment to Safety (y-axis): Does the model abstain from risky or uncertain answers?
Ability to Respond Accurately (x-axis): Can the model produce correct answers grounded in real biomedical data?

🔴 Key takeaway: No current model achieves both high accuracy and strong safety. Some models (like Claude-4) are cautious but often imprecise. Others (like GPT-4o) are more assertive but risk hallucinations. We need both.

💡 What’s Next: Toward Safer, Smarter Biomedical LLMs

We’re actively building toward smarter, safer biomedical AI by:

Add more specialized biomedical models via our collaboration with LMArena.
Supporting structured data, CRISPR applications, and domain-specific fine-tuning.
Incorporating user experiments and uploaded datasets to enrich model adaptation and evaluation.
Enabling long-context and retrieval-augmented agents for literature-intensive tasks.

Stay tuned — the next frontier in biomedical AI is already here, and you’re invited to shape it.

🚀 Try it now

🧬 Go to BiomedArena.AI
💥 Explore the Benchmark Paper (Preprint)
📢 Follow us on X

BiomedArena.AI Welcomes New Frontier AI Models for Biomedical Research!

🚀 Try it now

🧬 BiomedArena.AI Expands Biomedical Knowledge Agents for Smarter, Context-Aware Insights

Introducing BiomedArena.AI: Evaluating LLMs for Biomedical Discovery