BiomedArena.AI Welcomes New Frontier AI Models for Biomedical Research!
We're excited to announce a major expansion of BiomedArena.AI, the world’s first open, live platform for testing LLMs in biomedical research tasks. In response to community demand, we’ve onboarded 12 of the most advanced LLMs available today. These models deliver state-of-the-art reasoning and retrieval capabilities.
🧠Newly Added Models
OpenAI
GPT-5, GPT-5-mini
o3, o3-mini
Anthropic
Claude-4.1-Opus
Claude-4.0-Sonnet
Google DeepMind
Gemini-2.5-Pro
Gemini-2.5-Flash
Perplexity
Sonar-Reasoning-Pro
Sonar-Pro
Sonar-Reasoning
Sonar-Huge
💭 Enable Thinking Mode for Deeper Reasoning
Users can now unlock Thinking Mode to access reasoning-enhanced models specifically designed for multi-step problem solving, literature synthesis, and experimental planning.
🔍 Try them on your most complex biomedical questions, from GWAS interpretation to drug repurposing queries, and see how the models compare!
⚠️ Note: Thinking Mode must be enabled. It may take 5-10x longer to generate responses, but offers more robust answers for complex biomedical tasks.
📊 Biomedical AI Benchmarking with CARDBiomedBench
Alongside the model expansion, we’ve updated our Safety vs. Accuracy performance map, based on evaluations from CARDBiomedBench, our curated benchmark spanning over 68,000 Q/A pairs across genetics, pharmacology, and clinical reasoning.
This visualization shows each model’s trade-off between:
Commitment to Safety (y-axis): Does the model abstain from risky or uncertain answers?
Ability to Respond Accurately (x-axis): Can the model produce correct answers grounded in real biomedical data?
🔴 Key takeaway: No current model achieves both high accuracy and strong safety. Some models (like Claude-4) are cautious but often imprecise. Others (like GPT-4o) are more assertive but risk hallucinations. We need both.
💡 What’s Next: Toward Safer, Smarter Biomedical LLMs
We’re actively building toward smarter, safer biomedical AI by:
Add more specialized biomedical models via our collaboration with LMArena.
Supporting structured data, CRISPR applications, and domain-specific fine-tuning.
Incorporating user experiments and uploaded datasets to enrich model adaptation and evaluation.
Enabling long-context and retrieval-augmented agents for literature-intensive tasks.
Stay tuned — the next frontier in biomedical AI is already here, and you’re invited to shape it.
🚀 Try it now
🧬 Go to BiomedArena.AI
💥 Explore the Benchmark Paper (Preprint)
📢 Follow us on X