Deep Dives
We Trained mRNA Language Models Across 25 Species for $165—Here’s How
OpenMed built an end-to-end protein-to-mRNA pipeline covering structure prediction, sequence design, and codon optimization. CodonRoBERTa-large-v2...
QIMMA: The Arabic LLM Leaderboard That Actually Checks Its Homework
Most Arabic LLM benchmarks have quality issues nobody talks about. QIMMA fixes that with a...
VAKRA: A Brutally Honest Look at Where AI Agents Actually Fail
IBM's VAKRA benchmark exposes how poorly current AI agents handle real-world tool use across 8,000+...
Google’s AMIE Diagnostic AI Took Its First Real-World Clinical Test. Here’s What Happened.
Google Research and Beth Israel Deaconess tested AMIE, a conversational diagnostic AI, with real patients...
TurboQuant: Google’s New Trick for Squeezing AI Models Without Breaking Them
Google Research's TurboQuant, QJL, and PolarQuant algorithms promise extreme vector compression for LLMs with zero...
Google’s AI takes on the NHS breast screening bottleneck — two new studies, real results
Google Research just dropped two companion studies in Nature Cancer on using AI in NHS...
Testing LLMs on Superconductivity Research Questions
Google researchers tested six LLMs on expert-level high-temperature superconductivity questions. NotebookLM and a custom system...
ConvApparel: Why Your AI User Simulator Is Probably Lying to You
Google's ConvApparel dataset exposes how LLM-based user simulators fail to mimic real humans—they're too patient,...
Google’s New Framework Puts LLM Personality Tests on the Couch
Google Research introduces a framework that adapts psychological questionnaires into situational judgment tests to measure...
How many raters do you actually need for AI benchmarks? Google has answers
Google Research challenges the standard 1-5 rater approach in AI benchmarks, showing that depth over...
ReasoningBank: Giving AI Agents a Memory That Actually Learns from Failure
Google's ReasoningBank framework lets agents distill generalizable reasoning strategies from both successes and failures, moving...
Simula: A Smarter Way to Generate Synthetic Data by Designing Datasets, Not Just Samples
Google Research's Simula framework treats synthetic data generation as mechanism design, using reasoning to build...