Polaris Safety
Constellation Architecture

We use a novel 4.1T+ parameter constellation architecture where specialized support models increase medical accuracy and safety.

circular chart

Read the Original Seminal Paper About Polaris

We developed Polaris, the first safety-focused LLM constellation for real-time patient-AI healthcare conversations. Unlike prior LLM works in healthcare focusing on tasks like question answering, our work specifically focuses on long multi-turn voice conversations.

Other Research Papers

Healthcare conversational AI agents shouldn’t be optimized only for clean benchmark accuracy in production-first regime; they must be optimized for the lived reality of patient conversations, where audio is imperfect, intent is indirect, language shifts mid-call, and compliance hinges on how guidance is delivered.
Perfecting Human–AI Interaction at Clinical Scale
Supportive conversation depends on skills that go beyond language fluency—reading emotions, adjusting tone, and navigating moments of resistance, frustration, or distress. Despite rapid progress in language models, we still lack a clear way to understand how their abilities in these interpersonal domains compare to those of humans.
HEART: A Unified Benchmark for Assessing Humans and LLMs in Emotional Support Dialogue

The deployment of artificial intelligence (AI) in healthcare necessitates robust safety validation frameworks, particularly for systems directly interacting with patients. While theoretical frameworks exist, there remains a critical gap between abstract principles and practical implementation. Traditional LLM benchmarking approaches provide very limited output coverage and are insufficient for healthcare applications requiring high safety standards.

Real-World Evaluation of Large Language Models in Healthcare (RWE-LLM): A New Realm of AI Safety & Validation

Colorectal cancer (CRC) screening rates remain disproportionately low among Hispanic and Latino populations compared to non-Hispanic White populations. While artificial intelligence (AI) shows promise in health care delivery, concerns exist that AI-based interventions may disadvantage non–English-speaking populations due to biases in development and deployment.

Using a Multilingual AI Care Agent to Reduce Disparities in Colorectal Cancer Screening

Today’s nursing practice is suffering from challenges related to workforce shortages, increasing patient complexity, and administrative burdens- leading to one of highest burnout among all professions. The integration of artificial intelligence (AI) in healthcare can support a transformative development in modern nursing practice, addressing many of these challenges.

Advancing nursing practice through artificial intelligence: Unlocking its transformative impact

Healthcare professionals face an urgent need for AI literacy as artificial intelligence technologies rapidly transform clinical practice, yet nursing-specific educational resources remain scarce. The objective of this study was to evaluate the effectiveness of an innovative micro-learning AI education program developed through an academic-industry partnership.

Transforming healthcare AI education through micro-learning: A novel partnership model for nursing workforce development.

Retrieval-augmented generation (RAG) is a powerful technique that enhances downstream task execution by retrieving additional information, such as knowledge, skills, and tools from external sources. Graph, by its intrinsic “nodes connected by edges” nature, encodes massive heterogeneous and relational information, making it a golden resource for RAG in tremendous real-world applications.
Retrieval-Augmented Generation with Graphs (GraphRAG)
Large language model (LLM)-based agents increasingly rely on tool use to complete real-world tasks. While existing works evaluate the LLMs’ tool use capability, they largely focus on the final answers yet overlook the detailed tool usage trajectory, i.e., whether tools are selected, parameterized, and ordered correctly. We introduce TRAJECT-Bench, a trajectory-aware benchmark to comprehensively evaluate LLMs’ tool use capability through diverse tasks with fine-grained evaluation metrics.
TRAJECT-Bench:A Trajectory-Aware Benchmark for Evaluating Agentic Tool Use

Podcasts

With 1.8 million patient calls completed* and an 8.95/10 satisfaction rating, Hippocratic’s safety-focused LLM frees up more time for clinicians to provide care. Learn about our constellation architecture, which uses 22 models for safety validation, and how the AI agent app store enables clinicians to scale key aspects of their expertise.

*At time of podcast. We have over 7 million clinical calls currently.

Exploring the abundance that AI can bring to the healthcare sector, Munjal discusses Hippocratic AI’s ability to deliver high EQ agents with infinite time and memory, resulting in improved patient outcomes and experiences.

Videos

Watch the interview where Munjal talks with Vedant Agrawal of Premji Invest about how our team used this constellation of models to create a modular, redundant, and yet flexible architecture to cater to patient safety at the highest level.

“As part of Jensen’s keynote speech for GTC, Hippocratic AI was a featured company. Watch this short clip of Jensen speaking about our company and the work that we are doing. This was a proud milestone for us!”

If you ever wanted to know how we partnered with our investors to create Hippocratic AI, watch this long form video with Hemant Taneja of General Catalyst that does a great job capturing how it just takes one conversation to propel an idea to something truly transformative in how we provide care. It’s a great nod to how we need to co-design and collaborate with those in the ecosystem to make this work and how we need great partners like General Catalyst a16z, to help us lay the foundation.

Hippocratic AI cofounder and CEO Munjal Shah joins a16z Bio + Health general partner Julie Yoo to explore the incredible potential of generative AI in transforming healthcare delivery. Together, they discuss the emerging “voice renaissance” in healthcare, the nuances of designing safe and empathetic AI interactions, and how Hippocratic AI is addressing safety concerns with innovations like multi-LLM architecture and rigorous training with thousands of nurses.