Papers
Topics
Authors
Recent
Search
2000 character limit reached

The Anatomy of a Personal Health Agent

Published 27 Aug 2025 in cs.AI, cs.HC, and cs.MA | (2508.20148v1)

Abstract: Health is a fundamental pillar of human wellness, and the rapid advancements in LLMs have driven the development of a new generation of health agents. However, the application of health agents to fulfill the diverse needs of individuals in daily non-clinical settings is underexplored. In this work, we aim to build a comprehensive personal health agent that is able to reason about multimodal data from everyday consumer wellness devices and common personal health records, and provide personalized health recommendations. To understand end-users' needs when interacting with such an assistant, we conducted an in-depth analysis of web search and health forum queries, alongside qualitative insights from users and health experts gathered through a user-centered design process. Based on these findings, we identified three major categories of consumer health needs, each of which is supported by a specialist sub-agent: (1) a data science agent that analyzes personal time-series wearable and health record data, (2) a health domain expert agent that integrates users' health and contextual data to generate accurate, personalized insights, and (3) a health coach agent that synthesizes data insights, guiding users using a specified psychological strategy and tracking users' progress. Furthermore, we propose and develop the Personal Health Agent (PHA), a multi-agent framework that enables dynamic, personalized interactions to address individual health needs. To evaluate each sub-agent and the multi-agent system, we conducted automated and human evaluations across 10 benchmark tasks, involving more than 7,000 annotations and 1,100 hours of effort from health experts and end-users. Our work represents the most comprehensive evaluation of a health agent to date and establishes a strong foundation towards the futuristic vision of a personal health agent accessible to everyone.

Summary

  • The paper introduces a modular multi-agent framework that integrates data science, domain expertise, and health coaching to deliver personalized health recommendations.
  • The system employs rigorous evaluations using both automated techniques and expert assessments, demonstrating marked improvements in statistical analysis and differential diagnosis.
  • The modular and orchestrated approach enhances task decomposition and iterative refinement, setting a new standard for personalized, trustworthy health AI.

The Anatomy of a Personal Health Agent: A Modular Multi-Agent Framework for Personalized Health AI

Introduction

This paper presents a comprehensive framework for a Personal Health Agent (PHA) that leverages LLMs to provide personalized health recommendations by reasoning over multimodal data from consumer wearables and health records. The work addresses the underexplored challenge of supporting diverse, non-clinical health needs in daily life, moving beyond prior LLM-based health assistants that are limited in scope, reasoning, and personalization. The authors propose a modular, multi-agent system, each sub-agent specializing in a core competency: data science, health domain expertise, and health coaching. The system is evaluated through a rigorous, multi-dimensional framework, including both automated and extensive human expert/user assessments.

User-Centered Design and Health Needs Taxonomy

The design of PHA is grounded in a user-centered methodology, synthesizing over 1,300 real-world health queries from web search, forums, and surveys, and expert workshops. This analysis identifies four critical user journey categories (CUJs):

  1. General Health Knowledge: Factual, open-ended health questions.
  2. Personal Data Insights: Interpretation and contextualization of personal health data.
  3. Wellness Advice: Actionable, personalized recommendations for behavior change.
  4. Personal Medical Symptoms: Symptom assessment and triage.

These categories inform the modular decomposition of the agent, ensuring coverage of the full spectrum of consumer health needs.

Modular Multi-Agent Architecture

Data Science Agent (DS Agent)

The DS Agent is responsible for robust statistical analysis of personal and population-level time-series health data. Its architecture is a two-stage pipeline:

  • Analysis Plan Generation: Translates ambiguous, open-ended queries into structured, reproducible statistical analysis plans, explicitly operationalizing variables, data transformations, sufficiency checks, and statistical tests.
  • Code Generation and Execution: Converts the plan into executable Python code, with iterative self-correction for error handling.

Evaluation: On a benchmark of 141 query-plan pairs, the DS Agent achieves a mean plan quality score of 75.6% (vs. 53.7% for the base Gemini model, p<0.001), with substantial improvements in data availability and timeframe selection. Code generation pass rates reach 79.0% after five trials, with a significant reduction in data handling errors (11.0% vs. 25.4%).

Domain Expert Agent (DE Agent)

The DE Agent provides authoritative, contextualized medical knowledge and reasoning. It employs a multi-step Reason-Investigate-Examine cycle, integrating tools for web search, biomedical literature, and population statistics, and synthesizes evidence-based, personalized responses.

Evaluation: The DE Agent outperforms the base model on four medical MCQ benchmarks (overall accuracy 83.6% vs. 81.8%, p=0.002), and achieves higher top-1/5/10 accuracy in differential diagnosis tasks (46.1%/75.6%/84.5%). In contextualized Q&A, it is rated as significantly more trustworthy (96.9% vs. 38.7%) and preferred for personalization (71.9% win rate). Clinician evaluation of multimodal health summaries shows strong gains in clinical significance, cross-modal association, and comprehensiveness.

Health Coach Agent (HC Agent)

The HC Agent is designed for multi-turn, mixed-initiative health coaching, incorporating motivational interviewing and goal-setting best practices. Its modular architecture separates personalized coaching, recommendation timing, and conversation conclusion modules.

Evaluation: In user studies, the HC Agent is rated higher for conversation flow, motivational interviewing, and feedback incorporation. Expert raters confirm superior performance in goal identification, active listening, and personalized intervention. Notably, the agent is less optimized for progress tracking, suggesting an area for further refinement.

Orchestrated Multi-Agent Collaboration

The PHA system employs an orchestrator that dynamically assigns main and supporting agents based on query classification, decomposes tasks, and iteratively synthesizes responses with memory updates for conversational coherence. This design is informed by principles of modular cognition, adaptive support, low user burden, and architectural simplicity.

Comprehensive Evaluation

The PHA framework is evaluated on 10 benchmark tasks using the WEAR-ME dataset (N~1500), with over 7,000 human annotations and 1,100 hours of expert/user effort. Both end-users and health experts assess multi-turn conversations across 50 representative personas.

  • End-User Perspective: PHA is ranked as the best system in 48.7% of cases, outperforming both single-agent and parallel multi-agent baselines in overall quality, data analysis, and data interpretation. Users highlight the system's ability to synthesize quantitative and qualitative insights into actionable, personalized advice.
  • Expert Perspective: Experts show an even stronger preference for PHA (80% top ranking), citing superior technical depth, clinical accuracy, and effective integration of data science, domain knowledge, and coaching. The orchestrated, iterative collaboration is critical for producing coherent, contextually relevant, and safe recommendations.

PHA achieves these gains with lower computational cost and latency than parallel multi-agent baselines, though it remains more resource-intensive than single-agent systems.

Limitations and Future Directions

  • Statistical Reasoning: The DS Agent's handling of data distributions and advanced statistical modeling remains limited.
  • Tool Selection and Factuality: The DE Agent's reliance on web search can introduce noise; improved source selection and domain-restricted retrieval are needed.
  • Coaching Progress Tracking: The HC Agent underperforms in progress measurement, indicating a need for enhanced longitudinal tracking modules.
  • Scalability: The multi-agent architecture increases LLM call volume and latency, presenting challenges for real-time deployment.
  • Ethical and Regulatory Considerations: Algorithmic bias, privacy, security, and user over-reliance are critical risks. The system is explicitly not designed to replace clinical expertise, and any real-world deployment would require rigorous regulatory review.

The authors suggest future research into dynamic, competitive/cooperative agent pools, longitudinal impact studies, and fairness-aware evaluation.

Implications

This work demonstrates that modular, orchestrated multi-agent systems can substantially improve the personalization, accuracy, and actionability of AI-driven health recommendations. The explicit separation of data analysis, domain reasoning, and coaching enables both independent evaluation and targeted improvement of each competency. The comprehensive evaluation framework sets a new standard for benchmarking health AI agents, emphasizing both user and expert perspectives.

The PHA framework provides a validated blueprint for next-generation personal health AI, supporting the vision of accessible, trustworthy, and context-aware health agents. The modular approach is model-agnostic and extensible to future LLMs and health data modalities.

Conclusion

The Anatomy of a Personal Health Agent establishes a robust, modular multi-agent framework for personalized health AI, validated through extensive, multi-level evaluation. The system's architecture and evaluation methodology offer a foundation for future research and development of safe, effective, and user-centered health agents. The work highlights the necessity of specialization, orchestration, and rigorous assessment in advancing the practical utility of LLM-based health assistants.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

Explaining “The Anatomy of a Personal Health Agent” in Simple Terms

Overview

This paper imagines a smart helper for your health—a Personal Health Agent (PHA). It’s like a team of friendly experts in your phone that can look at your wearable data (like steps, sleep, heart rate), understand health facts, and coach you toward healthier habits. The goal is to show how such a helper could be built, what it should do, and how well it might work in everyday life. It’s a research idea, not a product you can download.

Key Questions the Paper Tries to Answer

To make this health helper truly useful, the researchers asked:

  • What do people actually want help with when it comes to their health?
  • What skills should a smart health helper have to answer those needs?
  • How can we design it so different parts work together smoothly?
  • Does the system give good, safe, and personalized advice?
  • How does it perform when tested with real health data from wearables and lab tests?

How They Built and Tested It

The researchers used a “user-centered” approach, which means they started by learning what people need and then designed the system around those needs.

They looked at real questions people ask online (Google Search, Gemini, Fitbit forums), ran surveys with Fitbit users, and held a workshop with 14 experts. From this, they found four main types of requests:

  1. General health knowledge (facts and explanations),
  2. Personal data insights (what your own numbers mean),
  3. Wellness advice (how to improve),
  4. Personal symptoms (what might be going on with your body).

To handle these requests, they designed a team of three “sub-agents” that work together, like a sports team with different positions:

  • Data Science Agent (DS): Think of this like a data detective. It analyzes your time-based numbers from wearables (steps, sleep, heart rate) and compares them to the general population. It answers questions such as “Has my running gotten faster?” or “Do I sleep more on active days?” It plans the right calculations and uses code to get accurate results.
    • Time-series data = a diary of your body’s signals over time.
    • Statistical tests = checks to see if a pattern is real or just random.
  • Domain Expert Agent (DE): This is the health knowledge expert. It explains medical terms and puts your data in context. It can interpret lab results and daily signals (like HRV or blood pressure) and personalize answers based on your age, health history, and environment.
    • Differential diagnosis = a careful list of possible reasons for symptoms, not a final medical answer.
  • Health Coach Agent (HC): This is your motivator and guide. It helps you set goals, find barriers, and build action plans. It uses techniques like motivational interviewing, which means it asks supportive questions so you discover your own reasons to change, making plans that fit your life.

These three are coordinated by an “orchestrator”—like a team coach—that decides which agent should do what, combines their answers, and helps the conversation flow.

To test the system, the team used:

  • Mixed evaluations: both automated checks and human reviews.
  • Real-world data from a study called WEAR-ME, where about 1,165 participants (Fitbit/Pixel Watch users) consented to share wearable data, answered health surveys, and got lab tests (like cholesterol and metabolic panels). Everything followed research ethics and privacy rules.

They ran 10 different benchmark tasks and collected more than 7,000 ratings, spending about 1,120 hours with experts and end-users to judge the system’s performance.

Main Findings and Why They Matter

Here’s what they found in simple terms:

  • The Data Science Agent improved the quality of analysis plans and made fewer coding mistakes than a standard LLM. In other words, it was better at turning a vague question like “Am I sleeping well?” into a clear plan, correctly running the analysis, and giving accurate numbers.
  • The Domain Expert Agent showed strong medical knowledge and could better personalize explanations by using the person’s data (wearables + lab results + background).
  • The Health Coach Agent provided more useful, motivating conversations, according to both everyday users and professional health coaches.
  • The whole multi-agent system (PHA) worked well when everything was combined: data insights + expert knowledge + coaching. End-users and experts judged it as more helpful and consistent in multi-turn, open-ended conversations about health goals.

Why this matters: Most health apps either show raw numbers, offer general advice, or give basic facts. This system tries to connect all three—analyzing your data, explaining what it means, and turning it into practical steps—so guidance is personalized and more likely to help you change habits.

What This Could Mean for the Future

If developed safely and responsibly, a personal health agent could:

  • Help people understand their wearable data in plain language and spot meaningful patterns.
  • Offer advice tailored to their goals and lifestyle, not one-size-fits-all tips.
  • Encourage long-term healthy habits through supportive coaching, not just information dumps.
  • Support, not replace, healthcare professionals—pointing people to real medical care when needed.

Important notes:

  • This is a research framework, not a commercial product, and it’s meant for everyday wellness support rather than medical diagnosis.
  • The system is designed to complement human experts and guide people to clinical resources for serious issues.
  • Privacy and consent were central in the data used for testing.

In short, the paper shows how a “team” of smart agents could make health guidance more personal, accurate, and motivating—bringing us closer to a helpful, everyday health companion that’s accessible to everyone.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 20 tweets with 61 likes about this paper.

alphaXiv

  1. The Anatomy of a Personal Health Agent (32 likes, 0 questions)