Day-to-Day Functioning Screening

Updated 22 February 2026

Day-to-day functioning screening is the systematic evaluation of individuals' abilities to perform daily self-care and complex tasks for independent living.
Screening tools use questionnaire-based assessments, digital sensor methods, and AI-driven conversations to objectively measure functional decline.
Validated instruments demonstrate high reliability and robust psychometrics, enabling early detection and targeted intervention in aging and cognitive disorders.

Day-to-day functioning screening refers to the systematic evaluation of an individual's capacity to perform the complex, multifaceted activities required for independent living. These activities typically encompass Instrumental Activities of Daily Living (IADLs) and Activities of Daily Living (ADLs), spanning domains such as self-care, household management, financial transactions, social engagement, mobility, and cognitive self-regulation. Screening approaches aim to detect and quantify subtle and overt declines in function across aging, neurodegenerative, musculoskeletal, and psychiatric disorders, and to provide actionable metrics for diagnosis, prognosis, intervention targeting, and longitudinal tracking.

1. Conceptual Framework and Rationale

IADLs comprise multi-step, cognitively mediated tasks (e.g., shopping, cooking, financial management, electronic device operation), while ADLs target basic self-care processes (e.g., bathing, dressing, toileting, feeding). Decline in IADL performance is frequently the earliest marker in the continuum from normal aging through Mild Cognitive Impairment (MCI) to dementia, and is also observed in numerous non-neurological conditions. ADLs serve as a more stringent threshold of impairment, often prompting disability support or care escalation (Jutten et al., 2016, Sheng et al., 2023, Yang et al., 2016).

Screening tools must (a) efficiently distinguish normal variability from clinically significant deterioration, (b) provide interval-level or at least ordinal-level scores with demonstrated psychometric rigor, and (c) enable repeated measurements to capture transitions and monitor intervention effects. Modern approaches increasingly leverage technology for objectivity, scalability, and ecological validity.

2. Principal Screening Modalities and Tools

2.1 Questionnaire-Based Approaches

The Amsterdam IADL Questionnaire (A-IADL-Q) exemplifies proxy-based, IRT-modeled assessment, offering a 70-item and a 30-item Short Version (A-IADL-Q-SV) (Jutten et al., 2016). The A-IADL-Q-SV encompasses all major IADL domains via a fixed set of 30 items, with each item rated for difficulty attributable to cognitive decline. The tool is unidimensional and scored via a graded response model, mapping responses onto a latent impairment trait (θ̂), which can be linearly transformed into a T-score (T = 50 + 10θ̂) for clinical interpretation. Suggested impairment bands for θ̂ are: ≤ –1.0 (normal), –1.0 to 0.0 (subtle decline), 0.0 to 1.0 (mild impairment), > 1.0 (moderate-severe impairment).

Functional self-report using visual stimuli is implemented in YADL, an image-based web survey presenting 47 photographs of ADLs (Yang et al., 2016). Patients rate the practical difficulty of pictured activities on a three-level ordinal scale. Personalization algorithms retain images rated as “hard” for longitudinal tracking, and scoring is computed as the normalized sum or maximum mapped to canonical ADL indices.

2.2 Digital and Sensor-Based Screening

Physical functioning can be screened using remote, minute-level accelerometry. The daily activity profile approach transforms ActiGraph GT1M data into J-dimensional vectors of average minutes per day in data-driven activity classes, allowing for classification into performance quartiles on standardized physical function tests (e.g., 400m walk, 20m pace, 5× sit-to-stand) via penalized ordinal regression (Agarwal et al., 2018). Models achieve held-out AUCs up to 0.79 (400m walk, lowest quartile).

Unsupervised cognitive impairment screening is enabled by mini-SPACE, an iPad-based serious game targeting spatial navigation and perspective-taking skills (Tian et al., 15 Nov 2025). Performance metrics (rotation time, movement time, perspective taking error) are z-standardized and averaged into a composite “SPACE Error,” which significantly predicts MoCA scores (ΔR² up to 0.13), with ICC = 0.86 for 3-session average.

2.3 Speech and Conversational AI Systems

Smartwatch-derived acoustic markers enable functional deficit detection via passive speech collection. Variances of MFCC derivatives and second formant (F₂) explain >80% of classifier impact for cognitive tasks, with LightGBM models achieving up to 77.8% accuracy compared to 68.5% for standard tests (Yamada et al., 2023).

Conversational AI, such as CaiTI, applies LLM-based dialogue to probe 37 daily-functioning dimensions (e.g., sleep, mood, social, productivity, hygiene) using open-ended, therapist-authored prompts (Nie et al., 2024). User responses are segmented and classified into (Dimension, Score∈{0,1,2}) tuples, guiding motivational interviewing or CBT mini-exercises as required. In-study live data yields dimension and score accuracies >97%, and statistically significant improvements in functioning over 14 days and 24 weeks.

3. Scoring, Psychometrics, and Validation

Modern screeners increasingly employ IRT or related ordinal models to provide interval-level scaling, reliability, and trait coverage across the entire impairment spectrum. The A-IADL-Q-SV demonstrates Cronbach’s α = 0.98, ICC with the original version = 0.97, and strong construct validity (MMSE r = 0.72, DAD r = 0.87) (Jutten et al., 2016). In multi-country validation, item-level DIF (ΔR² range 0–0.034) was negligible, supporting cross-national comparability (Dubbelman et al., 2019). Similar psychometric rigor is reported for game- and activity-profile-based screeners, with test–retest ICC >0.85, robust to unsupervised at-home use (Tian et al., 15 Nov 2025).

Conversational LLM-based screeners achieve score/dimension classification accuracies exceeding 95% in microbenchmarks and demonstrate longitudinal sensitivity to functional change (Nie et al., 2024).

4. Workflows, Implementation, and Use Cases

The administration paradigm ranges from proxy or self-report (A-IADL-Q-SV, YADL), to fully digital (accelerometry, games), to AI-mediated conversational interaction (CaiTI). The table below summarizes selected workflows:

Modality	Data Source	Workflow Duration	Scoring Output
A-IADL-Q-SV	Proxy questionnaire	10–15 min	θ̂ (IRT), T-score
Accelerometer profile	Wearable sensor (7d)	0 (passive)	Activity-profile, quartile
Smartwatch speech	Voice recording	<5 min	ML-based risk score
Mini-SPACE	Serious game, iPad	5–15 min	Composite spatial error
CaiTI	Conversational AI	5–20 min (var.)	37-length domain score vec.

Screening tools are selected based on target population, burden tolerance, setting (home, clinic, remote), and need for objectivity or ecological validity.

5. Domain-Specific and Cross-Cultural Considerations

Item bias and cultural applicability are addressed via DIF analysis (A-IADL-Q series), iterative item elimination, and translation/adaptation protocols (Jutten et al., 2016, Dubbelman et al., 2019). For technology-dependent or context-specific tools, ongoing calibration and validation in new demographic settings are requisite. Activity-profile and game-based screeners may require age-gender stratification and localized tutorial adaptation (Tian et al., 15 Nov 2025). Speech-based markers must control for language, dialect, and acoustic environment (Yamada et al., 2023).

6. Comparative Performance and Integration with Clinical Practice

The A-IADL-Q-SV is superior to the MMSE for early IADL decline detection (correlation r = 0.72 vs. primarily orientation/memory). DAD lacks fine gradation and is less sensitive to complex tasks. Wearable-based and digital game-based assessments provide unobtrusive, objective alternatives with robust psychometric properties, enabling continuous or high-frequency monitoring impossible in traditional clinic-based settings (Agarwal et al., 2018, Tian et al., 15 Nov 2025). LLM-driven conversational agents match therapist-level classification accuracy and deliver immediate intervention guidance (Nie et al., 2024).

Integration recommendations include the use of automated scoring platforms, modular pipelines for digital systems, cloud-based dashboards for remote tracking, and validation against established clinical thresholds. Interpretation bands and alert thresholds should be tailored by context and validated prospectively.

7. Limitations and Future Directions

Limitations include potential bias in proxy or self-report, need for periodic model recalibration in digital tools, demographic representativeness of normative datasets, and technical constraints in AI-driven dialogue and speech analysis. Further work is required to establish diagnostic cut-points and ROC characteristics in diverse cohorts, to integrate multimodal sources (e.g., wearables and speech), and to deploy adaptive, personalized screening tailored to individual trajectories (Jutten et al., 2016, Agarwal et al., 2018, Nie et al., 2024).

A plausible implication is that hybrid, technology-enabled day-to-day functioning screeners, incorporating objective markers and adaptive AI interfaces, will continue to supplement and ultimately transform traditional assessment paradigms in neurological, psychiatric, and aging research and care.