Smartphone Missingness as a Depression Biomarker: A Baseline-Controlled Re-analysis of StudentLife

Abstract

Background Whether gaps in smartphone passive-sensing data carry psychological signal — beyond what a baseline self-report already provides — is a recurring question in digital phenotyping that has rarely been tested with multiplicity control and cross-validation on the same cohort.

Objective To test whether participant-level missingness in passive sensing and ecological momentary assessment (EMA) is an incremental predictor of depression beyond a single baseline self-report, in the canonical StudentLife cohort.

Methods Exploratory re-analysis of the publicly available StudentLife dataset. The original 2014 publication reports n = 48 enrolled undergraduates; the public archive contains 59 distinct sensor-instrumented UIDs, but the additional 11 are sparse-data or PHQ-9-incomplete records and contribute nothing to the between-person analyses, which use only the 38 participants with paired pre-term and post-term Patient Health Questionnaire-9 (PHQ-9). We computed 89 participant-level missingness features from nine continuous sensor streams, five phone-activity logs and 27 EMA prompts, and evaluated them under leave-one-out cross-validation with nested-CV-tuned hyperparameters, cluster-bootstrap confidence intervals, an omnibus joint F-test, and Benjamini–Hochberg multiplicity control.

Results A pre-term PHQ-9 baseline alone explained 59% of out-of-sample variance in post-term scores (n = 32; 95% cluster-bootstrap CI [0.22, 0.81]). Tuned regularized linear models trained on missingness alone reached only the cohort-mean baseline; adding missingness to pre PHQ-9 did not improve performance. The omnibus joint F-test of all nine continuous-stream missingness rates against post-term PHQ-9, adjusted for pre-term PHQ-9, was non-significant (F (9, 27) = 0.43, P = 0.91). No individual feature survived multiplicity correction. A separate within-person day-level analysis (2,186 person-days) yielded a small valence-specific prospective effect (r = +0.082, 95% CI [+0.011, +0.162]) opposite in direction to the withdrawal hypothesis.

Conclusion In this cohort, smartphone-data missingness did not add incremental predictive value beyond a single baseline PHQ-9. The analysis is exploratory and StudentLife-specific; it should not be read as evidence that missingness is never informative.

Plain-language summary Many studies use the gaps in someone’s smartphone data — missing GPS readings, missed survey prompts, fewer phone interactions — as a possible warning sign of depression. This re-analysis tested that idea on a widely used public dataset from a class of 48 college students. After accounting for each student’s depression score at the start of the term, the gaps in their phone data added no useful information about their depression score at the end of the term. The result is specific to this dataset and does not mean that smartphone gaps are never informative, but it shows that such claims need careful baseline comparisons.

Question Is smartphone-data missingness an incremental depression biomarker beyond a single baseline self-report?

Findings In an exploratory re-analysis of the StudentLife cohort (38 participants with paired PHQ-9 scores), missingness features did not improve prediction of post-term PHQ-9 beyond pre-term PHQ-9 under leave-one-out cross-validation, nested-CV-tuned models, an omnibus joint F-test, or multiplicity-controlled univariate screens.

Meaning Smartphone-data missingness should not be interpreted as a psychological signal in absence of baseline-controlled, cross-validated, multiplicity-aware evidence. The result is specific to a high-functioning undergraduate cohort over a ten-week term and requires replication.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This re-analysis received no external funding. The author received no payment, services, or in-kind support from any third party for any aspect of this work, including study design, data analysis, manuscript preparation, or decision to submit for publication.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study used ONLY openly available human data that were originally located at: https://studentlife.cs.dartmouth.edu/dataset.html This is the StudentLife dataset (Wang et al., UbiComp 2014), released by Dartmouth College's Computer Science Department for public use. The dataset is de-identified and was made openly available for secondary analysis prior to the initiation of this re-analysis. No new data were collected, and no identifiable private information was generated, accessed, or held by the author. The re-analysis qualifies as exempt human-subjects research under U.S. federal regulations governing secondary research use of publicly available, de-identified data (45 CFR Section 46.104(d)(4)).

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data availability

All data analyzed in this study are from the publicly available StudentLife dataset (https://studentlife.cs.dartmouth.edu/dataset.html). No new data were collected. The pre-processed participant-day-stream presence tensor and the 89-feature participant-level table are included as supplementary files.

Comments (0)

No login
gif