Multimodal Sleep Measurement and Alignment Analysis in Outpatients With Major Depressive Episode: Observational Study


Introduction

Sleep is a fundamental aspect of mental and physical well-being, playing a crucial role in emotional regulation, cognitive function, and overall health [-]. Insufficient and excessive sleep are both linked to cognitive impairment, heightened stress, and increased risk of chronic diseases [-]. In psychiatric populations, sleep abnormalities are especially pronounced, with insomnia and hypersomnia being core symptoms of major depressive disorder (MDD) [], and hypersomnia is frequently observed in bipolar disorder (BD) []. Moreover, deviations from typical sleep are associated with elevated suicide risk, mood instability, and early relapse indicators in many psychiatric disorders, including mood disorders [-]. Given these profound implications, accurately capturing sleep patterns is essential for psychiatric assessments. However, sleep is typically measured in laboratory settings, which limits its applicability for real-world monitoring. This limitation makes obtaining reliable long-term measurements challenging.

Polysomnography (PSG) is the gold standard for sleep assessment but is typically limited to controlled environments [-]. Alternative methods, such as sleep diaries and questionnaires, offer greater flexibility but are prone to recall bias [], often leading to discrepancies between subjective reports and objective measurements [,]. Wearable technology, such as actigraphy, provides a more practical solution with reasonable accuracy in tracking some sleep parameters, such as onset and offset []. However, actigraphy has limitations, including potential discomfort and the inability to measure physiological markers like heart rate and respiration []. More recent advancements in sleep monitoring have introduced contactless methods, such as ballistocardiography-based bed sensors and smartphone-based sleep tracking []. While these newer modalities show promise, their validity and comparability, especially in assessing patients with depressive disorders in real-world settings, remain underexplored. To address this gap, we use a multimodal approach, combining actigraphy, bed sensors, smartphone-based tracking, and the ecological momentary assessment (EMA) [] to assess sleep.

A crucial issue in sleep research is the discrepancy between different measurement modalities. Studies comparing PSG with actigraphy and self-reports have consistently shown variation in onset and offset measurements [,]. The divergence is particularly pronounced in psychiatric populations, where sleep misperception—the tendency to perceive one’s sleep inaccurately—is common [,]. While actigraphy has been frequently validated against PSG, the role of newer modalities like bed sensors and smartphone data requires further investigation [].

Beyond measurement discrepancies, demographic, clinical, and environmental factors introduce further variance. For example, chronotype, gender, and age, as well as external variables such as seasons, are associated with sleep patterns [,], with women generally reporting longer sleep [,]. Using smartphones at bedtime, a behavior more common in younger participants, can decrease agreement between smartphone and actigraphy estimates because actigraphy may register sleep while the smartphone detects ongoing wakefulness []. Psychological conditions such as depression further complicate measurement accuracy due to circadian rhythm disruptions []. Seasonal fluctuations in daylight and temperature are consistently linked to variation in sleep timing and duration [], consequently influencing cross-modal alignment. Evidence for these seasonal influences appears across modalities: nearable sensors [], wearables [,,-], and smartphone [,].

Given these complexities, this study investigates how sleep onset, offset, and total sleep time (TST) measurements compare across actigraphy, smartphone data, bed sensors, and EMA. Additionally, it examines whether these measurements differ between healthy controls and patients with depression, providing insights into the extent of alignment in sleep assessments. Finally, this research explores the demographic, psychological, and environmental factors contributing to discrepancies between measurement modalities, highlighting the sources of error and bias. By addressing these questions, this study contributes to the area of naturalistic sleep research, offering a deeper understanding of sleep monitoring technologies and their implications for psychiatric evaluation, monitoring, and intervention.


MethodsEthical Considerations

The study was approved by the Helsinki and Uusimaa Hospital District Ethics Committee and by Helsinki and Uusimaa Hospital District Psychiatry (approval number §125/2018). Data were anonymized, and participants received four movie tickets as compensation. Participation was voluntary, and individuals were informed that they could withdraw without consequence. All participants received clear information about the study and provided written informed consent.

Data Collection

The MoMo-Mood study [] explored the effectiveness of wearable technology in monitoring sleep and mood in individuals with psychiatric disorders as well as finding behavioral markers of depression from passively collected data [,]. The study included 164 participants, of which 133 were diagnosed with a current major depressive episode, including 85 patients with MDD, 27 with borderline personality disorder (BPD), and 21 with BD, alongside 31 healthy controls. Additionally, a pilot study was conducted with 37 participants, including 14 patients with MDD and 23 healthy controls. The main and the pilot studies had a similar design and data from both studies were included in the analysis.

Participants were recruited voluntarily from mood disorder outpatient services at Helsinki University Hospital Mood Disorder Division, Turku University Central Hospital Department of Psychiatry, and City of Espoo Mental Health Services, and healthy controls were recruited from the general community. Eligibility required a confirmed diagnosis of MDD, BD, or BPD, and individuals with psychotic symptoms, current substance use disorder, or acute suicide risk were excluded. Healthy controls were screened to ensure the absence of clinical depression. In the pilot study, patients were asked to participate for 6 months and healthy controls for 1 year, while in the MoMo-Mood study participants were invited to take part for up to 1 year. Participation length varied, resulting in differences in the amount of data contributed. Enrollment occurred on an ongoing basis, with participants joining and leaving at different times. Participation was voluntary, and individuals were informed that they could withdraw without consequence. All participants received clear information about the study and provided written informed consent. The study was approved by the Helsinki and Uusimaa Hospital District Ethics Committee and by Helsinki and Uusimaa Hospital District Psychiatry. Data were anonymized, and participants received 4 movie tickets as compensation. Further details on study design, participants recruitment, and data collection procedure can be found in previous studies [,].

The study period was divided into two phases: an initial 2-week active phase and a subsequent passive phase lasting up to 1 year. In this work, we focused exclusively on the active phase of data collection. During this phase, participants wore wrist-worn actigraphy devices and used bed sensors to monitor sleep patterns. Concurrently, smartphone and EMA data was gathered through the AWARE app [], which tracked various behavioral metrics []. Participants were also prompted 5 times daily to report on their mood, energy levels, and other psychological states using a 7-point Likert scale. Participants returned the actigraphy devices and bed sensors after the active phase, but smartphone data collection continued during the passive phase. The duration of the active phase was determined based on the amount of time actigraphs could collect data without needing to be recharged. Charging the actigraphs would have significantly increased the burden on participants and was not deemed feasible. A more detailed description of the questions can be found elsewhere [].

Sleep and Activity Monitoring Protocol

Participants, were provided with actigraphy devices (Phillips Actiwatch 2, Philips Respironics) to measure activity levels and sleep for 2 weeks. The devices can operate during this period without recharging to ensure that participants did not have to remove the device during the monitoring session. Participants were required to wear the devices on the wrist for as long as they could and were instructed to remove the device only during sauna use, which is a common activity in Finland.

In addition, nearable devices, including bed sensors and smartphones, were also used for sleep monitoring. The Murata SCA11H (Murata Electronics) nodes, based on ballistocardiography technology, were used for bed sensing. Each participant received a preconfigured Wi-Fi router to enable automatic data transfer to the study server. The participants were asked to place the Murata SCA11H node under their mattress, positioned close to but not directly beneath them. Alternatively, they could be attached to the bed frame near the participant. The sensors processed information on site and produced metrics such as pulse rate, heart rhythm variability, breathing rate, cardiac output, and signal intensity at a 1 Hz rate [].

Smartphones were used for sleep tracking due to their noninvasive nature and ability to monitor behaviors []. We collected communication timestamps (calls and texts), anonymous contact identifiers, smartphone screen activity events (screen on, off, lock, and unlock), location data, app usage patterns, and battery state. In this study, we only analyzed screen events to identify sleep periods.

Furthermore, smartphones facilitated the use of EMA, which was used to gather real-time data on sleep and participants’ emotional states throughout the study []. During the active phase, participants received multiple EMA prompts daily on their smartphones. These included a morning questionnaire to assess sleep from the previous night and an evening questionnaire to track the activities of the day. Additionally, a total of 3 randomly timed prompts were sent throughout the afternoon within specific time ranges. The morning questionnaire included a single item asking participants to report how long they had slept during the previous night. The item was answered using categorical response options rather than an open text field. The response options were: under 5 hours, 5-6 hours, 6-7 hours, 7-8 hours, 8-9 hours, 9-10 hours, and over 10 hours. Participants were prompted to complete the morning questionnaire at a fixed time in the morning rather than immediately upon awakening, and they were given a specific time window to respond. Similarly, the evening questionnaire was completed within a designated evening window. This schedule allowed for collecting subjective data on sleep, mood, energy levels, physical activity, and other psychological factors []. Data from all modalities were gathered through the Niima data collection platform [] and was preprocessed with the Niimpy behavior analysis toolbox [].

Device Data Export Procedures and Analysis

Sleep data were derived using proprietary algorithms from both the actigraph and bed sensor, which were then transformed into sleep parameters for analysis. For the actigraph, raw data were aggregated into 30-second intervals, with each interval assigned 1 of 3 possible labels based on activity levels. The “ACTIVE” label indicated high activity, indicating the user was awake and moving. The “REST” label represented low activity, suggesting the user was resting but not necessarily asleep. The “REST-S” label denoted sustained low activity, indicating that the user was likely asleep. To identify sleep episodes, the data was sorted chronologically, and continuous REST-S intervals were grouped together and treated as a single sleep episode.

The bed sensor collected raw data on heart rate and respiratory rate at 1-second intervals. This dataset was processed using the manufacturer’s algorithm to generate labels indicating bed occupancy and physiological states []. Sleep periods were identified based on the status variable, where a status of “1” indicated sleep, and “0,” “2,” and “3” represented nonsleep states (eg, not in bed, awake in bed, or signal overload). Continuous status “1” intervals were treated as sleep episodes and grouped together as part of the same sleep period.

For the smartphone data, sleep was inferred based on the lock and unlock status as indicators of inactivity, following the procedure described by Aledavood et al [] and used in previous studies [,]. Periods of inactivity, identified when the device was locked, were assessed to determine sleep periods. The longest stretch of inactivity was classified as nocturnal sleep. The lock and unlock status was converted into a binary format, and the longest inactivity periods were tracked to calculate sleep onset, sleep offset, and TST.

In addition, the EMA questions and responses, initially in Finnish, were translated into English. Data were selected based on responses to the question, “How many hours did you sleep last night?” to examine the relationship between subjective and sensor-based sleep data. Categorical TST ranges were transformed into numerical values (eg, “6-7 hours” was converted to 6.5 hours).

Transforming Raw Data to Sleep Measures

Data preprocessing was performed on the smartphone, actigraph, and bed sensor data. First, missing data were removed to ensure completeness. Second, timestamps were standardized to the Europe/Helsinki time zone to maintain consistency across all data sources. Third, sleep data were aligned using a 3 PM to 3 PM time frame, meaning each 24-hour period started and ended at 3 PM each day. This approach was chosen to account for variations in sleep schedules and to standardize sleep cycle measurements across participants.

A 5-minute threshold was applied to identify sleep periods, allowing gaps of up to 5 minutes between consecutive sleep intervals to be treated as part of the same sleep episode. TST outside the 3-13 hour range was excluded, following established guidelines []. This threshold was set to exclude implausible TST values, as unusually short or long sleep episodes could indicate naps, data errors, or atypical patterns.

Finally, sleep onset and offset were determined by identifying the start and end times of the longest detected sleep episode within each 24-hour period. Participants with missing values for these parameters were excluded from the analysis. illustrates how sleep periods were identified using the sleep labels provided by the manufacturer for both the actigraph and bed sensor.

Figure 1. Status changes identified by proprietary algorithms of the actigraph and bed sensor. For the actigraph, the sleep period began at 23:55 and ended at 06:51, marked by “Rest-S,” while the bed sensor indicated “Status 1,” showing the bed was occupied and the user was asleep from 21:40 to 05:30. This displays differences in sleep period detection and wakefulness between the two modalities. Missing Data and Outlier Handling Procedures

Although 164 participants were initially recruited for the main study, 13 (7.9%) participants did not submit passive data and were removed from all evaluations. Additionally, missing data resulted from various factors, including device noncompliance, such as failing to wear or charge the device, user dropout, inconsistent usage, and data cleaning procedures. Technical issues, such as battery depletion, sensor malfunctions, or connectivity failures, may have also contributed to data loss. After filtering out missing data, the sample included 173 unique participants from both the main and pilot studies. The datasets included 102 actigraph users, 108 bed sensor users, and 148 smartphone users.

Additionally, the 2 largest outliers from both the onset and offset columns were identified and removed to limit the impact of extreme values. Due to the overlap between the outliers in both columns, a total of 3 outliers were removed from the bed sensor and actigraph datasets and 2 from the smartphone dataset, resulting in the exclusion of 8 observations in total. Manual checks revealed that these outliers were due to device-related issues, including improper device wear for the actigraph, placement or configuration errors for the bed sensor, and technical problems with the smartphone data collection. Following data cleaning, the final sample size consisted of 172 participants and was as follows: actigraph (29 healthy controls, 70 patients; n=99), bed sensor (34 healthy controls, 71 patients; n=105), smartphone (46 healthy controls, 100 patients; n=146), and EMA (48 healthy controls, 109 patients; n=157). Furthermore, a total of 3 participants from the pilot study did not provide information on age or gender and, therefore, their data were excluded from further analyses, resulting in a final sample of 169 participants.

Statistical Analysis

To assess the agreement between different sleep measurement modalities, Bland-Altman plots [] were created by plotting the differences between paired measurements against their mean values. This approach provides a visual assessment of the differences between two modalities, revealing any systematic bias as well as the limit of agreement (LoA). Sleep onset, offset, and TST comparisons were made across modalities using this approach, with 95% LoA calculated. To account for day-to-day variability within each participant, data were aggregated by averaging individuals’ data across multiple days. Pearson correlation analysis was conducted to quantify the relationships between sleep parameters from different modalities. Normality was first assessed using the Shapiro-Wilk test []. If deviations from normality were observed, quantile-quantile plots were used for cross-checking []. The size of the correlation coefficients was interpreted based on the thresholds from the guidelines suggested by Mukaka []. A high correlation ranged from 0.70 to 1.00, a moderate correlation ranged from 0.50 to 0.70, a low correlation was between 0.30 and 0.50, and a negligible correlation was considered when ranged from 0.00 to 0.30. Finally, 95% CIs were calculated for the regression slopes.

Linear mixed models [,] were used to investigate factors associated with the alignment of sleep parameters between modalities and to account for the repeated measurements within each participant. The alignment of sleep parameters between measurement modality pairs was used as the dependent variable, represented as the absolute difference between the two modalities. The models adjusted for demographic factors, including age and gender. Chronotype was included as a fixed effect, measured by the Morning-Eveningness Questionnaire (MEQ), with higher scores indicating morningness preferences and lower scores indicating eveningness preferences. Given the unique geographical position of Finland, the seasonal factor was controlled for using day length duration. Between-group differences were evaluated by comparing each of the 3 patient groups with the healthy control group, which was used as the reference category. Participants were included as the random effect. The model was performed in the R Statistical Software (version 4.3.1; R Core Team) [] using the lme4 package []. All other analyses were performed in Python (version 3.12, Python Software Foundation) [].


ResultsDescriptive Statistics

The demographic characteristics of the participants are outlined in . Additionally, descriptive statistics for sleep onset, offset, and TST across modalities are presented in , based on daily averages to account for daily fluctuations. In the control group, sleep onset varies between modalities. The earliest average sleep onset was recorded by the actigraph (11:48 PM, SD 70 minutes), while the smartphone reported the latest (12:01 AM, SD 106 minutes). The bed sensor recorded a mean onset time of 11:58 PM (SD 67 minutes). In the patient group, the mean onset times were generally later, as the bed sensor recorded the earliest average onset at 12:19 AM (SD 100 minutes) and the actigraph reported the latest at 12:35 AM (SD 92 minutes). Lastly, the smartphone recorded a mean onset time of 12:30 AM (SD 101 minutes) for patients.

Table 1. Demographic characteristics of participants (N=169).CharacteristicValueAge (years), mean (SD)35.1 (12.8)Female, n (%)125 (73.9)MEQa score, mean (SD)39.2 (5.2)Group, n (%)
Major depressive disorder78 (46.2)
Control50 (29.6)
Bipolar disorder21 (12.4)
Borderline personality disorder20 (11.8)

aMEQ: Morning-Eveningness Questionnaire.

Table 2. Mean sleep parameters (onset, offset, and total sleep time) and SDs across different assessment methods for control and patient groups.ParameterGroupBed sensor, mean (SD)Actigraph, mean (SD)Smartphone, mean (SD)EMAa, mean (SD)Onset
Control23:58 (67 min)23:48 (70 min)00:01 (106 min)—b
Patients00:19 (100 min)00:35 (92 min)00:30 (101 min)—Offset
Control07:33 (87 min)07:44 (74 min)07:06 (72 min)—
Patients08:24 (119 min)08:58 (91 min)07:32 (92 min)—Total sleep time (duration)
Control7 h 35 min (78 min)7 h 55 min (62 min)7 h 10 min (87 min)7 h 14 min (35 min)
Patients8 h 5 min (107 min)8 h 22 min (60 min)7 h 7 min (78 min)7 h 26 min (68 min)

aEMA: ecological momentary assessment.

bNot applicable.

Sleep offset showed a similar pattern, with differences between groups and measurement modalities observed. In the control group, the smartphone reported the earliest mean offset time (7:06 AM, SD 72 minutes), while the actigraph recorded the latest (7:44 AM, SD 74 minutes). The bed sensor estimated a mean offset time of 7:33 AM (SD 87 minutes). In the patient group, offsets occur later, with the smartphone recorded the earliest mean offset time at 7:32 AM (SD 92 min) and the actigraph reported the latest at 8:58 AM (SD 91 min). Finally, the bed sensor reported a mean offset of 8:24 AM (SD 119 min). All modalities indicate a trend of delayed offset times among patients compared to controls. Furthermore, patients show greater variability in sleep offset times, suggesting more inconsistency in their wake-up patterns.

Finally, TST was evaluated using both objective measurements and subjective reports from the EMA questionnaires. In the control group, the smartphone recorded the shortest mean TST (7 hours 10 minutes, SD 87 minutes), while the actigraph recorded the longest (7 hours 55 minutes, SD 62 minutes). The bed sensor estimated a mean TST of 7 hours 35 minutes (SD 78 minutes), and subjective reports from the EMA questionnaires indicated a mean of 7 hours 14 minutes (SD 35 minutes). For the patient group, the mean TST was generally longer, with the smartphone recording the shortest mean TST at 7 hours 7 minutes (SD 78 minutes) and the actigraph recording the longest at 8 hours 22 minutes (SD 60 minutes). The bed sensor estimated a mean TST of 8 hours 5 minutes (SD 107 minutes), and EMA responses showed a mean TST of 7 hours 26 minutes (SD 68 minutes). Overall, the findings showed relatively consistent trends across the modalities in measuring TST.

Alignment of Sleep Parameters Across Assessment ModalitiesSleep Onset

Bland-Altman plots comparing sleep onset times across modalities are presented in . The mean difference between sleep onset was not statistically significant (). However, sleep onset showed significant positive correlations across modalities, as displayed in . Actigraph and bed sensor onset times had the strongest correlation (r=0.70, 95% CI 0.58-0.96; P<.001), indicating a high level of agreement between these modalities. The correlation between actigraph and smartphone onset times was moderate (r=0.69, 95% CI 0.60-0.95; P<.001), indicating that these two modalities capture similar sleep onset patterns. The correlation between smartphone and bed sensor onset times was also moderate (r=0.50, 95% CI 0.32-0.69; P<.001), suggesting greater variability in smartphone-based measurements.

Figure 2. Bland-Altman plot comparing sleep onset times across modalities, with data points color-coded by group. The red dashed line indicates the mean bias, and the green dashed lines represent the 95% limit of agreement (LoA). The smallest mean bias is observed between the actigraph and bed sensor at 1.80 minutes. Figure 3. Scatter plots comparing sleep onset times across different modalities, with each point representing a paired dataset. Data points are color-coded to distinguish control and patient groups. All comparisons show positive correlations, with the strongest correlation observed between the actigraph and bed sensor. Table 3. Mean bias and statistical comparison of sleep metrics across assessment methods.Sleep metric and comparisonMean bias (SD)t test (df)P valueOnset
Actigraph vs bed sensor1.80 (62.23)0.24 (70).81
Smartphone vs bed sensor6.08 (86.42)0.67 (89).51
Actigraph vs smartphone–4.10 (74.78)–0.50 (83).62Offset
Actigraph vs bed sensor30.39 (85.26)3.00 (70)<.001
Smartphone vs bed sensor–37.86 (107.19)–3.35 (89)<.001
Actigraph vs smartphone64.85 (86.78)6.85 (83)<.001Total sleep time
Actigraph vs bed sensor28.57 (103.03)1.51 (98).13
Actigraph vs smartphone59.21(87.56)6.29 (98)<.001
Actigraph vs EMAa58.51 (66.42)5.43 (98)<.001
EMA vs bed sensor–24.9 (93.15)–2.49 (104).01
EMA vs smartphone22.61 (79.74)1.81 (120).07
Smartphone vs bed sensor–34.55 (111.37)–3.57 (104)<.001

aEMA: ecological momentary assessment.

Sleep Offset

presents Bland-Altman plots comparing sleep offset times across the different measurement modalities. The mean bias for sleep offset across all modalities was statistically significant (P<.001). In particular, the mean bias between the actigraph and bed sensor offsets was 30.39 (SD 85.26; 95% CI 10.36-50.43) minutes, indicating a positive bias, where the actigraph tends to report later onset times compared to the bed sensor. The smartphone offset differed from the bed sensor by a mean of –37.86 (SD 107.19; 95% CI –60.18 to –15.53) minutes, showing a negative bias, with the smartphone tending to report earlier offset times compared to the bed sensor. Lastly, the mean difference between the actigraph and smartphone offsets was 64.85 (SD 86.78; 95% CI 46.13-83.57) minutes, reflecting a larger positive bias, with the actigraph reporting significantly later onset times compared to the smartphone.

Figure 4. Bland-Altman plot comparing sleep offset times across modalities, with data points color-coded by group. The red dashed line indicates the mean bias, and the green dashed lines represent the 95% limit of agreement (LoA). The actigraph records later offsets, while the smartphone records earlier offsets relative to other modalities. The largest mean bias is observed between the actigraph and smartphone at 64.85 minutes.

Moreover, all modalities showed moderate to low correlation in measuring sleep offset (). The highest correlation was observed between the actigraph and bed sensor (r=0.65, 95% CI 0.57-1.01; P<.001), followed by the correlation between the actigraph and smartphone (r=0.51, 95% CI 0.27-0.59; P<.001). The smartphone and bed sensor showed the lowest correlation (r=0.43, 95% CI 0.34-0.88; P<.001). Results indicated moderate to low alignment in sleep offset measurements across modalities, with greater variability observed in smartphone-based measurements.

Figure 5. Scatter plots comparing sleep offset times across different modalities. All offsets show a mild positive correlation, with the strongest correlation observed between the actigraph and bed sensor. Total Sleep Time

The Bland-Altman plot in showed that the actigraph consistently reported higher TST compared to the other modalities. The mean difference between actigraph and bed sensor in TST was 28.57 (SD 103.03; 95% CI 10.83-46.31) minutes, reflecting a small positive bias, which was not statistically significant (P=.13). The actigraph also overestimated TST compared to the smartphone by 59.21 (SD 87.56; 95% CI 44.41-74.00) minutes and by 58.51 (SD 66.42; 95% CI 47.68-69.34) minutes when compared to the EMA data, both of which were statistically significant (P<.001).

Figure 6. Bland-Altman plot comparing total sleep time (TST) across modalities, with data points color-coded by group. The red dashed line indicates the mean bias, and the green dashed lines represent the 95% limit of agreement (LoA). The actigraph reported higher TST compared to other modalities, while the smartphone reported lower values.

EMA slightly underestimated TST relative to the bed sensor and actigraph but overestimated it compared to the smartphone. The mean difference between EMA and the bed sensor was –24.95 (SD 93.15; 95% CI –40.08 to –9.82) minutes, reflecting a small negative bias, which was statistically significant (P=.01). In comparison to the smartphone, EMA had a mean difference of 22.61 (SD 79.74; 95% CI 9.52-35.69) minutes, indicating a very small positive bias, though this difference was not marginally significant (P=.07). Lastly, the smartphone underestimated TST compared to the bed sensor, with a mean difference of –34.55 (SD 111.37; 95% CI –52.82 to –16.26) minutes, which was statistically significant (P<.001). These results suggest that there are significant biases between the actigraph and both EMA and the smartphone. The actigraph overestimated TST, likely because it can misclassify wakefulness as sleep. EMA, based on subjective recall, slightly underestimated TST. The smartphone showed the greatest underestimation, which may be due to screen activity shortly before sleep onset and after waking.

Pearson correlations revealed varying levels of agreement between modalities for TST (). The strongest correlation was observed between actigraph and EMA (r=0.45, 95% CI 0.20-0.63; P<.001), Actigraph and bed sensor sleep measurements demonstrated a low positive correlation (r=0.30, 95% CI 0.12-0.91; P=.01), while actigraph and smartphone measurements showed a similar low correlation (r=0.30, 95% CI 0.10-0.60; P=.006). The correlation between EMA and bed sensor data was less pronounced and marginally significant (r=0.21, 95% CI –0.05 to 0.79; P=.08). EMA and smartphone measurements showed a negligible correlation (r=0.26, 95% CI 0.10-0.56; P=.005). Similarly, smartphone and bed sensor data did not exhibit a significant correlation (r=0.17, 95% CI –0.04 to 0.47; P=.11), suggesting greater variability in smartphone-based sleep estimates.

Figure 7. Scatter plots comparing total sleep time (TST) across different modalities. All comparisons show weak positive correlations, with ecological momentary assessment (EMA) versus bed sensor and smartphone versus bed sensor not exhibiting statistically significant correlations. Sleep Measurement Discrepancies in Patient and Control Groups

Paired t tests (2-tailed) were used to compare sleep onset times between devices and methods for controls and patients separately, with differences reported in minutes. Linear regression with Pearson correlation was also performed to assess the relationship between device pairs.

Sleep Onset

Analysis of sleep onset times using paired t tests (2-tailed) revealed minimal mean biases () between measurement modalities in both control and patient groups. For instance, the mean bias between actigraph and bed sensor was –4.57 minutes in controls (t22=–0.37; P=.71) and 4.85 minutes in patients (t47=0.52; P=.60), indicating no systematic difference in onset detection timing. Similarly, the smartphone versus bed sensor comparison showed a mean bias of 18.2 minutes in controls (P=.20) but was near zero in patients (0.02 minutes; P=.99). Despite these small biases, correlations () revealed stronger agreement in patients than controls for some modality pairs. Notably, the actigraph versus bed sensor correlation was 0.74 (P<.001) in patients, compared to 0.49 (P=.02) in controls, suggesting more consistent relative timing across these modalities in patients. The smartphone versus bed sensor and actigraph versus smartphone onset times also showed significant positive correlations in both groups. These findings suggest that although absolute onset time differences between devices are small and nonsignificant, the consistency between modalities is greater in patient groups.

Table 4. Mean bias and statistical comparison of sleep metrics across methods and groups.Sleep metric, comparison, and groupMean bias (SD)t test (df)P valueOnset
Actigraph vs bed sensor

Control–4.57 (58.9)–0.37 (22).71

Patients4.85 (64.2)0.52 (47).60
Smartphone vs bed sensor

Control18.2 (75.2)1.32 (29).20

Patients0.02 (91.5)0.0014 (59).99
Actigraph vs smartphone

Control–8.00 (68.2)–0.59 (24).56

Patients–2.44 (77.9)–0.24 (58).81Offset
Actigraph vs bed sensor

Control20.9 (65.8)1.52 (22).14

Patients34.9 (93.5)2.59 (47).01
Smartphone vs bed sensor

Control–22.9 (86.0)–1.46 (29).15

Patients–45.3 (116.3)–3.02 (59).004
Actigraph vs smartphone

Control32.2 (70.4)2.29 (24).03

Patients78.7 (89.9)6.72 (58)<.001Total sleep time
Actigraph vs bed sensor

Control33.9 (105.44)1.74 (28).09

Patients28.6 (104.83)1.10 (69).27
Actigraph vs EMA

Control53.3 (61.65)3.11 (27).004

Patients61.6 (67.28)5.10 (69)<.001
Actigraph vs smartphone

Control8.26 (98.54)1.69 (28).10

Patients80.9 (78.04)6.65 (69)<.001
EMA vs bed sensor

Control–42.5 (75.975)–0.70 (27).49

Patients–20.9 (97.54)–2.51 (70).01
EMA vs smartphone

Control3.65 (80.44)0.05 (27).96

Patients18.9 (90.22)1.65 (92).10
Smartphone vs bed sensor

Control–14.5 (100.62)–0.92 (33).36

Patients–42.5 (116.72)–3.57 (70)<.001Table 5. Correlations across assessment methods between control group and patients.Sleep metric, comparison, and usersCorrelation, r (95% CI)P valueOnset
Actigraph vs bed sensor

Control0.49 (0.10 to 0.95).02

Patients0.74 (0.61 to 1.05)<.001
Smartphone vs bed sensor

Control0.58 (0.21 to 0.68)<.001

Patients0.48 (0.27 to 0.77)<.001
Actigraph vs smartphone

Control0.73 (0.59 to 1.38)<.001

Patients0.66 (0.50 to 0.92)<.001Offset
Actigraph vs bed sensor

Control0.73 (0.54 to 1.32)<.001

Patients0.59 (0.44 to 1.04)<.001
Smartphone vs bed sensor

Control0.44 (0.13 to 1.13).02

Patients0.42 (0.25 to 0.92)<.001
Actigraph vs smartphone

Control0.46 (0.05 to 0.59).02

Patients0.49 (0.25 to 0.68)<.001Total sleep time
Actigraph vs bed sensor

Control0.12 (–0.30 to 0.55).54

Patients0.33 (0.06 to 0.61).02
Actigraph vs EMA

Control0.55 (–0.02 to 1.13).097

Patients0.43 (0.18 to 0.68).002
Actigraph vs smartphone

Control0.36 (–0.02 to 0.74).08

Patients0.30 (0.05 to 0.54).02
EMA vs bed sensor

Control0.31 (–0.21 to 0.83).26

Patients0.21 (–0.05 to 0.47).12
EMA vs smartphone

Control0.34 (–0.02 to 0.71).07

Patients0.25 (0.05 to 0.45).02
Smartphone vs bed sensor

Control0.17 (0.53 to –0.20).38

Patients017 (–0.08 to 0.43).19Sleep Offset

Significant mean biases were observed across modalities for sleep offset, particularly in the patient group (). In controls, the mean difference between actigraph and bed sensor was 20.91 minutes but was not significant (P=.14). In contrast, this difference increased to 34.94 minutes in patients and was statistically significant (t47=2.59; P=.01), indicating that patient characteristics such as movement patterns or restlessness affect actigraph readings more strongly. Comparisons involving smartphone also showed larger and significant discrepancies in patients. The smartphone underestimated sleep offset by 45.3 minutes relative to the bed sensor (P=.004), whereas in controls the difference was nonsignificant (P=.16). The largest offset bias was observed between actigraph and smartphone in patients, with a mean difference of 78.7 minutes (P<.001), compared to 32.2 minutes in controls (P=.03).

Despite these differences, correlation analyses showed that the relative agreement across modalities remained comparable between groups (). Actigraph and bed sensor offset times correlated strongly in both controls (r=0.73, P<.001) and patients (r=0.59, P<.001). Smartphone and bed sensor correlations were moderate and similar across groups (controls: r=0.44; P=.02 and patients: r=0.42; P<.001). Actigraph and smartphone also showed comparable correlations (controls: r=0.46; P=.02 and patients: r=0.49; P<.001). These results indicate that absolute disagreement in sleep offset timing was greater among patients, especially involving smartphone data. However, the consistency of measurements across individuals, as reflected by correlation coefficients, was largely preserved.

Total Sleep Time

TST comparisons demonstrated the largest mean biases, with notable differences between controls and patients (). Controls exhibited small, marginally significant biases between modalities, such as 33.9 minutes for actigraph versus bed sensor (P=.09) and 8.26 minutes for actigraph versus smartphone (P=.10). Conversely, patients showed significantly larger biases: actigraph versus smartphone differed by 80.9 minutes (P<.001), actigraph versus EMA by 61.6 minutes (P<.001), and actigraph versus bed sensor by 28.6 minutes (P=.27). In , correlations for TST were generally weak and nonsignificant in controls (eg, actigraph vs bed sensor r=0.12; P=.54), but patients exhibited moderate significant correlations, including actigraph versus bed sensor (r=0.33; P

Comments (0)

No login
gif