Sleep Accuracy: Garmin loses, Fitbit wins in Scientific Study
9 commercial devices went head-to-head in this scientific sleep study that is claimed to be internally funded by West Virginia University, Rockefeller Neuroscience Institute.
The study comprised 5 people over an extended period of 98 data points (nights) with various scientific controls that looked at basic metrics: Total sleep time (TST), total wake time (TWT), and sleep efficiency (SE).In a nutshell, Fitbit Ionic showed the least variation with Garmin Vivosmart 4 Band and Apple Watch 3 (SleepWatch app) having the highest, or worst, variation. Even the Whoop Strap 2.0 and Polar A370 fared better than the Garmin.
Here are some charts from the report’s findings followed by some of my thoughts and key take-out points from the results.
Here are my thoughts
I don’t believe all the results
At least I don’t believe they are representative results for the brands on test. Even if I did buy them, I would like to have seen the researchers use significantly more people, significantly more data points and use newer devices which would give the manufacturers a fairer shot. Also omitting EMFIT QS was somewhat strange as, in my opinion, it is the best consumer-grade sleep-tracking device.
Interestingly the researchers found that “Consistent trends across all devices were observed in their failure to determine the amount of time its user was awake rather than sleeping“. At least that seems to tie in with what many of you often say.
Surprisingly when it came to sleep stage analysis the limited number of data points showed that
- “Measurements of light sleep obtained from WHOOP were much less biased and illustrated strong consistency between trials“
- the “Ability to estimate deep sleep was remarkably poor for all devices” and that
- “they all overestimated REM sleep time“.
As we’ve heard many times before, the sleep devices we use simply do not accurately measure sleep stages.
It just goes to show that you can’t always take any single view or a single technology as being correct. Clearly, technologies work differently on different people as I am at pains to say whenever I show my workout-related oHR stats. That said, the researchers swapped devices between subjects and do state that, “None of the remaining subjects had significantly different error distributions from each other.” Luckily readers here tend to be an intelligent bunch who read around and get a variety of different views before making up their own minds. I would be interested to know, below, how much weight you would place in this study compared to your own personal experiences.