Garmin beaten by WHOOP: Study shows 99% accuracy

WHOOP 4.0 band vs whoop 3.0 strap
Don’t do this at home.

Is WHOOP Accurate?

Let’s see what science says…

Study Concludes WHOOP has the best HR & HRV accuracy

Hey, don’t bite me. I didn’t do the study I’m just reporting it.

Study Results: mdpi.com

The study attempts to answer the oft-mooted question, “Is WHOOP Accurate?

This research by the Central Queensland University is published in a peer-reviewed journal called SENSORS and it looked at the accuracy of 6 wearables which estimated HR, HRV and sleep (sleep stages and duration). Importantly the study was independently funded by the Australian Institute of Sport and supported by WHOOP. It compared a II-lead ECG  / polysomnography with the accuracy of 5 well-known wearables, from WHOOP, Garmin, Oura, Apple, Polar and also a head-worn wearable called Somfit.

Findings

The study found that WHOOP 3.0 had a higher degree of accuracy than all the other wearables when it came to measuring HR and HRV. WHOOP proved to be over 99% accurate when measuring both HR & HRV. Although readers should take note that the study was based on low levels of HR activity whilst sleeping.

Must Read: Detailed WHOOP 4.0 Review

Problems with the study

  • This is a sleep study. Don’t confuse it with a study showing WHOOP’s accuracy in sports compared to these other devices. (Though actually, WHOOP is more than sufficiently accurate for sports when worn on the bicep as I and other reviewers have found)
  • All 6 devices were worn simultaneously by each participant. Not forgetting the polysomnography / ECG, that’s the WHOOP 4.0 band plus 3 watches, a ring and a forehead-based wearable. I don’t know about you but I’ve only got two wrists. So only two of the devices can have been worn properly.
    • Furthermore was there any favouritism in allowing WHOOP to be worn on the bicep where better results WILL be obtained? or were other devices worn on the inner side of the forearm?
  • Old devices were used.
    • Oura 3 has improved hardware so Oura 2 is a bad choice. I also understand Oura 3.0 has a new, verified sleep stage algorithm but it is not yet live.
    • Vantage V lacks the 2nd generation Precision Prime sensor in the Vantage V2. So that is a bad choice.
    • Apple Watch 6 is a good choice and does have the latest generation optical HR sensor. However, the study uses the Sleep Watch app and the authors note “the exact period of sampling used by Sleep Watch was not specified“. that’s because the authors probably don’t understand how Apple Watch works and did not enable AF History in order for the Apple Watch to regularly record HRV. Most likely, the SleepWatch app only had 3 or 4 HRV readings, per participant per night! Oh dear. [The study results were submitted on 29 July 2022 and hence could not have used the latest version of Watch OS which is the only version that produces frequent HRV measurements via AFib History]
      • Even if the SleepWatch app somehow got more frequent HRV estimates it would be using a proprietary algorithm and not Apple’s
    • Garmin 245 has the previous generation of optical HR sensors.
    • WHOOP 3.0 is the older generation sensor. More detailed data was provided by WHOOP to the researchers (this is not unusual they have done that for me too, however, it’s data that you won’t necessarily see in normal use)
  • The study used a sleep lab which includes polysomnography. That is about as state-of-the-art as you need to get. However, only one technician was used to score the results. This might sound like I am being pedantic yet one of the several drawbacks of polysomnography is that different technicians WILL score (identify) sleep stages with differences of up to 20%. But using one technician is reasonable.
  • Conflicts of interest are reported thus “GD Roach., C. Sargent, and DJ Miller are members of a research group at Central Queensland University (i.e., The Sleep Lab) that receives support for research (i.e., funding, equipment) from WHOOP Inc. However, WHOOP was not involved in the design, conduct or reporting of this study.”

Note these release dates of current/later-gen models: WHOOP 4.0 was released in September 2021, Oura 3.0 in November 2021, Forerunner 255 in June 2022, Polar Vantage V2 in October 2020 and the Apple Watch 7 in September 2021 (Watch 8 in September 2022). So, realistically, at the time when the study was conducted perhaps only the Vantage V2 would have been available to use.

Thoughts

This is a sleep study that only compares previous-generation sensors & algorithms.

We can’t say anything about WHOOP 4.0’s accuracy based on this study as an older model was used. However, in one sense this study is indicative of WHOOP’s accuracy given the right circumstances. Sleep is one of those circumstances.

I’m reasonably confident that the newest models of Polar, Garmin, WHOOP and Oura would all yield improved results. And I’m positive that enabling AFib on Watch OS 9 would give massively different HRV/rMSSD results.

Furthermore, my own findings are that WHOOP is more than sufficiently accurate for athletes when worn on the biceps. Even so, my own sleep studies with the latest tech (N=1) show that WHOOP still consistently beats Garmin. But I know other researchers/reviewers who get different results with similar tech hence the need for larger sample sizes.

 

Correlation of HRV Readings to Polar H10/HRV4Training – day-to-day (Column 1) and baseline (Column 2) correlations. Garmin 955, Oura 2 and WHOOP 4 used.

 

As well as reminding readers not to extrapolate accuracy with resting readings to the accuracy of readings from workouts, remember also that no wearable can use heart rate data to correctly measure the strain of a strength-type workout. Heart rate just isn’t a good measure of that even if it is captured accurately. However, this study DOES validate the accuracy of WHOOP’s measurement of your response to such strength workouts, albeit with the previous generation of sensor.

That said, people at WHOOP will be super-happy with the results. The comparisons of WHOOP to a II-lead ECG/polysomnograph give as good a comparison to ‘correctness’ as you will easily get.

 

Reader-Powered Content

This content is not sponsored. It’s mostly me behind the labour of love which is this site and I appreciate everyone who follows, subscribes or Buys Me A Coffee ❤️ Alternatively please buy the reviewed product from my partners. Thank you! FTC: Affiliate Disclosure: Links pay commission. As an Amazon Associate, I earn from qualifying purchases.

12 thoughts on “Garmin beaten by WHOOP: Study shows 99% accuracy

  1. It was a Whoop 3.0 and Garmin Forerunner 245, rather than the Whoop 4.0 and forerunner 255. That’s significant because both systems have a totally reworked optical HR sensor cluster.

    Like you, I don’t understand how they managed to get 4 wrist-worn sensors simultaneously. They don’t explain it in their methodology and that seems like a red flag.

    If I am reading the figure 3 graphs correctly the whoop 3.0 had nearly perfect correspondence with ECG for HRV data that just blows the rest of the devices out of the water. I don’t really believe that Whoop has magical oHR sensor technology that is leaps and bounds better than Apple, Polar, and Garmin, so I find this surprising. Your surmise that they might have located the whoop higher up the forearm could explain this result, though.

    The authors disclosed that they are funded by whoop. This can lead to implicit (or overt) bias. In general I’m not sure I buy this stuff and especially that any quantified self device on the market can reliably categorize sleep stages.

    I think there is something there with HRV but there are a lot of borderline BS “metrics” too. “Strain”, “body battery”, “stress”, respiration rate, sleep stages… These things defined by a company with proprietary implementations are basically just “for fun” in my opinion and not to be taken too seriously.

    1. Exactly. I only want to add that MDPI journals have an inredibly poor reception in the scientific community. The time from submission to acceptance was only two weeks! A proper peer review usually takes months

    2. Whoop has cloud-based algorithms. And the algorithms are very important.

      I would imagine that a third wearable was either worn with a spacer or worn on the underside of the arm. That’s not too much of an unreasonable thing to do but it just doesn’t represent how things are normally worn. actually the device that is worn away from the wrist might get superior results to one worn correctly on the wrist.
      I think you are being a bit harsh with your BS claim on Whoop’s metrics. Most are pretty good. OK strain is, IIRC, based on a scale of 0-21 but Garmin is 0-5 but so what?, the underlying TRIMP will be very similar and soundly grounded.
      any companies ‘readiness to train’ metric is essentially nonsense as it is not a measurable thing per se but rather has indicators on physiological states like HRV.

      HRV: Yes some reviewers have found whoop inaccurate as a hr sensor and then assumed it would be inaccurate for resting hrv…which is an incorrect assumption.

  2. I guess you didn’t read the bit at the bottom where it says the study was supported by Whoop?

    1. no! I’ll add that.ty

      I also seem to remember that the AIS partnered with whoop a few months back, i couldn’t find that info again when i looked (I only have so much time)

  3. Thanks for the very useful meta-analysis. At the bottom it says:

    Conflicts of Interest: G.D.R., C.S., and D.J.M. are members of a research group at Central Queensland University (i.e., The Sleep Lab) that receives support for research (i.e., funding, equipment) from WHOOP Inc. However, WHOOP was not involved in the design, conduct or reporting of this study.

    I’m not going to claim that this conflict of interests _definitely_ caused bias, but given this combined with all the issues with the report you helpfully identified, it seems somewhat misleading to word your main article headline in a way which presents the findings as definitively in Whoop’s favour.

    Indeed, you provided a more precise subtitle (“3rd Party Study Concludes Whoop has the best HR & HRV accuracy”), plus the “Don’t bite me” caveat, so I wonder why you didn’t go with something like that as the main headline, e.g. “Study claims Whoop has best HR & HRV accuracy”. This precision in the headline is important because a lot of people don’t read beyond that (e.g. if they see the article via a tweet).

    1. agreed

      I’m not a scientist! I don’t want to write scientific papers nor read like one. I do want people to read my content tho in the same way that mainstream news organisations want people to read their’s

  4. One thing worth nothing as well is how the data was received: “All manufacturers of the six wearable devices were invited to provide data directly to the researchers rather than the researchers having to obtain the data via the associated apps. The advantage of data direct from a manufacturer is that it may have a greater level of precision than data available in an app. For example, sleep data are typically scored in 30-s epochs, but some apps output/display the data in 5-min epochs.” Whoop and Somfit were the only companies to provide their own data.

    1. good point on the granularity. of course it’s also important as to how frequently the data is captured per second
      when i reviewed whoop v1 about a million years ago the only way to get HRV data then was from whoop directly, which they kindly provided for me.
      cloud-based algorithms can potentially utilise processing-intensive AI routines and complex noise reduction methods that are not practical on a wearable (many of which are under-powered or just-powered)
      whoop also processes data in the cloud for users now.

Comments are closed.

wp_footer()