Garmin Vivosmart can’t measure stress: Study Shows Little Correlation to Self-Reported Levels

Garmin vivosmart 4 scientific stress hrv test
AI Image

Garmin Vivosmart can’t measure stress: Study Shows Little Correlation to Self-Reported Levels

A recent scientific paper using Garmin Vivosmart technology with a large sample size of people has recently received the attention of the UK’s national media.

Smartwatches offer little insight into stress levels, researchers find, [The Guardian]

Wearable’s favourite, Pulse, then criticised that article and how it was widely taken up by “engagement farmers” and “social media bots“.

I’m glad that HRV, stress and wearables are now the topic of mainstream media. This demonstrates that wearable technologies are now an accepted part of everyday life and consumers’ interest in them and their bodies.

However, I suspect some tech brands don’t understand HRV, and I’m pretty sure the same can be said about many commenters in this space.

A good place to start your understanding of HRV is this article, hosted on this site, written by Marcio Altini and adapted and edited by me. It summarises pretty much everything an interested lay person needs to know, with links to Altini’s more detailed content for those who want it.

HRV – everything you need to know | uses, science and limitations | Garmin, WHOOP, and Oura – HRV4Training

The following is my take on the research and the aftermath of its coverage, which contains a few unpleasant truths.

Summary of the Scientific Paper

Siepe et al’s scientific paper is called “Associations between ecological momentary assessment and passive sensor data in a large student sample” It looks at how data from a Garmin Vivosmart 4 (2018) compares to what people say about their feelings about stress, tiredness, and sleep.

The sample size and number of data points are very good, and I would assume significant results could be obtained.

  • They followed 781 students tracking their stress, momentary tiredness, and sleep.
  • Each student provided 3 months of data.
  • Students wore a Garmin Vivosmart 4 smartwatch to passively and frequently collect data.
  • Students also self-reported up to four times daily via short surveys (Ecological Momentary Assessments or EMAs). Ie up to 352 data points per student.

Tech Issues

Vivosmart 4 was released in 2018 and uses a Garmin Elevate Gen 2 PPG sensor. It has been previously used in scientific research, eg Hehlmann et al. (2021), but as far as I know, there has never been any assessment, outside of Garmin/firstbeat, of that Elevate sensor for rMSSD/HRV data. The accuracy of the HRV data cannot be inferred from HR accuracy tests  – tests that a tech reviewer like me might have undertaken during sports. 

Some reports have said ‘it’s old tech’ that has moved on since then, which is true. But telling the time has moved on from analogue watches, but some of the old ones are pretty accurate!

Basically, no one outside of Garmin knows how accurate the HRV data is from Vivosmart 4.

The researchers or someone should have found a way to determine the accuracy of HRV using a smaller test, perhaps by comparing waking HRV taken from a Polar H10 chest strap to a stress or body battery reading at a similar time, or some similar proxy as it is not possible (AFAIK) to use a 3rd party tool to capture raw HRV data from Vivosmart 4.

Furthermore, as the paper notes, Garmin does not publish its algorithms, introducing new factors to the raw data to create a somewhat invented score.

So, this large scientific paper looks at the correlations between ‘somewhat invented scores‘ and ‘the ability of students to perceive various physiological states’.  Hmmmmm

You can guess where I’m heading with this. Let’s look at the results

Main Results

  • Sleep: There was a “robust and positive association” between the number of hours of recorded sleep and what the students thought their sleep quality was.
  • Tiredness: The match was much weaker. A Garmin Body Battery Score uses HRV, HR and activity data. That data had a weak -0.082 correlation to the self-reported assessment.
  • Stress: There was almost no match for most people. The Garmin stress score also uses HRV, HR and activity data.

Why the Stress Data Didn’t Match

I suspect the researchers weren’t using the best methods and tools to measure something they didn’t understand at multiple levels. Other than that, it was fine 🙂 There were an impressive number of data points, though, a bit like a car with a go-faster stripe.

Here are a few thoughts

  • What actually is sleep quality? How do you measure it? What are its units of measurement? What is the gold standard comparator? Clark and Landolt (2017), and others state that sleep quality is probably tied to the proportion of time spent in restorative stages like slow-wave sleep (deep sleep) and REM sleep. I would probably agree with that. However, Garmin Vivosmart 4 does not assess sleep stages, and, as I have pointed out on this site many times, no wearable tech can properly assess sleep stages. So, the research compared a subjective feeling from the student to something the researchers couldn’t measure. They just happened to find a correlation.
  • What actually is stress? Maybe you might think HRV is the body’s response to stress, which it is to a degree. But it’s more a sign of how your body copes with stress – after strenuous exercise, an athlete’s HRV might rise, whereas for a couch potato, it would undoubtedly fall. Plus, there are numerous stressors throughout the day. How do we know which ones are affecting the student? Perhaps they are ‘stressed’ at the end of a lecture they didn’t understand, but Garmin should also report a rise in stress after a meal as the body digests the food.
  • What actually is tiredness? Is it sleepiness or the physical lack of energy? Again, there is the understanding of what the student is trying to assess about their body, and then the authors compare this to Garmin’s made-up composite metric. (made up in the sense that there will be a somewhat arbitrary weighting of inputs)

 

Let’s look at the media’s take

The Guardian Article’s View

Smartwatches offer little insight into stress levels, researchers find….devices cannot differentiate between someone being overworked and being excited [The Guardian]

The Guardian article is kinda true in many respects, but perhaps leaves the wrong impression.

The article seems to condemn all smartwatch tech when, in fact, only one 2018 model was used. It generalises the conclusions. That said I suspect that even Garmin’s latest Gen 5 sensor tech and latest version of the Body Battery and Stress algorithms would have made little difference.

The article also seems to focus on stress as ’emotional stress’, when there are many other components to consider

Finally, the Guardian article focuses more on blaming the device when there is also bias or misunderstanding among the students, affecting the self-perception scores.

 

The Pulse Article’s View

Pulse is a fitness tech site whose article defends the tech, pushing back to some degree on the methods employed by the researchers.

Smartwatches aren’t confused about stress—but headlines and studies are [Pulse]

Pulse does consider that Stress goes beyond emotional stress. However, Pulse does not appear to understand the nuances of stressors and the stress reactions to the exercise stimuli it cites.

Exercise is a form of acute physiological stress; that is precisely how training adaptation works. It’s the process of stressing your cardiovascular and musculoskeletal systems that forces them to adapt and get stronger.
A device that correctly identifies a workout as a high-stress event is not confused; it is working perfectly [Pulse]

To assert that a Vivosmart 4 will perfectly record a high-stress event is just wrong. Although if it CORRECTLY identifies something, it must identify it perfectly (I guess that’s a tautology).

Overall, Pulse’s response is a good counter-balance to The Guardian. Pulse defends the tech and blames the students, whereas the Guardian grabs the headlines by slating the tech.

It’s a fair fight! The reality is worse

my Take

The study is a well-intentioned waste of taxpayer money. [My Take]

Using old Garmin Vivosmart 4 technology is flawed from the perspective of a piece of unvalidated hardware for the intended purpose, compounded by using algorithms that must be published to be trusted.

The researchers are using the wrong measurements, like Body Battery, to compare to the students’ perceptions at the wrong times of day, when too many other factors come into play.

Furthermore, the researchers do not say how they educated the students on what they were trying to record – “I feel stressed right now – how much do you agree” is too vague.

That said, the study basically says that what we feel doesn’t seem to tie up with what tech says. You then have to ask, “Should it?” Is how you feel the same as physiological reality? A: It might be…or…it might not.

Generally, I rely on tech to measure something that can be validated, like HR and power, or supported by a body of science (HRV, TSB, VO2max). When tech pretends to measure and assess other aspects of physiology (Readiness) or performance, I find it interesting and hope everyone else knows to draw the same line between ‘interesting’ and ‘science’. Which they probably don’t.

Sources and Resources

the5krunner.com © 2010-2025

tfk, the5krunner
Sports Technology Reviewer and International Age Group Triathlete

With 20 years of testing Garmin wearables and competing in triathlons at an international age group level, I provide expert insights into fitness tech, helping athletes and casual users make informed choices.

Reader-Powered Content

This content is not sponsored. It’s mostly me behind the labour of love, which is this site, and I appreciate everyone who follows, subscribes or Buys Me A Coffee ❤️ Alternatively, please buy the reviewed product from my partners. Thank you! FTC: Affiliate Disclosure: Links pay commission. As an Amazon Associate, I earn from qualifying purchases.

5 thoughts on “Garmin Vivosmart can’t measure stress: Study Shows Little Correlation to Self-Reported Levels

  1. I think it also depends on the target audience for a metric to work and how well it works.
    If the technology is designed for high-performance athletes who train seven days a week and have a completely different daily routine, then the technology is useless when worn by an IT student who spends 18 hours a day programming in his chair.

    I’m not here to defend Garmin, that’s not my job – it’s just a thought.

    1. i think you have to separate out what it’s measuring, how it presents it, for what purpose. and also the target audience (as you say).

      sleep quality is important to all
      stress in the content of HRV is valid to most people. but there are quite tight limits to how and when the data is meaningful. Again, HRV is suitably useful to most people but different types of people (atheltes) can use data for different purposes, and there are also some differences to the interpretation of hrv amongst different populations – eg people can still be very fit with low hrv (genetic factors), hrv responds differetnly after exercise (see above)

      1. Hey, thanks for your reply. I used the wrong word. Not “designed,” but “calibrated.” I wasn’t referring to the metric itself, but how it was calibrated—in other words, the data basis.

  2. I dont think calling this a waste of taxpayers money is fair.
    First, a study is not only valuable by its findings alone, but what it does to the research field. the discussion it spawns and the new studies it influences.
    Second, a study might not be seeking for the answers you care for ;-D

    I did not bother to gain access to the full text, so in cant comment on general methodology, but the abstract and setup give also some indications on what the study tries to achieve and neither does it seem to try to validate the tech nor is it about tracking “exercise stress” primarely

    Assuming they choose the vivosmart conciousely, that sets already somewhat the context and that is not an athlete but an individual, casualy interested in additional information about personal health (ie target audience for fitness tracker, not sports watch). I would assume that given the target audience, garmin would need to match the established semantic for “stress” for that target audience. So no, the participants should not be educated on what data they should provide, they should record what they naturly perceive as stress. That then leads to the conclusion that “wearable data and their corresponding self-report measures may not necessarily measure similar constructs” Which i find very likely true. It can also mean that the general population is not very self aware, which is also very likely true 😀

    Should the self observation and the measured data correlate? I would say, to some extent. If something is really stressfull (in the common sense of the word!), it should be reflected in the data and to my experience it is. But it should also hint at stressfull things we dont observe as easily ( like the effect of 2 beers or the recovery after an infection ) Which in my observation it also does.

  3. Ask anyone wearing a smart watch: how many hours have you slept last night? Or even worse: what time you fell asleep and at what time you woke up? And you’ll get an answer that is pretty much “copy & pasted” from their watch…

Leave a Reply

Your email address will not be published. Required fields are marked *