Garmin VO2max – scientific study roasts it and loves it. Who for? A: It depends
A recent study on the Garmin Forerunner 245 (Engel et al. 2025) urged caution in high-level athletes when looking at the VO2max values the watch produced.
More: Garmin Forerunner 245 Review
the Garmin Forerunner 245 smartwatch consistently underestimated VO2max in highly trained athletes by roughly 6.3 ml/min/kg, showing an average percentage error of 9.4-10.4% and generally poor reliability
Smartwatches have become convenient and popular tools for athletes looking for insights into their health and fitness metrics, such as VO2max. As one of the key determinants of performance and health span, VO2max is high on the list of popular metrics, and Garmin has estimated it on its watches and bike computers for well over five years.
How does Garmin calculate VO2max?
Garmin estimates VO2max for runners and cyclists using algorithms it acquired from Firstbeat in 2020. This analyses heart rate data alongside pace (for running) or power (for cycling) during certain qualifying activities. For runners, this requires a relatively hard outdoor GPS run of over 10 minutes, whereas for cyclists, it requires a hard but steady ride of at least 20 minutes; the latter’s accuracy is improved with a power meter.
Notable updates occurred in late 2023, when the company changed how VO2max was synced and stored, affecting device consistency, which was widely reported on forums then. In late 2024, Garmin refined its algorithms with firmware version 21.xx, resulting in VO2max drops or adjustments also publicly reported by many users.
A note on accuracy
Before discussing the study (which is interesting), here are some comments on the accuracy of the tech stack.
- The authors used a Polar H10 chest strap. This will give good data, but the authors did not state whether ANT+ or BLE was used when paired to the watch, which could affect accuracy (it likely did not).
- The authors used GPS from a Forerunner 245 on a running track. This would likely be in open skies, so GPS reception should be good. However, the Forerunner 245 uses the CXD5603GF GPS chipset, which is not as accurate from my extensive tests over the years as the current generation of dual frequency GNSS chipsets. Poor distance and hence pace were likely used. I can’t see how the authors could have adjusted the distance used in Garmin’s calculation with the precise distance recorded on the track.
- The authors did not use a footpod
- The authors did not use Garmin’s Track Mode
- The authors did not describe any specific validation or adjustment for the distance the Garmin Forerunner 245 smartwatch recorded against the actual distance run.
- The authors used Garmin’s older, outdated VO2max algorithm
- The authors inputted HRmax into the Garmin as determined by a ramp test.
- The HRmax used by Garmin is likely both a qualifying workout validation criterion AND a key component in the subsequent calculation of VO2max.
- The authors used a Cosmed Q-NRG MAX in the lab. Gold standard.
Honestly folks. Some random blogger (me) on the internet just shouldn’t be able to make valid criticisms of science research like this. Science is these people’s job; they should have devised a more reliable method to avoid wasting months of their lives. They simply had to buy a slightly more expensive watch or Stryd footpod to negate most of my criticisms.
A meta-analysis by Molina-Garcia et al. (2022) found that exercise-based algorithms in wearables offer higher accuracy at a population level but with large individual variation.
The Study: Garmin Forerunner 245 Under the Microscope
First up. Sample size. Thirty-five moderately-to-highly trained endurance athletes (24 males, 11 females) were recruited for the study. Using a gold standard lab method, they were assessed to have a mean VO2max of 60.1 ± 8.2 ml/min/kg.
- The Gold Standard method involves a ramp test on a treadmill, measuring RER to determine actual VO2max and HRmax.
- A track method was employed to test Garmin’s algorithm. It needed two submaximal 15-minute outdoor runs at an intensity >70% of HRmax, wearing the Forerunner 245 with activated GPS. An external chest strap enhanced HR accuracy. Age/sex data and lab-determined HRmax were manually entered into the device.
To aid later analysis, the athletes were classified as moderately trained (VO2max ≤ 59.8 ml·min⁻1·kg⁻1) and highly trained (VO2max > 59.8 ml·min⁻1·kg⁻1).
Garmin: The ROASTING
The Garmin Forerunner 245 consistently underestimated VO2max compared to the lab measurements. The mean differences were -4.73 ml·min⁻1·kg⁻1 (after run 1) and -4.05 ml·min⁻1·kg⁻1 (after run 2) across all athletes. Other studies show that wearables underestimate VO2max by around 4.36 ml/kg/min.
The smartwatch performed significantly worse for athletes classified as highly trained. The Intraclass Correlation Coefficients (ICC) for this group were low (0.34-0.41), and there were Higher Error Margins with higher Mean Absolute Percentage Error for highly trained athletes of around 10%. In 2017, Firstbeat claims 5% MAPE for running VO2max error.
For highly trained athletes, Garmin 245 systematically underestimated VO2max by 6.25-6.88 ml·min⁻1·kg⁻1.
Anecdotal Evidence: When I was in the highly trained category, Garmin underestimated my VO2max by at least 5. The personal downside we shall see from the next section is that its current estimate of my athletic demise is more likely to be correct 🙁
Why so inaccurate? I’ve already speculated on some reasons why the tech might incorrectly estimate pace, but surely that would also apply to the slower group?
Garmin: Why you can love your VO2max
For athletes who are ‘moderately trained’, this group has much lower Lower Error Margins. MAPE is substantially lower, at 4.1% (Run 1), and further improves to 2.8% (Run 2). The ICCs were also higher (0.63-0.66), indicating a moderate level of agreement.
Another point is that the study might show a positive learning effect in the Garmin algorithm. Ie the MAPE was reduced with a second qualifying run. However, don’t interpret that in the sense that if you can add 100 qualifying runs, the VO2max will become accurate – it won’t. It might become more accurate, but Garmin probably has not fully trained its algorithms on every ability-based subgroup; that said, it will likely have much more training data of moderate-grade athletes. That is perhaps why its algorithm appears to be more accurate at that level.
Also of note is the famous Molina-Garcia meta-analysis found that sports wearables, like Garmin, generally offer higher VO2max accuracy than calculations based on resting conditions. So. It could be worse!
Pervasive Caution
If you want an accurate VO2max, you can only rely on a lab test.
Proprietary algorithms are not publicly disclosed, making it hard for independent researchers to validate them. IMHO, it would be helpful if the likes of Garmin clearly state the inputs (as I have done here), this might help researchers better design studies.
Individual variability is also a factor, making it hard to get accuracy, but there is also a degree of personal responsibility. It’s up to you to understand how to get the best accuracy from your data. You need to get accurate distance/pace, heart rate, power, HR Zone, HRmax inputs and use the correct kit (chest strap, running pods) with the proper settings (GNSS = dual frequency). Simply entering an incorrect HRmax or wrongly setting LTHR/AnT can open up a whole tin of worms on the accuracy stakes across all of Garmin’s advanced physiological metrics. I think most of you reading this know that.
Take Out
If this study is correct, moderately trained athletes might validly monitor their VO2max with a Garmin. However, highly trained athletes should not.
It might be enough for us to observe our personal trends rather than compare VO2max with friends.
But countering that, Garmin’s VO2max is an input for all the following metrics, and you really do have to wonder how correct any of these are:
- Training Status
- Training Load
- Race Time Predictions
- Performance Condition
- Training Effect (Aerobic)
Maybe highly trained athletes should revert to basic sports watches?
Should Garmin and its competitors be more transparent about calculating some of their core metrics? Would that ruin Garmin’s competitive advantage or help us all to get basic metrics that we agree on and which can be independently validated?
Garmin VO2max – how to use Garmin’s tools to optimise it and 5 reasons why you’re doing it wrong
Sources
Main source: https://pubmed.ncbi.nlm.nih.gov/40770433/
Other related, including those using smartwatches
- Molina-Garcia P, Notbohm HL, Schumann M, Argent R, Hetherington-Rauth M, Stang J et al (2022). Validity of estimating the maximal oxygen consumption by consumer wearables: a systematic review
with meta-analysis and expert statement of the INTERLIVE network. Sports Med 52(7):1577–1597. https:// doi. org/ 10. 1007/s40279-021-01639-y - Apple Watch 7: Caserman P, Yum S, Göbel S, Reif A, Matura S (2024). Assessing the accuracy of smartwatch-based estimation of maximum oxygen uptake using the Apple Watch Series 7: validation study. Biomed Eng. https://doi.org/10.2196/59459
- Düking P, van Hooren B, Sperlich B (2022). Assessment of peak oxygen uptake with a smartwatch and its usefulness for training of runners. Intern J Sports Med 43(7):642–647. https://doi.org/10.1055/a-1686-9068
- https://doi.org/10.21315/eimj2018.10.3.8
- https://doi.org/10.1097/00005768-200001000-00012
- https://doi.org/10.1123/jmpb.2019-0066
- https://doi.org/10.3390/technologies11030071
- https://doi.org/10.1007/s00421-019-04142-5
- https://doi.org/10.1113/jphysiol.2007.147629
- https://doi.org/10.1152/japplphysiol.01063.2016
- https://doi.org/10.1055/a-1925-7468
- https://doi.org/10.1136/bjsports-2016-097295
Do you think that highly trained athletes (I see this as pro/semi-pro athletes) rely on watch metrics to see their improvements ? Specially on things as VO2Max which does not seem really useful for their daily labour ?
Actually I still think that all those bells and whistle are made for consumers, not elite/highly trained athletes ? Am i totally off ?
i agree with where you are coming from in general.
specific cases of tech usage will probably link to personal preferences with sponsorship playing a part.
Do I think Jakob Ingebrigtsen and Alex Yee rely on wearable tech?
A: No
I thought lactate threshold was determined by inferring the inflection point by correlating HR with pace, not using VO2 max as input…. Garmin’s own support page says:
“Your device measures your lactate threshold level using heart rate and pace. When exceeding the threshold, fatigue starts to increase at an accelerating rate.”
This implies Garmin looks for where effort (pace) starts pushing heart rate disproportionately higher—i.e. a non-linear “breakpoint.”
yes. good spot, thank you. i’ve clarfied the sentence
I wish they would do a study that didn’t work about the absolute number, and just told us how well whatever number it shows you tracks with improvement
indeed
that’s what good thing that is/will happen with all the AI stuff.
eg Whoop coach does what you say. it can even auto note an improvement and auto find what caused it. but you have to tag your life (assign KEYWORDS to days/events/sleep which are then treated as if numeric)
I use the Vo2 max and other metrics to compare with my own data. So long as it’s trending in the right direction I don’t pay much attention to the actual figure.
Garmin running VO2 max estimate is flawed also because – shoes. Their Cycling estimate must be more accurate because it really doesn’t rely on any speed affecting equipment – or indeed speed at all. Not even GPS and its errors. Thoughts?
oh i see what you mean with super shoes. yep must be a valid point
cycling: i guess there is a power meter error. 1% could be simialr to the 1-2% gain from shoes maybe???
À propos des autres données comme le Statut d’entrainement : si l’erreur reste constante (par exemple toujours 20% de sous-estimation ou 10% de surestimation), ce type d’algorithme n’est pas vraiment impacté. En effet ce sont des valeurs relatives et pas absolues.
Pour les prédictions de course, elles sont souvent assez optimistes je trouve, même si ça dépend des individus bien sûr.
Thanks for a well reasoned article, way above the standard of so much social content.
“Running twice for 15 minutes at over 70% HRmax”: Okay, so the guy who ran a 30-minute light jog (75% HRmax) contributed to the study… How can we expect a plausible estimate of VO2max from two running sessions?
You have to let people run for a few weeks, accumulate runs, sessions at and above threshold, and add a 15-minute maximal test. In short, a real-world example…
only a small percentage of people will do that.
there has (needs?) to be a way to model ‘normal’ activity parameters to get reasonable approximations. Perhaps a better-tweaked model for true athletes.
Sidenote:
If power is need for sthing then power cannot improve the same sthing.
See “This analyses heart rate data alongside pace (for running) or power (for cycling) during certain qualifying activities. For runners, this requires a relatively hard outdoor GPS run of over 10 minutes, whereas for cyclists, it requires a hard but steady ride of at least 20 minutes; the latter’s accuracy is improved with a power meter.”