The female VO2max problem: why your Garmin number may not reflect your fitness

You know how to run well. A week of difficult training and the Garmin VO2max number climbs. Then it drops in the cold, or after a hard block, or for no obvious reason at all. If a lab test produces an entirely different figure, the question is why.
VO2max estimates from wrist devices are based on algorithms built on exercise physiology research that, for many years, predominantly used male participants. That imbalance matters because the accuracy differences are real and can affect how female runners interpret their fitness data.
The algorithm’s blind spot
Wearables do not measure VO2max directly. Instead, the Firstbeat algorithm that powers Garmin’s estimate analyses the relationship between heart rate and running pace during qualifying sessions. It selects 20-30-second windows where heart rate is above approximately 70 per cent of maximum and movement is steady, then extrapolates from those submaximal data points to estimate oxygen uptake at full effort.
The problem starts with the training data behind the algorithm. According to a validation study by Wiecha et al. (2023) involving 5,260 endurance athletes, 84.76 per cent were men, reflecting a well-documented historical pattern of underrepresentation of women in exercise physiology research.
Why female physiology can confuse the algorithm
Women tend to have lower average VO2max values than men due to differences in haemoglobin concentration, body composition, and cardiovascular physiology. Beyond that, factors such as stroke volume, heart rate response, and hormonal fluctuations across the menstrual cycle may affect the physiological signals the algorithm uses to estimate aerobic capacity. The relationship between pace, heart rate, and fitness may therefore be more variable than the prediction model assumes. However, direct evidence linking menstrual cycle phase to Garmin VO2max error is limited, and the wording should remain cautious.
The directional bias
Research has identified sex differences in the direction of VO2max miscalculation. A 2017 study using the Garmin Forerunner found that the predicted VO2max underestimated values for women and overestimated them for men. Polar’s V800 showed the opposite pattern: overestimation for women, underestimation for men. Different algorithms, samples, and testing conditions can all produce directional differences, and the magnitude and consistency of any sex-specific bias vary across devices and generations.
What does appear likely is that certain assumptions built into these algorithms about how heart rate relates to pace, how body composition affects aerobic capacity, and what a normal pace-to-HR curve looks like may not hold equally across sexes. This is a possible explanation rather than a confirmed mechanism.
The heart rate measurement problem
Behind the algorithm lies the optical heart rate sensor, and that is where physics comes in.
Optical sensors measure changes in blood flow under the skin using light. The algorithm assumes the heart rate input is accurate. It frequently is not, particularly when comparing wrist devices to a chest strap. An American College of Cardiology press release summarising research presented at ACC.17 reported wrist device error margins ranging approximately ±15 to ±34 beats per minute under the specific devices and conditions tested. That is not a universal figure for all wrist sensors, but it illustrates how much input error the VO2max calculation can work with. Firstbeat’s own validation reports approximately 5 per cent mean absolute error when paired with a chest strap; wrist-only heart rate raises that to 5-8 ml·kg⁻¹·min⁻¹.
Skin tone is a further variable. Some studies have found higher margins of error in optical heart rate readings for individuals with darker skin tones. However, findings are mixed, and newer multi-wavelength sensors have reduced but not eliminated the gap. See the heart rate guide for a full breakdown of research on optical sensor accuracy.
What the validation studies actually show
When Garmin’s VO2max algorithm is tested against laboratory spirometry, the picture is mixed. In a study of the Garmin Forerunner 245 by Engel et al. (2025), the smartwatch underestimated VO2max on average across athletes, with mean differences of -4.73 ml·min⁻¹·kg⁻¹ and -4.05 ml·min⁻¹·kg⁻¹ for the first and second qualifying runs. Subgroup analysis revealed substantially better accuracy in moderately trained athletes (mean absolute percentage error of 4.1-2.8 per cent) than in highly trained athletes, where the watch underestimated by 6.3 ml·min⁻¹·kg⁻¹ with a mean absolute percentage error of 10.4-9.4 per cent.
The error size varies with athletes’ fitness levels, device models, and generations. The Forerunner 245 figures cannot be applied as a universal correction to other devices.
For female runners tracking VO2max trend to monitor endurance progress, the practical implication is to weigh training pace, race performances, and perceived effort alongside the watch figure rather than relying on the absolute number alone.
Practical guidance: how to use your number
- Treat it as a trend, not a fact. The absolute value on any given day carries limited information. Consistent increases over weeks and months indicate aerobic improvement. A sustained drop signals something has changed: training, recovery, health, or the quality of heart rate data the algorithm is receiving.
- Use a chest strap for important qualifying sessions. Pairing a chest strap removes the largest single source of input error. Firstbeat’s validation places accuracy within approximately 5 per cent of laboratory spirometry under controlled conditions when using chest-strap heart rate. Wrist-only data raises that error substantially.
- Set your actual maximum heart rate. Garmin defaults to the 220 minus age formula, which is typically wrong by 10-15 beats for trained athletes. An inaccurate maximum heart rate setting affects calibration: Firstbeat’s own data show that a 15 bpm error in maximum heart rate produces a 7-9 per cent error in the VO2max output. Getting it right improves the estimate materially. Use a ramp test, a verified hard track effort with progressive pace increases, or a lab measurement. Enter the result manually in Garmin Connect under User Settings.
- Understand device-specific patterns. If your device consistently underestimates your race performance or perceived fitness, note the gap. It is more useful to know your device’s tendency than to take each reading at face value.
- Get lab tested for consequential decisions. If VO2max will inform a significant training or racing decision, whether that is selecting marathon pace, evaluating a training block, or understanding a plateau, a laboratory test with respiratory gas analysis is the only reliable figure. The sports science section covers VO2max testing methodology and what the number actually predicts for race performance.
The bigger picture
Women’s participation in endurance sport has grown substantially over recent decades, increasing the need for reliable fitness measurement. Wearable VO2max algorithms were developed in an era when women were underrepresented in exercise physiology research. More recent validation studies suggest the sex discrepancy in algorithm accuracy may be narrowing, though independent evidence across current device generations remains thin.
Until that evidence base matures, VO2max estimates are best treated as one measure among several, alongside race performances, training pace at given heart rates, and perceived effort. They are estimates derived from a model, not direct measurements of aerobic capacity. This article is part of the site’s female athlete tech coverage, which addresses wearables, physiology, and performance for female endurance athletes.
FAQ
Why does my Garmin VO2max go up and down week to week?
The Firstbeat algorithm reads the relationship between heart rate and running pace during qualifying sessions. Heat, fatigue, poor sleep, a loose strap, and wrist sensor dropout all elevate heart rate relative to pace, which the algorithm reads as reduced fitness. The number reflects input data quality as much as actual fitness change. A single qualifying run with a chest strap in consistent conditions is more informative than several wrist-only readings taken across varying weather.
Is Garmin VO2max less accurate for women?
Validation studies show directional biases that differ between device brands. Some studies find Garmin underestimates VO2max in women while overestimating in men; Polar has shown the opposite pattern on certain devices. The algorithms were developed on datasets that were heavily male-skewed, which may contribute to sex-specific error patterns. However, the size and consistency of the effect vary by device generation and athlete fitness level.
Does using a chest strap improve Garmin VO2max accuracy?
Yes, meaningfully. Firstbeat’s own validation reports a mean absolute error of approximately 5 per cent when paired with a chest strap. Wrist-only heart rate raises that error to 5-8 ml·kg⁻¹·min⁻¹. For any session intended to update your VO2max estimate, a chest strap such as the Garmin HRM 600 removes the largest single source of input error.
How important is setting the correct maximum heart rate in Garmin?
Significant. Firstbeat’s data show that a 15 bpm error in maximum heart rate results in a 7-9 per cent error in VO2max. The default 220 minus age formula is typically off by 10-15 beats per minute for trained athletes. Setting a measured maximum heart rate from a ramp test or a verified hard effort materially improves the estimate. Enter it manually in Garmin Connect under User Settings rather than relying on the default.
What is the most accurate way to find my true maximum heart rate?
A progressive ramp test works reliably: run 400m repeats, increasing pace each rep until you can no longer maintain form, or perform a series of short hill sprints with full recovery between efforts. A 5K race effort rarely produces true maximum heart rate because pacing strategy limits the final push. A laboratory cardiopulmonary exercise test is the most accurate method and also produces lactate threshold data useful for zone setting.
When should a female runner get a lab VO2max test?
When the number will drive a significant decision: selecting marathon pace, assessing whether a training block has produced real aerobic gains, or understanding a performance plateau that wearable data cannot explain. The watch estimate is adequate for week-to-week trend tracking. For decisions where the margin matters, laboratory respiratory gas analysis is the only reliable figure.
Why is Garmin VO2max less accurate for highly trained athletes?
Engel et al. (2025) found a mean underestimation of 6.3 ml·min⁻¹·kg⁻¹ in highly trained athletes on the Forerunner 245, compared with 2.8-4.1 per cent error in moderately trained athletes. The likely reason is that the algorithm’s submaximal pace-to-HR model was validated predominantly on recreational and moderately trained populations. Highly trained athletes have unusually low heart rates at given paces relative to the model’s expectations, which the algorithm misreads as a lower fitness level than is actually present.
Does the menstrual cycle affect Garmin VO2max readings?
Plausibly, though direct evidence is limited, progesterone in the luteal phase elevates heart rate at a given pace, which the algorithm could interpret as reduced fitness. HRV changes and elevated resting heart rate during the luteal phase may also affect recovery scores that feed into the broader training load picture. Tracking VO2max alongside cycle phase over several months is the best way to identify whether a personal pattern exists. See the sports science section for more on hormonal effects on wearable data.
Last Updated on 19 June 2026 by the5krunner

Shradha Puri is a tech writer covering fem tech, wearables, consumer technology and AI-powered gadgets. With a background in marketing and editorial strategy, her work focuses on how emerging technology is influencing health, fitness and everyday consumer experiences. She closely follows the tech space, with a particular interest in sleep, recovery and health tracking wearables.
