Fitbit AIR vs watchOS 27: HR and HRV Accuracy Tested

Fitbit AIR vs watchOS 27: HR and HRV Accuracy Tested

After the complete lack of excitement in yesterday’s watchOS 27 announcement, I decided to test it to see if anything had changed regarding the HR recording frequency or accuracy. It’s possible that some new methods or algorithms were sneaked in to be tested by an unsuspecting public ahead of a possible new optical sensor in September on the new Watches. I am that unsuspecting member of the public. Let’s go.

The Tests

Firstly, when I get a new Apple Watch or a new version of watchOS, for the last 5 or so years, I’ve always checked whether the HRV recording frequency had changed. So that was one test.

Secondly, I did quite a hard yoga and legs session at the gym last night, so obviously, when you are quite tired, you do a few running intervals the next day. 5x 3″ seemed enough.

The Gear

If you’ve read this blog from the early days, you’ll know that I’ve always had a slightly unhealthy interest in HR and HRV. That’s still true. Recently, sports gear has made things more exciting for me because it means there are more reasonably reliable HR straps I can use at the same time. So, even though I know I look like a complete “$%”£$%£$%, I went out festooned with Amazfit Helio Strap 1, Fitbit AIR, Whoop MG (biceps), Apple Watch Ultra 3, Polar SENSE (biceps), Fourth Frontier ZONE (linked to FR970). I know, greedy, right?

Heads up: you can’t take these wrist results as the gospel truth, since they were worn quite close together. The results for them aren’t going in any review, so I deem that to be Ok.

Results overview

The easy one is the notes on HRV. I upgraded AWU3 to watchOS 27 (full explainer — just the facts) and looked at the recorded HRV values in Health. The sampling rate duration is exactly 1 minute, and the frequency is once every 15 minutes. So that’s 14 minutes between samples, during which nothing is recorded. Look at it another way, that is 4 minutes of recording per hour. And that’s the maximum. If Apple Watch detects that lots of movement has caused problems with the HRV reading, it will save and try again later. I’m not sure of the precise recovery mechanism, but it’s something along those lines. FYI, Apple used the SDNN formula, not RMSSD, like everyone else. This HRV explainer tells you why.

watchOS 27 HRV sampling: Apple Watch records HRV for exactly 1 minute, once every 15 minutes overnight. That is 4 minutes of data per hour at best. If movement corrupts a reading, the watch skips and retries later. Apple uses SDNN, not RMSSD. Every other major recovery tracker uses RMSSD.

Contrast that with several other devices that take constant readings. Or with Whoop, which selectively samples certain periods it deems appropriate during the night. Whatever the merits of these alternate methods, it’s abundantly clear that the non-Apple brands record more data. Put another way, there is a decent chance that the infrequent polling by Apple will miss things. I know the guys at Bevel would love Apple to provide more frequent data to put their source on par with what Whoop owners enjoy every night, but there is little Bevel can do about it.

Note: There is chatter and supposed leaks suggesting that Apple will launch a new optical HR sensor in September. Based on the past cadence of its OHRMs, a new one was due last year, so there is a good chance one will come this year. Further, an even better change is that it will be lower-powered and better able to continuously read HR and HRV.

I also discovered what had been happening with Fourth Frontier ZONE. Even though it’s an ECG, saving multiple readings per second, it only saves one reading per 20 seconds to a FIT/TCX when data is manually exported from its ecosystem, which explains why I was only seeing about one pair of data with Fitbit AIR, which saves once every two seconds — 2×20 = 40. Further, it doesn’t always save HR data to Strava unless there is a GPS point to send as well (this is now a lapsed legacy restriction, IIRC), so if I want per-second HR data from Fourth Frontier ZONE, I have to record it on a watch. Luckily, I have a few of those.

Full test overview

As you can see, this is a bit of a mess. Fitbit AIR took about 10 minutes to work out what data was my heart rate. From other tests, it seems to detect multiple frequency (cadence) streams and then latch onto the one it thinks is correct. Sometimes it gets confused and switches between the two. Here, it was just confused until about 19:00, after which it performed very well. For me, on my 1-hour run, you can clearly see that the net effect on Fitbit AIR is that I performed the equivalent of about 7 intervals, not the 5 I actually performed. This will make a material difference to anything done with this data later. Put simply, it’s wrong. I was doing intervals at the start. Maybe it just takes a while to warm up in some circumstances. Let’s say it’s you doing your 30-minute run; one-third of it could be recorded as a Zone 4 or Zone 5 effort. That isn’t wrong, it’s very wrong. $$$$ Buy it if you want to. Just remember that most of the time you will only wear one band, unlike me, so you will probably have no idea if it is right or not. Most of the time, when you look at it, it will be right (see below). But…I hope I’ve made my point.

Then Whoop has a slight wobble at the start, doing something similar to AIR, but the discrepancy with reality is relatively trivial. It will have virtually no impact on the load or impact calculation for its version of my session.

Then we can see the Helio Strap (latest firmware) has quite a few dropouts, especially at the start. This looks quite bad, but I guess that these will have little impact on Amazfit’s calculations of the workout data.

Finally, Apple with watchOS 27. The dropouts are relatively normal. Often, reviewers will apply smoothing to the data, so short, infrequent dropouts don’t show, and there’s only a very slight blip in the moving average. Like the Helio Strap, a few short dropouts are no massive issue. However, the underreporting at 32:00 is relatively unusual. This might be a new behaviour or a new algorithm, and either way, it’s something worth keeping an eye on.

Onto the detailed bit after the chart.

 

HR trace comparison chart: Fitbit AIR, Whoop MG, Amazfit Helio Strap, Apple Watch Ultra 3 and Polar SENSE during 5x3-minute running intervals

 

 

Statistical analysis

I wanted to give the devices a fair shot. Handily, I used DCRainmaker’s Analyser tool (beta), which lets me select a specific time period within the overall exercise and show the heart rate stats for that period. So I chose the five 3-minute intervals as shown in the next chart.

A quick visual look at the great detail we now see shows there should only be issues in the stats with Helio’s dropouts, Apple’s longer dropouts and underreporting of the first interval, and Fitbit AIR’s mess-up for the first 30 seconds. You would hope for lots of nice general statistical agreements here, with Helio and AWU3 lagging a bit behind.

DCR Analyzer detailed HR comparison for 5x3-minute intervals: Fitbit AIR, Whoop MG, Amazfit Helio Strap, Apple Watch Ultra 3, Polar SENSE and Fourth Frontier ZONE

But that’s why we have stats. To quantify what we might see. Or not see.

Is Fourth Frontier ZONE the correct reference?

For this interval-running test, the Fourth Frontier ZONE chest strap should probably be treated as the primary reference device, because chest-strap ECG measurements remain the gold standard for exercise heart-rate monitoring. However, the Polar Verity Sense on the biceps is also generally regarded as a highly accurate reference-grade optical sensor, and in this dataset, the two agree remarkably closely.

The comparison between Polar Sense and Fourth Frontier ZONE is particularly informative:

  • Bias: +1.4 bpm
  • Limits of Agreement (LoA): -6.8 to +9.7 bpm
  • n = 1,546

These are very tight limits for a session with running intervals, suggesting that both devices tracked HR changes very similarly. Since the Polar is worn on the biceps and uses optical sensing, while the ZONE uses chest-based ECG, I would still use the ZONE as the formal reference. Still, the data indicate that either device could credibly serve as a benchmark in this specific test.

Device-by-device: versus Fourth Frontier ZONE

Polar Sense (biceps optical)

  • Bias vs ZONE: +1.4 bpm
  • LoA: -6.8 to +9.7 bpm
  • n = 1,546

This was the closest match to the chest strap. The low bias and narrow LoA indicate excellent tracking during interval changes. It effectively validates the quality of the reference dataset.

WHOOP MG (biceps optical)

  • Bias vs ZONE: +1.1 bpm
  • LoA: -6.2 to +8.3 bpm
  • n = 1,546

WHOOP was surprisingly strong in this test, showing almost identical agreement to the Polar Sense. The very narrow LoA suggest it tracked rapid HR changes unusually well for a wrist/arm optical device. Based on this session alone, WHOOP delivered near-reference performance.

Fitbit AIR (wrist optical)

  • Bias vs ZONE: +0.5 bpm
  • LoA: -6.5 to +7.5 bpm
  • n = 689

Fitbit AIR produced the smallest bias against ZONE and similarly tight LoA. However, the sample size was much smaller (689 samples versus 1,546 for most other comparisons), so confidence is slightly lower. Within the available data, performance was excellent. For more on why the sample count is lower, see Fitbit AIR’s 2-second sampling rate explained.

Apple Watch Ultra 3 (wrist optical)

  • Bias vs ZONE: -1.0 bpm
  • LoA: -29.3 to +27.2 bpm
  • n = 1,546

Average HR was very close to the reference, but the LoA are much wider than those of Polar, WHOOP or Fitbit. This suggests occasional larger deviations during interval transitions despite a low overall bias. The AWU3 appears accurate on average but is less consistent during rapid changes in intensity. For a full picture of the Apple Watch Ultra 3 for endurance sport, see the guide.

Amazfit Helio Strap (wrist optical)

  • Bias vs ZONE: -0.3 bpm
  • LoA: -28.5 to +28.0 bpm
  • n = 1,208

The Helio Strap’s average HR was almost identical to the chest strap, but the wide LoA indicate substantially greater variability than Polar, WHOOP or Fitbit. Like the Apple Watch, it appears capable of producing accurate average HR values while occasionally missing or lagging interval-related HR fluctuations.

Overall ranking — Fourth Frontier ZONE as reference

Based primarily on agreement with the Fourth Frontier ZONE chest strap:

  1. Polar Sense — closest established reference-grade match (LoA -6.8 to +9.7 bpm, n=1,546)
  2. WHOOP MG — virtually identical performance to Polar (LoA -6.2 to +8.3 bpm, n=1,546)
  3. Fitbit AIR — excellent agreement, though based on fewer samples (LoA -6.5 to +7.5 bpm, n=689)
  4. Apple Watch Ultra 3 — low bias but wider variability (LoA -29.3 to +27.2 bpm, n=1,546)
  5. Amazfit Helio Strap — similarly low bias but wide variability (LoA -28.5 to +28.0 bpm, n=1,208)
Five-device summary vs Fourth Frontier ZONE (ECG reference):

Device Wear position Bias LoA n
Polar SENSE Biceps optical +1.4 bpm -6.8 to +9.7 bpm 1,546
WHOOP MG Biceps optical +1.1 bpm -6.2 to +8.3 bpm 1,546
Fitbit AIR Wrist optical +0.5 bpm -6.5 to +7.5 bpm 689
Apple Watch Ultra 3 Wrist optical -1.0 bpm -29.3 to +27.2 bpm 1,546
Amazfit Helio Strap Wrist optical -0.3 bpm -28.5 to +28.0 bpm 1,208

Key takeaway

If the chest-strap ZONE is used as the reference, Polar Sense, WHOOP, and Fitbit AIR all stayed within roughly ±7–10 bpm in 95% of observations, which is excellent for interval running. The Apple Watch Ultra 3 and Amazfit Helio Strap achieved similarly low average errors. Still, they showed much wider limits of agreement (around ±28–29 bpm), indicating less reliable tracking of rapid HR changes despite accurate overall averages.

What if Polar Sense is the reference?

I’ve still not finished. Instead, what if we were to say that Fourth Frontier ZONE isn’t a recognised gold standard? Even though it’s an ECG, it hasn’t been as widely used as the Polar H10. From my experience, Polar SENSE feels reliable. So let’s treat that as the reference instead. The overall story remains almost identical, with some twists and turns.

Polar Sense and Fourth Frontier ZONE are already very close to each other, so not much changes.

  • Bias: +1.4 bpm
  • LoA: -6.8 to +9.7 bpm
  • n = 1,546

That’s a small enough difference that switching reference devices doesn’t materially alter the conclusions.

Device-by-device: versus Polar Sense

Device Bias vs Sense LoA n
WHOOP MG +0.3 bpm -2.2 to +2.8 bpm 1,546
Fourth Frontier ZONE +1.4 bpm -6.8 to +9.7 bpm 1,546
Fitbit AIR +0.9 bpm -8.2 to +10.1 bpm 689
Apple Watch Ultra 3 +2.5 bpm -24.5 to +29.4 bpm 1,546
Amazfit Helio Strap -1.7 bpm -28.3 to +25.0 bpm 1,208

What changes?

WHOOP looks even better

The biggest change is WHOOP.

Against Polar Sense, it has:

  • Bias 0.3 bpm
  • LoA only -2.2 to +2.8 bpm
  • n = 1,546

Those are extraordinarily tight limits for an interval session. They imply that WHOOP and Polar Sense were effectively producing the same HR trace most of the time.

You keep reading comments from everyone that Whoop isn’t accurate. Sorry folks. As I’ve been saying since the day Whoop 5/MG was launched, it’s really accurate for the biceps. And as anyone reading this already knows, wrist-based HR is not reliably accurate. For a full comparison of recovery trackers including WHOOP, Fitbit AIR and Polar, see the guide.

WHOOP MG vs Polar SENSE (biceps to biceps): Bias 0.3 bpm. LoA -2.2 to +2.8 bpm across 1,546 samples. That is a tighter agreement than most chest-strap-to-chest-strap comparisons in the published literature. The WHOOP MG is worn on the biceps, not the wrist.

Fourth Frontier ZONE and Fitbit AIR become almost interchangeable

ZONE and Fitbit AIR both sit in the next tier:

  • ZONE: ±8 bpm-ish agreement
  • Fitbit AIR: ±9 bpm-ish agreement

The difference between them is small enough that I’d hesitate to claim that one clearly outperformed the other from this dataset alone, especially given Fitbit’s smaller sample size. This is especially interesting, as AIR is performing as well as an ECG strap. Well, at least it is when it bothers to record data!

Apple Watch Ultra 3 and Amazfit Helio Strap

Apple and Helio remain clearly behind, as expected from the visual inspection. Neither benefits materially from changing the reference:

  • Apple: roughly ±27 bpm LoA
  • Helio: roughly ±26–28 bpm LoA

Both still show much greater variability than the leading group.

Revised ranking — Polar Sense as reference

  1. WHOOP MG (by a noticeable margin)
  2. Fourth Frontier ZONE
  3. Fitbit AIR (very close to ZONE, but lower n)
  4. Apple Watch Ultra 3
  5. Amazfit Helio Strap

The interesting statistical point

The fact that WHOOP is closer to Polar Sense (LoA ≈ ±2.5 bpm) than Polar Sense is to ZONE (LoA ≈ ±8 bpm) suggests one of two things:

  1. WHOOP genuinely tracked the Polar Sense exceptionally well throughout the interval session
  2. WHOOP and Polar Sense may have shared similar smoothing or response characteristics, causing them to agree slightly better with each other than either does with the chest strap during rapid HR transitions.

Without seeing the time-series traces, I’d be cautious about declaring WHOOP “more accurate than the chest strap.” The safer conclusion is that WHOOP’s output was almost indistinguishable from Polar Sense’s output in this test, while both remained very close to the chest-strap reference.

Composite benchmark (Polar Sense + Fourth Frontier ZONE averaged)

Here’s another thought. Because Sense and ZONE already agree closely, averaging them to create a new reference should reduce random disagreement between them and yield a more stable benchmark. Let’s see what happens.

Justification

Using an average of the two best-performing reference devices would not materially change the conclusions from this interval test.

The two reference devices already agree closely:

  • Polar Sense vs Fourth Frontier ZONE
    • Bias: +1.4 bpm
    • LoA: -6.8 to +9.7 bpm
    • n = 1,546

This level of agreement suggests either could serve as a benchmark. Averaging them reduces the impact of any small errors from either device and creates a more robust reference.

Rankings

Rank Device Assessment
1 WHOOP MG Closest overall to the combined reference.
2 Fitbit AIR Very close to the combined reference, although based on a smaller sample (n=689).
3 Apple Watch Ultra 3 Good average accuracy, but larger errors during interval transitions.
4 Amazfit Helio Strap Similar average accuracy to Apple Watch, but with substantial variability during intensity changes.

Conclusion

Whether the benchmark is:

  1. Fourth Frontier ZONE alone,
  2. Polar Sense alone, or
  3. An average of Polar Sense and Fourth Frontier ZONE,

The practical outcome is the same: WHOOP MG (biceps) and Fitbit AIR (wrist) form the leading group for interval heart-rate tracking. In contrast, Apple Watch Ultra 3 (wrist) and Amazfit Helio Strap (wrist) show substantially greater variability despite having low average error.

Aggregate conclusion by wear position

I looked at some more stats, and you’ve seen enough already. So let’s cut straight to the chase: if you aggregate the results to BICEPS vs CHEST vs WRIST. Based on this single interval-running test:

  1. The chest and biceps positions form the top tier, producing very similar heart rate data.
  2. Biceps optical sensing appears much closer to chest-strap performance than wrist optical sensing.
  3. Wrist-mounted sensors can accurately measure average heart rate but are more prone to transient errors during rapid changes in intensity.
  4. The biggest determinant of accuracy in this dataset appears to be sensor location, not sensor technology. The two biceps optical devices were substantially closer to the chest strap than either of the wrist-worn optical devices.

Now who would have thought that 😉

A concise summary would be:

During interval running, upper-arm/biceps optical sensors produced heart-rate measurements that were broadly comparable to a chest strap, while wrist-based optical sensors showed approximately three to four times wider limits of agreement, indicating less reliable tracking of rapid heart-rate changes.

Biceps vs wrist during intervals: Biceps optical devices (WHOOP MG, Polar SENSE) produced LoA of roughly ±7–10 bpm versus the ECG reference. Wrist optical devices (Apple Watch Ultra 3, Amazfit Helio Strap) produced LoA of roughly ±27–29 bpm. The difference is three to four times wider, not a marginal gap.

Take Out

Depending on how you slice, dice and choose the data, you could probably argue in favour of most of the devices on test today.

Personally, I would reject Fitbit AIR for personal sports use due to its performance at the start of a workout and its inability to provide per-second data. I have it on my wrist as I type this, and I’m perfectly happy with it as a non-sports tracker. For a broader assessment of Fitbit AIR and WHOOP MG in a HYROX accuracy test, that article covers a different effort profile.

 


Frequently asked questions

These questions draw on the sports science methodology used throughout this test.

What is the HRV sampling frequency on watchOS 27?

watchOS 27 records HRV for exactly 1 minute once every 15 minutes overnight, producing a maximum of 4 minutes of data per hour. If movement corrupts a reading, the watch skips the sample and retries later. Apple uses SDNN rather than RMSSD, which is the formula used by every other major recovery tracker including WHOOP, Oura, and Garmin.

Is Fitbit AIR accurate for interval running?

In this test, Fitbit AIR produced a bias of just +0.5 bpm against the Fourth Frontier ZONE ECG chest strap across 689 samples, with limits of agreement of -6.5 to +7.5 bpm. That is excellent when it is recording correctly. The problem is the startup period: AIR took approximately 10 minutes to lock onto the correct signal in this session, generating false interval counts in the first third of the run. On a 30-minute run, that matters. On a 60-minute run with intervals concentrated at the start, it produces materially wrong zone data.

Is WHOOP MG accurate for heart rate during intervals?

Yes, at least when worn on the biceps. Against the Fourth Frontier ZONE ECG reference, WHOOP MG produced a bias of +1.1 bpm and LoA of -6.2 to +8.3 bpm across 1,546 samples. Against Polar SENSE as the reference, the agreement tightened further to a bias of +0.3 bpm and LoA of -2.2 to +2.8 bpm. The WHOOP MG is designed for biceps wear, not wrist wear. Wrist-based heart rate is a separate and less reliable proposition.

Does Apple Watch Ultra 3 accurately track heart rate during running intervals?

On average, yes. The bias against the Fourth Frontier ZONE ECG reference was just -1.0 bpm across 1,546 samples, which is very good. The problem is variability: the limits of agreement were -29.3 to +27.2 bpm, indicating that during rapid intensity changes, individual readings can be substantially off even when the average is close. For steady-state running, AWU3 HR data is generally reliable. For interval tracking where zone accuracy during transitions matters, the wider LoA are a genuine limitation.

Is biceps optical HR more accurate than wrist optical HR?

In this test, yes, substantially. Both biceps devices (WHOOP MG, Polar SENSE) produced limits of agreement of roughly ±7–10 bpm against the ECG reference. Both wrist devices (Apple Watch Ultra 3, Amazfit Helio Strap) produced limits of agreement of roughly ±27–29 bpm. The gap is not marginal: wrist LoA were three to four times wider. The decisive factor appears to be sensor location, not sensor technology. Both the biceps and wrist devices use optical PPG sensing.

What does Limits of Agreement (LoA) mean in heart rate accuracy testing?

Limits of Agreement, calculated using the Bland-Altman method, describe the range within which 95 per cent of individual differences between two devices fall. A LoA of -6.8 to +9.7 bpm means that in 95 per cent of individual readings, the device was within roughly 7–10 bpm of the reference. A LoA of -29.3 to +27.2 bpm means individual readings could be nearly 30 bpm off in either direction while still falling within the normal range for that device. Bias (mean difference) tells you about systematic offset; LoA tells you about consistency.

Why does Fitbit AIR have fewer data samples than the other devices?

Fitbit AIR records heart rate once every two seconds, giving it a maximum of 30 samples per minute. Most competing optical devices sample more frequently. The Fourth Frontier ZONE ECG exports one data point per 20 seconds to FIT/TCX files, so each matched pair requires both devices to have recorded at the same timestamp, further reducing the usable sample count. The result in this test was 689 matched samples for Fitbit AIR versus 1,546 for devices with higher sampling rates. See the full explainer on Fitbit AIR’s sampling rate for more detail.

Last Updated on 9 June 2026 by the5krunner


My favourite kit and nutrition

  • Maurten — the race nutrition trusted by elite athletes. Gels and drink mix engineered to be easy on the stomach.
  • Garmin 90-degree charging adapter — the small adapter that keeps your charging cable tidy at the stem. Essential for race day.
  • Garmin charging puck — the fastest and most reliable way to top up your Garmin before a session.
  • Ravemen FR300 — front light that mounts directly under your Garmin or Wahoo head unit. Keeps your bars clean and your beam pointed where it matters.
  • Garmin Varia RTL515 — radar rear light that alerts you to vehicles approaching from behind. Pairs with your Edge or Garmin watch.
  • Stryd — the footpod that brings running power to your Garmin. The single most useful running upgrade I have made.
  • Favero Assioma Pro RS2 — the power meter pedals most serious cyclists end up choosing. Accurate, easy to move between bikes.


Reader-Powered Content

Buy me a coffee

This content is not sponsored. It’s mostly me behind the labour of love, which is this site, and I appreciate everyone who supports it.

Support the site: Follow (free, fewer ads) · Subscribe (paid, ad-free) · Buy Me A Coffee ❤️

All articles are written by real people, fact-checked, and verified for originality. See the Editorial Policy. FTC: Affiliate Disclosure — some links pay commission. As an Amazon Associate, I earn from qualifying purchases.

Leave a Reply

Your email address will not be published. Required fields are marked *