Apple Watch Ultra 2 Accuracy – Beats Garmin again when using watchOS 26
@TheQuantifiedScientist (Rob) produces some great content considering the accuracy of wearables and related tech. He repeats the same test on multiple devices over time and has some great charts to show them all pitted against each other.
Like other reviewers, he’s not perfect and is limited to a N=1 sample size. Nevertheless, getting some alternative perspectives and different ways to look at accuracy is great.
Rob typically looks at
- simple heart rate covering indoor cycling, weights, outdoor cycling and running
- sleep state assessment versus an electric brain wave sensor (deemed to be the best at-home device)
- GPS tracking
Here’s what he found about Apple Watch Ultra 2 on watchOS 26; I believe you should listen to some of what he says.
Sleep Stage Tracking
Before looking at @TheQuantifiedScientist, let’s look at the accuracy of what he compares to (I do as well sometimes).
-
Gold Standard Polysomnography Reliability/Accuracy: From large-scale studies, overall inter-rater agreement among human scorers is typically around 83%
- Zmax has 72% Reliability/Accuracy (Esfahani, 2023). (that is an absolute 72% not 72% of 83%)
| Sleep Stage | Polysomnography (PSG) Accuracy | Hypnodyne ZMax Accuracy |
|---|---|---|
| Wake | ~85% | 77% |
| N1 (Light Sleep) | ~60% | 40% |
| N2 (Light Sleep) | ~80% | 81% |
| N3 (Deep Sleep) | ~85% | 82% |
| REM | ~85% | 83% |
To be absolutely clear: There is no way that your wearable will be as intrinsically accurate as the Zmax, let alone the polysomnography. Your watch is definitely less than 72% correct. But Rob finds the Apple Watch Ultra 2 to be closer to that 72% than most.
He compares the Apple Watch Ultra 2 to the Zmax EEG headband to assess sleep stage tracking. Wearables use HR, HRV, movement, and other metrics to estimate sleep stages, but the Zmax considers electrical brain waves. Sleep stages are brain events, so Zmax measures them closer to the source rather than inferring them from other body physiology. The metrics considered are:
- Percentage Agreement: This measures how well the Apple Watch’s predictions for specific sleep stages align with the reference device’s.
- Agreement for Specific Sleep Stages: Rob finds the agreement for deep sleep (e.g., ~74% agreement), light sleep (e.g., ~87% agreement), and REM sleep (e.g., ~72% agreement).
Take Out
The frequency of Rob updating his sleep test results and historic sleep stage results database is second to none. He’s my go-to source for sleep stage info. Against that, you must understand that sleep stage tracking is ‘a bit of fun’ in wearables use. As an athlete, you might pay some attention to DEEP SLEEP (N3) as that stage is more likely to have a closer agreement to reality, and this is the stage where your body physically adapts to your workouts. Furthermore, the differences in sleep stages between people are HUGE, and Rob’s N=1 sample size may very well bear zero relationship to the accuracy you receive.
Garmin users will generally not want to switch to Apple Watch, so your best options for improved sleep stage data are to go with Oura Ring or Eight Sleep – I use both, and Rob recommends both. Eight Sleep will improve your sleep stages and general sleep quality (check out this Eight Sleep review).
Check out his full video


My recommendation for sleep stage tracking, honestly, is to ignore sleep stage tracking. I think DC rainmaker says it best — we would not accept < 80% accuracy for pretty much any other metric, so why would we for sleep stages? Time asleep vs awake, resting heart rate, and heart rate variability are all pretty good for wearables (apple, Garmin, whoop, etc) while at rest, and give a pretty decent picture of recovery. No need to muddle it up with unreliable light/deep/rem data
FWIW: I’ve been saying that about polysomnogrpahy for WAY longer than ray has. probably a few years longer
again, as i’ve said above the situation regarding the different stages is more nuanced. each can be detected with differing degrees of reliability @tqs also says this.
accuracy: i think the situation is more nuanced than what you say for the vast majority of people. most people simply accept a device is correct because someone once told them it was. ohr in sport can quite easily be <80% accurate. most of you reading this kind of blog are not ‘normal’ (like me) in the sense that we probably deeply care about our sports data and that it is correct.
The way that I look at it, is that if the sleep results are way off, what other metrics are off? AW heart rate numbers are extremely accurate whereas Garmin, not so good. Even Rob says that Garmin’s GPS is second to none and then there’s the battery life consideration. I’ve got both an AWU2 and a Fenix 8. Unfortunately, having had a pacemaker installed a little while ago, but still pretty active, Garmin records no sleep data. Tech Support and articles say pacemakers interfere. I actually love the Fenix but it has to go. AWU3 here I come. 5Krunner many thanks for your website, I love it.
ty for your kind words.
I don’t think rob’s GPS tests are varied enough yet. Look at what dcrainmaker and i (and others) do in that area.
oura/eight sleep are the best way to go for the more detailed sleep stuff, if that’s what people want.
if you are an athelte then hrv is best handled by waking readings (hrv4training) or something like polar’s orthostatic test. i dont know why garmin does not do one of these, very odd.
AW takes HRV readings every 15 minutes or so.
pacemaker: the articles you indicate would seem sensible to me but i’ve never looked into it
Does it mean that this new ‘intelligent alarm clock’ feature is basically just a gimmick?
in theory it’s a great idea.
if it identifies the correct stage in your sleep cycles all would be good i imagine.