It might seem superficially obvious to some people that “the most accurate power meter” might be the best power meter. Or maybe the most repeatably accurate power meter is the best?
But then maybe the most repeatably accurate power meter involves: a huge sum of money; an inconvenient placement on your bike; and a usage case totally different to the one you need. I covered that a while ago whilst looking at the fabled ‘Best Power Meter‘ for differing types of rider uses.
But even if you want/need an accurate power meter then how do you know how accurate it really is? I mean REALLY know? Will the manufacturer test the accuracy of every power meter they ship? Will they test it meaningfully? Might that accuracy change because of flaws in your installation of it? Or might accuracy just change over time despite your best calibration efforts?
I was going to post a quick table of Power Meter Accuracy. I was just trying to be helpful to someone, somewhere. Just to give them a steer. But as you’ve already guessed I opened a can of worms 😉
This is the list of stated manufacturer accuracy levels. We start by seeing the alphabetic brilliance of 4iiii being named as they are:
- +/- 1%
- 4iiii Precision
- Favero ASSIOMA (V2 firmware update for IAV)
- Rotor
- SRM
- Verve InfoCrank
- Garmin Vector 3
- +/- 1.5%
- Powertap Hub/Pedals/Crank +/- 1.5%
- Quarq Riken +/- 1.5%
- Watteam PowerBeat- +/- 1.5%
- Garmin Vector 1 & 2
- +/- 2%
- Easton Cinch
- FSA Powerbox
- Pioneer Crank
- Polar/Look
- Stages
- Favero bePRO
- +/- 3%
- Powerpod
I may have missed some but that will do for now and please let me know if the manufacturer claimed figures are incorrect.
BUT
I don’t believe the figures…at least not all of them.
Cycling Weekly noted that in a test of 54 power meters (23 models), scientists “ found that individual power meters deviated a lot, even when units came from the same manufacturer. The scientists are concerned that six units deviated by more than five percent, including products from Stages, Quarq and power2max.“. The same study found SRM and PowerTap to be the best also noting that “Power meters used by elite and recreational cyclists vary considerably in their trueness.”
Let’s say I compare 4 power meters simultaneously, how do I know which, if any, are right? I guess you could compare: dual pedals; dual cranks; and a spider of some sort. But they should all be naturally different to a fourth PowerTap G4 wheel because of, for example, drive train loss. If 3 show the same readings does it mean they are all right? Or does it mean that as a reviewer I was given new and specially calibrated units by the manufacturer that maybe would not be given the same level of attention as a normal production model for you? I suppose that it is possible. I have no idea either way. The study referenced by Cycling Weekly, above, does kind of hint at something like that potentially happening.
Cycling Weekly again note “The scientists don’t know whether the worst meters are not well calibrated before they leave the factory or if their set up deteriorates during use.”
So maybe the best review would be of a genuinely retail bought unit, and that would have to have been bought ‘blind’ without the manufacturer’s knowledge. Maybe that unit should then get 1000 miles of use before tests commence? Just a thought.
The units I get my hands on (temporarily from friends as well as manufacturers) mostly seem either ‘about right’ or patently wrong. I don’t seem to find a middle ground.
Is it important?
Let’s say I’m doing 250watts for ‘quite a while’ I could certainly tell if I was doing 260w (+4%) but I’m not sure that I could spot 2% variation at 255w even after half an hour. I might explain it away as having a good/bad day.
So: is the +/- 3% of the PowerPod so bad? On that basis, maybe not.
Summary for Jo-Average Cyclist: Accuracy and trueness seem to be really, really, really important. But on a day-to-day basis, I’m not so sure if most of us really, really, really know if we are getting that accuracy. So is it important?
😉 I might have to start to like optical heart rate monitors now 😉
The challenge with the study (while well intentioned and in general I liked their methodology), is that they skipped over the most important part: Model numbers.
They assumed that an SRM from this year is the same as an SRM from 10 years ago. Just like they assumed a Quarq from now is the same as 5 years ago or 8 years ago. I inquired to the study authors to get more specifics, and they admitted (honestly), they didn’t know all those details. The problem was most of the equipment was loaned.
If there’s any bike component I’d never trust second-hand it’s a power meter. At least not unless it went back to the manuf and got checked out. So for example the Power2Max units could have been from 2011-2012 in the days of early accuracy challenges. Which are totally different beasts to units today (entirely different designs in fact).
The challenge with the GCN video’s software site, while again a good attempt, is that it doesn’t realistically account for wind or rolling resistance (tire inflation), aeroness, etc… – which in the confines of what we’re talking about (a few percent) actually do matter significantly. Far more significantly than they realize.
Which isn’t to say there’s any perfect way to do testing (except that the above aren’t great). For example, my comparing 3-4 units at once has ways it too can go wrong and sometimes you have to ‘know by gut’ when a power meter is wrong in that case (if say 3 disagree but the 4th is actually the right one). But too many tests are done indoors these days, and that’s simply not where power meters fail today*. Most fail outdoors, specifically with vibrations like cobbles/rough roads, or temperature swings. Ironically enough, SRM of all units is the most susceptible to temperature swings…but nobody ever shows that because it goes against the whole ‘gold standard’ thing.
Anyway…
*Except ROTOR 2INPower, LIMITS, and early WatTeam units, somehow.
**Fwiw, I haven’t seen much of difference between manuf and retail provided units for review (I’ve also even bought some units second-hand). With automation testing every power meter coming off the line from every major vendor, it’s basically either damage during install/shipping that tends to cause issues. Or some odd burn-in issue not seen in shorter manuf QA tests. Most PM issues these days are systematic of design issues and easily spotted within 2-3 rides.
Thank you for taking the time to write a comment that’s probably longer than the post!
I’d certainly sometimes agree with your ‘gut feel’ comment and it’s further interesting that you say that there isn’t ‘much’ difference between retail and manuf provided units – but that implies still SOME difference. But then how could, say, a 0.25% difference be certainly and accurately quantified?
I’m 99% sure I was once sent a PM by the manufacturer and that extra testing/calibration had been peformed by them prior to despatch; I would imagine for you that EVERY manufacturer does that.
Yeah, I say ‘much’ in the general sense of I can’t remember seeing any case where it mattered favorably in their direction. For example, WatTeam sent an entire team of three guys for three days up to both watch me install a unit and then install a second unit too in case I had issues. And then…it failed. Both did.
Power2Max PR sent me a unit…but then forgot to load the firmware unlock for BLE.
Quarq PR sent a unit that had spikes on it.
FSA a unit that doesn’t seem to temp cal properly.
4iiii sent three units, all tested on bikes at their place first, then had me do an actual test at their HQ…also all failed.
Stages this summer spent an extra morning (beyond the first day) with me trying to get LR to work, after swapping to another bike too. No luck – still weird data.
It’s actually kinda cool because power meter data is so incredibly black and white compared to other aspects. So the second they all leave, I just keep doing what I do and gather boatloads of comparative data. Spikes and drops are easy to spot and easy to point out.
Heck, I’d even wager at this point I almost have better luck with retail units than not!
yes, I guess accuracy is black and white.
I’m still trying to get my head around the multi-device tests. I understand where you come from totally.
Personally I find that more than 2 recording devices (excl a backup) just means more screens that I simply can’t and don’t look at. I could have 3 devices actively recoridng for 10 hours each. But in reality they just become dumb and blind data gatherers. The purpose of the headline ‘i’ve spent 10 hours with device X’ then becomes misleading, although true.
So then one often only really spots an anomoly ‘after the fact’ when looking at the data and trying to tie it back to a bump in the road via a GPS point at the same time.
Is that repeating system testing that could/should have already been done by the manuf? (Even LIMITS had their P1 and V2 for simultaneous testing 😉 … Garmin prob have an Assioma as well 😉 )
Is the whole thing just glorified (unpaid) UAT?
Does a focus on accuracy detract from looking at the overal experience (I know you will disagree with that statement for you). I’m coming back full circle to the point where I just need to have used device X as my main device for a couple on months to meaningfully comment on all the nuances in the real world (hence my WAHOO migration lack of content…I’m actually using it a lot)
Yeah, I roughly watch power meters during multi-device tests, but it’s largely after the fact. I’m mostly watching for issues I can address during the ride (like a left or right side totally dropping off, or an obvious calibration/zero offset issue). But mostly, it’s after the fact.
I don’t think you can do much during the ride to really see the important pieces anyway for power meter data (i.e. catching 1-second drops or such).
Inversely, for running, I think it’s more important for stuff like stabilized pace during the run. Certainly, some of my tests are just gathering data to capture later (like GPS track data), but I’m also glancing at pace stability a bit.
HR is easy to analyze during or after.
If any of these, I try and upload the files to the Analyzer right after the workout, merely so I can do a quick glance and catch anything obvious to correlate to something on my run that’s notable.
Still, there’s no doubt that in the grand scheme of things I test, power meter testing is the least enjoyable. Not so much for the actual testing, but rather the prep. Right after I wrote the comment previous to this, I then spent about 75 minutes troubleshooting power meter/ANT+ dropouts in the Cave, then spent another hour figuring out why one file got all time-slid funky afterwards. You kinda get to the point where when you’re finally ready to ride, you’ve lost all interest in doing so. Or worse, if you can’t get certain units to cooperate, you decide it’s not worth riding at all – since it’s a wasted workout.