This is part two of our technical series on bicycle power meters. If you missed part one, it is located HERE.
Our first article was a primer, of sorts. Well, a very big primer. In essence, we covered the complicated topic of how power meters and data transmission actually work. If that was Powermeter 101, this is the sophomore-level Powermeter 201.
After all of the phone calls, interviews, emails, and questions, I wanted to see how it all fleshed out in the real world. Does it really make a difference whether a power meter is time or event-based? When looking at a power file, can you actually see data differences based on whether your head unit records at half second or one second intervals? Perhaps most importantly – can you train effectively with any power meter? Does a $4,000 power meter work better than a $700 one? There are many questions indeed.
Let me be clear: This article does not declare a ‘winner’. That’s not our purpose or place. As with the previous article, I’ll keep the discussion centered around the four power meters with which I have experience – Powertap, Quarq, SRM, and Stages.
We will look at a handful of test runs that I did, explaining the intent and protocol for each. We will also analyze the results and break down what you need to know.
First-round Testing: Head Units
Not wanting to dive in too deep right off the bat, the first round of testing was relatively simple. We wanted to look at various head units, to see if they affect the data you get. I tested three power meters – Stages, Powertap SL+, and Dura Ace 7900 SRM – each individually. I had four head units to work with: a Garmin 500, Garmin 310XT, CycleOps Joule GPS, and SRM PC7. I tested each power meter by itself, but transmitting to multiple head units.
That may seem silly. Shouldn’t all of the power files be identical if all of the receiving devices are reading off of the same power meter? Not necessarily. As we learned in part one of this article, most ANT+ head units randomly pick a single data packet each second (out of four that get sent every second). Seasoned users of power meters also know that you have to deal with random data drops and interference.
Also, on a very basic level, the settings of a head unit can hugely affect the data you get. HUGELY? Yes, actually. Do you have your Garmin set on 1-second recording or Smart Recording? Are you including zeros? When was the last time you set your zero offset? What is dictating the start and stop of your head unit – GPS movement, wheel speed, or a manual timer? Oh yeah… and there’s that little detail of updating the firmware in both your power meter and head unit. If you can’t remember the last time you did that (or don’t know what ‘firmware’ is), chances are your data is not as accurate as it could be. Your 200-watt average power from that last ride could be completely bogus. Maybe it is really 191… or 213. A power meter is only useful to the extent that it is also accurate and repeatable… I simply wanted to learn how the data transmission half of the equation affects this.
For data analysis, I had two wonderful resources. First, Berkeley professor and Slowtwitch forum member, Dr. Robert Chung. If you don’t know who he is… well, you should. Among other feats, he created the Chung Method of Estimating CdA with a Power Meter. At heart, he’s a stats guy, with a very big cycling habit. That is good news for us, because he is a master of slicing-and-dicing data. Dr. Chung analyzed my files using uber-geek power software, Golden Cheetah. I hoped to learn the very nitty-gritty from him; the data-packet-by-data-packet run down of exactly what was happening during my rides.
Next, I had help of Gloria, Marcus, Dirk, and Gear at TrainingPeaks. I visited their Boulder, CO office for an afternoon, in hopes of convincing them to crunch some numbers (lucky for me, they kindly agreed). They analyzed my files using the multi-file analysis capability in their popular WKO+ software. From them, I hoped to learn the practical side of things. I proposed – let’s look at these files as if they belonged to one of your customers. Are the differences large enough to significantly affect one’s training? Is this something that the average user – the fat of the bell curve – will care about?
Before beginning any of the tests, I updated the firmware on all of my devices (an exercise of patience in itself). I also made certain that all of the critical settings were the same:
-1 second recording
-Include zeros in data averaging
-Tire diameter set at 2096 (700x23mm)
I set the zero offset on each power meter before each test run. Tire pressure was standardized to 100psi. Just in case you’re wondering, the fun level of these rides was dialed right down to zero.
Run #1 – Powertap SL+ indoors on CycleOps Fluid 2 trainer
The first run we’ll show is my Powertap SL+ hub sending to three head units – Garmin 500, Garmin 310XT, and Joule GPS. It was a short ride, and included varying speed and power, but relatively consistent cadence.
The first graphic, provided by Dr. Chung, shows speed on the Y axis, and distance on the X axis. The top graph shows the full distance (approx. 4.25km), and the graph below it zooms in a smaller portion of that.
Here is Dr. Chung’s analysis:
“The first [graph] shows the comparison of speed by distance. What this shows is that the speed signal seems pretty well-captured except (and this is important) for an occasional missed record. What this means is that the speed signal is pretty reliable – and that we can zoom in and often find the exact spot where a signal was dropped. The top panel shows the overall pattern, and you can see that by the time you get to the right side of the plot the distance signals are slightly offset. The bottom panel zooms in on a little spot where it appears that a speed signal got dropped so we get mis-registration on the distance. Just for the record, if you calculate the distances based on the speed signals, the Joule recorded that the ride covered 4.281 km, the 500 recorded it at 4.288 km, and the 310 recorded at 4.296 km (i.e., the 500 was 7 meters [about 3 wheel revolutions] farther than the Joule, and the 310 was about 8 meters farther than the 500). Thus, the Joule and 310 were about 15 meters (~7.5 wheel revolutions) apart in distance. Your average speed over this test was a bit more than 6 m/sec, so we're talking either 1 or 2 dropped signals.”
Next, we looked at the same ride, but for wattage by distance:
Dr. Chung noted:
“The second plot shows the comparison of watts by distance after correcting for a dropped signal. The bottom panel shows that when we correct for the dropped signal, there may still be a momentary ‘jiggle’ but pretty soon they resume synchronization. (This is a completely off-topic, but for me it's fascinating because it appears to be an example of something called weak ergodicity, which is related to something I wrote about in my PhD dissertation 20 years ago).
The bottom line of these two plots is there does appear to be a difference between head units, but the difference is relatively small. These comparisons are done at the smallest interval (single record by single record) so in a deep sense, unless you want to correct for single record drops in the data, you don't want to do second-by-second comparisons, even though the differences are tiny. But, importantly, that's why I usually smooth over a few seconds to a few minutes rather than look at single records.”
When Dr. Chung talks about single records, we’re actually looking at the individual data packets that get sent and received by your head unit. In the case of, say, a Quarq sending to a Garmin 500, the head unit picks up one of these each second. That piece of data is a little snapshot of our lovely power equation (Power = Force x Distance / Time). When he mentions a data drop or missed record, that’s when something gets lost in space – the head unit misses a signal. Why does that happen? Well… it just does. If your power analysis software is doing its job properly, it should treat these as a null in terms of average wattage (i.e. it does not count as a zero, which would bring your average down).
It generally works best to synch data based on speed or distance (which is what you populate your x-axis with). Both of these track predictably within a power file, and it is rare that you’d get data drops at the exact same time when testing multiple power meters or head units. If you see that two files are consistent, but the third sharply diverges or goes to null, we can assume that it is incorrect.
If we tried to synch based on power, it would be much more difficult and subjective. It doesn’t tend to track as cleanly as distance and speed, and its calculation is at the mercy of which data packet the head unit decides to pick up. As our second two graphs illustrate, the reported wattage from the same power meter looks different on different head units. It is also worth noting that we cannot synch based on time. One would think that clock time would be the perfectly-objective way to look at data, but there is a real-world problem. If you stop pedaling, different head units stop recording at different times. With the Garmin products, you can manually start and stop the timer (so you could leave it running), but you cannot do that with others, such as the Joule GPS (when you stop moving, it stops).
Note on Stages
This also brought an important issue to my attention with regards to the Stages power meter. Part of its beauty is that it is simple. It ONLY requires a crank arm – no external sensors, magnets, or anything. That also means that it does not produce any sort of a speed or distance signal (which are the two things we need in order to properly synch data between files). It also means that I could not use my Joule GPS with the Stages power meter, at least for indoor riding. The Joule needs either external speed measurement or GPS movement to tell the timer to start running… I have neither available to me on an indoor trainer. What to do?
For my first test runs, I opted to use only the Garmin 500 and 310XT, use the manual start/stop function, and never stop pedaling throughout the time interval. For later tests, we inquired with Stages whether we could use an external ANT+ speed sensor to feed speed data to the head unit. As it turns out, you can do this, but it does kill some of the would-be simplicity of the Stages product. Of course, if you’re not doing a bunch of multi-file analysis like me and own a Garmin head unit, this problem is a complete non-issue (but we’re geeks, and we wanted to understand what the heck was going on).
Run #2 – SRM indoors on CycleOps Fluid 2 trainer
This run was done in a similar fashion to the previous. I used my SRM, sending data to a PC7, Garmin 500, and Garmin 310XT. Note that the PC7 was set to one-second recording, to match the Garmin head units – but it does receive four torque data packets per second (which can slightly affect the data at higher cadences).
The following graph was provided by TrainingPeaks. The top graph shows wattage, and the bottom shows cadence:
Yellow = Garmin 310
Red = Garmin 500
Green = SRM PC7
Let’s zoom in on some of the more notable information; this is the Power (top) graph:
Towards the left of the above graph, we can see that the red line (Garmin 500) has a data drop. The rest of the graph tracks similarly, but not identically.
Here is the official bird’s-eye-view analysis from TrainingPeaks:
-The 3 head units have slightly different sample counts (see right hand column on screenshot). This could be from pressing the button at different times
-kJs: 500 shows 62kj vs. 63 kJ for the others
-Intensity Factor (IF) is very slightly different - down to thousandths of a point
-TSS is off by 0.1 - to give the reader a sense of context, 100 TSS = 1 hour of all-out riding. 0.1 is insignificant.
-NP (normalized power) is an exact match
-Max power was close between the SRM PC7 and the Garmin 500. The 310 returned a max power of slightly less (226W vs 232W). This could have something to do with how the devices pick up the samples, but to us this is no functional difference
-Average power very similar. The 310 is 1W higher. The PC7 and the 500 are identical.
They also noted,
“From a practical training standpoint, we don't see any issue with using any of the various head units tested, and the data recorded was similar enough that an athlete could switch between these devices and not lose fidelity in their training data.”
There you have it. While I did several more tests in this fashion, none of the data showed what anyone felt was a significant problem for training. The data are not identical, but unless you’re analyzing second-by-second information (without any smoothing applied in your analysis software), you’re fine. Well, you may disagree me, but I think you’ll be fine.
It appears that most of the variability in data between head units was due to random data drops. Beyond that, we see some variability that is likely due to the head units’ utilization of ANT+ protocol (only picking up one out of four available data packets). Taking a large step back, however, the probability is quite small that you’d have very big outliers among any grouping of data packets sent at 4hz. It is simply not all that practical – especially with triathlon’s steady efforts – that you’d have a huge spike or drop in the power of your pedal stroke within a one second time range.*
*There was a reported bug with previous firmware versions of some Garmin products. If you would pedaled very hard and then stopped suddenly, it was reported that the head unit would hold an artificial power number on the screen and in the power file that weren’t real. If you were displaying a 3-second average and stop pedaling at just the right time, it might keep displaying and recording your 500-watt average for an extra three seconds (while you were, in reality, coasting). That is my best understanding of the problem, and I’ve also heard that newer versions of the firmware corrected this (yet another reason to update your firmware).
Second-round Testing: Multiple Power Meters
With the first round of testing done, it was time to move on to the heavy artillery:
Three power meters, three head units, one bike. At this point, we were confident that we knew enough about data transmission to make a fair evaluation of the actual power meters. The idea was simple: perform a series of tests, running all three power meters simultaneously, each sending to a single head unit. All head units would be set up identically, and we’d do multi-file analysis, similar to the first run of tests. I wasn’t sure what I was looking for, which is why I had such wonderful helpers to act as second and third pairs of eyes. Do the power meters live up to their quoted accuracy ratings? Could we see a notable drivetrain power loss in the Powertap data? Would something else unique jump out in the data?
1. The Powertap SL+ always sent data to the Joule GPS. It just worked easiest that way, due to the Joule’s start-stop function being dictated by wheel speed. Because of the first round of testing, we were confident that the Joule captured data as well as the other head units. The Stages and SRM sent data to the Garmin head units. I opted to not use the PC7 because of its 4hz torque frequency reception (a good thing for your data, but it is the outlier of the head units). The only hitch that this presented was that I had to attempt to manually start both Garmin heads at the same time the Joule started recording due to wheel speed. It wouldn’t be perfect, but with Dr. Chung’s ability to re-synch data, we were not concerned.
2. By this point in the testing, we had established that the Stages would work with an external speed sensor. We used a single speed sensor (an SRM ANT+ unit) to send wheel speed data to both the Stages head unit, and the SRM head unit. Although I manually started and stopped the time of the Stages head unit, we wanted to have the ability to accurately speed-synch our files. GPS was turned off on all of the head units. The cadence sensor (below) sent data to the Joule GPS, on behalf of the Powertap.
Run #3 – Multiple PMs, indoors on CycleOps Fluid2
Setup for this test:
-Powertap SL+ sending to Joule GPS
-Stages sending to Garmin 500
-SRM sending to Garmin 310XT
-X-Axis is distance in kilometers
The goal with this ride was to do varying cadence and effort, and include a VERY low cadence segment. How low? I was shooting for sub-30 RPM. Why bother? With time being such a critical part of power measurement, we wanted to find out if there was a practical lower limit. At 30rpm, it takes two seconds to complete a pedal stroke. Are the SRM or Powertap programmed to only accept cadence over a certain level? Could the Stages accelerometer have trouble detecting such slow movement – would we be too far out of its range? Clearly this has little practical application in triathlon racing, but we wanted to see what we could find. It could have practical application to cyclists doing standing start efforts or other high torque/low cadence instances.
You can clearly see on the graph where I shift into a (much) higher gear, and the cadence drops. Once the values get below about 30 rpm, the Stages data dropped out. At all ‘normal’ cadences, the data are all very close.
Dr. Chung commented:
“The Stages had a cadence and power drop-out for about 45 seconds (around the 3.5 km or 9 minute mark). I'm wondering whether this was a 'true' drop-out or if you just went below the cadence threshold and the Stages then timed-out. If so, that was very clever of you, and you've uncovered something very, very interesting. One of the odd little coincidences is that if you didn't take into account the drop out and just looked at the averages, the Stages was quite close to the PT and SRM. However, if you exclude that 45 seconds of zero power, the average Stages power was about 5 watts higher than the PT and SRM.”
Without cadence - time in our equation, what would happen to the resulting power?
Where the Stages data gets zero cadence, it also gets zero power. It is also interesting to note that the Powertap data has what appears to be two single records of zero power within the low cadence segment. My only guess at this time is that the wheel speed was so low that ‘events’ were not happening fast enough for a power measurement – but we still get a cadence number because of the way we’re measuring cadence for that device (externally with a separate ANT+ sensor – i.e. NOT using cadence to calculate power).
Finally, we’ll look at a VE. What does it mean? According to Dr. Chung:
“VE is virtual elevation. It's my way to get around synching up individual data points. VE is a cumulative function, so you can think of it as kind of like a running total. That means if any single point is off due to poor synchronization it won't throw off the entire series very much, but it lets me see how things differ over time.”
In simple terms, it gives us an objective way to compare the relative effort of different ride files. This ride was indoors on the trainer, so you’d think that there would be no virtual elevation gain, right? Wrong. On my fluid trainer, it takes about ten or fifteen minutes to reach operating temperature. As this happens, the resistance increases, giving us effective headwind or elevation gain.
SRM = Red
Powertap = Black
Stages = Blue
Dr. Chung commented:
“The VE plot shows that the PT and SRM were almost on top of one another while the Stages drifted ‘high’ within the first 1.5 km, then when you slowed down the Stages dropped down then kind of paralleled the PT and SRM for the rest of the ride. So you can see why I sort of like the VE plot for this purpose - it shows where the PMs start to diverge and you can see how they drifted apart.”
Run #4 – Multiple PMs, indoors on CycleOps Fluid2
Setup for this test:
-Powertap SL+ sending to Joule GPS (Red line on graph)
-Stages sending to Garmin 310XT (Yellow line on graph)
-SRM sending to Garmin 500 (Green line on graph)
This was a mixed effort and speed ride, with a couple unique segments. Around the 10:30 mark, we included another low-cadence segment. Towards the end, I did six standing-start efforts, alternating starting on my left and right legs (L, R, L, R, L, R).
With the standing starts, we wanted to look at a few things:
1) Head unit behavior. I had both Garmin units set to continue running throughout the test (auto pause OFF). The Joule GPS, however, automatically has auto pause on. I completely stopped my wheel and cranks between each standing start. Would there be a lag in the Powertap data upon restarting, because the head unit had to wake up each time?
2) The elephant in the room: Would the Stages data appear much lower when starting on my right leg?
Because of these two things, we were treating the SRM as our control. It measures both legs and was sending to a continuously-running head unit.
That’s the full ride; let’s take a closer look at our two key segments.
NOTE: the SRM Data is slightly to the left of the others. I wasn’t perfect in starting and stopping my head units, and WKO+ does not have the ability to re-synch by single records. This does not affect the data analysis.
In the low cadence segment, I began at 40 rpm, and dropped 5 rpm each twenty seconds. Once I hit 20 rpm, I climbed back up until I hit 40.
As we can see, the SRM (green line) is the only power meter that maintained consistent data throughout our cadence experiment.
What about the standing starts? Interestingly, the SRM data had the highest average and maximum power. TrainingPeaks put together this spreadsheet to detail the differences for us:
Here’s what jumps out at me:
-The average power for the Stages does NOT show a clear divergence from SRM based on what leg we started with. The Stages wattage – relative to itself - is always higher on the left leg (as would be expected), but the difference to SRM shows no clear pattern. Maximum power does however, show a clearer pattern of left and right leg measurement.
-The SRM shows consistently higher average and maximum power than either the Powertap or Stages. My guess is that there are two things at play: 1) the Powertap has a slight lag in ‘wake-up time’, and 2) the Stages accelerometer has a slight lag in its own ‘wake-up time’ to detect movement. What we don’t know is how the PC7 would react in this situation (it auto-pauses similar to the Joule GPS).
Here is TrainingPeaks’ take-home analysis of the entire ride:
“The SRM data is shifted slightly left so take that into account. It looks like there are some dropouts of power readings with both the Stages and PowerTap at very low cadences. The Stages drops out around 24, not sure where the Powertap is dropping out but there are definitely drops at very low cadences. This is sort of a red herring, however, as nobody I know rides or trains below 30 rpm; even force reps are faster than one pedal turn every 2 seconds.
Also on the indoor ride the SRM seems to have a bigger initial spike on the standing starts but neither the Powertap nor the stages are detecting them so I'm not sure if the SRM is grabbing a significant spike or detecting something transient from the nature of the start.”
This concludes the indoor-ride-testing segment of our article series. What conclusions can we draw?
For all practical purposes and ‘normal’ riding conditions, the data appear to be very similar between power meters. The average power numbers for large pieces of time (i.e. minutes or longer) were very close. Most of the differences we saw would disappear with any sort of smoothing applied – which is how most of us analyze our ride files for training purposes.
Our intent with the low cadence and standing start work was to test each system at their extremes (it’s our job, ain’t it?!). It is also worth noting that these tests took place before the most recent Stages firmware update (v2.0.12), which was released on 05/30/2013. According to Stages, this update included:
-Apple Bluetooth Bug fixes
-Improved Data Handling
-Improved Cadence and Power Filtering
Could this update eliminate our issues with oddball-cadence-riding? Quite possibly.
In Part 3, we’ll take the riding outdoors, to see how different ambient conditions affect our data. After all, indoor riding does not include rough pavement, temperature change, or anything of the sort. We will also offer some final conclusions and take-home advice that apply to the setup and use of all power meters and head units.