Reliability of the sensor: Part 1

I was very worried that the sensors wouldn’t be very good. After all, they’re cheap, poorly documented, and come from a virtually unknown manufacturer.

Happily, there are statistical tests to to tell how good the sensors are—and the TL;DR? They’re not bad at all!

I used a statistical test called the Intra-Class Comparison to measure whether the four different Pis I built agree with each other. And they do! When one Pi reports bad air quality, the other Pis tend to do so, too.

Of course, that’s the short version. Herewith, the longish version:

In a perfect world, the air quality monitors would all report exactly the same number, like in Figure 1. They would give us results that are perfectly correlated.

Figure 1: Perfectly correlated AQMs

Of course, it’s not a perfect world, and these are cheap sensors. They vary. The issues, then, are how much they vary, and whether that is an acceptable amount.

Figure 2 shows another imaginary example. The AQMs make wild swings, and, worse, the swings are uncorrelated. When one AQM reports an increase, another reports a decrease. When one goes up a lot, another goes up a little. It’s a mess.

Figure 2: Uncorrelated results

These are uncorrelated results, and if we received them, we would know our sensors are random number generators.

There are middle grounds between perfect correlation and no correlation at all.

In Figure 3, the results are loosely correlated. Generally, when one sensor reports a change, the others do, too. However, in each case, the sensors measure different sizes of change.

Figure 3: Loosely correlated results

And in Figure 4, we have an excellent result, and one that somewhat approximates the results I found with the four Pis I used: each sensor reports a consistent change. If Pi1 reports, say, a PPM of x, then Pi2 reports a PPM of x+2.

Figure 4: Excellent results

 

Of course, it would be best if there was no variation between the sensors, and if they reported exactly the same results. But, if they’re going to vary, this is just the kind of variance we want, because it’s easily corrected for. And, more or less, that’s what we got.

The actual correlation for a 24-hour period

Our Pis varied, but they varied by a reasonably consistent amount. Pi1 was always a little higher that Pi2, which was usually a little higher than Pi3 and Pi0.

This is great, because it means we will be able to make meaningful comparisons between different parts of the neighbourhood. We can take the results from one AQM, adjust them by the constant, and compare it to the other adjusted AQMs. Thus, we will be able to see whether local pollution conditions are better (or worse) than other locations in the area.


I can hear you in the back. Correlation does not equal causation. Not quite right, but I catch your drift.

Correlated results aren’t necessarily good results. For instance, our air quality monitors could, unbeknownst to us, be measuring humidity, not pollution. As long as they are all measuring humidity consistently, we would never know, because they are all correlated.

Quite so. Still, I can’t think of any way to check every possible, non-particulate, cause. It could be that they are all measuring humidity, temperature, sunshine, the radio waves, sunspots, or the Blue Jays’ score. At some point, we have to just have faith that the sensors are doing what they say they are doing and look like they are doing.