Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid spikes in temperature data #9

Closed
tobyweston opened this issue Apr 12, 2017 · 20 comments
Closed

Avoid spikes in temperature data #9

tobyweston opened this issue Apr 12, 2017 · 20 comments

Comments

@tobyweston
Copy link
Owner

I'm not sure why it started happening but I'm seeing extreme spikes in temperature data.

screen shot 2017-04-12 at 17 02 30

Potentially ignore volitile readings or check the CRC when reading (as suggested in the Arduino forum).

@tobyweston
Copy link
Owner Author

tobyweston commented Apr 13, 2017

CRC is already checked when parsing the file. A failed CRC check should result in a -\/ which will end up as an error in the log.

class ParserTest extends Specification {
  "Fails to extract temperature with failed CRC check" >> {
    val output =
      """|72 01 4b 46 7f ff 0e 10 57 : crc=57 NO
         |72 01 4b 46 7f ff 0e 10 57 t=23125
      """.stripMargin
    Parser.parse(output) must be_-\/.like {
      case error: CrcFailure => ok
    }
  }
}
case class RecordTemperature(host: Host, input: TemperatureReader, output: TemperatureWriter, error: PrintStream = System.err) extends Runnable {
  def run(): Unit = {
    input.read.fold(error.println, temperatures => {
      output.write(Measurement(host, now(), temperatures)).leftMap(error.println)
    })
  }
}
case class CrcFailure() extends Error("CRC failure, this could be caused by a physical interruption of signal due to shorts, a newly arriving 1-Wire device issuing a 'presence pulse' or gremlins.")

tobyweston added a commit that referenced this issue Apr 16, 2017
@tobyweston
Copy link
Owner Author

Not quite as high as the above but the datasheet says 85 is a starting temp. Probably worth filtering out 85 degrees.

screen shot 2017-06-12 at 18 41 53

tobyweston added a commit that referenced this issue Oct 8, 2017
tobyweston added a commit that referenced this issue Oct 8, 2017
tobyweston added a commit that referenced this issue Oct 8, 2017
tobyweston added a commit that referenced this issue Oct 8, 2017
tobyweston added a commit that referenced this issue Oct 8, 2017
tobyweston added a commit that referenced this issue Oct 8, 2017
tobyweston added a commit that referenced this issue Oct 8, 2017
tobyweston added a commit that referenced this issue Oct 10, 2017
tobyweston added a commit that referenced this issue Oct 11, 2017
tobyweston added a commit that referenced this issue Oct 12, 2017
@tobyweston
Copy link
Owner Author

tobyweston commented Jan 15, 2018

If enabled, ignore temperatures with a +/-25% fluctuation. Enable and configure by adding -Davoid.spikes=30 to your server startup command.

@Quaxo76
Copy link

Quaxo76 commented Jan 18, 2018

I'm adding, as requested, an image of a spike with the corresponding csv file. The spike was at 23:09, but due to the time code difference (I suppose) on the csv file it shows at 22:09.
By the way, how can I fix this time code difference?
temperature2

temperatures.csv.zip

@Quaxo76
Copy link

Quaxo76 commented Jan 19, 2018

One more, and this is weirder.

I have two machines, a server named PiZero, and a client named PiOld. Until now, only PiOld had been subject to the spikes, so I installed the "temporary fix" as described (both are running the latest version). The server had never shown any spike, so I did not do the fix there.

Now I got this. A spike on BOTH machines at exactly the same time. But on the server (which doesn't have the fix) the reading went down to zero. On the client, ad EXACTLY the same frame, it went down only about 15% (so it's not wrong that the fix didn't catch it).

Now I'll apply the fix to the server too, but how can it be that 4 different sensors all had the problem on two separate machine at exactly the same moment? Must be something server-side...
I'm attaching the csv and a screenshot.

temperature3

temperatures.csv.zip

Cristian

@tobyweston
Copy link
Owner Author

tobyweston commented Jan 19, 2018

Sorry, I should have made it clearer: the (possible) fix only runs on the server.

The server stores the values and so the fix works by looking at previous values that are received to work out the % difference, discarding if it's too big. I didn't consider not sending, if the same is true on the client.

Not sure if it's meaningful, but I see similar spikes across a spread of machines at about the same time. I have like 5 running and anywhere between 1 and all 5 may spike at the same time. That's why I kind of started thinking about environmental problems (like maybe my microwave freaking the wifi signal out).

@Quaxo76
Copy link

Quaxo76 commented Jan 19, 2018

Ah ok, I thought the value would just be discarded without being sent.
I'll install the fix on the server, though of course this won't catch minor spikes.
Since the spike also happened on the server, how can wifi disruptions influence the reading? It should be local...
And anyway, with this morning's spike, no one was using a microwave or other powerful appliances here, and I also live in a pretty isolated area.

@tobyweston
Copy link
Owner Author

You can always lower the threshold to say 15% to try and catch the minor spikes. You might need to experiment a little.

Good points. I'm struggling to debug it atm so reaching a little 😉

@tobyweston
Copy link
Owner Author

I'm assuming that in your most recent example, there's nothing in the log at around 19/01/2018 07:56 ?

@Quaxo76
Copy link

Quaxo76 commented Jan 19, 2018

Just thinking... To debug it, maybe you could set a low threshold (like 5%) and when a spike occurs, the machine could also log somewhere the actual scratchpad from the sensor, and any intermediate processing that was done on that data? I don't believe the sensors actually read a spike, so something must happen along the path that the raw data follows, so logging all intermediate points might show where the problem is...

@tobyweston
Copy link
Owner Author

👍

@Quaxo76
Copy link

Quaxo76 commented Jan 19, 2018

Nothing on the log around that time. I have many errors at other times, mainly "CRC error" and "error in RequestLoop()"...
But I've been trying several times to recompile this morning, so that may have slowed down the systems to the point of unresponsiveness...

By the way, I just had another spike. Do you want me to keep sending screenshots and CSVs?

@tobyweston
Copy link
Owner Author

Curious about the CRC error, can you share?

The RequestLoop error is some oddity with the underlying HTTP library I use. I've raised a request with the library and am experimenting with an upgrade. Unfortunately, that's taking time because its a big, bumpy upgrade path to get to their latest library.

Maybe hold on the CSVs for a bit.

@Quaxo76
Copy link

Quaxo76 commented Jan 19, 2018

Here's the current log from the server. There have been spikes at 7:55, 10:03 and 11:41 (though the recorded times might be off by an hour due to the time zone difference).
ServerLogs.txt

@Quaxo76
Copy link

Quaxo76 commented Jan 20, 2018

Not sure if this is meaningful, but spikes here tend to happen when I'm doing something else with the pi (mainly compiling). I've left it alone for about 10 hours, and had no spike whatsoever; after coming home I updated the source and recompiled, and got like 10 spikes in an hour...

Cristian

@Quaxo76
Copy link

Quaxo76 commented Jan 21, 2018

OK, with the help of the new document, I installed the "temporary fix". Installation was successful, as indicated by the log:

Sun 21-Jan-201811:35:26.080[main]INFOStarting temperature-machine (server mode)...
Sun 21-Jan-201811:35:26.270[main]INFORRD initialising for 'PiZero', 'PiOld' (with up to 5 sensors each)...
Sun 21-Jan-201811:35:27.404[main]INFOcreate "/home/pi/.temperature/temperature.rrd" --version 2 --start 1516530926 --step 30 DS:PiZero-sensor-1:GAUGE:35:U:U DS:PiZero-sensor-2:GAUGE:35:U:U DS:PiZero-sensor-3:GAUGE:35:U:U DS:PiZero-sensor-4:GAUGE:35:U:U DS:PiZero-sensor-5:GAUGE:35:U:U DS:PiOld-sensor-1:GAUGE:35:U:U DS:PiOld-sensor-2:GAUGE:35:U:U DS:PiOld-sensor-3:GAUGE:35:U:U DS:PiOld-sensor-4:GAUGE:35:U:U DS:PiOld-sensor-5:GAUGE:35:U:U RRA:AVERAGE:0.5:1:2880 RRA:AVERAGE:0.5:120:168 RRA:AVERAGE:0.5:240:360
Sun 21-Jan-201811:35:32.664[main]INFOStarting Discovery Server, listening for 'PiZero', 'PiOld'...
Sun 21-Jan-201811:35:32.838[temperature-machine-discovery-server-1]INFOListening for broadcast messages...
Sun 21-Jan-201811:35:33.214[main]INFOMonitoring sensor file(s) on 'PiZero' 
	/sys/bus/w1/devices/28-041500a0f0ff/w1_slave
	/sys/bus/w1/devices/28-031500e9feff/w1_slave

Sun 21-Jan-201811:35:35.373[main]INFOTemperature spikes greater than +/-15% will not be recorded
Sun 21-Jan-201811:35:48.482[main]INFOHTTP Server started on http://127.0.1.1:11900

But I still see spikes. Even major ones, up to above 400 degrees.
Attaching screenshot, csv and log in case it helps. This one is unusual in that only one sensor got it wrong, the readings from the other 3 seem ok.

temperature4

ServerLogs_2.txt

temperatures (1).csv.zip

@Quaxo76
Copy link

Quaxo76 commented Jan 25, 2018

I just realized that every spike that I've ever seen, only ever affects one single frame, i.e. one reading is wrong, and is surrounded on both sides by good values. So, since the "temporary fix" does not appear to be working, at least for me, have you ever thought of checking the readings, and discard any reading that is different from the surrounding ones, if the ones before and after show the same value (or almost the same value)?

@Quaxo76
Copy link

Quaxo76 commented Feb 7, 2018

I was experimenting with the fix (since I still get spikes occasionally) and I thought of something.
If a change is classified as a spike when it's more than X% change, is that a change in degrees Celsius or Kelvin? If it's Celsius, what happens when the temperature is around 0C? I.e. if the temp goes from 0.1 to 0.2 C, it's obviously not a spike, but it would be read as a 100% increase, and then tagged as a spike, no?
And by the way, without the fix installed, I get very large spikes (up to hundreds of degrees); withe the fix, I still get spikes but of a lower value (but still over the set threshold, i.e. if the temp is 18C I get a spike to 40C...)

@tobyweston
Copy link
Owner Author

Thanks for the thoughts.

It's very difficult to gauge cause and effect here. I'm planning to tidy a few other things up just in case they are causing problems (like making sure no exceptions could be thrown causing weird behaviour), then I think I'm going to log every measurement and cross reference that to what's in the RRD database. That way I can see if the cause is with RRD. I'm also thinking of changing the way its logged to RRD as there's a slim chance more clients could increase the odds of spikes... not sure yet.

To answer your question though, everything is in centigrade and the comparison is between the last value and current. You can see it in the CSV and the % difference.

@tobyweston
Copy link
Owner Author

Closing for now as recent work seems to have fixed this... at least, I haven't seen it for a while. If I see it again or it's reported, I'll reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants