The ABAP Detective Gets a Timeout

In every logical detective’s case files there are what you might call “unsolved mysteries.” In other words, work in progress. Depending on the severity and visibility of the case your humble working stiff might be pushed to commit incomplete conclusions, or suffer the agile wrath of a waterfall.

Previously, in pulling data from remote environment sensors, I discovered evidence of “bad data” being recorded. Whether from mechanical flaws in the sensors, poor code, or what, is yet to be determined. The spikes, when processed by a human brain, are clearly obvious. Or obviously clear.

Pi temperature 26-Jun-2022

My first approach was to determine any invalid values or ranges to discard, first manually with SQL, and then programmatically with schedule job scripts. That worked in theory, until the valid values crept into the previously invalid range.

I let the trail go cold for a while, peeking in the memory stack at random intervals to get a sense of the sensors. To drill down further, I set up loops to poll more frequently to identify the fingerprints of data gone wrong, gauging how many successive errors would appear. I thought if I took enough samples, the chance of “all bad” or “mostly bad” would diminish.

At first, my cleaning logic was inside the measurement code, which was sustainable if only one metric had errors. Since at least 2 parameters on 2 different sensor “hubs” were suspect, I moved the cleansing portion into a library file. If you know Python beyond my novice level experience, I would welcome suggestions for improvements in the actual code, if not in the necessary algorithm behind the language statements.

First change

Below is a partial “diff” from the embedded code to a library reference. The function name “list sweep” is as BASIC as it gets. Collect a list of numbers, and toss out the ones we don’t like.

 $ diff BME280-pressure-millibars.py-SAVE BME280-pressure-millibars.py
4a5
> from sensors_base_functions import * > while counters < 20: < exit(-1)
< time.sleep(1)
---
> readings.append(pressure)
> time.sleep(0.1)
> counters = counters + 1
> > print(round(statistics.fmean((list_sweep(readings))),2))

Why 20 loops? (or 19?)

Next iteration will probably turn the hard-coded loop count, as well as the sleep duration, into meaningful named variables. Are 20 samples enough? As it turns out, that is as many as could be done with that frequency not upsetting the poll driver apple cart. Otherwise, you would get the time out error (as implied in this post title).

646:20220803:095604.976 Failed to execute command "/usr/local/share/zabbix/externalscripts/pimoroni.sh BME280-humidity": Timeout while executing a shell script.

So, what, only 3 seconds to view the lineup? What are the chances that would gather enough data to allow successful cleanup, no data escaping the dragnet?

As an aside, the published documentation for the monitoring suite I am using strongly suggests that the normal timeout (waiting period) is 30 seconds, not 3!

The%20timeout%20that%20applies%20to%20shell%20scripts%20is%20hardcoded%20in%20Zabbix

The timeout that applies to shell scripts is hardcoded in Zabbix

Text/Link:

To save some time for any future Googlers who make their way to this topic.. According to: https://www.zabbix.com/forum/zabbix-...186#post147186 The timeout that applies to shell scripts is hardcoded in Zabbix and the only way to change it is to alter the source code and recompile. In other words, you better optimise your PowerShell script to provide output and bail if it's taking too long.

I can’t explain why 30 would shrink to 3 (yet) other than knowing what should be a miniscule overhead script throw and catch, but once I saw “hardcoded” I took that as a clue to fit my cleanup into the shortest time I could. And not get a timeout.

What is in my “base functions” library you might ask? We let’s treat that as a “locked room mystery” where we are stuck in the room also. I use the python statistics module for sample set mode and standard deviation, enumerate the sent list, make a new list of defined-as-good items, returning the latter.

Sub routine

# -*- mode:python -*-
# Base functions style from A Roman
# Compatible with Python 3.10 or later.
# Load modules.
# Tue Aug 2 01:46:26 UTC 2022 import statistics # sweep list of inerrant values, return swept list
def list_sweep (bunch_in): bunch_out = [] mode = statistics.mode(bunch_in) dev = statistics.stdev(bunch_in) for index, item in enumerate(bunch_in): if (abs(item-mode) < dev): bunch_out.append(item) return(bunch_out)

Change?

Results? Better than before, yet not 100% reliable.

Aug 16 19:38 BME280-pressure-millibars.py
Aug 16 19:38 BME280-pressure-inches.py

Pressure%20values%20over%20one%20month

Pressure values over one month

No obvious spikes up or down since this change went into use on August 16th. Case solved?

Findings

Given how many decades since I (barely) passed a university-level advanced statistics class, what looks like a mess is getting (mostly) sorted logic-wise. For my purposes on indoor air quality, the records are looking more reliable, if not accurate. For higher accuracy and precision, I would consult an expert.

Does it make sense to toss the sensor with glitches? That depends on your budget, your specs, and how much quality control effort you can do. I figured this is an intellectual challenge as much as a practical exercise. Keeps the little grey cells from stagnating I suppose.

The above declaration of ‘python 3.10 or later” is an aside I may expound further later. The switch/case language grammar muddle. Added to the Python language spec as per “PEP 634: Structural Pattern Matching”.

match channel: case 'temperature_celsius': print("Temperature C:\t%.4f" % mySensor.temperature_celsius) readings.append(mySensor.temperature_celsius) case 'temperature_fahrenheit': print("Temperature F:\t%.4f" % mySensor.temperature_fahrenheit) readings.append(mySensor.temperature_fahrenheit)

Previous chapters of the sensor cases:

Related posts:

Other links: