|
|
In September 2009 I bought a Fine Offset WH-1081 weather station. I spent some time getting it to work with wview, for which I received a patch for an old version. I had lots of trouble with wview, and when the power company damaged the unit in November, I started writing my own code. This page is an extract from my diary. One day I'll write a proper description of the software.
The WH-1081 is a variant of the WH-1080, which has some additional irrelevant features. The software should work on both, so in the following I'll refer only to the WH-1080.
More work on porting wview to NetBSD, and finally finished. I never realised what a pain it is that NetBSD stores packages in /usr/pkg and not the more usual /usr/local. Many packages, including parts of radlib, hard code the path name in various configuration files, and I had to go out and fix them. Now I have a complete standard installation of wview; the next part is to incorporate Steve Woodford's patches for my Fine Offset WH1081 weather station.
Spent some time today looking at the software for the Fine Offset WH-1081 weather station that Steve Woodford sent me a while back. It was a patch for Wview release 4.0.1, and the current release is 5.5.3. Spent a fair amount of time adapting that, not helped by the fact that I don't know (and really don't want to know) about autoconf. Something went wrong generating the Makefiles, but by that time I was too frustrated and decided to put it off. Maybe I should first install release 4.0.1 to ensure that things work at all.
I installed the NetBSD installation on kimchi from a downloaded ISO, and it didn't give me the option to install X. Went looking for a package, but found none. Went over the web site looking for documentation. Plenty on how to use X, but nothing on how to get it onto the system. Finally found a document and followed that, but ended up with only a base installation—not even xterm was there. And then it occurred to me that this isn't even the standard NetBSD version of X (should be XFree86). This really could do with much better documentation.
My weather station has been working nicely, and set up a cron job to sync it to the external web site every 15 minutes. And then I saw some results that didn't look right:
That's a highest pressure of about 6.5 atmospheres and (only shown in the graph) a lowest of about -1 atmosphere. Then it occurred to me that Steve Woodford had warned of the unit returning ridiculous values, and he had send me a patch for working around it, which I had clearly forgotten to include. Put that in, rebuilt the executables, and the HTML generator crashed. Further investigation showed that it had had some kind of overflow generating the graph for the barometric pressure (the one on the right). No idea how it worked before, but everything seemed to fail from then on, so took the thing offline until I can get it up and running on FreeBSD, which hopefully won't be too difficult.
Into the office this morning to find dereel's /home file system full. I know it's getting full—I've added about 60 GB of photos and 10 GB of MP3s in the last few months—but it shouldn't have been quite full yet. Further investigation showed that it was a ktrace.out file, which I promptly removed—and it made no difference. Clearly it was still running—and running kdump against the file would have told me what the file was, but I had just deleted it! Fortunately Peter Jeremy explained to me what the -C option did. I thought it stopped tracing for a specific process, but in fact it's for all processes for which the user can stop it.
My suspicion was that it was the weather software, of course, and checking /var/log/messages confirmed it: it had stopped functioning in the middle of the night, and attempts to restart it were unsuccessful:
That continued despite restarts until I disconnected and reconnected the USB cable. Is this a problem in the FreeBSD USB stack? To be monitored.
Spent some time trying to sign up for various weather reporting systems, notably Wunderground and CWOP, both of which have very difficult to understand instructions. In particular, Wunderground mentions a password, but doesn't give the opportunity to set one. Signed up anyway, got no confirmation, and read instructions telling me that it would take at least a day, that various things could go wrong, and in each case the result would be that nothing happened. Wonderful. And I can't even check until tomorrow.
Wunderground is very specific about the location of the weather stations, though. When setting the location of the station, it specifies the latitude to 13 places of decimals, and the longitude to 14 places:
|
That corresponds to a resolution of 1-8 mm latitude and about 0.8 × 1-9 mm longitude. I wonder what they're thinking.
Into the office this morning to discover that the weather station software had hung again in the middle of the night, and that the /home file system was full again.
The full file system was for the same reason as before: a ktrace.out file had filled it up. And again I removed it without checking what was generating it. But it seemed to be related to the weather software, and sure enough, found that I had included a ktrace of the wviewd process in the startup file. So hopefully that's over and done with now.
The hang was different, and from the log messages it was clear that it had happened long after the file system filled up—at about the same time as yesterday, but the flood of log messages had already flushed the previous day's messages. This time I had:
That's really helpful, of course. But yesterday it seemed to have happened a little after 02:00 as well. Is there something in the nightly cron jobs that trips over the USB stack at this time of the morning?
Getting things started again wasn't easy. Various components wouldn't stop, and starting things manually is greatly hampered by the presence of PID files that don't get ignored if the process has died.
More investigation why I didn't show up in Wunderground. It turned out that my guess was right, that it wanted my own password. But that contained a character that wview didn't accept, so it just truncated the password to that point. Tried another one, all letters with a verylongcommentaboutthiskindofstupidity, only to discover that Wunderground won't accept more than 10 characters in a password. sigh.
After adapting to these quirks, things worked, and I appeared in the map. Spent some more time looking at other reporting systems, and set up reporting for CWOP. That seemed even easier, but when I tried to restart wview, things went to hell:
What's that? There's plenty of memory available. Built a debug version of wviewd and tried it out, and established that radCfOpen (did I get the studly caps right?) is part of radlib, and it's designed to read in the configuration file. It got the config file name right (something that it didn't bother to report), and somewhere inside it ran into trouble with the “memory allocation”. It then stopped without any further message and with a 0 completion code. ktrace showed that it read in the configuration file, then:
There were lots of these semops, but all with return value 0 (successful). What's all this about? Is it really a semaphore issue, or is it really trying to allocate ridiculous quantities of memory? And if so, why not report how much? About the only thing that I can conclude is that a library that can report this kind of message is that they're not worth having. Reading a configuration file shouldn't require lots of semaphore operations.
The obvious conclusion was that the problem was due to a configuration change. But after reverting the changes (RCS is your friend), it didn't change anything. Spent about an hour trying to work out what went wrong, and in the end reverted to the NetBSD installation, which happily accepted the same configuration files once I fixed the path names.
So, what's the problem? One is clearly a badly documented and rickety framework (the only documentation I can find for radlib is a API reference). The other is the Tower of Babel attitude to software design. It's probably not worth trying to debug it; I need to migrate to wview release 5.5.3, which doubtless is waiting with other pain, such as configuration files stored in a database. But maybe some of the problems I've seen so far will go away.
For a change, no full file system this morning, and the wview software (on NetBSD) hadn't hung either. But the problems running wviewd on FreeBSD continued.
One thing that the power failure “fixed” was the “memory allocation failure” that I was having with wviewd. I strongly suspected that it was something to do with left-over System V semaphores—how I hate the three ugly sisters! This tends to confirm the suspicion. On IRC, Peter Jeremy pointed me to ipcrm, where, apart from a way to remove dead semaphores, I read:
Callum and Edwin are both on the IRC channel as well. And that was at a time when I was mentor for Edwin, so I should have known all about it. Checked the commit logs and found:
My mind must be failing me.
More work on porting wview to FreeBSD, and now have a clean build of release 5.5.3. Now I need to test it without disrupting the reporting too much. It looks as if it was a good choice to migrate to the latest version rather than search for the bugs in the old one: the area where the bug occurred (reading the configuration) has now changed completely, though not obviously for the better: instead of storing it in (multiple) text files, which I can maintain with RCS, it's now in a database. We'll see.
Part of my weather software is a script that copies the web pages to the external server every 15 minutes. It uses an ssh tunnel to do so, and I found it littering the system with old ssh-agent processes. With a bit of advice, found a couple of environment variables that allowed me to do trap the process on exit:
The trick is knowing about the environment variables; there's also a SSH_AUTH_SOCK which can be of use under some circumstances. I should probably use it to not start any additional ssh-agent processes, but this works for now.
More work on the weather station software today, and found out why the build was so clean: I had included all the code, but I had omitted most of it from the configuration information, so it hadn't been compiled. Normal enough problems once I reattached it, with the exception of the dependencies. Any normal build system has a depend target in the Makefile, but this thing uses GNU autoconf, something about which I have never heard much good. Even 15 years ago, in Porting UNIX software, I pointed out weaknesses; nowadays I'm reminded of a Dijkstra quotation:
If Fortran has been called an infantile disorder, PL/I must be classified as a fatal disease.
Finally found the problem—it seems that the dependencies are built by the configure script, and they base on the variable AC_CONFIG_FILES in configure.in, at least in this case.
More work on wview today. Made some progress, but it's painful. I've had the idea of storing configuration information in a database before, with the Black Box project a couple of years ago. But that was in conjunction with web pages to update it, and of course it used MySQL. This software uses sqlite3, which I don't know, and which is different enough from MySQL that I can't just jump in; instead I need to learn Yet Another Dialect of SQL. And the configuration scripts are still just that, scripts, and not very clever at that. Maybe the intention is to create a web-based configuration system, but the current status seems to have the worst of both methods, and it can also easily lead to the system using two different database systems: there's a provision for storing weather data in a database (MySQL or PostgreSQL, but not sqlite3), but the configuration must be stored in an sqlite3 database. I'm left wondering how much work I want to do on this software.
If I were to believe my weather station today, we're having pretty extreme weather: it reported a low temperature of -1840.3 °C, enough to cause the HTML generator to crash. Clearly more work needed. Took a look at the code, but without better documentation it's really not clear what the best solution is. There are clearly two issues: one is ridiculous temperatures like this one, and the other is sudden changes. How quickly can temperature change? On 7 February 2009 the temperature dropped 15° in 30 minutes; that's presumably about as fast as you'd ever see it.
I joined up the wview mailing list a couple of days ago, and, after finding a way which Google groups didn't reject, replied to a thread about access to the repository (there is no access). As I've already observed, lack of access to the revision history has made things complicated, and I said so. The response (to a message sent with texts completely out of order)?
And that was all. No mention of my concern about access to the revision history. This confirms my opinion about people who can't read beyond the first couple of lines, and one of the reasons I hate reverse chronological documents.
Mark Teel is the principal author of wview, and also the first person I've ever seen to ask me to write messages upside-down. Clearly this is not a list in which I will participate. Still, maybe this is not such a bad thing; there are so many details I don't like about wview that the lack of requirement to feed back my data might turn out to be a relief.
More fun with the wview weather station software today, at least partially because I forgot to apply some of Steve Woodford's patches. The result is that I have archive records with ridiculous values in them, and no way to clean them out. Spent some time investigating that, discovering in the process that the archive file format is not very conducive to such procedures. To be fair to the author, he has since changed it, but it's now stored in a sqlite3 database (even if the rest uses, say, MySQL). I don't intend to follow that method, but I had hoped for an easier way of sanitizing the data.
As it was, added a subdirectory cleanarchive to the utilities directory, and spent most of the time trying to get these horrible GNU automake and friends to accept it. Solution:
By the time I had that finished, I couldn't be bothered doing anything else.
More problems with the WH-1080 weather station today: something went wrong with the communication between station and computer, and it took me a few hours to notice. During this time I got a continual stream of:
Restarting wview got rid of it, so it looks like there's room for improvement. In the meantime I have 1 spurious mm of rain and flat values for all the other parameters, presumably because the entire data record was discarded:
|
The morning after a power failure is always a problem while I pick up the pieces, but this time things looked pretty good. I had powered dereel on (need to do it manually) after the power came back at 2:16, and Yvonne confirmed that she could work normally. brewer and kimchi were up as well, and all seemed OK.
But then brewer went away, and when I went out to look it the display was blank (yes, the new one does have a display), and pressing the reset button just gave me repeated beeps. Brought it inside and remounted the memory, after which it came back normally. Why did that happen?
kimchi was another matter: I got a continual stream of
Wunderground and my own weather records showed complete nonsense. How do you debug that? I don't even know where this information is coming from. I can't debug the wviewd daemon, because it's tied up in this horrible startup script with lots of dependencies and System V semaphores. I tried stopping and restarting wview a few times, but it didn't help. Tried renaming the archive files—they've proven to cause problems in the past—but that didn't help either.
This is ridiculous. What this daemon should be doing is simply reading the data from the device at specified intervals and storing the data in a database, and that can be debugged easily. Clearly a bit of error checking is a good idea (that's what these messages are about, after all), but the complexity of the code is mind-boggling. Decided to write my own.
Problem: documentation, of course. Where do I find what the device does? Read through the code, which shows some of the worst programming style I've ever seen. Got somewhere, but I couldn't find a man page for libusb. It turned out that it has HTML documentation, but by the time I found that, I no longer had time. In addition, there's the very real question of whether the power outage yesterday didn't damage the weather station; I haven't seen problems like this before, and they cropped up immediately after the power failure. I should probably try it with the software they supplied.
One thing, though, that seems to justify the decision to write my own transfer daemon: the information returned by the station is in metric units, sort of. The wind speed, for example, is in units of 0.1 m/s. wview translates everything into American units first, and if you select metric units it translates them back again. In the process it loses a lot of accuracy. 0.1 m/s is 0.36 km/h; wview stores in units of 1 mph (1.6 km/h), and that's what my records show.
Turned my attention to my own code, which had stalled yesterday because of lack of documentation. Looked for the HTML documentation, and it wasn't there either. Discovered the code in /usr/ports/devel/libusb, including a complete list of HTML pages in pkg-plist, but it didn't get installed. Looked for the documentation on the libusb project home page, and with great difficulty found the API documentation, which wasn't brilliant: it didn't even say which header files to include. Found a file called libusb.h, which appears to be the only one. But it wasn't on my system!
Looked again at the code in wview and found references to a file called usb.h, which appears to have been written by the same person, Johannes Erdfelt. But the contents were only marginally related. There's nothing in the comments in usb.h to tell me where it came from, but contains things like:
By contrast, libusb.h contains:
So the two files are completely incompatible, but they both (I think) claim to be libusb. Further investigation shows that the first file (usb.h) is from libusb release 0.1.12, and the latter is the libusb 1.0 API. Gratuitous changes, it would seem, requiring much recoding.
But then I was told that libusb (a “standard” interface) has been included in FreeBSD 8.0-RELEASE. Went to look at my 3 month old 8.0-CURRENT system (swamp again), involving a disk swap in my test machine (previously kimchi). That didn't work: the system didn't come out of self test, not even to the point of displaying anything on the screen. It wasn't the disk: the system didn't get that far, and it didn't respond even with not disk. My best bet is a defective power supply, which would also explain the strange issue with overwriting the wrong file. And that's a typical consequence of one of these power failures.
Unfortunately, I didn't have another power supply—I get through quite a few of them—so went looking for Yvonne's old machine and used that instead. Found a man page for libusb, the first ever. But the contents!
A third interface! And the names include the version number! What kind of nonsense is that? Libraries are for making things portable, not incompatible. I'm told that the latest version of 8.0 RC has a different interface, so started installing that on swamp. But that takes hours, so left it at that.
In the meantime, set to installing the “Easyweather” software supplied with the weather station on pain. That turned out to be easier than I expected: I had already done it. And it confirmed my suspicions: the station is defective. The pressure readings are all over the place, typically in the order of 1700 hPa. It also showed bugs in Easyweather: it was unable to ignore the incorrect readings, making it only marginally functional.
I've drawn a number of consequences from this experience. Firstly, of course, I need a new weather station, and I ordered one; hopefully I can get Powercor to pay for that and the repairs to kimchi.
But more importantly: the whole approach to USB access seems broken to me. It reminds me of the Bad Old Days 40 years ago, where access to every kind of device was different. And there were lots of people who explained why that had to be, and why you couldn't access a card reader with the same subroutines as a printer or a magnetic drum. Then came UNIX with a unified approach and showed that yes, indeed, you can. But it didn't last long: the rot started with interprocess communication, and networking used a different interface again. And now it seems it's a free-for-all. USB is here to stay; why aren't there better interfaces in the kernel to make things like libusb unnecessary? I've decided to use the kernel interfaces now, though I fear that it will mean incompatibility with Linux.
More playing around with the weather station program today, and in the end decided to go with the current libusb implementation, with guidance from the code in wview, after discovering that it's still supported in the more recent versions of libusb, and that there's documentation of a kind:
Name
usb_open -- Opens a USB deviceDescription
usb_dev_handle *usb_open (struct *usb_device dev);
usb_open
is to be used to open up a device for use.
usb_open
must be called before attempting to perform any operations to the device. Returns a handle used in future communication with the device.
And that's the entire documentation. Apart from the interesting use of the term “open up”, it supplies no description of any kind of what happens if things go wrong. What return value? Where is the error information stored? I still don't know, but I found an undocumented function usb_strerror that suggests that it uses something akin to errno (maybe even errno itself). And the HTML is so ugly!
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML
><HEAD
><TITLE
>usb_open</TITLE
><META
...
Worked my way through the code, which simplified considerably as I understood it. It seems that the sequence is:
Call usb_init to start the whole thing off.
Call usb_get_busses to get list of the USB busses on the system. The list members contain a list devices that describe the devices attached to the bus. Search each of these lists looking for an entry with studly members idVendor and idProduct that match the device we're looking for:
Open the device with usb_open.
Claim specific interfaces of the device with usb_claim_interface.
After this, the wview code goes off and issues multiple calls to usb_get_descriptor, for reasons that aren't documented and which don't make much sense to me. One of the calls was incorrect: it read in the wrong descriptor, but then, it ignored it anyway. I suspect a lot of this code has migrated from one package to another, and since it doesn't do any harm, and removing it might, it stays. I'll have to experiment, but currently I'm doing the same thing.
There is other code that is seems more obviously redundant:
Before looking for a device, the code calls usb_find_busses to find out how many USB busses are connected to the system. The documentation claims: “find all of the busses on the system. Returns the number of changes since previous call to this function (total of new busses and busses removed)”. This seems self-contradictory to me, or it's using the word “find” in a non-obvious way.
It also calls usb_find_devices to see how many devices are connected. It's not clear what use this is, and in my case it returned 0 for no obvious reason. Possibly it's a permission issue, but without documentation there's not much you can do.
That's as far as I got. Potentially I just need to read now, but I decided to run ktrace against wview to see what low-level requests are sent, and by coincidence the device “worked”: it returned correct temperature and wind values, and the pressure was within the thresholds, though obviously wrong. So I left it running.
Continued with my work on the USB interface of the Fine Offset WH-1081 weather station, and got the code completed relatively quickly. I'm becoming more and more dubious about some of the calls, and there are weirdnesses like having the data stored in two different places, but at least I now understand what's going on.
So: ran it. Nothing happened, not even a SIGSEGV. Put the thing in gdb—that's easy, in contrast to wviewd—and discovered it was hanging in the very first read. It seems that to read from the device, you first need to send a control message. The reference code I have is full of manifest literals which make it completely unclear what's really going on:
With gdb I was able to confirm that I had set the values correctly—but it didn't read anything. Have I missed out something? Why does this call just hang instead of time out? I can see what's going out to the device with ktrace, but that has very little to do with the calls I've made. The library is the obstacle. It doesn't provide a significantly greater level of functionality than the raw ioctl calls, but it does obfuscate the matter. I need to work out how to continue.
Back to looking at my weather station code today. I was hanging in a call to usb_interrupt_read, and the obvious assumption was that I had not copied the sample code correctly. But how do I compare that? Discovered that, with a couple of tricks, I could use gdb on wviewd: I just needed to stop it becoming a daemon, which you do with a specific parameter to the undocumented function radProcessInit—why use the standard tools when you can write your own?
Then I just needed to start the wview processes, stop wviewd, remove its pid file, and I could run a new copy. That helped, but it only confirmed that I hadn't forgotten any calls.
So, what next? Checked with ktrace and gdb to see what calls were issued by each library function. usb_interrupt_read surprised me: I had already established that it had set errno to 6 (ENODEV: device does not exist), and then I saw that one of the things it did was:
Why was it doing that? I had already opened /dev/ugen0.00 with the less surprising usb_open. And how do I find out why? Found the sources to libusb, and found, in usb_interrupt_read:
There's an implicit call to open in there! Talk about lack of POLA! Looked through the code of ensure_ep_open, but it delved into cookies in internal structures and ultimately turned me off with some preprocessor macro whose definition I couldn't find. To add to the problem, it seems that the i386 function entry sequence has changed in gcc version 4, so reading the assembler code was particularly painful.
Looking at the kdump output again, I discovered I had only read half of it:
What on earth is that? The second parameter to open is located in a different place the second time. Different parameters? Further checking confirmed that yes, wviewd did exactly the same thing. So I had been looking in the wrong place for a couple of hours. The reality proved to be far less interesting: I had incorrectly counted the length of the data returned, and a call to usb_interrupt_read will hang if there's no data, even if I set a timeout.
Spent a lot more time trying to access the data, not helped by not knowing whether the device is big-endian or little-endian. It turns out that it's little-endian, but the USB bus parameters are big-endian. Got some kind of (obviously incorrect) data out of the thing and then gave up for the day.
Continued with my weather station software today, gradually cleaning up the strangenesses. It now reads the station every 30 seconds and prints the results:
The pressures are completely wrong, of course, but the device is defective. My new one arrived today (and I'm back on the net), and I was able to confirm with it that the pressures are read correctly, modulo the fact that the device wants you to set the relative pressure manually.
What next? Store the data in a database, maybe get other data—Steve Woodford sent me a program that read all sorts of maxima and minima, though I'm not sure it wouldn't be more robust just to calculate them. And, of course, write web pages to access the data.
The new station required setup, of course, and the documentation, though copious, is difficult to read. Here's what I do to set it up for my purposes. I find the device itself particularly unpleasant to use—I hate touch screens, and this one has particularly poor display contrast. In addition, many of the functions (read maxima and minima, for example) are destructive readout: you can only do it once. So all I do is to set it up for computer access.
Basics: to set parameters, first press the area of the screen that you want to set. The value will blink, and roughly in the bottom middle a + and a - sign will appear, also blinking. You can now do one of three things:
The parameters are not easy to recognize: there's no text to tell you what you're setting, though some of them cause hints to blink. Others are completely confusing, and the documentation doesn't help. I set the following fields:
Press the date field (roughly bottom centre). The entire date lights up. The parameters are:
Press the time field (bottom left). The entire time changes to something like 1cd5. The parameters are:
The pressure field is centre right. The parameters are:
So now I have a little program that can talk to the weather station. What other secrets does it hide? Steve Woodford sent me one program, but it seems that this machine has about 64 kB of memory accessible via the USB bus, and the manual states that it can store 4080 readings. At 16 bytes per reading, that suggests that the archive records take up all memory except for the first 256 bytes. The previous incarnation of the program shows that it updates a specific memory location at frequent intervals, and that there's a field stating how old the entry is. Every 30 minutes it moves on to a new slot, so the archive entries show 48 records per day and should last for 85 days. There's no direct date storage in there, but you can use the “age” field in the archive records to determine the age of the previous one.
Wrote an option to dump the entire memory, and had a surprising amount of trouble. In the end, it turned out that the memory above 0xda00 was invalid. That might make sense, since I haven't had the station for 85 days, but if I understand the algorithm correctly, the current entry should have been at the top of the address space, and in fact it's round 0x1900. More head-scratching required. It probably makes more sense to improve the program to the point where it can replace wview and then look at the other things later.
The weather has changed completely in the last couple of days. After 10 days of hot, dry weather—we had no rain and temperatures up to 38.6°—we collected all the rain we've been missing for the month, a total of 72.9 mm today, on top of another 15.7 mm yesterday. The temperatures were similarly different, only 18.9°—nearly 20° lower. Spent most of the day inside as a result, and apart from the weather station software, also produced a Google map of the gardens we visited on Friday and Saturday. Google Maps is an interesting idea, but it's a real pain to use, and there are many things I either still don't understand, or which are just plain bugs.
On with the weather station today, and once again achieved my goals. Interfacing with Wunderground was more complicated than I had expected: they have relatively detailed instructions about the protocol (it's encoded as a web URL), but unfortunately it appears to be incorrect. It states that you can omit just about anything, so I omitted dew point data, since the station doesn't provide it, and you can calculate it from temperature, pressure and humidity. But it continually reported a dew point temperature of -73.3°, which proves to be close enough to -100 °F that I suspect this is some default value that is inserted if you don't supply it. That's in contrast to other fields, such as pressure, which are just left blank if you don't supply them.
Went looking for algorithms to calculate the dew point. The Wikipedia page supplies both a simple and inaccurate formula and a more accurate one that requires wet bulb temperatures. Finally came across one that I'm going to have to check for accuracy, but at least returns plausible results.
Then worked out a simple web page to show immediate data; I can work on that at my leisure, since the info is also available on Wunderground, and I want to think about how to represent the data. Getting the info to PHP is interesting: I can't just use mmap like I did between the programs. So what format do I use? I can get historical data out of the database, but that seems a bit overkill for the current readings. In the end decided to write (programmatically) a PHP header file that contains the information. That worked, but it highlights the difficulties I have with PHP.
That meant I could swap weather stations, put the new one on dereel, and shut down kimchi. Ah, the peace! But it became clear that the barometric pressures were incorrect: Wunderground wants pressures relative to sea level, while I was sending absolute pressures. More searching for conversion formulae, and I have the feeling that somewhere the conversions to the old measures aren't correct: I'm getting about 1 hPa less reported on the web site than I thought I sent.
Other minor things need looking at. I get this about once an hour:
That looks like a driver issue to me, and it currently causes the program to stop. It should be able to recover (at least by closing and opening the station), but in the meantime I've just put the program start in a shell loop, which works, though it's not very elegant.
So my weather station is working, and all I need to do is to refine things. I now have a date in the weather observations page, and things are working well with Wunderground—or are they? I still have this issue with the barometric pressure. Various sources, including Wikipedia and the wview source code, have a conversion factor of 0.0295299801 hPa to inches of mercury (well, Wikipedia has the reciprocal, 3.386389 for kPa to in Hg), but that doesn't seem to match Wunderground's view of the world. Empirically, discovered that it only registers to a resolution of 0.01 inch of mercury, which corresponds to 34 Pa, and comparison of a number of dual unit reports comes to an average conversion factor of 0.02953553, from which I could probably drop a couple of digits:
That now works within the constraints of resolution. And then I noted that my station had dropped off the WunderMap, possibly because of the number of strange readings I had sent during the day. Decided to wait until tomorrow before doing anything about that.
I had really intended to do other things today, but my bare web page got on my nerves, so set to producing some graphs. I'm using gnuplot, which has the advantage that I have used it before and the disadvantage that it causes to curse and scream every time I use it. Much is definitely the documentation—who ever thought that GNU info was a good idea?—and I couldn't find what I was looking for. Asked on IRC and found I wasn't alone. Rusty Russell uses gnumeric, and Edwin Groothuis uses jpgraph2. They might be worth examining, though the descriptions suggest they don't quite fit into an a shell script environment. Callum Gibson still uses gnuplot, and send me some examples which I haven't looked at yet.
Finally got some semblance of sanity, and the graphs don't look too bad. I can tidy them up as time goes on; it's time I did some other things instead.
So now my weather station software is working and displaying useful information—80% finished, say? Today I started on the other 80%. For the first time since I've had it running, we had rain:
How much did I report to Wunderground? 0.
The problem is the way the station records rainfall: units of 0.3 mm. One program reads the station once a minute, store them into the database and then resets the rainfall, while another program accesses the shared memory every 5 minutes and sends them to Wunderground—so at best there's only a chance of reporting 20% of the rain. Reworked that to maintain a cumulative count of the rain, while the second program maintains its own counter and only reports the difference. That seems to work (now—it took a couple of mistakes), but of course we didn't get any more rain.
More work with gnuplot, which still annoys me. After trying a minor change, ran my plot script again, and instead of writing to the specified file, it vomited raw PNG data all over my screen. There was nothing obviously wrong with the gnuplot script, and it took me some time to realise that it was sensitive to the file name. The file in question existed, and gnuplot didn't have the necessary permissions, so it printed an error message and then wiped it away by writing a flood of binary data to stdout. What a pain!
Tidied up the graphs with help from Callum Gibson and Peter Jeremy, though finding alternate fonts is beyond me. Still, most of the graphs now look good. The exception is, again, the rainfall:
Clearly I'm going to have to write something to smooth things. I'm beginning to understand why wview shows a bar graph; but even that requires some form of summation. I wonder if gnuplot can do that.
Another issue is the barometric pressure. How can you decide whether it's accurate? Looking at the weather observations in my area, I discovered that I was reporting a relative pressure about 15 hPa lower than the official weather stations, another station was reporting pressures another 9 hPa higher, and yet another was reporting a pressure 33 hPa lower—a total difference of 56.6 hPa! There will be differences, of course—that's why we have the stations—but this seems impossible. It's probably due to an incorrectly calibrated pressure sensor. But how do you calibrate them? Guessed that the offset is constant, so now my “configuration file” (still really a C source file) has an entry:
Message from Jason Morgan suggesting that I use the R Project for Statistical Computing for that purpose, though he concedes that it's overkill. Took a look, and it certainly looks interesting. It has its own language that looks relatively legible, though I'll wait until I've looked more carefully before confirming that. Another interesting thing is that it's derived from software written at the University of Adelaide. But for the moment I've found how to bend gnuplot to do what I want, so I'll leave that for another time.
The usual cleanup after a power failure today, but this time there was also a gap in my weather station readings:
There are readings, but they're stored in the station's memory, and without a timestamp, just a “record age”, a number between 1 and 30 representing the time since the record was created. But we have the shared segment file, which contains the last reading, including the age, and also the timestamp of the reading, so it should be relatively trivial to write a function recover_powercor_breakage, in acknowledgement of Powercor's involvement in making it necessary, which would catch up with the current conditions and report a reading every 30 minutes.
And it was. But for the first time I got a hard ENOTTY at a particular place, setting a timeout no less:
Gave up on that—I still have current readings to record, and the information won't wrap around for 3 months, so I have plenty of time to investigate. Instead fired up kimchi again and tried debugging with the other station—and for the first time that I have noticed, I got an ENOTTY on NetBSD as well. That's really strange: there's no hardware and little software in common between the two setups. Is this some kind of bug in the weather station hardware? I suppose I should try it on Linux and Mac OS as well.
This problem reading from my weather station is on the back burner, but it's burning. I've already established that the ENOTTY is a hard error: retries don't help. So the next thing to try was to reset the connection.
How do you do that? The obvious thing would be to simulate what happens when you restart the program, which normally works. So: calls to usb_release_interface and usb_close, followed by the correct initialization code.
Result: SIGSEGV out of a function called by usb_close. Briefly considered looking through the libusb code, but gave that up and just removed the call. That didn't work either: I continued to get ENOTTYs. So another avenue has stalled for the moment. Somehow this stuff should all be in the kernel on a par with other devices.
More work on the weather stations today. Working on the hypothesis that the USB problems could be related to the USB stack (the FreeBSD stack in 7.x is derived from the NetBSD stack), continued my Linux porting and got it finished. Sort of:
=== root@cvr2 (/dev/pts/5) /home/grog/dereel-weather 125 -> ./wh1080
After some discussion, it seems that Linux requires you to first detach the kernel driver which (apparently) claims any new USB device. Tried that, again with little success:
What does that message mean? More head-scratching needed, I suppose. Maybe I need to look at the meaning of the interface parameter.
Also gave up on writing scripts to import data from Wunderground, and wrote a quick and dirty program in C, which did the job, and which is flexible enough to handle changes in format. I suppose I could have done the same in Perl or Python, but I've never found a compelling reason to learn them.
Woke up to the sound of pouring rain—7.2 mm, quite a bit round here. But my weather software reported much drier conditions, with rainfall as low as -0.6 mm. Found the bug pretty quickly, but somehow the rain reporting is still a significant issue. The real problem is that it has to be an interval, and my solution (keeping “last value” and “current value” counters) doesn't really cater for reporting to multiple locations (database, screen, Wunderground, web site). Put in separate “last values” counters, but I still don't like the situation, and I'm pretty sure I'll change it again.
Also looking at comparisons between different weather stations, something that, as far as I know, no other weather software does. Importing the data is quite complicated; I had always thought that CSV was pretty straightforward, but only if you have the same formats as the supplier. I've already established that Wunderground has two different formats, but the Australian Bureau of Meteorology has yet another. Wrote another program, based on yesterday's Wunderground import program. Fortunately that went pretty smoothly. I can see myself writing quite a few of these; maybe I should combine them into one. Also reorganized the database daily observations table so that it has the same format for local and remote observations.
One of the things that worries me about my weather station is the accuracy. I've already established that the pressure gauge returns a value that, after correction to mean sea level, is about 13.5 hPa lower than the official weather stations, and I've added a correction factor. But is the error linear? And what do other stations report? Spent some time working out a comparative display page for a number of the weather stations in the area, in the process discovering a number of deficiencies in yesterday's import programs, notably that this stupid 12 hour time causes more problems than I expected: MySQL happily imports 9:00 AM and 9:00 PM to both mean 09:00.
The comparison page shows a number of things:
My gnuplot scripts are still pretty terrible. The legend in the small graphs overlays the right of the graphs, and the colours are difficult to recognize.
My pressure error does indeed seem to be linear, at least for the current pressure range.
As I've noticed before, other stations are wildly inaccurate about air pressure. Bacchus Marsh airport shows values in the order of 35 hPa lower than the official weather stations, and Delacombe is nearly 10 hPa higher:
The temperature here in Dereel seems much higher than elsewhere. I've noticed this before too, but in this case I think it's correct: I have three different outside thermometers, and they all agree. It is warmer here than in Ballarat, and it's interesting to note that the night temperatures are similar:
The wind speeds here seem lower. I'm not sure that this is correct, but it's difficult to know how to determine whether they're accurate or not.
Rain again today, and noted with satisfaction that I'm recording it correctly. Or am I? Looking at the Wunderground history, I saw no rain at all. Checking what I sent confirmed that the bug was at my end (not a foregone conclusion). What a mess this code is! Did a bit of investigation, but basically decided that I need to solve this problem differently. It should be possible to start the “report” program as many times as I want, and that breaks the current implementation.
Instead looked at the database approach: have a program pull the data out of the database and send it to Wunderground. That way I can not only sum the rainfall, but also average temperatures and pressures, which should potentially give more accurate results.
And accurate results are still an issue. According to the weather station, we had 12.9 mm rain today; but the manual rain gauge that I still maintain tells me that it was only 8.2 mm. Which should I believe? It's difficult to believe that the manual rain gauge is that inaccurate, though I have my concerns about evaporation on warmer days, but maybe the different location makes that much difference. To be observed.
Spent much of the day playing around with a revised version of the report program for Wunderground, not in itself a big problem. But it's so ugly! Maybe I need to find other ways of doing things, but the canonical way with the MySQL C API delivers everything in an array of char *, and any column can be NULL. In my case, nearly all the information is floating point, so I end up with lots of stuff like:
One change to the query and I have 100 lines of this filth to check through. There must be an easier way.
What an amazing change in the weather! Yesterday we had a maximum of 40.4°; today it was 22.1°, and that at 2:42, only because it was still cooling down. By the time I got up it had dropped under 20°.
Ideal weather, then, to play around with my weather software, and did a lot of work on the graphical representations, which really needed it. Included a graph of the temperature differences between here and Ballarat. That's not as simple as it seems: the readings take place at different times, so I have to interpolate. Here are the results for the last three days:
My observations that it tends to be warmer here than in Ballarat are somewhat confirmed, but today was an exception. I'll have to watch for a longer period of time.
That was the good news; the bad news is that gnuplot once again drove me to distraction. The following patch completely broke things:
It seems that the range is dependent on the format; but the error messages were completely unrelated, and it took me over an hour to find the cause. What a pain this stuff is!
That wasn't the only problem: we had 36 mm rain today, but looking at my Weather station history on Wunderground, it seemed that we had none. Further investigation showed that I had been reporting the rain incorrectly: instead of reporting the actual rain that has fallen, you need to report the total rainfall for the previous hour. That means that the totals you report depend on the frequency of reporting. Why would anybody want to do that? Fixed it up, so the Wunderground page shows the greatest rainfall between 18:30 and 20:30, when in fact the rainfall was like this:
It also means that I have to do some head-scratching about how to report the comparative rainfall for other stations; currently I have some that appear to have had over 100 mm rain:
I've really got other things to do, but spent most of the day playing around yet again with the web pages for the weather station. Currently I have the problem that the database is only on my system at home, not on the external web site, so I've been generating static PHP files with the data. Clearly that doesn't scale, so I've just had yesterday and today. Today I addressed the issue at least locally, and things went pretty quickly and smoothly. About the only issue is—how could it not be?— gnuplot. So now I can generate both data and plots for any day, and select them. But I'm still limited in the static choice of plots. It would be nice to have a language to say something like “show me comparative temperatures between Ballarat, Sheoaks and Dereel for the period 3 to 8 December”. That would also make working with these horrible gnuplot scripts easier. But I think it'll be a while before I manage that. The next step will probably be multi-day views of the same information I'm currently displaying only for one day.
Still more playing around with the weather station software. I'm relatively happy with the representation of the daily data, but it would be nice to have information over longer periods of time. That's straightforward enough: I could just modify the daily graphs for weeks, months and years, like wview does, but I wanted something more flexible. Played around with my horrible plot script and got it to more or less work, but now I have the issues with the number of markers on the y axis.
Message in the mail from David Peters, who has bought the same model of weather station as I have, and wanted to know about Steve Woodford's patches to wview. He had started writing his own software for Linux, but didn't want to replicate work already done. Sent him a tarball of my current work—hopefully it won't cause him too many headaches. More interesting, though, is that he managed to talk to the station at all under Linux. He's promised me a copy of his code once he's tidied it up.
Apart from that, more work on the display stuff. I'm making some progress towards multi-day displays. Part of the issue is that I'm not sure what I really want, so this is quite experimental. But it's becoming clear that making all the possible plots for a specific interval uses up a lot of time and disk space. I'll need to translate my plot script into PHP and do it on demand and per plot.
Still thinking about the weather station graphics today. It's (still) clear that my decision was right to rewrite the plotting functions in PHP and perform them only if they're needed. I'll also need a cron job to clean out old plots—I had something like 30 MB worth in there before I did so, and about 20% of all non-photo files on my personal web site. And this is after only a month.
The real problem, though, is: how do I do it right? Somehow I'm always left with this feeling with PHP; different tools do impose different ways of doing things, and in this case I need to run a MySQL query and then run gnuplot against the saved results. Couldn't think of a clean way to do that, so took the coward's way out; with time I'll think of a not completely unacceptable way of doing it.
Greg's home page | Greg's diary | Greg's photos | Copyright |