Better than the Defence.
One question I’m asked again and again by people encountering OldWeather for the first time is ‘How accurate are the transcriptions?’. We’ve known for a while that the answer is ‘very accurate’, but it’s always nice to be precise about such things, so just how accurate are we?
To find out, let’s look at HMS Defence, which we followed through much of 1914 and 1915, on a voyage from the Dardanelles, to Montevideo, to South Africa, and then back to the UK and patrol in the North Sea. The figure shows the air temperature and pressure recorded during this voyage.
We can see clearly in this image the date when they stopped cruising in tropical and sub-tropical oceans, and returned to the colder and stormier seas around Great Britain – around the beginning of 1915 the air temperature fell by around 30F and the pressure became much more variable. But looking closely at the image, we can also see some errors, both ours and those of the mariners writing the logs in the first place.
We can spot our own errors because each log page is transcribed by at least three people, and when those three people disagree, someone has made a mistake. The logs of the Defence yielded 1119 pressure observations (six a day for about 6 months). For 997 of those observations (89%) everyone who transcribed the observation agreed what it was; for 107 of the observations (10%) two or more of the transcribers agreed on a value, but 1 person disagreed; and for the remaining 15 observations (1.3%) the transcribers did not agree, there was no value with a clear majority of the inputs. (The values entered by individuals that did not agree with the majority are shown in the figure as small red points.)
From the first two categories we can estimate the transcription error rate: in 997*3+107*2=3205 cases the value entered is correct, and in 107 cases it is incorrect, so the error rate is 107/3312 – about 3%. So transcriptions are about 97% accurate – in other words, about 97% of the time the value entered by an individual transcriber is the value that most people would agree is written in the logs – an excellent individual accuracy rate.
If you are familiar with statistics, you may have spotted an inconsistency here: if one person makes a mistake 3% of the time, at least two out of three people should make a mistake on the same observation only about 0.3% of the time (3%*3%*3), while actually this happens much more often than that (1.3% of the time). The reason for this excess of cases where all the transcribers disagree, is that some of the entries are illegible. For example, consider the barometer height at 4am in the log for Thursday 10th September 1914; this was variously transcribed as ‘30.18’, ‘30.10’, and ‘30.12’ – all of which are plausible readings. In this case there is no one answer we can agree on and the disagreement is not a transcription error but a success – we have flagged an entry which cannot be transcribed with confidence. (This is why we encourage you to guess when entering hard-to-read values, when everybody guesses a different answer we know the entry is illegible.)
Even when we have transcribed a value with certainty it may not be correct – sometimes the log-keepers wrote the wrong value in the log: There is no doubt that the barometer height entered for midnight on Wednesday 7th October 1914 is ‘28.80’ inches, but there is also no doubt that the actual pressure was much higher than this (possibly ‘29.80’), and this error can be seen as the first of the three spikes in the figure above. So there are three errors in the log big enough to be obvious in the plot, and probably others with a smaller effect.
This post has turned out much longer and more complicated than I planned – mostly because the definition of ‘transcription error’ from a logbook containing erroneous and illegible entries is not simple – so, in summary:
- Individual transcriptions are about 97% accurate
- Of 1000 transcribed logbook entries:
- 3 will be lost because of transcription errors
- 10 will be illegible
- At least 3 will be errors in the logs
So for every 16 errors in the transcribed data (which we pass to the science team), only 3 are the responsibility of those of us reading the logs; the other 13 are the problems in the logs themselves. We can say with some confidence that we are better at reading the logs than the original log-keepers were at writing them.
Congratulations to captain ebaldwin and the crew for an excellent job on HMS Defence; and to all the oldWeather participants, as the accuracy of transcription is similarly high on all the ships I’ve looked at.