Getting oldWeather data ship-shape for science
Author of Post: Larry Spencer
Date of Post: Tuesday, December 04, 2018
The oldWeather science team is using the ship logbook observations to help make a three-dimensional global reconstruction of the Earth’s weather in order to understand the past climate of the Earth and better predict its future. This uses the ship observations in exciting and fascinating ways, but it needs them to be processed into a precise format and as accurately as possible. We know that the ship logbook transcriptions made by our volunteers are very accurate, but we still have plenty of challenges in using the data. Some of the entries were incorrect when originally recorded in the logs, and we need to infer some information that is not in the logbook explicitly. Therefore, we have to calculate exact positions and standardized dates from the local dates and location names provided in the logs. We need to get the observations ready to sail!
You might be wondering how all of that transcribed historical weather data is processed and then ends up being used by the scientists after the work of the volunteers has been completed. As one member of the science team, I work on this extensively, and I can tell you that it is a very enjoyable process. For each one of the oldWeather ships available, I meticulously analyze and quality-control the transcribed data for both latitude and longitude positions and weather observations (checking for and removing existing errors in the data) in order to ensure the highest-quality datasets possible. As a way to illustrate just how important this process is, two maps are displayed below. The first one is a “Before Map” that shows the Thetis’s positions and tracks (represented by the gold dots and lines) that were automatically generated by Philip’s software from the place names transcribed from the logbooks. The second one is an “After Map” that shows the ship’s positions and tracks after I carefully analyzed and quality-controlled the geographical positions data from the ship. There is a significant difference between the two maps displayed, with the “After Map” illustrating a much more “polished-up” and realistic version of the ship’s voyage positions and tracks (for instance, the ship not traveling over any land masses).
Figure #1. – “Before Map” of the oldWeather Thetis ship’s voyage positions/tracks throughout the time period of May 01, 1884 – December 31, 1908 before quality-control on the geographical positions data from the ship.
Figure #2. – “After Map” of the oldWeather Thetis ship’s voyage positions/tracks throughout the time period of May 01, 1884 – December 31, 1908 after quality-control on the geographical positions data from the ship.
The transcribed data produced by oldWeather are all stored in one big database. From this database, Philip’s software makes two separate data files for each ship: one for the latitude and longitude positions and one for the weather observations. I then apply a meticulous data analysis/quality-control process to the data that is contained in both of these files for each ship. As a part of this process, to make error-detecting a bit easier, I import the weather data into a spreadsheet and arrange it in order from the least to the greatest values and from the greatest to the least values. This is a quick and easy method of detecting and correcting the most obvious errors that exist in the data. I complete this for the three meteorological variables of surface pressure, 2-meter air temperature, and sea surface temperature. In addition to this, I manually edit the data where errors exist throughout the entire files, both in the geographical positions data file and in the weather observations data file. I use a standard set of criteria as a guide for how to locate errors and perform quality-control on the data where I detect problems. For example, in regards to the surface pressure data, if there are any values provided in the original file that are lower than 28.00 inches of Mercury or if there are any values that are higher than 32.00 inches of Mercury, then I remove that particular value and replace it in the file with an “NA”, which represents “Not Available”, because it would be classified as a “meteorologically-unrealistic” value. This specific criteria is used in applicability to either if there are any values of this type contained in the original file or if there are any of these types of values recorded in the original ship logbook. Another example would be if there are any values that have been transcribed correctly by the volunteers, but are obviously incorrectly recorded in the original ship logbooks, I then either remove and replace the values given in the original file with an “NA”, or I remove and change them to the obviously-correct values, depending upon the exact circumstances. In those particular circumstances, I make a careful comparison of the “obviously-incorrect” values with the values provided directly above and below them in the original ship logbooks and then proceed to manually edit them accordingly. When I have performed and completed this technical process in its entirety, I then convert a given ship’s quality-controlled weather and position data from the transcribed database into a standard format called the International Marine Meteorological Archive (IMMA) format. Each oldWeather ship has a final file that is created for it, which contains the combined weather and position data presented in the IMMA format. The overarching goal of this entire process is to produce standardized data, as accurate as possible, that can be easily used by professional scientists in major research projects.
This IMMA-formatted data is being used in several ways, notably for assimilation into the Twentieth Century Reanalysis Project (20CR) and for distribution through international databases, such as the International Comprehensive Ocean-Atmosphere Data Set (ICOADS) and the International Surface Pressure Databank (ISPD). For example, this IMMA-formatted data for OldWeather3 ships will be archived as an ICOADS auxiliary dataset in the Research Data Archive (RDA) at the National Center for Atmospheric Research (NCAR) located in Boulder, Colorado. The overarching purpose of doing this work is to develop weather and climate models and to better predict and understand extreme, high-impact weather and climate phenomena on a global scale. We would be much less able to successfully accomplish this purpose without our oldWeather volunteers! So, I want to extend a huge “thank-you” to all of our volunteers for contributing so much time, effort, and dedication to this project ….. because it all starts with the very critical step of getting the observations and positions contained in the original ship logbooks accurately transcribed!