Friday 29 January 2010

GHCN v2

GHCN v2 is a database that contains monthly mean temperature data for thousands of stations across the world. This data was gathered from existing databases and by requesting station data from various countries in the late 90s. Since then a subset of the data is updated each month - those countries or stations that support monthly reporting.

Two papers that describe how the data was compiled, what it is, etc can be downloaded here: http://www.ncdc.noaa.gov/oa/climate/ghcn-monthly/index.php

GHCN v2 data can be downloaded from here: ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/zipd/

Everything in this folder is ziped (even the readme). I'll skip some detail and mention some of the more interesting parts of these files.

readme.temperature describes the files and their format.

The file v2.mean contains raw mean monthly station data for all stations in GHCN v2. It has undergone quality assurance, but not homogenization. Each line in this file is a year of absolute monthly temperature data for a station. So for example here are three lines from the file:

1016035500001968 109 123 130 160 181 207 247 241 223 191 162 128
1016035500001969 117 114 134 149 182 194 223 241 220 185 161 108
1016035500001970 129 110 122 138 165 211 232 249 233-9999 148 114

I've bolded the year in each line. The 12 numbers after the year are the absolute mean temperature reading for each month of that year in tenths of a degree C (so divide by 10 to get the actual absolute mean). You can see that -9999 is used to denote a month with missing data. The 12 digits before the year essentially identify the station.

Here's is another line to demonstrate something:

1016035500011970 129 110 122 138 165 211 232 249 233 184 148 114

Note that this line shares the same first 11 digits, but the 12th digit (marked in red) differs from the three above. This is actually the same station. During the GHCN data gathering process more than one monthly mean temperature record was found for this station, and they were different enough that GHCN included both and marked them as duplicates. The 12th digit is a 'duplicate' code. In this case both record '0' and record '1' have identical values for the months in 1970, although record '1' is not missing the value for October.

The file v2.mean_adj is the same format, but it contains adjusted monthly mean temperature. Adjusted as in homogenized.

The file v2.country.codes is obviously named. Here's a line from the file

101 ALGERIA

That tells us the station we were looking at above is in Algeria. The first 3 digits of the station identifier are the country code.

The file v2.temperature.inv contains a list of all stations in GHCN v2 and information about those stations. Each line is info for a station. Here's an example from the file:

10160355000 SKIKDA 36.93 6.95 7 18U 107HIxxCO 1x-9WARM DECIDUOUS C

The first 11 digits are the station identifier. It's the station we were looking at earlier. It also has the name of the station location, Skikda, then it's Longitude and Latitude, elevation above sea level, and some other info about the station location.

I am first going to focus on the raw data file first. Tomorrow I will write a program to load the data.

I changed the template for the blog, the old template was ridiculous. Even with a wide monitor the blog text was constrained within a ridiculous width. I am amazed that any such template could exist - it's unreadable. The new template should allow the text to fit the width of the window. I also removed pointless clutter like the "followers" box and the "about me" box.


No comments:

Post a Comment