Retrieving precipitation data from NOAA

Fri, Jun 5, 2020 4-minute read

This post details how to retrieve the daily precipitation data needed to create the graph demonstrated here.

Getting the data

This graph requires 3 datasets, all of them from a specific NOAA weather station.

  • current year (2020) daily precipitation counts
  • climatological normal year-to-date precipitation
  • daily record high and low year-to-date precipitation

First, choose a weather station which has been around long enough for NOAA to calculate climatological normals. Download the full list as a .txt file. The first and last columns contain unique station IDs.

I’m using the NOAA station at Milwaukee’s primary airport, General Mitchell Intl (MKE). The station code is USW00014839 and the district code is 726400.

current year weather reports

All of the 2020 daily weather reports are available as individual files for each station in the directory ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2020/. The URL for the MKE station is ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2020/726400-14839-2020.op.gz.

This code downloads the data:

library(tidyverse)

general.mitchell.2020 <- read_table(file = "ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2020/726400-14839-2020.op.gz") %>%
  # just select station_id, date, and precipitation
  select(station_id = 1, year_month_day = 3, precip = PRCP) %>%
  separate(year_month_day, sep = c(4,6), into = c("year","month","day")) %>%
  mutate(
    # this removes the data quality flag
    precip = str_remove(precip, "G|I|H"),
    # values of 99.99 indicate missing value, replace with 0
    precip = as.numeric(replace(precip, precip == 99.99, 0)),
    # create cumulative precipitation
    cum_precip_2020 = cumsum(precip),
    # convert column classes
    year = as.numeric(year),
    month = as.numeric(month),
    day = as.numeric(day),
    station_id = as.character(station_id),
    # create date variable
    date = as.Date(paste(year, month, day, sep = "-"))) %>%
  # just keep the variables we want
  select(month, day, cum_precip_2020)
## Warning: Missing column names filled in: 'X5' [5], 'X7' [7], 'X9' [9],
## 'X11' [11], 'X13' [13], 'X15' [15]
head(general.mitchell.2020)
## # A tibble: 6 x 3
##   month   day cum_precip_2020
##   <dbl> <dbl>           <dbl>
## 1     1     1            0.06
## 2     1     2            0.06
## 3     1     3            0.06
## 4     1     4            0.06
## 5     1     5            0.06
## 6     1     6            0.06

climatological normals

This code downloads the year-to-date precipitation normals for each weather station. After downloading, we filter for just the Milwaukee station.

precip.normals <- read_table("ftp://ftp.ncdc.noaa.gov/pub/data/normals/1981-2010/products/precipitation/ytd-prcp-normal.txt",
                             col_names = c("station_id", "month", paste(1:31))) %>%
  # pivot so that there is one row for each date
  pivot_longer(names_to = "day", values_to = "precip",
               cols = -c(station_id, month)) %>%
  # extract the quality flag from end of precip (details in documentation)
  mutate(quality_flag = str_sub(precip, -1),
         # replace flag with NA when there is no letter flag
         quality_flag = replace(quality_flag, ! quality_flag %in% c("C","P","Q","R","S"), NA),
         # remove flags from precipitation
         precip = str_remove(precip, "C|P|Q|R|S"),
         # convert classes
         day = as.numeric(day),
         month = as.numeric(month),
         precip = as.numeric(precip)) %>%
  # remove dates that don't exists
  filter(precip != -8888) %>%
  # convert precipitation to inches
  mutate(precip = precip/100,
         precip = replace(precip, precip < 0, NA),
         # replace cumprecip is missing with the previous day's value
         precip = zoo::na.locf(precip)) %>%
  # filter for the desired station
  filter(station_id == "USW00014839") %>%
  select(-quality_flag)
head(precip.normals)
## # A tibble: 6 x 4
##   station_id  month   day precip
##   <chr>       <dbl> <dbl>  <dbl>
## 1 USW00014839     1     1   0.06
## 2 USW00014839     1     2   0.12
## 3 USW00014839     1     3   0.18
## 4 USW00014839     1     4   0.24
## 5 USW00014839     1     5   0.3 
## 6 USW00014839     1     6   0.36

records

As far as I know, NOAA doesn’t provide a dataset with station records, so I had to calculate this myself. You could download each annual file from NOAA’s FTP server by changing the year in the file path above, or you could use NOAA’s data download user interface. Once you’ve acquired the data, calculate the cumulative year-to-date precipitation for each day, then group by day of year. Calculate the standard deviation using just the years 1981-2010, because this is the 30-year period used to calculate the climatological normals retrieved above. Your output should look something like this. You can also download this file using the provided dropbox link.

precip.records <- read_rds(url("https://www.dropbox.com/s/62w9jtx5inae3l8/MKE_precip_records.rds?dl=1"))
head(precip.records)
## # A tibble: 6 x 5
##   month   day ytd_precip_sd max_cum_precip low_cum_precip
##   <dbl> <dbl>         <dbl>          <dbl>          <dbl>
## 1     1     1         0.202           1.11              0
## 2     1     2         0.333           1.25              0
## 3     1     3         0.346           1.26              0
## 4     1     4         0.427           1.41              0
## 5     1     5         0.436           1.41              0
## 6     1     6         0.470           1.58              0

combine the data

Join the three data.frames by month and day, then create a day-of-year variable.

all.precip.data <- inner_join(precip.normals, precip.records) %>%
  left_join(general.mitchell.2020) %>%
  mutate(dayofleapyear = 1:length(station_id))
head(all.precip.data)
## # A tibble: 6 x 9
##   station_id month   day precip ytd_precip_sd max_cum_precip low_cum_precip
##   <chr>      <dbl> <dbl>  <dbl>         <dbl>          <dbl>          <dbl>
## 1 USW000148…     1     1   0.06         0.202           1.11              0
## 2 USW000148…     1     2   0.12         0.333           1.25              0
## 3 USW000148…     1     3   0.18         0.346           1.26              0
## 4 USW000148…     1     4   0.24         0.427           1.41              0
## 5 USW000148…     1     5   0.3          0.436           1.41              0
## 6 USW000148…     1     6   0.36         0.470           1.58              0
## # … with 2 more variables: cum_precip_2020 <dbl>, dayofleapyear <int>