Retrieving precipitation data from NOAA
This post details how to retrieve the daily precipitation data needed to create the graph demonstrated here.
Getting the data
This graph requires 3 datasets, all of them from a specific NOAA weather station.
- current year (2020) daily precipitation counts
- climatological normal year-to-date precipitation
- daily record high and low year-to-date precipitation
First, choose a weather station which has been around long enough for NOAA to calculate climatological normals. Download the full list as a .txt file. The first and last columns contain unique station IDs.
I’m using the NOAA station at Milwaukee’s primary airport, General Mitchell Intl (MKE). The station code is USW00014839
and the district code is 726400
.
current year weather reports
All of the 2020 daily weather reports are available as individual files for each station in the directory ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2020/
. The URL for the MKE station is ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2020/726400-14839-2020.op.gz
.
This code downloads the data:
library(tidyverse)
general.mitchell.2020 <- read_table(file = "ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2020/726400-14839-2020.op.gz") %>%
# just select station_id, date, and precipitation
select(station_id = 1, year_month_day = 3, precip = PRCP) %>%
separate(year_month_day, sep = c(4,6), into = c("year","month","day")) %>%
mutate(
# this removes the data quality flag
precip = str_remove(precip, "G|I|H"),
# values of 99.99 indicate missing value, replace with 0
precip = as.numeric(replace(precip, precip == 99.99, 0)),
# create cumulative precipitation
cum_precip_2020 = cumsum(precip),
# convert column classes
year = as.numeric(year),
month = as.numeric(month),
day = as.numeric(day),
station_id = as.character(station_id),
# create date variable
date = as.Date(paste(year, month, day, sep = "-"))) %>%
# just keep the variables we want
select(month, day, cum_precip_2020)
## Warning: Missing column names filled in: 'X5' [5], 'X7' [7], 'X9' [9],
## 'X11' [11], 'X13' [13], 'X15' [15]
head(general.mitchell.2020)
## # A tibble: 6 x 3
## month day cum_precip_2020
## <dbl> <dbl> <dbl>
## 1 1 1 0.06
## 2 1 2 0.06
## 3 1 3 0.06
## 4 1 4 0.06
## 5 1 5 0.06
## 6 1 6 0.06
climatological normals
This code downloads the year-to-date precipitation normals for each weather station. After downloading, we filter for just the Milwaukee station.
precip.normals <- read_table("ftp://ftp.ncdc.noaa.gov/pub/data/normals/1981-2010/products/precipitation/ytd-prcp-normal.txt",
col_names = c("station_id", "month", paste(1:31))) %>%
# pivot so that there is one row for each date
pivot_longer(names_to = "day", values_to = "precip",
cols = -c(station_id, month)) %>%
# extract the quality flag from end of precip (details in documentation)
mutate(quality_flag = str_sub(precip, -1),
# replace flag with NA when there is no letter flag
quality_flag = replace(quality_flag, ! quality_flag %in% c("C","P","Q","R","S"), NA),
# remove flags from precipitation
precip = str_remove(precip, "C|P|Q|R|S"),
# convert classes
day = as.numeric(day),
month = as.numeric(month),
precip = as.numeric(precip)) %>%
# remove dates that don't exists
filter(precip != -8888) %>%
# convert precipitation to inches
mutate(precip = precip/100,
precip = replace(precip, precip < 0, NA),
# replace cumprecip is missing with the previous day's value
precip = zoo::na.locf(precip)) %>%
# filter for the desired station
filter(station_id == "USW00014839") %>%
select(-quality_flag)
head(precip.normals)
## # A tibble: 6 x 4
## station_id month day precip
## <chr> <dbl> <dbl> <dbl>
## 1 USW00014839 1 1 0.06
## 2 USW00014839 1 2 0.12
## 3 USW00014839 1 3 0.18
## 4 USW00014839 1 4 0.24
## 5 USW00014839 1 5 0.3
## 6 USW00014839 1 6 0.36
records
As far as I know, NOAA doesn’t provide a dataset with station records, so I had to calculate this myself. You could download each annual file from NOAA’s FTP server by changing the year in the file path above, or you could use NOAA’s data download user interface. Once you’ve acquired the data, calculate the cumulative year-to-date precipitation for each day, then group by day of year. Calculate the standard deviation using just the years 1981-2010, because this is the 30-year period used to calculate the climatological normals retrieved above. Your output should look something like this. You can also download this file using the provided dropbox link.
precip.records <- read_rds(url("https://www.dropbox.com/s/62w9jtx5inae3l8/MKE_precip_records.rds?dl=1"))
head(precip.records)
## # A tibble: 6 x 5
## month day ytd_precip_sd max_cum_precip low_cum_precip
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 0.202 1.11 0
## 2 1 2 0.333 1.25 0
## 3 1 3 0.346 1.26 0
## 4 1 4 0.427 1.41 0
## 5 1 5 0.436 1.41 0
## 6 1 6 0.470 1.58 0
combine the data
Join the three data.frames
by month and day, then create a day-of-year variable.
all.precip.data <- inner_join(precip.normals, precip.records) %>%
left_join(general.mitchell.2020) %>%
mutate(dayofleapyear = 1:length(station_id))
head(all.precip.data)
## # A tibble: 6 x 9
## station_id month day precip ytd_precip_sd max_cum_precip low_cum_precip
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 USW000148… 1 1 0.06 0.202 1.11 0
## 2 USW000148… 1 2 0.12 0.333 1.25 0
## 3 USW000148… 1 3 0.18 0.346 1.26 0
## 4 USW000148… 1 4 0.24 0.427 1.41 0
## 5 USW000148… 1 5 0.3 0.436 1.41 0
## 6 USW000148… 1 6 0.36 0.470 1.58 0
## # … with 2 more variables: cum_precip_2020 <dbl>, dayofleapyear <int>