pipeline.Rmd
There are a few helper functions to help you find out what data sets are available as well as the corresponding editions and versions The ons_available_datasets()
function will return a dataframe with information about all available datasets. The id
column is what you need to download a dataset.
datasets <- ons_available_datasets()
datasets() %>%
select(id)
id
1 cpih01
2 mid-year-pop-est
3 ashe-table-7-hours
4 ashe-table-7-earnings
5 ashe-table-8-hours
6 ashe-table-8-earnings
7 opss-rates
8 opss-membership
9 wellbeing-year-ending
10 wellbeing-local-authority
...
Once you have picked a dataset, you need pick the edition you want. This can be done using ons_available_editions()
.
# Discover the available editions for a particular dataset
ons_available_editions(id = "mid-year-pop-est")
edition
<chr>
1 mid-2018-april-2019-geography
2 mid-2019-april-2020-geography
3 time-series
Finally, you need to find out what what versions are availble for a specific edition of a dataset.
You should now be ready to download the data. Start by specifying where you want the data to downloaded to. The monstr_pipeline_defaults()
returns a default folder structure (without creating it). You can specify the a file path base using the download_root
argument. If you do not specify download_root
, the base file path will be your project root if you are using Rstudio projects and wherever you working directory is set to otherwise. The output from monstr_pipeline_defaults()
is then fed to ons_datasets_setup()
which queries the ONS API to get the relevant information to prepare for downloading the data. Finally, ons_download()
downloads the data. The rest of the piped code reads in, cleans and saves a clean version of the data.
monstr_pipeline_defaults(download_root="/path/to/download/root/") %>%
ons_datasets_setup() %>% # Uses the monstr 'standards' for location and format
ons_dataset_by_id("weekly-deaths-local-authority") %>%
ons_download(format="csv") %>%
monstr_read_file() %>%
monstr_clean() %>%
monstr_write_clean(format="all")
ons_datasets_setup(monstr_pipeline_defaults()) %>%
ons_dataset_by_id("weekly-deaths-local-authority") %>%
ons_download(format="csv")
# file will be in `{{root}}/data/raw/ons/weekly-deaths-local-authority/time-series/vN.csv`
# metadata about the file will be in `{{root}}/data/raw/ons/weekly-deaths-local-authority/time-series/vN.csv.meta.json`
ons_datasets_setup(monstr_pipeline_defaults()) %>%
ons_dataset_by_id("weekly-deaths-local-authority") %>%
ons_download(format="xls")
datasets <- ons_datasets_setup(monstr_pipeline_defaults())
## get the metadata about v4 of the time-series edition of weekly-deaths-local-authority dataset.
wdla4_meta <- datasets %>% ons_dataset_by_id("weekly-deaths-local-authority", edition="time-series", version=4)
# download it
wdla4_meta %>%
monstr_pipeline_defaults() %>%
ons_download(format="csv")
# Or get the latest
wdla_latest <- datasets %>% ons_dataset_by_id("weekly-deaths-local-authority", edition="time-series")
# csv for the web meta data about the schema of the data.
wdla_latest %>% ons_download(format="csv")