Perform quality assurance on a raw IEA data file
iea_file_OK.Rd
When starting to work with an IEA data file,
it is important to verify its integrity.
This function performs some validation tests on .iea_file
.
Usage
iea_file_OK(
.iea_file = NULL,
text = NULL,
expected_1st_line_start = ",,TIME",
expected_2nd_line_start = "COUNTRY,FLOW,PRODUCT",
expected_simple_start = expected_2nd_line_start,
.slurped_iea_df = NULL,
country = "COUNTRY",
flow = "FLOW",
product = "PRODUCT",
rowid = "rowid"
)
Arguments
- .iea_file
The path to the raw IEA data file for which quality assurance is desired. Can be a vector of file paths, in which case each file is loaded sequentially and stacked together with
dplyr::bind_rows()
.- text
a string containing text to be parsed as an IEA file.
- expected_1st_line_start
the expected start of the first line of
iea_file
. Default is ",,TIME".- expected_2nd_line_start
the expected start of the second line of
iea_file
. Default is "COUNTRY,FLOW,PRODUCT".- expected_simple_start
the expected starting of the first line of
iea_file
. Default is the value ofexpected_2nd_line_start
. Note thatexpected_simple_start
is sometimes encountered in data supplied by the IEA. Furthermore,expected_simple_start
could be the format of the file when somebody "helpfully" fiddles with the raw data from the IEA.- .slurped_iea_df
a data frame created by
slurp_iea_to_raw_df()
- country
the name of the country column. Default is "COUNTRY".
- flow
the name of the flow column. Default is "FLOW".
- product
the name of the product column. Default is "PRODUCT".
- rowid
the name of a row number column added internally to
.iea_file
per country. Default is "rowid".
Details
At this time, the only verification step performed by this function is confirming that every country has the same flow and product rows in the same order. The approach is to add a per-country row number column to the data frame and delete all the data in year columns. Then, the resulting data frame is queried for duplicate row numbers. If none are found, the function returns the data frame read from the file.
Note that .iea_file
is read internally with data.table::fread()
without stripping white space.
If .slurped_iea_df
is supplied, arguments .iea_file
or text
are ignored.
If .slurped_iea_df
is NULL
(the default),
either .iea_file
or text
are required, and
the helper function slurp_iea_to_raw_df()
is called internally
to load a raw data frame of data.
This function is vectorized over .iea_file
.
Examples
library(magrittr)
sample_iea_data_path() %>%
iea_file_OK()
#> [1] TRUE