Slurp an IEA extended energy balance data file
slurp_iea_to_raw_df.RdThis is the internal helper function that reads IEA data files.
This function reads an IEA extended energy balances .csv file and
converts it to a data frame with appropriately-labeled columns.
One of iea_file or text must be specified, but not both.
The first line of iea_file or text
is expected to start with expected_start_1st_line, and
the second line is expected to start with expected_2nd_line_start, and
it may have any number of commas appended.
(The extra commas might come from opening and re-saving the file in Excel.)
Alternatively, the file may have a first line of expected_simple_start.
If none of these conditions are not met, execution is halted, and
an error message is provided.
Files should have a return character at the end of their final line.
Usage
slurp_iea_to_raw_df(
.iea_file = NULL,
text = NULL,
expected_1st_line_start = ",,TIME",
country = "COUNTRY",
expected_2nd_line_start = paste0(country, ",FLOW,PRODUCT"),
expected_simple_start = expected_2nd_line_start,
ensure_ascii_countries = TRUE
)Arguments
- .iea_file
The path to the raw IEA data file for which quality assurance is desired. Can be a vector of file paths, in which case each file is loaded sequentially and stacked together with
dplyr::bind_rows().- text
A string containing text to be parsed as an IEA file. Can be a vector of text strings, in which case each string is processed sequentially and stacked together with
dplyr::bind_rows().- expected_1st_line_start
The expected start of the first line of
iea_file. Default is ",,TIME".- country
The name of the country column. Default is "COUNTRY".
- expected_2nd_line_start
The expected start of the second line of
iea_file. Default is "COUNTRY,FLOW,PRODUCT".- expected_simple_start
The expected starting of the first line of
iea_file. Default is the value ofexpected_2nd_line_start. Note thatexpected_simple_startis sometimes encountered in data supplied by the IEA. Furthermore,expected_simple_startcould be the format of the file when somebody "helpfully" fiddles with the raw data from the IEA.- ensure_ascii_countries
A boolean that tells whether to convert country names to pure ASCII, removing diacritical marks and accents. Default is
TRUE.
Details
This function is designed to work as more years are added
in columns at the right of the .iea_file,
because column names in the output are constructed from the header line(s) of .iea_file
(which contain years and country, flow, product information).
Extended energy balance data can be obtained from the IEA as a *.ivt file. To export the data for use with the IEATools package, perform the following actions:
Open the *.ivt file in the Beyond 20/20 browser on a relatively high-powered computer with lots of memory, because the file is very large.
Arrange the columns in the following order: "COUNTRY", "FLOW", "PRODUCT", followed by years.
Change to the unit (ktoe or TJ) desired.
Save the results in .csv format. (Saving may take a while.)
This function is vectorized over .iea_file.
Examples
# 2018 and earlier file format
slurp_iea_to_raw_df(text = paste0(",,TIME,1960,1961\n",
"COUNTRY,FLOW,PRODUCT\n",
"World,Production,Hard coal (if no detail),42,43"))
#> COUNTRY FLOW PRODUCT 1960 1961
#> <char> <char> <char> <int> <int>
#> 1: World Production Hard coal (if no detail) 42 43
# With extra commas on the 2nd line
slurp_iea_to_raw_df(text = paste0(",,TIME,1960,1961\n",
"COUNTRY,FLOW,PRODUCT,,,\n",
"World,Production,Hard coal (if no detail),42,43"))
#> COUNTRY FLOW PRODUCT 1960 1961
#> <char> <char> <char> <int> <int>
#> 1: World Production Hard coal (if no detail) 42 43
# With a clean first line (2019 file format)
slurp_iea_to_raw_df(text = paste0("COUNTRY,FLOW,PRODUCT,1960,1961\n",
"World,Production,Hard coal (if no detail),42,43"))
#> COUNTRY FLOW PRODUCT 1960 1961
#> <char> <char> <char> <int> <int>
#> 1: World Production Hard coal (if no detail) 42 43