Slurp an IEA extended energy balance data file
slurp_iea_to_raw_df.Rd
This is the internal helper function that reads IEA data files.
This function reads an IEA extended energy balances .csv file and
converts it to a data frame with appropriately-labeled columns.
One of iea_file
or text
must be specified, but not both.
The first line of iea_file
or text
is expected to start with expected_start_1st_line
, and
the second line is expected to start with expected_2nd_line_start
, and
it may have any number of commas appended.
(The extra commas might come from opening and re-saving the file in Excel.)
Alternatively, the file may have a first line of expected_simple_start
.
If none of these conditions are not met, execution is halted, and
an error message is provided.
Files should have a return character at the end of their final line.
Usage
slurp_iea_to_raw_df(
.iea_file = NULL,
text = NULL,
expected_1st_line_start = ",,TIME",
country = "COUNTRY",
expected_2nd_line_start = paste0(country, ",FLOW,PRODUCT"),
expected_simple_start = expected_2nd_line_start,
ensure_ascii_countries = TRUE
)
Arguments
- .iea_file
The path to the raw IEA data file for which quality assurance is desired. Can be a vector of file paths, in which case each file is loaded sequentially and stacked together with
dplyr::bind_rows()
.- text
A string containing text to be parsed as an IEA file. Can be a vector of text strings, in which case each string is processed sequentially and stacked together with
dplyr::bind_rows()
.- expected_1st_line_start
The expected start of the first line of
iea_file
. Default is ",,TIME".- country
The name of the country column. Default is "COUNTRY".
- expected_2nd_line_start
The expected start of the second line of
iea_file
. Default is "COUNTRY,FLOW,PRODUCT".- expected_simple_start
The expected starting of the first line of
iea_file
. Default is the value ofexpected_2nd_line_start
. Note thatexpected_simple_start
is sometimes encountered in data supplied by the IEA. Furthermore,expected_simple_start
could be the format of the file when somebody "helpfully" fiddles with the raw data from the IEA.- ensure_ascii_countries
A boolean that tells whether to convert country names to pure ASCII, removing diacritical marks and accents. Default is
TRUE
.
Details
This function is designed to work as more years are added
in columns at the right of the .iea_file
,
because column names in the output are constructed from the header line(s) of .iea_file
(which contain years and country, flow, product information).
Extended energy balance data can be obtained from the IEA as a *.ivt file. To export the data for use with the IEATools package, perform the following actions:
Open the *.ivt file in the Beyond 20/20 browser on a relatively high-powered computer with lots of memory, because the file is very large.
Arrange the columns in the following order: "COUNTRY", "FLOW", "PRODUCT", followed by years.
Change to the unit (ktoe or TJ) desired.
Save the results in .csv format. (Saving may take a while.)
This function is vectorized over .iea_file
.
Examples
# 2018 and earlier file format
slurp_iea_to_raw_df(text = paste0(",,TIME,1960,1961\n",
"COUNTRY,FLOW,PRODUCT\n",
"World,Production,Hard coal (if no detail),42,43"))
#> COUNTRY FLOW PRODUCT 1960 1961
#> <char> <char> <char> <int> <int>
#> 1: World Production Hard coal (if no detail) 42 43
# With extra commas on the 2nd line
slurp_iea_to_raw_df(text = paste0(",,TIME,1960,1961\n",
"COUNTRY,FLOW,PRODUCT,,,\n",
"World,Production,Hard coal (if no detail),42,43"))
#> COUNTRY FLOW PRODUCT 1960 1961
#> <char> <char> <char> <int> <int>
#> 1: World Production Hard coal (if no detail) 42 43
# With a clean first line (2019 file format)
slurp_iea_to_raw_df(text = paste0("COUNTRY,FLOW,PRODUCT,1960,1961\n",
"World,Production,Hard coal (if no detail),42,43"))
#> COUNTRY FLOW PRODUCT 1960 1961
#> <char> <char> <char> <int> <int>
#> 1: World Production Hard coal (if no detail) 42 43