RCLabels
Matthew Kuperus Heun
2024-01-30
RCLabels.Rmd
Introduction
Working with matrices often requires manipulating row and column
labels to achieve desired outcomes for matrix mathematics. The
RCLabels
package (Row and Column Labels) provides
convenient tools for manipulating those labels.
Use cases
Two applications of matrix mathematics are input-output analysis in economics and physical supply-use table (PSUT) matrices for energy conversion chain (ECC) analysis. In those contexts, row and column labels describe processing stages or flows of goods or services between processing stages. Row and column labels can benefit those applications, ensuring that like quantities are added, subtracted, multiplied, or divided, etc., provided that row and column labels are respected during matrix operations.
One package that respects row and column labels is
matsbyname
, thereby making economic and ECC input-output
analyses easier. Easy manipulation of row and column labels is,
therefore, an enabling capability for using the matsbyname
package. This package (RCLabels
) provides easy manipulation
of row and column labels. In fact, the matsbyname
package
uses RCLabels
functions internally.
Label structure
Row and column labels are always character strings, often with a prefix–suffix structure, where the prefix and suffix are denoted by a separator or delimited in other ways. Example row and column labels include
- “pref -> suff” (separator “->”)
- “pref [suff]” (suffix delimited by “ [” and “]”)
- “(pref) (suff)” (prefix and suffix both surrounded by “(” and “)”)
- “pref.suff” (separator “.”)
Prefixes are usually the “thing” of interest, e.g. an energy carrier (“Coal”) or a processing stage in an energy conversion chain (“Main activity producer electricity plants”). Suffixes are usually modifiers or metadata about the thing (the prefix). Suffixes can describe the destination of an energy carrier (“Light [-> Industry in USA]”). Suffixes can describe the output of a processing stage (“Production [of Coal in ZAR]”).
Working with row and column labels
The RCLabels
package streamlines working with row and
column labels.
Notation
RCLabels
enables creation of notation objects that
describe the structure of a row or column label via the
notation_vec()
function.
# Create a notation object.
my_notation <- notation_vec(pref_start = "(", pref_end = ") ",
suff_start = "[", suff_end = "]")
# Notation objects are character vectors.
my_notation
#> pref_start pref_end suff_start suff_end
#> "(" ") " "[" "]"
Several notation objects are provided for convenience within RCLabels.
arrow_notation
#> pref_start pref_end suff_start suff_end
#> "" " -> " " -> " ""
paren_notation
#> pref_start pref_end suff_start suff_end
#> "" " (" " (" ")"
bracket_notation
#> pref_start pref_end suff_start suff_end
#> "" " [" " [" "]"
first_dot_notation
#> pref_start pref_end suff_start suff_end
#> "" "." "." ""
from_notation
#> pref_start pref_end suff_start suff_end
#> "" " [from " " [from " "]"
of_notation
#> pref_start pref_end suff_start suff_end
#> "" " [of " " [of " "]"
to_notation
#> pref_start pref_end suff_start suff_end
#> "" " [to " " [to " "]"
bracket_arrow_notation
#> pref_start pref_end suff_start suff_end
#> "" " [-> " " [-> " "]"
Note that identical pref_end
and suff_start
values (as shown in all notations above) are interpreted as a single
delimiter throughout the RCLables
package. Empty strings
(""
) mean that no indication is given for the start or end
of a prefix or suffix.
Creating row and column labels
Row and column labels can be created with the
paste_pref_suff()
function
my_label <- paste_pref_suff(pref = "Coal", suff = "from Coal mines in USA",
notation = my_notation)
my_label
#> [1] "(Coal) [from Coal mines in USA]"
Manipulating row and column labels (prefixes and suffixes)
Row and column labels can be manipulated using several helpful functions.
# Split the prefix from the suffix to obtain a named list of strings.
split_pref_suff(my_label, notation = my_notation)
#> $pref
#> [1] "Coal"
#>
#> $suff
#> [1] "from Coal mines in USA"
# Flip the prefix and suffix, maintaining the same notation.
flip_pref_suff(my_label, notation = my_notation)
#> [1] "(from Coal mines in USA) [Coal]"
# Change the notation.
switch_notation(my_label, from = my_notation, to = paren_notation)
#> [1] "Coal (from Coal mines in USA)"
# Change the notation and flip the prefix and suffix.
switch_notation(my_label, from = my_notation, to = paren_notation, flip = TRUE)
#> [1] "from Coal mines in USA (Coal)"
The prefix or suffix can be extracted from a row or column label.
get_pref_suff(my_label, which = "pref", notation = my_notation)
#> pref
#> "Coal"
get_pref_suff(my_label, which = "suff", notation = my_notation)
#> suff
#> "from Coal mines in USA"
Vectors and lists of row and column labels
The functions in RCLabels
work with vectors and lists of
row and column labels.
labels <- c("a [of b in c]", "d [of e in f]", "g [of h in i]")
labels
#> [1] "a [of b in c]" "d [of e in f]" "g [of h in i]"
split_pref_suff(labels, notation = bracket_notation)
#> $pref
#> [1] "a" "d" "g"
#>
#> $suff
#> [1] "of b in c" "of e in f" "of h in i"
This feature means that the functions in RCLabels
can be
used on data frames. Note that transpose = TRUE
ensures
that a single list column is created.
labels
#> [1] "a [of b in c]" "d [of e in f]" "g [of h in i]"
df <- tibble::tibble(labels = labels)
result <- df %>%
dplyr::mutate(
split = split_pref_suff(labels, notation = bracket_notation, transpose = TRUE)
)
result$split[[1]]
#> $pref
#> [1] "a"
#>
#> $suff
#> [1] "of b in c"
result$split[[2]]
#> $pref
#> [1] "d"
#>
#> $suff
#> [1] "of e in f"
result$split[[3]]
#> $pref
#> [1] "g"
#>
#> $suff
#> [1] "of h in i"
Nouns and prepositions
As discussed above, the prefix is often the “thing” of interest, and
the remainder of the label (the suffix) modifies the prefix. This use
case is so common that we introduce additional terms that enable
additional functionality. The prefix is usually a noun (one or
more words), and the suffix usually consists of prepositional
phrases (each consisting of a preposition and an object).
RCLabels
includes a list of common prepositions.
prepositions_list
#> [1] "in" "into" "from" "of" "->" "to"
Working with row and column labels (nouns and prepositions)
RCLabels
supports the “nouns and prepositions” view of
row and column labels with several functions. get_nouns()
extracts the nouns from a row or column label.
labels
#> [1] "a [of b in c]" "d [of e in f]" "g [of h in i]"
# Extract the nouns.
get_nouns(labels, notation = bracket_notation)
#> noun noun noun
#> "a" "d" "g"
# Extract the prepositional phrases.
get_pps(labels, notation = bracket_notation)
#> pps pps pps
#> "of b in c" "of e in f" "of h in i"
# Extract the prepositions themselves.
get_prepositions(labels, notation = bracket_notation)
#> $prepositions
#> [1] "of" "in"
#>
#> $prepositions
#> [1] "of" "in"
#>
#> $prepositions
#> [1] "of" "in"
# Extract the objects of the prepositions.
# Objects are named by the preposition of their phrase.
get_objects(labels, notation = bracket_notation)
#> $objects
#> of in
#> "b" "c"
#>
#> $objects
#> of in
#> "e" "f"
#>
#> $objects
#> of in
#> "h" "i"
# The get_piece() function is a convenience function
# that extracts just what you want.
get_piece(labels, piece = "noun", notation = bracket_notation)
#> noun noun noun
#> "a" "d" "g"
get_piece(labels, piece = "pref")
#> pref pref pref
#> "a" "d" "g"
get_piece(labels, piece = "suff")
#> suff suff suff
#> "of b in c" "of e in f" "of h in i"
get_piece(labels, piece = "of")
#> [[1]]
#> of
#> "b"
#>
#> [[2]]
#> of
#> "e"
#>
#> [[3]]
#> of
#> "h"
get_piece(labels, piece = "in")
#> [[1]]
#> in
#> "c"
#>
#> [[2]]
#> in
#> "f"
#>
#> [[3]]
#> in
#> "i"
# An empty string is returned when the preposition is missing.
get_piece(labels, piece = "bogus")
#> [[1]]
#> bogus
#> ""
#>
#> [[2]]
#> bogus
#> ""
#>
#> [[3]]
#> bogus
#> ""
Labels can be split into their component pieces.
labels
#> [1] "a [of b in c]" "d [of e in f]" "g [of h in i]"
# Split the labels into pieces, named by "noun" and prepositions.
split_labels <- split_noun_pp(labels,
prepositions = prepositions_list,
notation = bracket_notation)
split_labels
#> [[1]]
#> noun of in
#> "a" "b" "c"
#>
#> [[2]]
#> noun of in
#> "d" "e" "f"
#>
#> [[3]]
#> noun of in
#> "g" "h" "i"
# Recombine split labels.
paste_noun_pp(split_labels, notation = bracket_notation)
#> [1] "a [of b in c]" "d [of e in f]" "g [of h in i]"
# Recombine with a new notation.
paste_noun_pp(split_labels, notation = paren_notation)
#> [1] "a (of b in c)" "d (of e in f)" "g (of h in i)"
Modifying row and column labels
To modify row and column labels, use one of the modify_*
functions.
labels
#> [1] "a [of b in c]" "d [of e in f]" "g [of h in i]"
# Set new values for nouns.
modify_nouns(labels,
new_nouns = c("Coal", "Oil", "Natural gas"),
notation = bracket_notation)
#> [1] "Coal [of b in c]" "Oil [of e in f]"
#> [3] "Natural gas [of h in i]"
To modify other pieces of labels, use the
modify_label_pieces()
function.
modify_label_pieces()
enables assigning new values using a
“one-to-many” approach that enables aggregation.
labels
#> [1] "a [of b in c]" "d [of e in f]" "g [of h in i]"
# Change nouns in several labels to "Production" and "Manufacture",
# as indicated by the modification map.
modify_label_pieces(labels,
piece = "noun",
mod_map = list(Production = c("a", "b", "c", "d"),
Manufacture = c("g", "h", "i", "j")),
notation = bracket_notation)
#> [1] "Production [of b in c]" "Production [of e in f]"
#> [3] "Manufacture [of h in i]"
# Change the objects of the "in" preposition,
# according to the modification map.
modify_label_pieces(labels,
piece = "in",
mod_map = list(GHA = "c", ZAF = c("f", "i")),
notation = bracket_notation)
#> [1] "a [of b in GHA]" "d [of e in ZAF]" "g [of h in ZAF]"
# Change the objects of "of" prepositions,
# according to the modification map.
modify_label_pieces(labels,
piece = "of",
mod_map = list(Coal = "b", `Crude oil` = c("e", "h")),
notation = bracket_notation)
#> [1] "a [of Coal in c]" "d [of Crude oil in f]" "g [of Crude oil in i]"
To eliminate a piece of a label altogether, use the
remove_label_pieces()
function.
labels
#> [1] "a [of b in c]" "d [of e in f]" "g [of h in i]"
# Eliminate all of the prepositional phrases that begin with "in".
remove_label_pieces(labels,
piece = "in",
notation = bracket_notation)
#> [1] "a [of b]" "d [of e]" "g [of h]"
# Eliminate all of the prepositional phrases that begin with "of" and "in".
# Note that some spaces remain.
remove_label_pieces(labels,
piece = c("of", "in"),
notation = bracket_notation)
#> [1] "a [ ]" "d [ ]" "g [ ]"
With much power comes much responsibility!
Detecting strings in labels
There are times when it is helpful to know if a string is in a label.
match_by_pattern()
searches for matches in row and column
labels by regular expression. Internally,
match_by_pattern()
uses grepl()
for regular
expression matching.
labels <- c("Production [of b in c]", "d [of Coal in f]", "g [of h in USA]")
# With default `pieces` argument, matching is done for whole labels.
match_by_pattern(labels, regex_pattern = "Production")
#> [1] TRUE FALSE FALSE
match_by_pattern(labels, regex_pattern = "Coal")
#> [1] FALSE TRUE FALSE
match_by_pattern(labels, regex_pattern = "USA")
#> [1] FALSE FALSE TRUE
# Check beginnings of labels: match!
match_by_pattern(labels, regex_pattern = "^Production")
#> [1] TRUE FALSE FALSE
# Check at ends of labels: no match!
match_by_pattern(labels, regex_pattern = "Production$")
#> [1] FALSE FALSE FALSE
# Search by prefix or suffix.
match_by_pattern(labels, regex_pattern = "Production", pieces = "pref")
#> [1] TRUE FALSE FALSE
match_by_pattern(labels, regex_pattern = "Production", pieces = "suff")
#> [1] FALSE FALSE FALSE
# When pieces is "pref" or "suff", only one can be specified.
# The following function call gives an error.
# match_by_pattern(labels, regex_pattern = "Production", pieces = c("pref", "to"))
# Search by noun or preposition.
match_by_pattern(labels, regex_pattern = "Production", pieces = "noun")
#> [1] TRUE FALSE FALSE
match_by_pattern(labels, regex_pattern = "Production", pieces = "in")
#> [1] FALSE FALSE FALSE
# Searching can be done with complicated regex patterns.
match_by_pattern(labels,
regex_pattern = make_or_pattern(c("c", "f")),
pieces = "in")
#> [1] TRUE TRUE FALSE
match_by_pattern(labels,
regex_pattern = make_or_pattern(c("b", "Coal", "USA")),
pieces = "in")
#> [1] FALSE FALSE TRUE
match_by_pattern(labels,
regex_pattern = make_or_pattern(c("b", "Coal", "USA")),
pieces = c("of", "in"))
#> [1] TRUE TRUE TRUE
# Works with custom lists of prepositions.
match_by_pattern(labels,
regex_pattern = make_or_pattern(c("b", "Coal", "GBR", "USA")),
pieces = c("noun", "of", "in", "to"),
prepositions = c("of", "to", "in"))
#> [1] TRUE TRUE TRUE
Replacing strings in labels
There are times when it is helpful to replace strings in labels. The
replace_by_pattern()
function will replace strings in row
and column labels by regular expression pattern. Note that
replace_by_pattern()
is similar to
match_by_pattern()
, except
replace_by_pattern()
has an additional argument,
replacement
. Internally, replace_by_pattern()
uses gsub()
to perform regular expression matching.
labels <- c("Production [of b in c]", "d [of Coal in f]", "g [of h in USA]")
labels
#> [1] "Production [of b in c]" "d [of Coal in f]" "g [of h in USA]"
# If `pieces = "all"` (the default), the entire label is available for replacements.
replace_by_pattern(labels,
regex_pattern = "Production",
replacement = "Manufacture")
#> [1] "Manufacture [of b in c]" "d [of Coal in f]"
#> [3] "g [of h in USA]"
replace_by_pattern(labels,
regex_pattern = "Coal",
replacement = "Oil")
#> [1] "Production [of b in c]" "d [of Oil in f]" "g [of h in USA]"
replace_by_pattern(labels,
regex_pattern = "USA",
replacement = "GHA")
#> [1] "Production [of b in c]" "d [of Coal in f]" "g [of h in GHA]"
# Replace by prefix and suffix.
replace_by_pattern(labels,
regex_pattern = "Production",
replacement = "Manufacture",
pieces = "pref")
#> [1] "Manufacture [of b in c]" "d [of Coal in f]"
#> [3] "g [of h in USA]"
replace_by_pattern(labels,
regex_pattern = "Coa",
replacement = "Bow",
pieces = "suff")
#> [1] "Production [of b in c]" "d [of Bowl in f]" "g [of h in USA]"
# Nothing should change, because USA is in the suffix.
replace_by_pattern(labels,
regex_pattern = "SA",
replacement = "SSR",
pieces = "pref")
#> [1] "Production [of b in c]" "d [of Coal in f]" "g [of h in USA]"
# Now USA --> USSR, because USA is in the suffix.
replace_by_pattern(labels,
regex_pattern = "SA",
replacement = "SSR",
pieces = "suff")
#> [1] "Production [of b in c]" "d [of Coal in f]" "g [of h in USSR]"
# This will throw an error, because only "pref" or "suff" can be specified.
# replace_by_pattern(labels,
# regex_pattern = "SA",
# replacement = "SSR",
# pieces = c("pref", "suff")
# Replace by noun or preposition.
replace_by_pattern(labels,
regex_pattern = "Production",
replacement = "Manufacture",
pieces = "noun")
#> [1] "Manufacture [of b in c]" "d [of Coal in f]"
#> [3] "g [of h in USA]"
replace_by_pattern(labels,
regex_pattern = "^Pro",
replacement = "Con",
pieces = "noun")
#> [1] "Conduction [of b in c]" "d [of Coal in f]" "g [of h in USA]"
# Won't match: wrong side of string.
replace_by_pattern(labels,
regex_pattern = "Pro$",
replacement = "Con",
pieces = "noun")
#> [1] "Production [of b in c]" "d [of Coal in f]" "g [of h in USA]"
# No change, because "Production" is a noun.
replace_by_pattern(labels,
regex_pattern = "Production",
replacement = "Manufacture",
pieces = "of")
#> [1] "Production [of b in c]" "d [of Coal in f]" "g [of h in USA]"
# Now try with "of".
replace_by_pattern(labels,
regex_pattern = "Coal",
replacement = "Oil",
pieces = "of")
#> [1] "Production [of b in c]" "d [of Oil in f]" "g [of h in USA]"
# No change, because "Coal" is not "in" anything.
replace_by_pattern(labels,
regex_pattern = "Coal",
replacement = "Oil",
pieces = "in")
#> [1] "Production [of b in c]" "d [of Coal in f]" "g [of h in USA]"
# Now try in "in".
replace_by_pattern(labels,
regex_pattern = "USA",
replacement = "GBR",
pieces = "in")
#> [1] "Production [of b in c]" "d [of Coal in f]" "g [of h in GBR]"
replace_by_pattern(labels,
regex_pattern = "A$",
replacement = "upercalifragilisticexpialidocious",
pieces = "in")
#> [1] "Production [of b in c]"
#> [2] "d [of Coal in f]"
#> [3] "g [of h in USupercalifragilisticexpialidocious]"