Aggregate a matrix by pieces of row and/or column names
Source:R/aggregates.R
aggregate_pieces_byname.Rd
Aggregate a matrix (or list of matrices or a column in a matsindf
data frame)
by pieces of the row and column names.
Usage
aggregate_pieces_byname(
a,
piece,
margin = list(c(1, 2)),
inf_notation = TRUE,
notation = list(RCLabels::notations_list),
choose_most_specific = FALSE,
prepositions = list(RCLabels::prepositions_list),
aggregation_map = NULL,
pattern_type = "exact"
)
Arguments
- a
A matrix or list of matrices.
- piece
A character string indicating which piece of the row or column names to retain, one of "noun", "pps", "pref" or "suff", or a preposition, indicating which part of the row or column name is to be retained.
- margin
As a character, the row type or column type to be renamed. As an integer, the margin to be renamed. Default is
c(1, 2)
, meaning that both rows (margin = 1
) and columns (margin = 2
) will be renamed.- inf_notation
A boolean that tells whether to infer notation. Default is
TRUE
.- notation
The notation used for row and column labels. Default is
list(RCLabels::notations_list)
. The default value is wrapped in a list, becauseRCLabels::notations_list
is, itself, a list. SeeRCLabels
.- choose_most_specific
A boolean that indicates whether the most-specific notation will be inferred when more than one of
notation
matches a row or column label andallow_multiple = FALSE
. WhenFALSE
, the first matching notation innotations
is returned whenallow_multiple = FALSE
. Default isFALSE
.- prepositions
Prepositions that can be used in the row and column label. Default is
RCLabels::prepositions_list
.- aggregation_map
A named list of rows or columns to be aggregated (or
NULL
). Seedetails
.- pattern_type
See
RCLabels::make_or_pattern()
. Default is "exact".
Details
This is a convenience function that bundles two others
for common use cases:
rename_to_piece_byname()
followed by aggregate_byname()
.
Note that after renaming to the piece,
there may be rows or columns that are identically named.
If those identically named names aren't included in the aggregation_map
,
an error will result.
So, aggregate_byname()
is called twice;
first with aggregation_map = NULL
to sweep up any
rows or columns that are identically named
after renaming and
second with aggregation_map = aggregation_map
to
sum the desired rows or columns.
See examples.
When aggregation_map
is NULL
(the default),
rows (or columns or both) of same name are aggregated together.
If aggregation_map
is not NULL
, it must be a named list.
The name of each aggregation_map
item is the name of a row or column in output
that will contain the specified aggregation.
The value of each item in aggregation_map
must be a vector of names of rows or columns in a
.
The names in the value are aggregated and inserted into the output with the name of the value.
For example aggregation_map = list(new_row = c("r1", "r2"))
will aggregate rows "r1" and "r2", delete rows "r1" and "r2", and insert a new row
whose name is "new_row" and whose value is the sum of rows "r1" and "r2'.
The values in the aggregation_map
are interpreted as regular expressions, and
they are escaped using Hmisc::escapeRegex()
prior to use.
aggregation_map
should aggregate by pieces,
not by the full, original row and/or column names.
Examples
a <- matrix(c(1, 2, 3,
4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE,
dimnames = list(c("a [from b]", "c [from d]"),
c("e [from f]", "g [from h]", "i [from j]")))
a %>%
aggregate_pieces_byname(piece = "suff",
notation = RCLabels::from_notation,
aggregation_map = list(rows = c("b", "d"),
cols = c("h", "j")))
#> cols f
#> rows 16 5
m <- matrix(c(1, 0, 0,
0, 1, 1,
0, 1, 1), nrow = 3, ncol = 3, byrow = TRUE,
dimnames = list(c("Gasoline [from Oil refineries]",
"Electricity [from Main activity producer electricity plants]",
"Electricity [from Hydro]"),
c("Automobiles", "LED lamps", "CFL lamps"))) %>%
setrowtype("Product") %>% setcoltype("Industry")
mT <- transpose_byname(m)
# Aggregate the "Electricity" rows.
aggregate_pieces_byname(m, piece = "noun", margin = "Product",
notation = RCLabels::bracket_notation)
#> Automobiles LED lamps CFL lamps
#> Electricity 0 2 2
#> Gasoline 1 0 0
#> attr(,"rowtype")
#> [1] "Product"
#> attr(,"coltype")
#> [1] "Industry"
# Also works in a list.
aggregate_pieces_byname(a = list(m, mT), piece = "noun",
margin = "Product",
notation = RCLabels::bracket_notation)
#> [[1]]
#> Automobiles LED lamps CFL lamps
#> Electricity 0 2 2
#> Gasoline 1 0 0
#> attr(,"rowtype")
#> [1] "Product"
#> attr(,"coltype")
#> [1] "Industry"
#>
#> [[2]]
#> Electricity Gasoline
#> Automobiles 0 1
#> LED lamps 2 0
#> CFL lamps 2 0
#> attr(,"rowtype")
#> [1] "Industry"
#> attr(,"coltype")
#> [1] "Product"
#>
# Use an aggregation map
aggregate_pieces_byname(a = list(m, mT), piece = "noun",
margin = "Product",
aggregation_map = list(list(final = c("Electricity", "Gasoline"))),
notation = RCLabels::bracket_notation)
#> [[1]]
#> Automobiles LED lamps CFL lamps
#> final 1 2 2
#> attr(,"rowtype")
#> [1] "Product"
#> attr(,"coltype")
#> [1] "Industry"
#>
#> [[2]]
#> final
#> Automobiles 1
#> LED lamps 2
#> CFL lamps 2
#> attr(,"rowtype")
#> [1] "Industry"
#> attr(,"coltype")
#> [1] "Product"
#>
# Also works in a data frame.
df <- tibble::tibble(m = list(m, mT),
pce = "noun",
mgn = "Product",
agg_map = list(list(final = c("Electricity", "Gasoline"))),
notn = list(RCLabels::bracket_notation)) %>%
dplyr::mutate(
agg = aggregate_pieces_byname(a = m, piece = pce, margin = mgn,
aggregation_map = agg_map,
notation = notn)
)
df$agg
#> [[1]]
#> Automobiles LED lamps CFL lamps
#> final 1 2 2
#> attr(,"rowtype")
#> [1] "Product"
#> attr(,"coltype")
#> [1] "Industry"
#>
#> [[2]]
#> final
#> Automobiles 1
#> LED lamps 2
#> CFL lamps 2
#> attr(,"rowtype")
#> [1] "Industry"
#> attr(,"coltype")
#> [1] "Product"
#>
# Works when renaming to the piece results in identical row or col names.
b <- matrix(1:6, nrow = 3, ncol = 2,
dimnames = list(c("a [from b]", "c [from d]", "c [from e]"),
c("c1", "c2")))
b
#> c1 c2
#> a [from b] 1 4
#> c [from d] 2 5
#> c [from e] 3 6
# This aggregation works, because the "c" rows
# are aggregated before applying the aggregation_map,
# which, itself, does NOT aggregate the "c" rows.
b %>%
aggregate_pieces_byname(piece = "noun",
margin = 1,
inf_notation = FALSE,
notation = RCLabels::bracket_notation,
aggregation_map = list(f = c("a", "b")))
#> c1 c2
#> c 5 11
#> f 1 4