Using summarise in matsbyname
Source:vignettes/using-summarise-in-matsbyname.Rmd
using-summarise-in-matsbyname.Rmd
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(matsbyname)
library(tibble)
Introduction
matsbyname
functions in which operands are specified in
a ...
argument are ambiguous when applied to a data frame.
But there is an argument (.summarise
) that signals
intention, allowing the ambiguous functions to be used flexibly with
data frames.
“Normal” functions
For normal functions, such as +
and mean()
,
there is no ambiguity about their operation in a data frame.
df <- tibble::tribble(~x, ~y, ~z,
1, 2, 3,
4, 5, 6)
# Typically, operations are done across rows.
df %>%
dplyr::mutate(
a = x + y + z,
b = rowMeans(.)
)
#> # A tibble: 2 × 5
#> x y z a b
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 3 6 2
#> 2 4 5 6 15 5
To perform the same operations down columns, use
dplyr::summarise()
.
matsbyname::sum_byname()
What does matsbyname::sum_byname()
mean for a data
frame? Will it give sums across rows (as +
), or will it
give sums down columns (as summarise()
)? This ambiguity is
present for all *_byname()
functions in which operands are
specified via the ...
argument, including
matrixproduct_byname()
,
hadamardproduct_byname()
, mean_byname()
,
etc.
To resolve the ambiguity, use the .summarise
argument.
The default value of .summarise
is FALSE
,
meaning that the functions normally operate across rows. If you want to
perform the action down columns, set .summarise = TRUE
.
df %>%
dplyr::mutate(
a = sum_byname(x, y, z),
b = mean_byname(x, y, z)
)
#> # A tibble: 2 × 5
#> x y z a b
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 3 6 2
#> 2 4 5 6 15 5
df %>%
dplyr::summarise(
x = sum_byname(x, .summarise = TRUE) %>% unlist(),
y = sum_byname(y, .summarise = TRUE) %>% unlist(),
z = sum_byname(z, .summarise = TRUE) %>% unlist()
)
#> # A tibble: 1 × 3
#> x y z
#> <dbl> <dbl> <dbl>
#> 1 5 7 9
Summary
The .summarise
argument broadens the range of
applicability for many matsbyname
functions, especially
when used with data frames. The default is
.summarise = FALSE
, meaning that operations will be
performed across columns. Set .summarise = TRUE
argument to
signal intent to perform operations down a column.