Introduction
The *apply_byname()
family of functions are used
internally by many functions in matsbyname
. These functions
(unaryapply_byname()
, binaryapply_byname()
,
naryapply_byname()
) allow additional arguments, besides the
matrices that are being transformed, which are passed via
.FUNdots
. Getting .FUNdots
right can be
challenging, and this vignette provides some background and explanation
to those challenges. After reading this vignette, both callers of these
functions and programmers who use these functions will be in a better
position to take advantage of the various *apply_byname()
functions.
An example
To see how the .FUNdots
argument works, let’s make an
example function that takes advantage of
unaryapply_byname()
. mysum()
adds entries in
matrix a
, the result depending upon the value of
margin
.
mysum <- function(a, margin = c(1, 2)) {
sum_func <- function(a_mat, margin) {
# When we get here, we will have a single matrix a
if (1 %in% margin & 2 %in% margin) {
return(sum(a_mat))
}
if (margin == 1) {
return(rowSums(a_mat) %>% matrix(nrow = nrow(a_mat)))
}
if (margin == 2) {
return(colSums(a_mat) %>% matrix(ncol = ncol(a_mat)))
}
}
unaryapply_byname(sum_func, a, .FUNdots = list(margin = margin))
}
Structuring mysum()
as shown above provides several
interesting capabilities. First, mysum()
works with single
matrices.
m <- matrix(1:4, nrow = 2, byrow = TRUE)
m
#> [,1] [,2]
#> [1,] 1 2
#> [2,] 3 4
# Works for single matrices
mysum(m, margin = 1)
#> [,1]
#> [1,] 3
#> [2,] 7
mysum(m, margin = 2)
#> [,1] [,2]
#> [1,] 4 6
mysum(m, margin = c(1, 2))
#> [1] 10
Second, mysum()
works with lists.
# Works for lists of matrices
mysum(list(one = m, two = m), margin = 1)
#> $one
#> [,1]
#> [1,] 3
#> [2,] 7
#>
#> $two
#> [,1]
#> [1,] 3
#> [2,] 7
mysum(list(one = m, two = m), margin = 2)
#> $one
#> [,1] [,2]
#> [1,] 4 6
#>
#> $two
#> [,1] [,2]
#> [1,] 4 6
Finally, mysum()
works within data frames.
# Works in data frames and tibbles
DF <- tibble::tibble(mcol = list(m, m, m))
res <- DF %>%
dplyr::mutate(
rsums = mysum(mcol, margin = 1),
csums = mysum(mcol, margin = 2)
)
res$rsums
#> [[1]]
#> [,1]
#> [1,] 3
#> [2,] 7
#>
#> [[2]]
#> [,1]
#> [1,] 3
#> [2,] 7
#>
#> [[3]]
#> [,1]
#> [1,] 3
#> [2,] 7
res$csums
#> [[1]]
#> [,1] [,2]
#> [1,] 4 6
#>
#> [[2]]
#> [,1] [,2]
#> [1,] 4 6
#>
#> [[3]]
#> [,1] [,2]
#> [1,] 4 6
The problem
In the above examples, margin
was only 1
or
2
, not c(1, 2)
for the list and data frame
examples. Let’s see what happens when margin = c(1, 2)
and
a
is a list.
tryCatch(mysum(list(m, m, m), margin = c(1, 2)),
error = function(e) {strwrap(e, width = 60)})
#> [1] "Error: In prepare_.FUNdots(), when 'a' is a list, but an"
#> [2] "entry in '.FUNdots' is not a list, every top-level argument"
#> [3] "in .FUNdots must be NULL or have length = 1 or length ="
#> [4] "length(a) (= 3). Found length = 2 for argument 'margin',"
#> [5] "which is not a list. Consider converting argument 'margin'"
#> [6] "into a list of length 1."
To understand better what is happening, let’s try when the list
argument to mysum()
has length 2
.
mysum(list(m, m), margin = c(1, 2))
#> [[1]]
#> [,1]
#> [1,] 3
#> [2,] 7
#>
#> [[2]]
#> [,1] [,2]
#> [1,] 4 6
margin = c(1, 2)
is interpreted by
unaryapply_byname()
as “use margin = 1
for the
first m
in the list, and use margin = 2
for
the second m
in the list.
Now we see why passing margin = c(1, 2)
failed for a
list of length 3
(list(m, m, m)
):
unaryapply_byname()
applied margin = 1
for the
first m
, margin = 2
for the second
m
. But the third m
had no margin available for
it.
We also see why passing margin = c(1, 2)
was successful
for a list of length 2
(list(m, m)
):
unaryapply_byname()
applied margin = 1
for the
first m
, margin = 2
for the second
m
. In that case, we had exactly as many items in margin (2)
as items in the list passed to a
(2).
Now that we understand the problem, how do we fix it? In other words,
how can we apply margin = c(1, 2)
to all entries in
list(m, m, m)
? Or, better yet, how could we apply
margin = 1
to the first m
,
margin = 2
to the second m
, and
margin = c(1, 2)
to the third m
?
The fix
Fixes to the problem identified above can be provided by either the caller or (partially) by the programmer in three ways:
- wrap vector arguments in a list (a caller fix),
- use the
prep_vector_arg()
function (a programmer fix), and - use a data frame (a caller fix).
Wrap vector arguments in a list
If the caller is more specific, flexibility is gained. By wrapping
c(1, 2)
in a list()
, the caller indicates
“Take this margin (c(1, 2)
), replicate it as many times as
we have items in our a
list, using one c(1, 2)
for each item in the list.”
mysum(list(m, m, m), margin = list(c(1, 2)))
#> [[1]]
#> [1] 10
#>
#> [[2]]
#> [1] 10
#>
#> [[3]]
#> [1] 10
The caller can also supply different margin
s for each
item in the list of matrices.
mysum(list(m, m, m), margin = list(1, 2, c(1, 2)))
#> [[1]]
#> [,1]
#> [1,] 3
#> [2,] 7
#>
#> [[2]]
#> [,1] [,2]
#> [1,] 4 6
#>
#> [[3]]
#> [1] 10
But the caller must provide either 1
or
length(a)
items in the margin
argument, else
an error is emitted.
tryCatch(mysum(list(m, m, m), margin = list(1, 2)),
error = function(e) {strwrap(e, width = 60)})
#> [1] "Error: In prepare_.FUNdots(), when both 'a' and '.FUNdots'"
#> [2] "are lists, each top-level argument in .FUNdots must have"
#> [3] "length = 1 or length = length(a) (= 3). Found length = 2"
#> [4] "for argument 'margin', which is a list. Consider wrapping"
#> [5] "argument 'margin' in a list()."
Use the prep_vector_arg()
function
To the extent possible, programmers should remove burdens on users of
their functions. So, it would be helpful if there were a way to
automatically wrap vector arguments in lists. To that end,
matsbyname
includes the prep_vector_arg()
function. prep_vector_arg()
uses heuristics to wrap vector
arguments in lists, if needed and when possible.
mysum2()
demonstrates the use of
prep_vector_arg()
.
mysum2 <- function(a, margin = c(1, 2)) {
margin <- prep_vector_arg(a, margin)
sum_func <- function(a_mat, margin) {
# When we get here, we will have a single matrix a
if (1 %in% margin & 2 %in% margin) {
return(sum(a_mat))
}
if (margin == 1) {
return(rowSums(a_mat) %>% matrix(nrow = nrow(a_mat)))
}
if (margin == 2) {
return(colSums(a_mat) %>% matrix(ncol = ncol(a_mat)))
}
}
unaryapply_byname(sum_func, a, .FUNdots = list(margin = margin))
}
If
- argument
a
is a list, - the vector argument (in this case
margin
) is not a list, and - the vector argument’s length is greater than 1 but not equal to the length of a,
then prep_vector_arg()
wraps the vector argument (in
this case margin
) in a list()
, thereby
relieving the caller of having to remember to make it into a
list
.
mysum2(list(m, m, m), margin = c(1, 2))
#> [[1]]
#> [1] 10
#>
#> [[2]]
#> [1] 10
#>
#> [[3]]
#> [1] 10
Note that if the length of the vector argument is equal to the length
of the list in a
, the caller’s intention is ambiguous, and
the vector argument is passed without modification.
mysum2(list(m, m), margin = c(1, 2))
#> [[1]]
#> [,1]
#> [1,] 3
#> [2,] 7
#>
#> [[2]]
#> [,1] [,2]
#> [1,] 4 6
If the caller wants c(1, 2)
to be applied to each item
in the a
list, the caller must wrap c(1, 2)
in
a list.
Use a data frame
The reason prep_vector_arg()
cannot always wrap
vector arguments in a list is that data frame columns are extracted as
vectors when they are atomic.
DF2 <- tibble::tibble(mcol = list(m, m), margin = c(1, 2))
DF2
#> # A tibble: 2 × 2
#> mcol margin
#> <list> <dbl>
#> 1 <int [2 × 2]> 1
#> 2 <int [2 × 2]> 2
DF2$margin %>% class()
#> [1] "numeric"
It would be a mistake to wrap DF2$margin
in a
list()
for the following call to mysum2
,
because the caller’s intent is clearly “apply margin = 1
for the first row and margin = 2
for the second row.
res2 <- DF2 %>%
dplyr::mutate(
sums = mysum2(mcol, margin = margin)
)
res2$sums
#> [[1]]
#> [,1]
#> [1,] 3
#> [2,] 7
#>
#> [[2]]
#> [,1] [,2]
#> [1,] 4 6
The good news is that within the context of a data frame, the caller’s intent is unambiguous.
Summary
Dealing with vector arguments to the various
*apply_byname()
functions can be tricky. But there are
three ways to solve any problems that arise:
- wrap vector arguments in a list (a caller fix),
- use the
prep_vector_arg()
function (a programmer fix), and - use a data frame (a caller fix).
This vignette illustrated all three fixes.