`vignettes/midf_apply_primer.Rmd`

`midf_apply_primer.Rmd`

`matsindf_apply`

is a powerful and versatile function that enables analysis of data frames by applying `FUN`

in helpful ways. The function is called `matsindf_apply`

, because it can be used to apply `FUN`

to a `matsindf`

data frame, a data frame that contains matrices as individual entries in a data frame. (A `matsindf`

data frame can be created by calling `collapse_to_matrices`

, as demonstrated below.)

But `matsindf_apply`

can apply `FUN`

across much more: data frames of single numbers, lists of matrices, lists of single numbers, and individual numbers. This vignette demonstrates `matsindf_apply`

, starting with simple examples and proceeding to sophisticated analyses.

The basis of all analyses conducted with `matsindf_apply`

is a function (`FUN`

) to be applied across data. `FUN`

must return a named list of variables, its result. Here is an example function that both adds and subtracts is arguments, `a`

and `b`

and returns a list containing its result, `c`

and `d`

.

```
example_fun <- function(a, b){
return(list(c = sum_byname(a, b), d = difference_byname(a, b)))
}
```

Similar to `lapply`

and its siblings, additional argument(s) to `matsindf_apply`

include the data over which `FUN`

is to be applied. These arguments can, in the first instance, be supplied as named arguments to the `...`

argument of `matsindf_apply`

. The `...`

arguments to `matsindf_apply`

are passed to `FUN`

according to their names. In this case, the output of `matsindf_apply`

is the the named list returned by `FUN`

.

```
matsindf_apply(FUN = example_fun, a = 2, b = 1)
#> $c
#> [1] 3
#>
#> $d
#> [1] 1
```

Passing an additional argument (`z = 2`

) causes the familiar `unused argument`

error, because `example_fun`

does not have a `z`

argument.

```
tryCatch(
matsindf_apply(FUN = example_fun, a = 2, b = 1, z = 2),
error = function(e){e}
)
#> <simpleError in FUN(...): unused argument (z = 2)>
```

Failing to pass a needed argument (`b = 1`

) causes the familiar `argument X is missing`

error, because `example_fun`

requires a value for `b`

.

```
tryCatch(
matsindf_apply(FUN = example_fun, a = 2),
error = function(e){e}
)
#> <simpleError in sum_byname(a, b): argument "b" is missing, with no default>
```

(If `example_fun`

tolerated a missing argument, no such error would be created.)

Alternatively, arguments to `FUN`

can be given in a named list to the first argument to `matsindf_apply`

(`.dat`

). When a value is assigned to `.dat`

, the return value from `matsindf_apply`

contains all named variables in `.dat`

(in this case both `a`

and `b`

) in addition to the results provided by `FUN`

(in this case both `c`

and `d`

).

```
matsindf_apply(list(a = 2, b = 1), FUN = example_fun)
#> $a
#> [1] 2
#>
#> $b
#> [1] 1
#>
#> $c
#> [1] 3
#>
#> $d
#> [1] 1
```

Extra variables are tolerated in `.dat`

, because `.dat`

is considered to be a store of data from which variables can be drawn as needed.

```
matsindf_apply(list(a = 2, b = 1, z = 42), FUN = example_fun)
#> $a
#> [1] 2
#>
#> $b
#> [1] 1
#>
#> $z
#> [1] 42
#>
#> $c
#> [1] 3
#>
#> $d
#> [1] 1
```

In contrast, named arguments to `...`

are specified by the user, so including an extra variable is considered an error, as shown above.

If a named argument is supplied by both `.dat`

and `...`

, the argument in `...`

takes precedence, overriding the argument in `.dat`

.

```
matsindf_apply(list(a = 2, b = 1), FUN = example_fun, a = 10)
#> $a
#> [1] 10
#>
#> $b
#> [1] 1
#>
#> $c
#> [1] 11
#>
#> $d
#> [1] 9
```

When supplying **both** `.dat`

and `...`

, `...`

can contain named strings which are interpreted as mappings from item names in `.dat`

to arguments in the signature of `FUN`

. In the example below, `a = "z"`

indicates that argument `a`

to `FUN`

should be supplied by item `z`

in `.dat`

.

```
matsindf_apply(list(a = 2, b = 1, z = 42),
FUN = example_fun, a = "z")
#> $a
#> [1] 2
#>
#> $b
#> [1] 1
#>
#> $z
#> [1] 42
#>
#> $c
#> [1] 43
#>
#> $d
#> [1] 41
```

If a named argument appears in both `.dat`

and the output of `FUN`

, a name collision occurs in the output of `matsindf_apply`

, and a warning is issued.

```
tryCatch(
matsindf_apply(list(a = 2, b = 1, c = 42), FUN = example_fun),
warning = function(w){w}
)
#> <simpleWarning in matsindf_apply(list(a = 2, b = 1, c = 42), FUN = example_fun): name collision in matsindf_apply: c>
```

`.dat`

can be a list (as shown in several examples above), but it can also be a data frame.

```
df <- data.frame(a = 2:4, b = 1:3)
matsindf_apply(df, FUN = example_fun)
#> a b c d
#> 1 2 1 3 1
#> 2 3 2 5 1
#> 3 4 3 7 1
```

Furthermore, `matsindf_apply`

works with a `matsindf`

data frame, a data frame wherein each entry in the data frame is a matrix. To demonstrate use of `matsindf_apply`

with a data frame, we’ll construct a simple `matsindf`

data frame (`midf`

) using functions in this package.

```
# Create a tidy data frame containing data for matrices
tidy <- data.frame(Year = rep(c(rep(2017, 4), rep(2018, 4)), 2),
matnames = c(rep("U", 8), rep("V", 8)),
matvals = c(1:4, 11:14, 21:24, 31:34),
rownames = c(rep(c(rep("p1", 2), rep("p2", 2)), 2),
rep(c(rep("i1", 2), rep("i2", 2)), 2)),
colnames = c(rep(c("i1", "i2"), 4),
rep(c("p1", "p2"), 4))) %>%
mutate(
rowtypes = case_when(
matnames == "U" ~ "product",
matnames == "V" ~ "industry",
TRUE ~ NA_character_
),
coltypes = case_when(
matnames == "U" ~ "industry",
matnames == "V" ~ "product",
TRUE ~ NA_character_
)
)
tidy
#> Year matnames matvals rownames colnames rowtypes coltypes
#> 1 2017 U 1 p1 i1 product industry
#> 2 2017 U 2 p1 i2 product industry
#> 3 2017 U 3 p2 i1 product industry
#> 4 2017 U 4 p2 i2 product industry
#> 5 2018 U 11 p1 i1 product industry
#> 6 2018 U 12 p1 i2 product industry
#> 7 2018 U 13 p2 i1 product industry
#> 8 2018 U 14 p2 i2 product industry
#> 9 2017 V 21 i1 p1 industry product
#> 10 2017 V 22 i1 p2 industry product
#> 11 2017 V 23 i2 p1 industry product
#> 12 2017 V 24 i2 p2 industry product
#> 13 2018 V 31 i1 p1 industry product
#> 14 2018 V 32 i1 p2 industry product
#> 15 2018 V 33 i2 p1 industry product
#> 16 2018 V 34 i2 p2 industry product
# Convert to a matsindf data frame
midf <- tidy %>%
group_by(Year, matnames) %>%
collapse_to_matrices(rowtypes = "rowtypes", coltypes = "coltypes") %>%
spread(key = "matnames", value = "matvals")
# Take a look at the midf data frame and some of the matrices it contains.
midf
#> Year U V
#> 1 2017 1, 3, 2, 4 21, 23, 22, 24
#> 2 2018 11, 13, 12, 14 31, 33, 32, 34
midf$U[[1]]
#> i1 i2
#> p1 1 2
#> p2 3 4
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"
midf$V[[1]]
#> p1 p2
#> i1 21 22
#> i2 23 24
#> attr(,"rowtype")
#> [1] "industry"
#> attr(,"coltype")
#> [1] "product"
```

With `midf`

in hand, we can demonstrate use of `tidyverse`

-style functional programming to perform matrix algebra within a data frame. The functions of the `matsbyname`

package (such as `difference_byname`

below) can be used for this purpose.

```
result <- midf %>%
mutate(
W = difference_byname(transpose_byname(V), U)
)
result
#> Year U V W
#> 1 2017 1, 3, 2, 4 21, 23, 22, 24 20, 19, 21, 20
#> 2 2018 11, 13, 12, 14 31, 33, 32, 34 20, 19, 21, 20
result$W[[1]]
#> i1 i2
#> p1 20 21
#> p2 19 20
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"
result$W[[2]]
#> i1 i2
#> p1 20 21
#> p2 19 20
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"
```

This way of performing matrix calculations works equally well within a 2-row `matsindf`

data frame (as shown above) or a 1000-row `matsindf`

data frame.

Users can write their own functions using `matsindf_apply`

. A flexible `calc_W`

function can be written as follows.

```
calc_W <- function(.DF = NULL, U = "U", V = "V", W = "W"){
# The inner function does all the work.
W_func <- function(U_mat, V_mat){
# When we get here, U_mat and V_mat will be single matrices or single numbers,
# not a column in a data frame or an item in a list.
# Calculate W_mat from the inputs U_mat and V_mat.
W_mat <- difference_byname(transpose_byname(V_mat), U_mat)
# Return a named list.
list(W_mat) %>% magrittr::set_names(W)
}
# The body of the main function consists of a call to matsindf_apply
# that specifies the inner function
matsindf_apply(.DF, FUN = W_func, U_mat = U, V_mat = V)
}
```

This style of writing `matsindf_apply`

functions is incredibly versatile, leveraging the capabilities of both the `matsindf`

and `matsbyname`

packages. (Indeed, the `Recca`

package uses `matsindf_apply`

heavily and is built upon the functions in the `matsindf`

and `matsbyname`

packages.)

Functions written like `calc_W`

can operate in ways similar to `matsindf_apply`

itself. To demonstrate, we’ll use `calc_W`

in all the ways that `matsindf_apply`

can be used, going in the reverse order to our demonstration of the capabilities of `matsindf_apply`

above.

`calc_W`

can be used as a specialized `mutate`

function that operates on `matsindf`

data frames.

```
midf %>% calc_W()
#> Year U V W
#> 1 2017 1, 3, 2, 4 21, 23, 22, 24 20, 19, 21, 20
#> 2 2018 11, 13, 12, 14 31, 33, 32, 34 20, 19, 21, 20
```

The added column could be given a different name from the default (“`W`

”) using the `W`

argument.

```
midf %>% calc_W(W = "W_prime")
#> Year U V W_prime
#> 1 2017 1, 3, 2, 4 21, 23, 22, 24 20, 19, 21, 20
#> 2 2018 11, 13, 12, 14 31, 33, 32, 34 20, 19, 21, 20
```

As with `matsindf_apply`

, column names in `midf`

can be mapped to the arguments of `calc_W`

by the arguments to `calc_W`

.

```
midf %>%
rename(X = U, Y = V) %>%
calc_W(U = "X", V = "Y")
#> Year X Y W
#> 1 2017 1, 3, 2, 4 21, 23, 22, 24 20, 19, 21, 20
#> 2 2018 11, 13, 12, 14 31, 33, 32, 34 20, 19, 21, 20
```

`calc_W`

can operate on lists of single matrices, too. This approach works, because the default values for the `U`

and `V`

arguments to `calc_W`

are “`U`

” and “`V`

,” respectively. The input list members (in this case `midf$U[[1]]`

and `midf$V[[1]]`

) are returned with the output.

```
calc_W(list(U = midf$U[[1]], V = midf$V[[1]]))
#> $U
#> i1 i2
#> p1 1 2
#> p2 3 4
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"
#>
#> $V
#> p1 p2
#> i1 21 22
#> i2 23 24
#> attr(,"rowtype")
#> [1] "industry"
#> attr(,"coltype")
#> [1] "product"
#>
#> $W
#> i1 i2
#> p1 20 21
#> p2 19 20
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"
```

It may be clearer to name the arguments as required by the `calc_W`

function without wrapping in a list first, as shown below. But in this approach, the input matrices are not returned with the output.

```
calc_W(U = midf$U[[1]], V = midf$V[[1]])
#> $W
#> i1 i2
#> p1 20 21
#> p2 19 20
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"
```

`calc_W`

can operate on data frames containing single numbers.

```
data.frame(U = c(1, 2), V = c(3, 4)) %>% calc_W()
#> U V W.1 W.2
#> 1 1 3 2 2
#> 2 2 4 2 2
```

Finally, `calc_W`

can be applied to single numbers, and the result is 1x1 matrix.

```
calc_W(U = 2, V = 3)
#> $W
#> [,1]
#> [1,] 1
```

This vignette demonstrated use of the versatile `matsindf_apply`

function. Inputs to `matsindf_apply`

can be

- single numbers,
- matrices, or
- data frames with appropriately-named columns.

`matsindf_apply`

can be used for programming, and functions constructed as demonstrated above share characteristics with `matsindf_apply`

:

- they can be used as specialized
`mutate`

operators, and - they can be applied to single numbers, matrices, or data frames with appropriately-named columns.