Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

measures of fit and their standard names #44

Open
corybrunson opened this issue Aug 13, 2022 · 1 comment
Open

measures of fit and their standard names #44

corybrunson opened this issue Aug 13, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@corybrunson
Copy link
Owner

corybrunson commented Aug 13, 2022

background

Gower &al (2011) detail several measures of fit for biplots, most prominently

  • the quality of the $r$-dimensional biplot, measured as the proportion of variance in the plot, calculated as the quotient of the traces of $\Lambda_r = {D_r}^2$ and of $\Lambda = D^2$.
  • the adequacy of the representation of the $j$-th row (respectively, column) in the $r$-dimensional biplot, calculated as the $j$-th diagonal element of $U_r\ {U_r}^\top$ (respectively, $V_r\ {V_r}^\top$), understood as the fidelity of the projections of the standard coordinates.
  • the predictivity of the $j$-th row (respectively, column) in the $r$-dimensional biplot, measured as the quotient of the $j$-th diagonal elements of $U_r\ \Lambda_r\ {U_r}^\top$ and of $U\ \Lambda\ U^\top$ (respectively, of $V_r\ \Lambda_r\ {V_r}^\top$ and of $V\ \Lambda\ V^\top$), understood as the fidelity of the projections of the principal coordinates.

These can be calculated directly from any SVD or EVD and interpreted for any technique based on them. In some cases they may also be calculated for supplementary elements.

suggestions

  1. A new .quality column, calculated as cumsum(.prop_var), could be added to the output of tidy.tbl_ord().
  2. The 1- or 2-dimensional adequacy and predictivity could be computed for all wrapped classes by augment_ord(), possibly via an option measures_of_fit = TRUE. (It would not be appropriate to annotate a tbl_ord with all $n \times k$ or $p \times k$ adequacies or predictivities.)
  3. Adequacy and predictivity for a specific value of $r$ could be computed in a mutate_*() call, e.g. mutate_rows(ord, fit_std = adequacy(dimension = 2L)), where adequacy() knows and is able to recover the necessary model components (cf. computing node and edge properties in tidygraph).

The value of (1) is, i think, self-evident. Probably only one of (2) and (3) would be appropriate, and i lean toward (3). Either would be valuable both (a) for downstream analysis of rows and columns and (b) as aesthetic mappings in biplots (e.g. to increase marker/vector opacity with predictivity/adequacy).

implementation

(2) and (3) would be supported by new recovery generics, possibly for the matrices of standard and of principal coordinates. (3) would probably require registration of the underlying model object within the wrapper, as in tidygraph.

@corybrunson corybrunson added the enhancement New feature or request label Aug 13, 2022
@corybrunson
Copy link
Owner Author

Item (1) is done in fd71caf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant