Fix to na.rm in decostand methods alr/clr #667

antagomir · 2024-06-30T21:57:39Z

This fixes #661

The na.rm option has been enabled in decostand methods "alr" and "clr".

Some notes:

"clr": With na.rm=TRUE, the NA entries in original data are replaced by 0 in the clr-transformed data; here zero is an arbitrary choice but no better solutions are available afaik, and the same approach is used in compositions::clr so at least we are in sync. With na.rm=FALSE, the entries that are NA in the original data will be also NAs in the clr-transformed data; the NAs are otherwise ignored in the calculations during the transformation. In the alternative implementation, compositions::alr NAs are always replaced with 0 (no na.rm option available). As far as I can see it is feasible to provide both options for na.rm for the "clr" method, we are providing that now.
"alr": similarly done for decostand "alr" transformation, user can choose whether to keep the NAs (na.rm=FALSE) or replace them by 0 (na.rm=TRUE). In comparison, I noticed that in the alternative implementation, compositions::alr NAs are always kept as NAs (no na.rm option available). As far as I can see it is feasible to provide both options for na.rm for the "alr" method, we are providing that now.
The "rclr" method will be handled separately in Matrix completion for robust CLR in decostand #619

@TuomasBorman can you test and confirm that these are as you expected?

TuomasBorman

I think now works as expected. See the comments.

R/decostand.R

antagomir · 2024-07-01T08:35:19Z

Great, fixed the remarks.

antagomir · 2024-11-17T14:37:06Z

This PR implements updated versions for:

Robust CLR decostand(..., method="rclr")
Robust Aitchison distance vegdist(..., method="robust.aitchison")

This PR is now updated with the following:

Earlier separate PR Matrix completion for robust CLR in decostand #619 has been merged here, taking into account the feedback there (I hope the formatting issues are now ok, those seemed to be the main remarks)
Approach is now synced with the Python implementation
Handling of missing values has been improved.

One major point that I wanted to check before giving the very final touch on this PR: can we export the new function OptSpace? This is not only used to calculate the robust.aitchison distance in vegdist but it is also more widely applicable and this would make the methods directly comparable with their equivalents in Python/gemelli, supporting cross-testing etc.

The OptSpace function is used for matrix completion after the rclr transformation. The rclr transformation usually results in some NA values in the transformed matrix, and the matrix completion is commonly done separately, after this step. The reason for separating the rclr transformation and the matrix completion is that the matrix completion returns other (SVD-related) outputs that are useful in further downstream analyses (e.g. robust Aitchison distance, robust PCA).

The new OptSpace function is a modified version of ROptSpace::OptSpace function, with modifications to better fit ecological data and the parallel Python implementations. The original ROptSpace implementation has MIT license and we have cited it properly here. MIT license is compatible with the GPL2 license of vegan and can be included this way.

So I would like to confirm if it is OK to export the OptSpace function? If OK, I can make the final checks - otherwise I would hide or remove it before declaring the PR complete.

antagomir · 2024-11-22T08:01:34Z

Some minor updates added.

This PR accommodates all previous feedback.

It is ready and preliminary tests against the Python/Gemelli implementation have been done and the results are aligned.

If possible, we hope to keep OptSpace as an exported function as it has further use (e.g. robust PCA).

cameronmartino · 2024-11-25T23:49:06Z

Hi,

I am the author of the method being added here & the python implementation (diecode now moved to gemelli). Greatly appreciate @antagomir adding this implementation in R! I have been in some conversation with him on this offline. To help push the PR along, I checked the implementation in this PR and the results are now the same as the python implementation (as pointed out in issue #570 they did not match before). Simple test example below. Hope this helps get this PR through, there has been the demand in R but I didn't have the bandwidth to tackle this myself.

First, grab some (small) real data as an example

wget https://github.com/biocore/gemelli/raw/refs/heads/master/ipynb/tutorials/qiime2-moving-pictures-tutorial/table.biom
# export to tsv for R
biom convert -i table.biom -o table.tsv --to-tsv
sed -i '' 1d table.tsv

Generate the distances and ordination in R (using this PR branch)

library(devtools)
install_github("antagomir/vegan", ref = "nafix")
library(vegan)
# load table
table <- read.table("table.tsv", header = T, comment = '', check = F, sep = '\t', row.names = 1)
table <- t(as.matrix(table))
rclr_table <- vegan::decostand(table, method="rclr")
# calculate the distance
distance = vegan::vegdist(rclr_table, method = "euclidean")
write.table(as.matrix(distance), file = "R-rpca-distances.tsv", row.names=rownames(table), col.names=rownames(table), sep="\t")
# save to test RPCA
svd.res <- svd(rclr_table)
write.table(svd.res$u, file = "R-rpca-sample-loadings.tsv", row.names=rownames(table), sep="\t")
write.table(svd.res$d, file = "R-rpca-eig-vals.tsv", row.names=FALSE, sep="\t")
write.table(svd.res$v, file = "R-rpca-feature-loadings.tsv", row.names=colnames(table), sep="\t")

Generate the distances and ordination in python.

import numpy as np
import pandas as pd
from biom import load_table
from gemelli.rpca import rpca
from gemelli import __version__
print('Version: %s' % __version__)

# load table
bt = load_table('table.biom')
# run RPCA (this runs RCLR internally)
py_ordination, py_distance = rpca(bt)
# extract RPCA results from skbio type
py_rpca_U = py_ordination.samples.values
py_rpca_V = py_ordination.features.values
py_rpca_S = py_ordination.eigvals.values
py_dist = py_distance.to_data_frame().values
# check the OptSpace results
R_rpca_U = pd.read_csv('R-rpca-sample-loadings.tsv', sep='\t', index_col=None).values[:, :3]
R_rpca_S = pd.read_csv('R-rpca-eig-vals.tsv', sep='\t', index_col=None).values[:3]
R_rpca_V = pd.read_csv('R-rpca-feature-loadings.tsv', sep='\t', index_col=None).values[:, :3]
R_rpca_dist = pd.read_csv('R-rpca-distances.tsv', sep='\t', index_col=None).values
# check
print(' Sample loadings match: %s \n' % str(abs((np.abs(R_rpca_U) - np.abs(py_rpca_U)).sum()) < 1e-6),
        'Feature loadings match: %s \n' % str(abs((np.abs(R_rpca_V) - np.abs(py_rpca_V)).sum()) < 1e-6),
        'Singular values match: %s \n' % str(abs((np.abs(R_rpca_S) - np.abs(py_rpca_S)).sum()) < 1e-6),
        'Distances match: %s \n' % str(abs((R_rpca_dist - R_rpca_dist).sum()) < 1e-6))

Version: 0.0.12
 Sample loadings match: True 
 Feature loadings match: True 
 Singular values match: True 
 Distances match: True

antagomir added 9 commits February 3, 2024 16:37

Added rCLR matrix completion

a28cd54

matrix completion documentation

2fa6867

Remove testthat as dep

a04ca36

sync

f7e44bc

Added na.rm option in clr

17dbf8f

Add unit tests for na.rm option in clr

eed5811

Add na.rm option in alr and update unit tests

a728925

rclr tunin

55852fa

Fixed the tests

ba5c70f

antagomir mentioned this pull request Jun 30, 2024

Matrix completion for robust CLR in decostand #619

Closed

removed pkg reference in tests

0f635da

antagomir mentioned this pull request Jun 30, 2024

CLR: na.rm is not working #661

Open

Load vegan library in tests

5a4f4f6

TuomasBorman approved these changes Jul 1, 2024

View reviewed changes

R/decostand.R Outdated Show resolved Hide resolved

R/decostand.R Show resolved Hide resolved

R/decostand.R Outdated Show resolved Hide resolved

R/decostand.R Outdated Show resolved Hide resolved

R/decostand.R Outdated Show resolved Hide resolved

Fixed TB comments

a175b3a

antagomir added 7 commits November 16, 2024 16:27

Merge remote-tracking branch 'upstream/master' into nafix

f55babe

Syncing the implementation with Python Gemelli

dcebf00

Exporting OptSpac

f3f52a2

unit tests added

449955b

Exporting OptSpace in NAMESPACE

4a2b84d

Passing tests and examples

78daefe

Adding remarks from PR 619

f73e773

antagomir added 6 commits November 21, 2024 22:43

adding imputation in rclr

676d71c

no need for na.rm in rclr

dda74fe

syncing the parts

a4afb55

Merge remote-tracking branch 'upstream/master' into nafix

438f6de

updates for the documentation

10255e3

Added matrix completion to the OptSpace output

d96c92a

cameronmartino mentioned this pull request Nov 25, 2024

Extracting the Python code of RobustPCA (rpca.py) biocore/gemelli#106

Closed

antagomir mentioned this pull request Nov 28, 2024

Adding matrix completion in rclr/rAitchison #570

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix to na.rm in decostand methods alr/clr #667

Fix to na.rm in decostand methods alr/clr #667

antagomir commented Jun 30, 2024

TuomasBorman left a comment

antagomir commented Jul 1, 2024

antagomir commented Nov 17, 2024 •

edited

Loading

antagomir commented Nov 22, 2024

cameronmartino commented Nov 25, 2024

Fix to na.rm in decostand methods alr/clr #667

Are you sure you want to change the base?

Fix to na.rm in decostand methods alr/clr #667

Conversation

antagomir commented Jun 30, 2024

TuomasBorman left a comment

Choose a reason for hiding this comment

antagomir commented Jul 1, 2024

antagomir commented Nov 17, 2024 • edited Loading

antagomir commented Nov 22, 2024

cameronmartino commented Nov 25, 2024

antagomir commented Nov 17, 2024 •

edited

Loading