Details on methodology of “Two Class Comparison”

Would it be possible to include some details on the two class comparison custom analysis method? I assume it’s a two sample hypothesis test with some form of multiple hypothesis correction, but the exact method that is used would be useful for reporting and reproducibility.

It’s a great tool overall though, really has sped up the process of working through a lot of datasets quickly.

1 Like

Hi Shovik,

As you guessed correctly , the two class comparison simply consists of a simple linear hypothesis test followed up a multiple hypothesis correction step.

In particular, for each feature/column of the selected dataset we are fitting a simple linear regression model to the chosen phenotype of interest. The estimated regression coefficient and its standard error then fed into the adaptive shrinkage method described in https://doi.org/10.1093/biostatistics/kxw041 to obtain moderated effect sizes (posterior mean estimates) and corresponding q-values (FDR). For the binary features, this methodology is being roughly equivalent (the only difference is the shrinkage step) to using a t-test with a pooled variance estimate.

Also, for the sake of reproducibility we keep the code used in this analysis in this github repo: https://github.com/broadinstitute/cdsr_models/blob/master/R/linear_association.R

Warmly,
Mustafa

2 Likes

I would like to confirm if the function “run_lm_stats_limma” in cdsr_models/R/linear_association.R at master · broadinstitute/cdsr_models · GitHub is still being used for two class comparison found in the latest version of Data Explorer ?