No positive values of effect size reported when using "custom analysis" tool


Let me start by saying how great the data explorer is, and how great the custom analysis tool is. I observed occasionally that when using the “custom analysis” tool within the data explorer that none of the results (or very, very few) have positive values. I would expect a much larger number to have somewhat positive values just by chance. I finally have a concrete example I can share that is sufficiently anonymous to reproduce the result. Here’s a screenshot of the results:

Here’s the downloaded table of results:
depmap_download.csv (14.7 KB)

Note there are only 155 results, and the display (by default) is set to show 1000 rows, so it is not that we are truncating the positive results.

I ran this using two class comparison, GDSC2 dataset, in and out group of cell lines are below.

I ran this calculation myself using python/pandas/scipy (stats.ttest_ind, effect size is difference of means between the groups):

(I can share code if you want). I definitely also observe asymmetry in effect size to stronger negative than positive, but there are many points that do have positive results (which is what I would expect by chance).

Here is the “in” set of cell lines used in the analysis:

Here are the out set of cell lines used in the analysis:

Here’s a zoomed in screenshot showing that there is just 1 point that is slightly positive:

Here’s another example - I’m using the same set of cell lines as above, using the 2 class comparison, and using the expression dataset.

Here is the table of results:
depmap_download_expression.csv (1.4 MB)

I find this example even more striking, it is really hard for me to understand how for a given set of cell lines all the genes expression could be more lowly* expressed than another group of cell lines. To be more precise, gene expression of in set is less than or equal to gene expression of out set.

Image of zoomed in plot near zero showing no genes have positive effect size:

Image of table when sorting from highest to lowest showing highest / most positive value is still less than 0:

I have in the past run this calculation myself in python/pandas/scipy and observed a much more symmetric distribution (I don’t have those results to hand but can find them and/or code if you want).