Hello
Let me start by saying how great the data explorer is, and how great the custom analysis tool is. I observed occasionally that when using the “custom analysis” tool within the data explorer that none of the results (or very, very few) have positive values. I would expect a much larger number to have somewhat positive values just by chance. I finally have a concrete example I can share that is sufficiently anonymous to reproduce the result. Here’s a screenshot of the results:
Here’s the downloaded table of results:
depmap_download.csv (14.7 KB)
Note there are only 155 results, and the display (by default) is set to show 1000 rows, so it is not that we are truncating the positive results.
I ran this using two class comparison, GDSC2 dataset, in and out group of cell lines are below.
I ran this calculation myself using python/pandas/scipy (stats.ttest_ind, effect size is difference of means between the groups):
(I can share code if you want). I definitely also observe asymmetry in effect size to stronger negative than positive, but there are many points that do have positive results (which is what I would expect by chance).
Here is the “in” set of cell lines used in the analysis:
ACH-000080,ACH-001037,ACH-000198,ACH-000487,ACH-000299,ACH-000263,ACH-000498,ACH-002262,ACH-002273,ACH-000362,ACH-000369,ACH-000045,ACH-002290,ACH-000168,ACH-000113,ACH-000336,ACH-001613,ACH-000218,ACH-000034,ACH-000195,ACH-001656
Here are the out set of cell lines used in the analysis:
ACH-000557,ACH-001036,ACH-000081,ACH-000004,ACH-000005,ACH-000002,ACH-000166,ACH-000386,ACH-000065,ACH-000751,ACH-001618,ACH-000770,ACH-000373,ACH-000387,ACH-000146
Extra:
Here’s a zoomed in screenshot showing that there is just 1 point that is slightly positive: