Thanks for all your work on this awesome project!
I was wondering about agreement between the 2022Q2 “CRISPR_gene_effect.csv” and 2022Q4 “CRISPRGeneEffect.csv” results. After pivoting longer, joining by DepMap_ID and gene identifier, I get the following scatter plots for Q2 vs Q4 results.
Is this roughly the agreement between the Q2 vs Q4 results that would be expected? Would the outliers (i.e. where Q2 and Q4 greatly disagree) caused by changes to Chronos described in the Q4 release notes.
Thanks for the help.
It’s difficult to set an expectation for how large changes should be when we change Chronos, but these are not outside what we expected. Overall, the Pearson correlation of the raveled gene effects is 0.976. A monocolor scatter plot is deceptive when there are over 10 million points, as is the case here; it makes the agreement look lower than it is.
The most extreme outliers (the upper left) are due to rare occasions when Chronos assigns one cell line for an otherwise common essential gene to have positive gene effect. The origins of this annoying effect aren’t clear but it is probably due to a competition between regularization penalties. The problem has been substantially reduced in 22Q4, which is why you see many more outliers in the upper left than the lower right. It should be reduced again in 23Q2, although still not eliminated.