Reading through it seems that they are largely highlighting the issues which one would encounter if you took public data generated through different projects. Different datasources will likely have biases stemming from differences in protocols, and I think that definitely makes sense and is a concern.
I believe the DepMap RNA data should be less prone to biases from different protocols and processing because we are using the same protocol for all the mRNA data that we generate, and processing all RNA data with the same pipeline.
That being said, if you look back on the mRNA data that was generated in the past, that certainly has been generated over the span of many years, and there have definitely been changes to protocols over that time, so there is certainly some risk coming from those changes which occurred over time.
I can circle back to folks here to ask whether they’re aware of biases introduced and whether we’re doing any form of batch correction for different protocols.
But independent of that, I think there’s still evidence that the TPM values across samples are meaningful because we often see them correlate with dependency profiles.
Given we see this alignment across these two orthogonal datasets, that gives me some confidence in the RNA profiles across samples are capturing real biological signal.