DepMap cell lines are not all independent

The assumption that observations are independent lies at the heart of every statistical test that I am familiar with, yet many of the cancer cell lines whose data is published on the DepMap portal are not independent, such as cells derived from a parent cell line after selecting for drug resistance.

It is important for DepMap to clearly highlight this detail for those who wish to use this data.

Additionally, it would also help for DepMap to publish the unfiltered genetic variants for each cell line, including both somatic and germline single nucleotide variants, which would allow users to accurately identify related cell lines. Alternatively, if patient privacy is an issue, publishing a table summarizing the overlap of germline and somatic variants between cell lines, for example using the Jaccard index, could also be very useful for users.

I’ve shared a table containing the Jaccard index derived from the inferred somatic mutations in each cell line here for people to use to identify related cell lines, based on this MAF file (CCLE_mutations.csv – I lost the version number :sweat_smile:) . Although, using only somatic variants will make it difficult or impossible to accurately identify related cell lines if they have low SNV count.