Dear DepMap staff,
I am looking at the prism data and have few questions:
- there are 10 cell lines that failed the str profiling. Some are contaminated and other were misidentified. Should I dismiss these 10 cell lines?
- there are unique 4518 drugs however there are 4686 drug-batch combinations. Some drug have different dosages however some have different combinations with the same dosage. For instance “amoxicillin” have 2 batch: BRD-K55044200-001-14-6::2.5::HTS and BRD-K55044200-001-15-3::2.5::HTS. The correlation between the 2 batches is 0.00727. Could please clarify this batch notion and why they are not correlated. if interested in data for the unique 4518 drugs, can I the average of drugs with different batches? Please advise.
Thank you for your help.
For the STR failed lines; yes please redact them from further analysis. Their Depmap_ID’s should be NA’ed in any case, but please feel free to reach out if you need further clarification.
For the batch id’s; the last 6 digits of the BRD id’s are the batch of the compound, e.g. in your example seems like we purchased amoxicillin twice (probably on different occasions) and included both into the screen, in most cases, both batches would be concordant but in the others, the mode of failure is one of the batches has a “stale” compound so it tends to not show much activity. So there is not a “best” way to collapse those batches, but here are some options:
- You can just collapse them by averaging, this may cause losing some information but in most cases it. shouldn’t matter much. The appeal of this method is its simplicity.
- You can choose the batch that is most correlated to the PRIMARY screen @ 2.5 uM
- You can choose the batch that has the most activity (which can be measured either in terms of variance or number of cell lines that shows viability less than a fixed threshold -we historically use 30%.
Hope this helps.