I am looking at features like Aneuploidy, WGD etc in the OmicsSignatures.csv files - between 24Q2 and 24Q4 version. There are some cell lines that show discordant values and some that were available before but are now missing?
What could be the reason for this discordance? More prominent for Aneuploidy and Ploidy features.
Below I am pasting the output of WGD concordance. See that there were 42 cell lines that had WGD status but are now missing? In this case can we impute them from 24Q2 information - or is that now considered incorrect?
This discrepancy is due to our ongoing “gapfilling” effort where we do WGS on models we previously only had WES for. Ploidy, LoHFraction, WGD, CIN, and Aneuploidy are all downstream of PureCN, which can give slightly different results for the same model’s WGS and WES data, resulting in the change you see on the model level.
As for the new NA’s, PureCN does fail to provide ploidy solutions without manual curation on a small subset of models, and it is expected that for some models, PureCN may fail on the newly generated WGS but not WES. If you wish to view these features on the profile level where predictions from both WGS and WES are included, please refer to OmicsSignaturesProfile.