Hello DepMap team,
I would like to ask what the best practice would be to extract the zygosity of mutations from the _AC columns in the CCLE_mutations.csv file. Currently, I interpret any mutation with a REF count of 0 as having a homozygous alteration, but I was wondering if there is a cutoff that is generally used/accepted?
Cheers,
Yuka
1 Like
Hi Yuka, sorry this got missed. We try to filter out known germline mutations so homozygosity is in many cases LOH of somatic mutations in our data. For this we don’t use any particular cuttoff and don’t mark them either.
Using a cutoff would be dependent on the read depth and copy number counts as well as the choice of FDR, and I don’t know of any straightforward way for this other than the conservative approach similar to yours with only including 0 REF counts. In general for the zygosity question, some arbitrary cutoff such as 0.6 may do the trick (if you knew there was no CN abberations and subclonality). To be more rigorous you can histogram all allele frequencies (AF) and find the value at the trough where 0.5 and 1.0 peaks separate. You would have to correct the AFs for copy numbers though. A lot of cases with AF=1.0 are actually LOH. Since most mutations that we report are somatic homozygosity would be somewhat unlikely among them (unless they are rescued hotspots). In the end, since we do not explicitly mark these, I’m not able to comment on best practices, so you should take my response with a grain of salt.