Determination of 0/1/2 for multi-variant genes in OmicsSomaticMutations MatrixDamaging

How are the 0/1/2 values calculated per-gene in the OmicsSomaticMutationsMatrixDamaging.csv file in the situation where there are multiple variants in the same gene in the same cell-line?

For example, in the RKO cell-line (ACH-000943) there are two variants in RGS22. Neither variant alone is homozygous, but in the OmicsSomaticMutationsMatrixDamaging.csv file, its listed as ‘2’ which would imply a homozygous mutation.

How are multiple mutations being aggregated to determine the 0/1/2 calls at the per-gene level?


When there are multiple variants in the same gene, the allele frequencies from these variants are summed and if this sum is greater than 0.95, a value of 2 is assigned in this matrix. You are correct that 2 doesn’t always mean that there is a homozygous mutation, but that it is more likely to be a complete loss-of-function mutation than 1.

Thanks for pointing this out. We will update the data description in the coming releases.