Defining "deep" deletions and amplifications

Thanks! I have asked Dr. Marco Mina about this question before and here was his reply (hope this helps to other users):

For our purposes, we wish to classify each gene level CNV the same way GISTIC and cBioportal do, that is in 5 categories (deep deletion, het loss, diploid, gain and amplified). if you had CNV data in real linear scale y , you would expect:

  • a diploid gene to have CNV level of y=2 (2 copies)

  • a homozygous loss (thsat is, a deep deletion in cBioportal jargon) to have y=0 (0 copies)

  • a gain to have roughly 3 copies (y=3)

  • an amp to have >= 4 copies (y>4)

Now, considering that there is some noise in the estimation of the CNV, we set the following thresholds on scale y: [deep del < 0.87 < het loss < 1.32 < diploid < 2.64 < gain < 3.36 < amp]

  • The transformation CCLE applied to derive their log-scaled CNV values ( x ) was:

x = log2(y/2)

Following such formula, the threshold we have to apply are: (deep del < ­-1.2 < het loss < ­-0.6 < diploid < 0.4 < gain < 0.75 < amp).

*Indeed, you can see that log2(0.87/2) = -1.2 , log2(1.32/2) = -0.599 … and so on.

I wonder if these threshold can only be applied to the cell line of diploid. So according to Marco’s reply, I think the threshold for defining the categorical CN status should be:
deep del < log2(0.87/ploidy+1) < het loss < log2(1.32/ploidy+1) < diploid < log2(2.64/ploidy+1) < gain < log2(3.36/ploidy+1) < amp

1 Like