Defining "deep" deletions and amplifications

Hello,

I had previously used the CN calls from CCLE and earlier versions of DepMap (i.e. 18Q), where the scores were centered around 0. In those data, it was common to define “deep” deletions as < -1.28, while amplifications were set as > +0.75.

Given that the CN scores are now instead “relative copy number” (What is relative copy number/copy number ratio?), are there are any general recommendations for defining a “deep” deletion or amplification, analogous to previous versions of the data?

Thanks, and appreciate this awesome resource!
Ryan

I should add that I am aware of the “+,-,0” scoring in the segment-level CN data, but I’m finding that this three-level scheme is very generous in defining a deletion or amplification (e.g., regions would previously have been defined as het loss or gain, rather than del/amp).

Thanks!

Hi. Did you find the answer to this question?

Hi,

No official answer from the DepMap team as far as I know, but I saw that Mina et al, 2020 (https://pubmed.ncbi.nlm.nih.gov/32989323/) used the following cutoffs for CN:
Amp: CN > 2^0.75
Del: CN < 2^-1.2
where CN = 1 means diploid.

Last I checked, the CN values from DepMap are expressed as log2(CN+1) = x, so to get the “CN” from the DepMap values, you’ll first need to transform with 2^(x) - 1.

Hope that helps – again, this is not an official answer from the DepMap team, so take it with a grain of salt.

1 Like

Thanks! I have asked Dr. Marco Mina about this question before and here was his reply (hope this helps to other users):

For our purposes, we wish to classify each gene level CNV the same way GISTIC and cBioportal do, that is in 5 categories (deep deletion, het loss, diploid, gain and amplified). if you had CNV data in real linear scale y , you would expect:

  • a diploid gene to have CNV level of y=2 (2 copies)

  • a homozygous loss (thsat is, a deep deletion in cBioportal jargon) to have y=0 (0 copies)

  • a gain to have roughly 3 copies (y=3)

  • an amp to have >= 4 copies (y>4)

Now, considering that there is some noise in the estimation of the CNV, we set the following thresholds on scale y: [deep del < 0.87 < het loss < 1.32 < diploid < 2.64 < gain < 3.36 < amp]

  • The transformation CCLE applied to derive their log-scaled CNV values ( x ) was:

x = log2(y/2)

Following such formula, the threshold we have to apply are: (deep del < ­-1.2 < het loss < ­-0.6 < diploid < 0.4 < gain < 0.75 < amp).

*Indeed, you can see that log2(0.87/2) = -1.2 , log2(1.32/2) = -0.599 … and so on.

I wonder if these threshold can only be applied to the cell line of diploid. So according to Marco’s reply, I think the threshold for defining the categorical CN status should be:
deep del < log2(0.87/ploidy+1) < het loss < log2(1.32/ploidy+1) < diploid < log2(2.64/ploidy+1) < gain < log2(3.36/ploidy+1) < amp

Thanks for commenting on this thread. Unfortunately we don’t have a specific threshold recommendation for this.

One possible option beside what has been suggested here can be using the thresholds used by TCGA here. The values seem to be log2(ratios) and the upper and lower values are at -0.3 and 0.3 for gain, loss and neutral.

We have internally used the following as well but I cannot comment on its reliability for a general use case:
“Copy number calls are used to identify focal deletions, deep deletions, and gene amplifications. All these calculations start with segment level relative copy number from the CCLE dataset. A gene is considered to be focally deleted if any of its exons have a copy number of less than 0.2. The weighted copy number of the exons is also calculated and if this is less than 0.4, the gene is considered to have undergone deep deletion. A gene is considered amplified if its weighted copy number is greater than 3.”

please note that our gene level copy number is log2(CN ratio + 1) whereas the segment level copy number is approximately CN ratio.

Hi. Are the weighted copy number and the corresponding thresholds (0.2, 0.4) you mentioned here in log2(Ratio + 1) or log2(Ratio)? What does the weighted CN mean here? That is, can these two values be applied to the CCLE CNV data without any transformation?

This is still experimental so please take it with a grain of salt (we may update some of the thresholds and make a post). But these are relative copy numbers weighted by genomic ranges. Values are from the segment file which approximately gives CN ratios (not log transformed). The segment file from depmap portal can be directly used for this.