Best practices for interpreting copy number values in CN WGS data 25Q3

I’m currently working with the 25Q3 WGS copy number data and am noticing some very high/extreme copy number values. Is there recommended guidance on how these extremes should be interpreted and/or dealt with.

I’m also want to do quantitative calling of amplifications and deletions. What are the recommended best practices for defining amp/del calls (for example, suggested thresholds)? One challenge I’m running into is that the copy number distributions vary widely across cell lines, so a single global threshold seems questionable. Is there a recommended way to handle this variability?

Hi,

In general, we don’t have specific threshold recommendations for determining amp/del using relative copy number data since it can be very context-dependent, but perhaps the discussion in this thread could be helpful.

The variability of copy numbers across cell lines is expected. If there are extremes that look concerning, do you mind sharing some specific examples so we can look into it?

Thanks,

Simone

Here are the distributions of log2 copy-number values for cell lines ACH-000017 and ACH-000022. These include examples of extremely high CN values (I can share more if helpful). Interpreting log2 values like 15 or 9 as copy number (2^15 or 2^9) doesn’t seem plausible to me. I assumed these are artifacts.

If these plots were generated using OmicsCNGeneWGS.csv, the values are in fact linear (min is 0 instead of negative).

Simone

Would you be able to provide/point to the documentation or a release where it mentions that the OmicsCNGeneWGS.csv file is now in linear scale?

I found this specific release that mentions that the values are no longer log2 transformed Announcing the 24Q2 Release that mentions “to be consistent with absolute copy number data, the relative copy number matrix is no longer log2 transformed.” ; however, it would be helpful for confirmation.