Calculation of frequency mutated genes

Hi. I was trying to calculate the frequency of mutated genes from CCLE_mutations.csv file of 22Q1 release, then ranking genes by frequency.

But I was really confused by the data, since the dataset seems to be specific to mutation sites, I don’t know what is the right way to combine those together.

I wonder if someone could explain a little bit about it.
Or if there are other databases could provide such information?

Thank you for your time,
Best regards,

Hello CXL,

The data is aggregated over all available sequencing types for a given sample. Some samples have more sequencing types than others. (also, note that we are only releasing somatic coding mutations).

So a simple method would be to just to take any available mutations in our dataset un-regarding of the sequencing type.

Whatever happens your analysis will be biased by the fact that different samples have different set of sequencing, each covering more or less well a specific set of genes.

Hope this helps.


Thank you so much for your reply. That really helps.

And I also have a question about ALT:REF. If I understand correctly, that ratio is the number of mutation allele by normal/reference allele, right?
So when I calculate mutation freqency of a gene, should I sum number of total ALT and REF of all entries of a gene for calculation, or I should just sum the number of entries of mutations for a gene? Which way is more reasonable and unbiased?

And when REF=0, does that really means no REF allele found at such place in sequencing?

Best regards.

Yes this is right. and ref 0 really means no reads found with that mutation.

I think both metrics represent different things and it depends on your underlying question and the point you want to make. But from what you are saying I would be inclined in computing mutation frequency as the sum of all mutations that have a high enough allele frequency.


I understand. Your help is precious. Thank for providing this helpful opinion!
Best regard,