Interpretation of mutation data files

SyedHaider · December 8, 2021, 2:15pm

Hi,

We are using pre-processed mutation files:

CCLE_mutations_bool_damaging.csv
CCLE_mutations_bool_nonconserving.csv

However, we are unable to find a perfect overlap between data in these files and all inclusive maf file:
CCLE_mutations.csv

For example, the full file (CCLE_mutations.csv) contains the following record (x = CCLE_mutations.csv):

x[which(x$DepMap_ID == "ACH-000842" & x$Hugo_Symbol == "ERBB2"), ]

       Hugo_Symbol Entrez_Gene_Id NCBI_Build Chromosome Start_position
376648       ERBB2           2064         37         17       37884214
       End_position Strand Variant_Classification Variant_Type Reference_Allele
376648     37884214      +      Nonsense_Mutation          SNP                G
       Tumor_Seq_Allele1 dbSNP_RS dbSNP_Val_Status       Genome_Change
376648                 T                           g.chr17:37884214G>T
       Annotation_Transcript  DepMap_ID cDNA_Change         Codon_Change
376648     ENST00000269571.5 ACH-000842   c.3685G>T c.(3685-3687)Gag>Tag
       Protein_Change isDeleterious isTCGAhotspot TCGAhsCnt isCOSMIChotspot
376648       p.E1229*          True         False         0           False
       COSMIChsCnt ExAC_AF Variant_annotation CGA_WES_AC HC_AC   RD_AC
376648           0      NA           damaging             9:29 167:554
       RNAseq_AC SangerWES_AC WGS_AC
376648    33:118

This appears as “damaging” but its not present in the boolean file: CCLE_mutations_bool_damaging.csv

There appears to be many cases of this. It is likely we are misinterpreting the file names as in what they contain. Any help would be greatly appreciated.

Thank you,
Syed

jnoorbak · December 8, 2021, 8:12pm

Hi. I suspect that many of these mutations are low allele frequency cases similar to the example that you have shared. In the boolean matrices we drop any mutation with allele frequency below 0.25.

SyedHaider · December 9, 2021, 1:52pm

Thank you very much for your answer.

Topic		Replies	Views
Looking for VCF files of CCLE mutation data Q&A data	5	797	February 12, 2021
Inconsistent Annotation_Transcript in CCLE_mutations file across different versions Issues and Bugs omics , data	1	393	December 3, 2020
Mutation file, column Tumor_Seq_Allele2 Q&A data	2	353	July 22, 2021
CCLE_mutation.csv formated Issues and Bugs data	3	634	December 15, 2021
Splice_Site Annotations in CCLE mutations Q&A data	4	804	February 25, 2021

Interpretation of mutation data files

Related topics