Hi,
We are using pre-processed mutation files:
CCLE_mutations_bool_damaging.csv
CCLE_mutations_bool_nonconserving.csv
However, we are unable to find a perfect overlap between data in these files and all inclusive maf file:
CCLE_mutations.csv
For example, the full file (CCLE_mutations.csv) contains the following record (x = CCLE_mutations.csv):
x[which(x$DepMap_ID == "ACH-000842" & x$Hugo_Symbol == "ERBB2"), ]
Hugo_Symbol Entrez_Gene_Id NCBI_Build Chromosome Start_position
376648 ERBB2 2064 37 17 37884214
End_position Strand Variant_Classification Variant_Type Reference_Allele
376648 37884214 + Nonsense_Mutation SNP G
Tumor_Seq_Allele1 dbSNP_RS dbSNP_Val_Status Genome_Change
376648 T g.chr17:37884214G>T
Annotation_Transcript DepMap_ID cDNA_Change Codon_Change
376648 ENST00000269571.5 ACH-000842 c.3685G>T c.(3685-3687)Gag>Tag
Protein_Change isDeleterious isTCGAhotspot TCGAhsCnt isCOSMIChotspot
376648 p.E1229* True False 0 False
COSMIChsCnt ExAC_AF Variant_annotation CGA_WES_AC HC_AC RD_AC
376648 0 NA damaging 9:29 167:554
RNAseq_AC SangerWES_AC WGS_AC
376648 33:118
This appears as “damaging” but its not present in the boolean file: CCLE_mutations_bool_damaging.csv
There appears to be many cases of this. It is likely we are misinterpreting the file names as in what they contain. Any help would be greatly appreciated.
Thank you,
Syed