We are using pre-processed mutation files:
However, we are unable to find a perfect overlap between data in these files and all inclusive maf file:
For example, the full file (CCLE_mutations.csv) contains the following record (x = CCLE_mutations.csv):
x[which(x$DepMap_ID == "ACH-000842" & x$Hugo_Symbol == "ERBB2"), ] Hugo_Symbol Entrez_Gene_Id NCBI_Build Chromosome Start_position 376648 ERBB2 2064 37 17 37884214 End_position Strand Variant_Classification Variant_Type Reference_Allele 376648 37884214 + Nonsense_Mutation SNP G Tumor_Seq_Allele1 dbSNP_RS dbSNP_Val_Status Genome_Change 376648 T g.chr17:37884214G>T Annotation_Transcript DepMap_ID cDNA_Change Codon_Change 376648 ENST00000269571.5 ACH-000842 c.3685G>T c.(3685-3687)Gag>Tag Protein_Change isDeleterious isTCGAhotspot TCGAhsCnt isCOSMIChotspot 376648 p.E1229* True False 0 False COSMIChsCnt ExAC_AF Variant_annotation CGA_WES_AC HC_AC RD_AC 376648 0 NA damaging 9:29 167:554 RNAseq_AC SangerWES_AC WGS_AC 376648 33:118
This appears as “damaging” but its not present in the boolean file: CCLE_mutations_bool_damaging.csv
There appears to be many cases of this. It is likely we are misinterpreting the file names as in what they contain. Any help would be greatly appreciated.