Annotation version for each release

Hi everyone,

I have been looking through different GitHub pages and different parts of the website, but I couldn’t locate a page for the annotation used for each release. Is there a page where the GENCODE used for each pipeline can be found?

Hi,

At the moment, different Ensembl/Gencode versions are being used in different components of our dataset. We are currently in the process of unifying them across our various released files, but as of 24Q4, here are the annotation sources corresponding to each dataset:

For OmicsCNGene, we used Ensembl BioMart version nov2020 to map genomic coordinates to gene symbols and Entrez IDs.

For gene expression, we used Gencode v38 to create references for RSEM. To generate the gene-level matrices (OmicsExpressionProteinCodingGenesTPMLogp1 and OmicsExpressionAllGenesTPMLogp1Profile), we mapped the ENSG IDs to gene symbols and Entrez IDs using BioMart version nov2020.

As for mutation, we used VEP version 110 and Gencode v44 for variant annotation (please see https://storage.googleapis.com/shared-portal-files/Tools/24Q4_Mutation_Pipeline_Documentation.pdf for detailed documentation on the mutation pipeline). Entrez IDs were then mapped from ENSG IDs using BioMart version nov2020.

Everything on the CRISPR side (CRISPRGeneEffect, CRISPRInferredGuideEfficacy, AvanaLogfoldChange, KYLogfoldChange, HumagneLogfoldChange) uses CCDS release 24: CCDS Release 24 - NCBI Insights.

Hope this helps,
Simone

Hi Simone,

Thank you very much! May I ask which pipeline generates fusion detection output as well? I am trying to match my analysis with a collaborator who is building a model using depmap data. That will be very helpful.

A unified version would be nice! Thanks for working on it!

Many thanks,
Kent

Hi Kent,

The current fusion pipeline uses Gencode v33. You can see our index files for detail: gs://ccleparams/references/GRCh38_gencode_v33_CTAT_lib_Apr062020.plug-n-play/ctat_genome_lib_build_dir/*

Best,
Simone

1 Like

Thank you Simone!