Discrepancies in mutational status from Cell line Selector and DepMap mutations tab for cancer cell lines

Dear Community,

I hope my message reaches everyone healthy and safe !! I was not sure if I should post my question/issue here, but probably due to the discrepancy of my initial stratification results, I decided to post it here:

briefly, just before some weeks ago, based on a current project, I utilized the cell line selector, to download the available colorectal cancer cell lines, along with the respective mutational status I can add amongst the search (file attached) as also how I added the mutation information (attached plot)


;

CellLineSelector.CRC.DepMap.AddMuts.csv (7.9 KB)

My ultimate rationale, is to stratify the cell lines based on the mutational status of specific genes like: KRAS mutated, BRAF mutated and RAS_RAF_WT (without any additional NRAS/HRAS/RAF1 muts); as our ultimate goal is to identify any specific dependencies in each subgroup, to compare and utilize with our “in-house” patient data;

On this premise, while then I was checking directly the DepMap portal, querying specific cell lines, I found various important discrepancies:

For example in cell line CL11: CL11 DepMap Cell Line Summary in the portal, when querying the mutations tab, for KRAS it returns only “other-non conserving” mutations, whereas from Cell line selector, KRAS has a hotspot value;

This is unfortunately evident for other cell lines, with noticeable differences;

Thus, my critical questions are the following:

  1. Should indeed be differences between the cell line selector and the actual DepMap portal? Or my rationale is incorrect and I should downloaded differently the cancer cell lines of interest along the mutational data?

  2. If my notion is correct, I should only then use the DepMap portal and query each colorectal cell line separately? to check the mutational status in order to stratify accordingly? And the only information to keep from the CellLine selector is just the column lineage2 if the respective cell line is colorectal by the column ?

  3. Finally, one other very crucial question regarding the interpretation of mutations: as in my project we are focusing on protein coding mutations, which types of mutations should I keep as most “important”? Only hotspot and damaging?
    In addition, the label “Other” does not mean a presence of mutation for this particular gene, correct? And other-non conserving has a different translation?

Thank you in advance :slight_smile:

Efstathios

Hi Efstathios,

At this point I believe the cell line selector only indicates a simple binary representation of mutation status (which is a 1 if there is any non-silent mutation), and doesn’t give information on whether the mutation is a hotspot, damaging, etc. Can you elaborate on how you’re seeing a hotspot value for KRAS in the cell line selector? When looking at the cell line summary, it appears that cell line has two KRAS mutations, one of which (Q61H) is a hotspot. We have plans to improve the functionality of selecting by more specific mutations in cell line selector.

For now, if you want to filter cell lines by more specific mutation criteria, I’d suggest downloading the mutation data and doing this manually.

In terms of which mutations are important, I’d say that’s a question researchers should decide on themselves based on the specific application. Hotspot mutations are often more likely to be functionally relevant, and similarly damaging mutations should generally correspond to a LoF effect, but if you want to parse out specific functional effects of variants it might be best to define that in your analysis.

Dear @jkmak,

thank you very much for your direct and detailed answer !!
Initially, to specifically elaborate regarding your question for the cell line selector and the mutation information, you can check from above my attached csv file with name CellLineSelector.CRC.DepMap.AddMuts.csv (7.9 KB)

Briefly, from the Cell line Selector portal, in the upper right section with name " Add a data column", I selected Cell line Metadata → Mutation → Gene and added the respective mutation information for my 4 genes of interest;

However, inside the portal, if you scroll down for the specific cell line mentioned, you can also see from the attached image, that indeed has 2 mutations on KRAS, but not with the abbreviation hotspot in the column Variant Annotation; rather than other-non conserving:

Or I understand something wrong here and you fetch differently this information?

In addition, regarding your second part of mutational importance: as highlighted, our interest is to stratify any available colorectal cancer cell lines, into 3 distinct categories, based on the mutation prevalence of protein-coding genes; thus, I would like to keep mutations with a confirmed role like hotspot and/or damaging, as it would be very vital to accurately define which of these cell lines are actually “WT”;

Thus, in your opinion as I saw that also " Other non-conserving" can be missense, which has an impact on amino-acid change, I could also include them further?
And for my most robust stratification, I can directly download from the DepMap Public 22Q2 Primary Files the CCLE_mutations maf file, and based on the sample info keep the available colorectal cancer cell lines, along with the necessary mutational information for further stratification?

Thank you for your consideration;

Efstathios

Thanks Efstatios,

I see the confusion now. There are actually two places to specify mutations in the cell line selector tool (one via selecting “Gene” at first in the dropdown).

When you look in the cell line page at these mutations, our definition of which mutations are ‘hotspot’ does not appear explicitly in the ‘variant annotation’ column, but is based on a threshold on either the TCGA or COSMIC frequency (so the p.Q61H mutation, with 56 instances in TCGA would qualify as a hotspot). We’ll work on making this information more explicit in the cell line selector and cell line page tools.

For now, in your case I would indeed recommend downloading the CCLE_mutations maf file and using that information directly to define the genotypes of interest and find cell lines you want.

Dear @pmontgom,

thank you for your verification, and apologies for the delayed response; thus, I should indeed use the CCLE_mutations maf file, and based on the sample information and mutational status to fetch the cell lines of interest;

One very important additional comment on this direction that I would like to mention:

Is there also any MSI information that accompanies the DepMap cell lines? From the respective publication in CCLE, I found the following:

https://www.nature.com/articles/s41586-019-1186-3#MOESM10

and in the supplementary table 7: CCLE.MSI.call : MSI call in CCLE dataset

Thus, you think that as the supplementary table also includes DepMap IDs, thank I could connect this information with the respective features from DepMap/CCLE?

Best,

Efstathios

Yes, those DepMap IDs should allow you to connect those MSI calls with the other features from DepMap/CCLE.

Thanks,
Phil

Dear @pmontgom,

thank you very much for your verification !!