For some targets like MERTK and CLDN18 the front end shows ‘Haematopoietic And Lymphoid’ as enriched lineage for CRISPR and RNAi data.
Considering the lineage enrichment is calculated from the lineage columns ‘OncotreeLineage’, ’ OncotreePrimaryDisease’ and ‘OncotreeSubtype’, this term is absent in these columns in the cell line data file ‘Model.csv’, but present in a column ‘SampleCollectionSite’.
My question is which all columns are considered and how is ‘Haematopoietic And Lymphoid’ considered in the lineage enrichment calculation?
Yes, that is a little inconsistent and something that will be improved in the 25Q3 update of the portal (currently scheduled for sometime around September).
Basically, long ago, we saw repeatedly that there’s a large effect difference between solid and suspended cell lines and so we defined two contexts specifically for these. I think they were called “Haematopoietic And Lymphoid” (lines annotated as such) and “Solid” (everything not labeled haematopoietic or lymphoid)
This predates our use of oncotree classifications and when we introduced oncotree, we used the OncotreeLineage field to still define these two contexts so they could generated for the portal to use.
As a result, these get context enrichments computed for them just like the other values of OncotreePrimaryDisease and OncotreeSubtype.
In the 25Q3 update, we’re planning on moving to a more consistent system in which it’s completely transparent how we defined the contexts (they will all be defined in a downloadable file called “SubtypeTree.csv” (largely based on oncotree but with a few additions for models which do not fit within oncotree’s classification system, such as non-cancerous models) and “SubtypeMatrix.csv” ( a mapping of which model is a member of which context).
This will hopefully address any confusion about what the exact classifications were used for presentation on the portal.
Thanks,
Phil