I was looking at the raw data from RNAi experiements and tried to calculate the pearson’s correlation of each gene to my gene of interest, but when I look at the top 100 on my list it is very different from the top 100 co-dependencies showed in the DepMap page. (I ordered my list with absolute value as it was done for the list on the webpage)
I cross-checked whether it was because the non-coding RNA were filtered out, genes with few overlapping data points (cell lines in which both genes are tested), or it has a high standard error, but none of them seems to be a solid criterion that would do the filtering. Does anyone know how these genes shown in the webpage is filtered?
I check again. There are actually only 84 entries in the top 100 co-dependencies in my gene of interest which is MYC. And there are just 16 genes with a single gene id with higher correlation with MYC than the last entry in the top 100 list provided on the website. The filtering seems to be whether the entry has a unique ID, in case anyone needs to know.
I’m not sure I follow what you’re describing. I tried to reproduce this, and downloading the CRISPR top 100 co-dependencies of MYC, I got a list of 100 genes, but for the RNAi I only got 91.
Are you looking at the RNAi dataset in particular?
I’ll need to investigate to confirm, but at least some point, we were reporting gene effects for genes that we couldn’t disambiguate (because the genes shared the sequence that was used for the guides). It’s been a while since I looked at the RNAi dataset, but if that’s still the case, I believe the portal filters those out when we load the data into the portal. (They exist in the raw data you can download, but it was unclear how to handle them in various places in the portal so we don’t load data for those.)
Yes I am looking at the RNAi dataset only. The list I have is only 84 gene long. And the last of the list is FCN2. If I just do the calculation with the raw data, there are 111 genes (or 112 depending on if you include MYC) above FCN2. Excluding MYC and the 11 genes with two IDs (assuming they are non-specific guide), there are 16 genes not on the list, which is exactly the number of genes the original list short of.
I guess you have seen something similar?