Hi,
On overview of PTBP1, it presents an module of enriched lineages, where listed P value of 3 cell lines, such as glioma. May I know how to calculate this value?
Thanks!
Hi, which tab are you seeing this in? Could you paste a snapshot of what you’re looking at? -thanks
The P value is shown in parentheses. How to calculate this value?
Thanks!
Hello,
The reported p-value is an uncorrected p-value from performing a t-test.
The two groups being compared are the gene effect of those lines annotated as that disease (ie: Kidney) vs the gene effect for all other lines. The annotations for which lines are annotated for which diseases can be found in the sample info file.
Thanks,
Phil
Hello,
Can you confirm that these lineage enrichments are still calculated via t-test of gene effect scores from “within lineage” cell lines vs. “other lineage” cell lines? I tried to reproduce this using the CRISPR_gene_effect.csv file, but am seeing different values than what is shown in the DepMap interface.
Specifically, for a given gene and lineage combination (e.g. SOX10 + skin), I would take the column from CRISPR_gene_effect.csv that corresponds to SOX10 Chronos values. I map the DepMap_ID’s to their lineages using the sample_info.csv file. I then break the SOX10 Chronos values into skin vs. all other lineages … and run a t-test comparing these groups. I get a p-value of about 5e-21 for SOX10 in skin, whereas the value shown in DepMap is 2.4e-154 (SOX10 DepMap Gene Summary). Can you advise if I am approaching this p-value calculation incorrectly?
Thanks!
Hello,
Another person also reported difficulty reproducing the values on the portal. I’m hoping to remove any ambiguity by providing code for what the portal is computing. See https://forum.depmap.org/t/re-p-value-of-enriched-lineages/2302/4 for the code that should be equivalent to what the portal is doing.
Thanks,
Phil
Thanks for the information, Phil. But the link seems to be unreachable, which shows “Oops! That page doesn’t exist or is private.” . Do you have any ideas?
Oh, I didn’t realize it but that thread turned out to be a private thread. I’m pasting the final response below:
The portal’s code is verbose, but I’ve reimplemented the section that was computing this, and it seems to get the same answer the portal shows.
Here’s some code that computes the enrichment based on files which are part of the 22Q4 depmap data release:
import os
import pandas as pd
import numpy as np
from scipy.stats import ttest_ind
ge = pd.read_csv("public-22q4/CRISPRGeneEffect.csv", index_col=0)
models = pd.read_csv("public-22q4/Model.csv")
columns = {
column_name: (models["OncotreeSubtype"] == column_name) for column_name in models["OncotreeSubtype"].unique()
}
context_matrix = pd.DataFrame(columns)
context_matrix.index=models["ModelID"]
shared_lines = list(set(ge.index).intersection(context_matrix.index))
context = context_matrix["Cutaneous Melanoma"][shared_lines]
print("in group size:", sum(context), "out group size:", sum(~context))
in_lines = context.index[context]
out_lines = context.index[~context]
t_statistic, p_value = ttest_ind(
ge.loc[in_lines,'BRAF (673)'], ge.loc[out_lines,'BRAF (673)'], nan_policy="omit"
)
print("t statistic", t_statistic, "p value:", p_value)
This results in the following output:
in group size: 8 out group size: 1070
t statistic -4.190964555422133 p value: 3.0055837095810874e-05
Let me know if you have any questions about this code, or you spot anything that is inconsistent with my earlier description.
Thanks!
Phil
Hi!
I’m checking the enriched lineages for EGFR and some of them show negative probabilities between parentheses. I don’t understand how a probability can be negative. Can someone please explain the meaning of this?
I don’t understand this myself. I’m very suspicious and I wonder if a bug was introduced in this latest release that went out last week.
I’ll ask a developer to investigate.
Thanks,
Phil
To update this thread: the 23Q2 portal update included a change which caused the numbers in the parentheses to change from p-values to the effect size (the difference between the two means).
We’ll be deploying a fix for this soon and then the values reported in parentheses will be p-values again.
Thanks,
Phil