P value of enriched lineages

ljjjj · April 9, 2021, 1:24pm

Hi,
On overview of PTBP1, it presents an module of enriched lineages, where listed P value of 3 cell lines, such as glioma. May I know how to calculate this value?
Thanks!

jnoorbak · April 9, 2021, 3:48pm

Hi, which tab are you seeing this in? Could you paste a snapshot of what you’re looking at? -thanks

ljjjj · April 10, 2021, 3:16am

The P value is shown in parentheses. How to calculate this value?
Thanks!

pmontgom · April 20, 2021, 6:35pm

Hello,

The reported p-value is an uncorrected p-value from performing a t-test.

The two groups being compared are the gene effect of those lines annotated as that disease (ie: Kidney) vs the gene effect for all other lines. The annotations for which lines are annotated for which diseases can be found in the sample info file.

Thanks,
Phil

plenehan · February 7, 2022, 2:53am

Hello,

Can you confirm that these lineage enrichments are still calculated via t-test of gene effect scores from “within lineage” cell lines vs. “other lineage” cell lines? I tried to reproduce this using the CRISPR_gene_effect.csv file, but am seeing different values than what is shown in the DepMap interface.

Specifically, for a given gene and lineage combination (e.g. SOX10 + skin), I would take the column from CRISPR_gene_effect.csv that corresponds to SOX10 Chronos values. I map the DepMap_ID’s to their lineages using the sample_info.csv file. I then break the SOX10 Chronos values into skin vs. all other lineages … and run a t-test comparing these groups. I get a p-value of about 5e-21 for SOX10 in skin, whereas the value shown in DepMap is 2.4e-154 (SOX10 DepMap Gene Summary). Can you advise if I am approaching this p-value calculation incorrectly?

Thanks!

pmontgom · February 24, 2023, 6:50pm

Hello,

Another person also reported difficulty reproducing the values on the portal. I’m hoping to remove any ambiguity by providing code for what the portal is computing. See https://forum.depmap.org/t/re-p-value-of-enriched-lineages/2302/4 for the code that should be equivalent to what the portal is doing.

Thanks,
Phil

Meng_Liu · March 2, 2023, 2:24pm

Thanks for the information, Phil. But the link seems to be unreachable, which shows “Oops! That page doesn’t exist or is private.” . Do you have any ideas?

pmontgom · March 2, 2023, 2:45pm

Oh, I didn’t realize it but that thread turned out to be a private thread. I’m pasting the final response below:

The portal’s code is verbose, but I’ve reimplemented the section that was computing this, and it seems to get the same answer the portal shows.

Here’s some code that computes the enrichment based on files which are part of the 22Q4 depmap data release:

import os
import pandas as pd
import numpy as np
from scipy.stats import ttest_ind

ge = pd.read_csv("public-22q4/CRISPRGeneEffect.csv", index_col=0)
models = pd.read_csv("public-22q4/Model.csv")

columns = {
    column_name: (models["OncotreeSubtype"] == column_name) for column_name in models["OncotreeSubtype"].unique()
}

context_matrix = pd.DataFrame(columns)
context_matrix.index=models["ModelID"]

shared_lines = list(set(ge.index).intersection(context_matrix.index))
context = context_matrix["Cutaneous Melanoma"][shared_lines]

print("in group size:", sum(context), "out group size:", sum(~context))
in_lines = context.index[context]
out_lines = context.index[~context]

t_statistic, p_value = ttest_ind(
        ge.loc[in_lines,'BRAF (673)'], ge.loc[out_lines,'BRAF (673)'], nan_policy="omit"
    )

print("t statistic", t_statistic, "p value:", p_value)

This results in the following output:

in group size: 8 out group size: 1070
t statistic -4.190964555422133 p value: 3.0055837095810874e-05

Let me know if you have any questions about this code, or you spot anything that is inconsistent with my earlier description.

Thanks!
Phil

Mariavi · May 15, 2023, 6:55pm

Hi!
I’m checking the enriched lineages for EGFR and some of them show negative probabilities between parentheses. I don’t understand how a probability can be negative. Can someone please explain the meaning of this?

pmontgom · May 15, 2023, 7:22pm

I don’t understand this myself. I’m very suspicious and I wonder if a bug was introduced in this latest release that went out last week.

I’ll ask a developer to investigate.

Thanks,
Phil

pmontgom · May 17, 2023, 6:18pm

To update this thread: the 23Q2 portal update included a change which caused the numbers in the parentheses to change from p-values to the effect size (the difference between the two means).

We’ll be deploying a fix for this soon and then the values reported in parentheses will be p-values again.

Thanks,
Phil

Topic		Replies	Views
How to interpret enriched lineage? Q&A	0	424	September 22, 2023
P of enriched lineages Q&A	0	275	February 18, 2023
Dependencies enriched files/ t-statistic and p-value columns Q&A	1	640	February 8, 2023
Context dependencies enriched in.... how analysed? Q&A	2	592	February 12, 2024
urgent help needed Q&A	3	174	June 12, 2024

P value of enriched lineages

Related topics