Find "differentially" significant pertubagens that sensitize selectively different cancer cell lines

Dear DepMap Community,

briefly, based on a collaborative cancer project, we have performed some computational modeling with 3 distinct breast cancer cell lines, and some proteomics experiments, to evaluate the effect of specific pertubagens on selected signaling pathways. Based on our initial results, our ultimate goal is investigate different pertubagens and drugs that act selectively on these three cell lines, and investigate downstream any available pathways and relative gene targets.

On this premise, initially we selected MCF7, T47D, and MDA-MB-231 were in the public data, and we downloaded the dataset from one such screen- this repository: on the left for “PRISM Repurposing 19Q4” drug screens and in detail the secondary-screen-replicate-collapsed-logfold-change.csv

Initially, we filtered for a fold-change <0.3 in MCF7 and T47D + a fold-change of >0.3 in MDA-MB-231 + a dose <0.5 (that higher doses render the compounds non-specific ?). This brings a list of <1000 hits (this list is redundant with respect to compounds because some match the filtering criteria with several doses). We also did the opposite (>0.3 in MCF7 and T47D + <0.3 in MDA-MB-231 + dose <0.5) to get the second list.

My crucial questions are the following:

  1. Regarding the actual fold-change cutoff-you would agree with the above approach, or a different filtering criterion should be applied in the respective fold changes, to find different drugs that act selectively/sensitize the different cell lines ? Or in contrast, the more negative the value, the more sensitive ? And for example fold changes > 0 might indicate that the treated cells are grew more than the control cells, which could be considered as an artifact ? and thus different number criteria should be applied ?

  2. Is there a way to also identify putative gene targets for each respective cell line and/or selected drug ? Could in parallel the genetic dependencies utilized for this goal ? For example the combined RNAi scores ? and/or the Achilles gene effect ? Alternatively, the genetic dependencies could be used to identify the “most important” genes for each cell line, and then through a functional enrichment analysis identify any pathways that are “preferentially” important for each cell line ?

(*Thank you in advance for your time and patience, and please excuse me for any naïve questions, as it is the first time to utilize the portal)

Dear Efstathios-Iason,

  1. More than the exact cut-offs used, I would be concerned about the number of cell lines used to define the filter. These screens usually suffer from considerable noise which necessitates using a much larger sample size. For example, one could look for compounds that preferentially kill a set of 10 cell liens more than another set of 10. You can run this type of Two class comparison using the Custom Analyses tool of the portal’s Data Explorer

  2. One thing you could look at is the top preferentially essential genes for each cell line that are listed on the cell line page (e.g., MCF7’s). To compute pathway enrichment for the top preferential dependencies you’d want to look at more than the top 10 that are currently displayed on the portal though. You can compute the full list yourself by downloading the data and following the methodology described on the page (hover over the question mark symbol).

You can also use the Custom Analyses tool to look for genetic dependencies correlated with drug sensitivity profiles. E.g.:

Hope this is helpful!


1 Like

Dear Aviad,

thank you very much for your comprehensive response and suggestions !! In order to fully understand your answer:

  1. Regarding the first approach:

A) if I have understood well, based on my description regarding these specific cell lines-what I could perform based on your suggestion, is to create two groups on data explorer, that is the one with the 2 selected “similar” breast cancer cell lines[group A: MCF7 & T47D] and the second with all the rest breast cancer cell lines ? aiming to increase both sample size as you suggested, but also specificity in the comparison ?

B) Regarding the dataset criterion in the group comparison: if I select in the step 1 the Drug sensitivity AUC (Prism Repurposing Secondary Screen)19Q4, this resembles the “secondary-screen-replicate-collapsed-logfold-change-csv” file which I have mentioned in my description ? or it is an alternative measure to also identify drugs that “preferentially kill” specific cell lines ?

C) If my notion is correct, based on the screenshot I have attached based on a random two group comparison: I should filter and keep the topN results, based both on the effect size and Q-value ? and in particular, to keep the most negative effect size, along with the lowest Q-values ?

D) Overall, the correct interpretation when comparing the two groups-that is the one of two selected cell lines of interest (group A) versus all the other breast cancer cell lines (group B)-is that the more the negative value in the column effect size, the “more selective” is the respective drug in the group A in comparison with the rest of the cell lines ?

  1. In parallel, based on the results on Step 1: assuming I have selected top10 drugs that differentiate between my selected drugs-then, also based on your suggestion, regarding the custom analysis to look for genetic dependencies correlated with drug sensitivity profiles: I can search for each drug separately, if there are available both the drug sensitivity AUC (secondary screen) and CRISPR Avana Public 21Q1 ? with also the respective cell lines ?

And afterwards, based on the relative results-I should also search for the highest correlation results ? either positive or negative, even with moderate effect and the relative q-value ? and select similarly the topN genes ?



A. That would be a step in the right direction, but I’d still worry about the results coming of an analysis using only 2 cell lines as one group.
B. It’s the same dataset.
C. It’s up to you to choose the analysis methodology. I can’t comment on that.
D. More negative effect size for a compound indicates a lower mean value (higher sensitivity of the cell lines) in group A compared to group B.
2. If you’re interested in the genetic dependencies that phenocopy a given compound sensitivity profile, you can search for them one compound at a time.

Again, while I’m happy to help with guidance on how to use the site, I can’t comment on the analysis methodology you choose to use.

1 Like

Dear Aviad,

thank you very much for your response and directions !! Indeed, without the intention to cause any disturbance, just to add two crucial comments-questions, not on the analysis methodologies but rather how the data from depmap can be utilized robustly:

A. Initially thank you for your directions regarding question 1-indeed still for the first part the sample size is small, but we would like to focus on these cell lines as a group that resemble a specific cancer subtype, but perhaps we might be able to increase the small group

B. Regarding the second question based on your suggestion: “If you’re interested in the genetic dependencies that phenocopy a given compound sensitivity profile, you can search for them one compound at a time”-thus, this is possible with a different type of query ? that I can check separately from the site for each compound with which dependencies-i.e. genes-it is highly correlated ? or you meant that from analysis in Step 1, I can select based on my methodology the topN hits, and then utilize Custom Analyses tool to look for genetic dependencies correlated with drug sensitivity profiles ?

C. Finally, in addition, is there another way from the DepMap portal, to investigate if specific sets of genes are essential in specific cancer cell lines ?

Thank you in advance for your time and help !!