Question about PRISM Drug sensitivity AUC data

Dear DepMap developers,

Thanks for developing this amazing tool!
I have some questions regarding the data structure, especially for the PRISM drug AUC data.

  1. I noticed that some drugs have multiple BROAD IDs, I guess that’s from different batches. But which one should we use? And I also noticed that almost all these drugs will have a large number of datasets and a smaller number of datasets, why’s that? Please see the snapshots as an example.
    Larger:

Smaller:

Also, if I download the data and use it on my end, I can replicate the scatterplot of the larger number of data set, but cannot replicate the small one, and the reason is that I found more corresponding cell lines than the plot in the portal, which I am not sure why’s that.

  1. I also noticed there are multiple screen ids even for the same drug Board ID, I am also confused to use which one is correct.

Any help or suggestions would be appreciated!

Thanks!
Shaowen

re-post the larger snapshot:

Yes, you’re right that some drugs have multiple BRD# IDs, and this is generally due to us having screened different batches.

As for which to use, that’s harder to say. I believe there were cases of two different batches resulting from ordering compound collections from different vendors where those vendors had some overlap. Other examples of different batches may be from re-ordering compounds. The differences that you see may be related to the material that was received in the two different batches, but it may be due to noise in the assay itself or the quality of the specific screen. In many cases, such as these, the portal is trying to present the data as published but doesn’t offer much guidance around which batch to use.

However, in this specific case, when you say that one batch yields a few points and the other plot yields more points, I think what you’re seeing is an example of a compound which was screened twice. In the first screen, the screening quality was likely low, which would result in many of the lines being dropped. There are several reasons why compounds may have been re-screened including data quality. This is why you see multiple screen IDs in addition to multiple compound IDs for this drug.

In general, we don’t have a suggestion for which batch to use, but later screens (as identified by bigger screen ID #) tend to be of higher quality as the data quality has improved the more this assay has been run.

In the portal, if we have multiple screen IDs for a given compound batch, the portal uses this heuristic to decide which screen to present. I suspect that the reason why you’re not seeing the plot with fewer points might be due to pooling results from multiple screens.

Are you breaking out the data into batch and screen for each plot when you generate it outside of the portal?

1 Like

Hi @pmontgom, Thanks for the reply. This is very helpful!
Yes, I broke out the data into different batches and different screens. And all of them matched back to the CCLE gene expression levels by the Cell ID.
Not sure if this is a good way to do it.

Thanks!
Shaowen

The data I am using is the CCLE_expression.csv and secondary-screen-dose-response-curve-parameters.csv

A comparison between plots generated from Depmap potal and my own plots

I was wondering is that because in the gene expression profile also has different batches and screen ids? So i cannot simply merge this two data by taking the depmap id?

Thanks,
Shaowen

Hello,

I believe the difference in what you’re seeing is due to the filtering I mentioned, “…if we have multiple screen IDs for a given compound batch, the portal uses this heuristic to decide which screen to present.”

However, I the way I described it was misleading. The filtering actually per cell line + compound, not at the level of per compound.

The reason why the are so few points in the bottom left plot is because all the points for cell lines which were present in the later screen were removed. The low number of cell lines which are remaining are actually cell lines that were likely not included in the later screen. (Several problematic lines were removed from the pools in later screens)

After some discussion, we’ve decided that this behavior is confusing, so we’re going to change the filter to work on a per-compound basis instead of per-(compound+cell line) basis. The result is that if you pull up this compound in the portal, you’ll only be presented with the higher quality screen (your top plots) and we won’t show people data for the bottom plots. (But the bottom data will continue to be available in the download files if people still want the lower quality data for anything)

I’ve put this change into the queue of upcoming changes and don’t currently have an ETA. However, once it’s in, it should be announced on the change-log on the portal’s home page.