Re-computed dose response values for GDSC datasets

One of the user questions about GDSC dataset is the reason for the recomputation of the dose-response parameters (besides the published original values), and how to compute these values from the raw data.

The need for the re-computation is dose-response parameters are completely driven by the visualization purposes, since in the original source (https://www.cancerrxgene.org/downloads/bulk_download) only the raw data and the single number summaries (AUC and IC50) values are available.

On the portal we processed the raw data following the vignette from https://github.com/CancerRxGene/gdscdata/blob/master/vignettes/gdsc_v17.Rmd. Once we obtain the viability values per replicate, we fit the classical 4 parameter log-logistic dose-response curves using dr4pl R-package (https://cran.r-project.org/web/packages/dr4pl/index.html).

Unfortunately, the source code for our analysis is not available at this point but an example snippet for a given perturbation and cell line is provided below:

compute_auc = function(l, u, ec50, h, md, MD) {
    f1 = function(x) pmax(pmin((l + (u - l)/(1 + (2^x/ec50)^h)),1),0)
    integrate(f1, log2(md),log2(MD))$value/(log2(MD/md))
}

compute_log.ic50 = function(l, u, ec50, h, md, MD) {
    if((l >= 0.5) | (u <= 0.5)){
        return(NA)
    }else{
        f1 = function(x) (l + (u - l)/(1 + (2^x/ec50)^h) - 0.5)
        return(tryCatch(uniroot(f1, c(log2(md), log2(MD)))$root, error = function(x) NA))
    } 
}

param = dr4pl(dose = data$dose, #list of doses in uM
                        response = data$viability, #viability values relative to DMSO
                        method.init = "logistic", trend = "decreasing")$coefficients$Estimate

results = tibble(
        upper_limit = param[1],
        ec50 = param[2],
        slope = -param[3],
        lower_limit = param[4],
        MD = data$dose %>% max(),
        md = data$dose %>% min()) %>%
    dplyr::mutate(auc = compute_auc(lower_limit, upper_limit, ec50, slope, md, MD),
                          log_ic50 = compute_log.ic50(lower_limit, upper_limit, ec50, slope, md, MD))

In our analyses, we observed the differences between original and the re-fitted values are marginal, and we recommend to use the original values for downstream analyses unless there is a good reason to do otherwise.