One of the user questions about GDSC dataset is the reason for the recomputation of the dose-response parameters (besides the published original values), and how to compute these values from the raw data.
The need for the re-computation is dose-response parameters are completely driven by the visualization purposes, since in the original source (https://www.cancerrxgene.org/downloads/bulk_download) only the raw data and the single number summaries (AUC and IC50) values are available.
On the portal we processed the raw data following the vignette from https://github.com/CancerRxGene/gdscdata/blob/master/vignettes/gdsc_v17.Rmd. Once we obtain the viability values per replicate, we fit the classical 4 parameter log-logistic dose-response curves using dr4pl R-package (https://cran.r-project.org/web/packages/dr4pl/index.html).
Unfortunately, the source code for our analysis is not available at this point but an example snippet for a given perturbation and cell line is provided below:
compute_auc = function(l, u, ec50, h, md, MD) {
f1 = function(x) pmax(pmin((l + (u - l)/(1 + (2^x/ec50)^h)),1),0)
integrate(f1, log2(md),log2(MD))$value/(log2(MD/md))
}
compute_log.ic50 = function(l, u, ec50, h, md, MD) {
if((l >= 0.5) | (u <= 0.5)){
return(NA)
}else{
f1 = function(x) (l + (u - l)/(1 + (2^x/ec50)^h) - 0.5)
return(tryCatch(uniroot(f1, c(log2(md), log2(MD)))$root, error = function(x) NA))
}
}
param = dr4pl(dose = data$dose, #list of doses in uM
response = data$viability, #viability values relative to DMSO
method.init = "logistic", trend = "decreasing")$coefficients$Estimate
results = tibble(
upper_limit = param[1],
ec50 = param[2],
slope = -param[3],
lower_limit = param[4],
MD = data$dose %>% max(),
md = data$dose %>% min()) %>%
dplyr::mutate(auc = compute_auc(lower_limit, upper_limit, ec50, slope, md, MD),
log_ic50 = compute_log.ic50(lower_limit, upper_limit, ec50, slope, md, MD))
In our analyses, we observed the differences between original and the re-fitted values are marginal, and we recommend to use the original values for downstream analyses unless there is a good reason to do otherwise.