Novartis Project Drive data questions


First of all, thanks to everyone who has contributed to making this great resource! I’m pretty new to working with all of this data, so sorry if my question is a bit naive.

I was wondering if the Novartis/Drive data contained in RNAi combined file D2_combined_gene_dep_scores.csv was the most recent version released by the Drive team, or if it was from a certain snapshot in time.

Also, I noticed different cell lines come up when ranking dependencies on a certain gene when I use the Novartis/Drive interface (DRIVE DATA PORTAL) vs the D2_combined data from the file above. I assume this is due to different ways of processing the data and accounting for off-target shRNA effects (Ataris vs Demeter2). Is this a correct assumption, or is there something else I should consider as well?


The published Novartis DRIVE dataset is available here. You can download the raw and processed versions of the data they generated there.

We reprocessed the DRIVE dataset, along with a couple other large-scale RNAi screens, using the DEMETER2 algorithm. The files labeled “D2_DRIVE” contain just the Novartis DRIVE data reprocessed with DEMETER2. The files labeled “D2_combined” are actually the result of running DEMETER2 on the combination of 3 datasets: Novartis DRIVE, Broad Achilles, and a Marcotte et al. breast cancer screen dataset. Thus, if you’re looking for the version most comparable to what the Novartis team published I’d use the “D2_DRIVE” dataset. It should be pretty similar in terms of the per-gene relative dependency profiles.

Hope that helps!