Download processed data just before Ceres/Chronos

Hi! I’m interested in having the data just before the copy number correction is implemented (Ceres, Chronos or whatever). I saw you can download the raw logFC as well as the mappings and quality checks that were used to aggregate/discard the samples/genes…

I was wondering whether you could provide the final resulting dataset (before Chronos) or at least the code to implement the missing steps from the logFC so that I can be sure I have the same data used to feed Chronos/Ceres.

Thanks! :relaxed:

Hello! For CERES you could use the Achilles_gene_effect_unscaled file to get the gene effect scores prior to CN correction (Please note: CERES will not be run in future releases). For Chronos, you can use code below to generate the model inputs. Please let me know if there are any issues!

'''
Generate the inputs for Chronos from the downloads available on 
the DepMap Downloads page (https://depmap.org/portal/download/).  

Required files:
- Achilles_guide_map.csv
- Achilles_replicate_map.csv
- Achilles_raw_readcounts.csv
'''

import pandas as pd



## Guide Map ##

#read in guide map
full_guide_map = pd.read_csv('Achilles_guide_map.csv')
#filter out guides which map to multiple genes
guide_counts = full_guide_map.groupby('sgrna').gene.count()
guide_map = full_guide_map[full_guide_map.sgrna.isin(
    guide_counts.loc[lambda x: x == 1].index
)]
#filter out guides with multiple alignments
alignment_counts = guide_map.groupby('sgrna').genome_alignment.count()
guide_map = guide_map[guide_map.sgrna.isin(
    alignment_counts.loc[lambda x: x == 1].index
)]
#only keep genes targeted by multiple guides
gene_counts = guide_map.groupby('gene').sgrna.count()
guide_map = guide_map[guide_map.gene.isin(gene_counts.loc[lambda x: x > 1].index)]



## Readcounts ##

#read in readcounts then trim to match selected sgrnas
readcounts = pd.read_csv('Achilles_raw_readcounts.csv', index_col=0)
readcounts = readcounts.loc[readcounts.index.isin(guide_map.sgrna)]



## Sequence Map ##

#read in replicate map to make sequence map
sequence_map = pd.read_csv('Achilles_replicate_map.csv')
#rename columns
sequence_map.rename(columns={'replicate_ID': 'sequence_ID', 'DepMap_ID': 'cell_line_name'}, inplace=True)
#annotate pDNA batches
sequence_map.cell_line_name.fillna('pDNA', inplace=True)
#set number of days for screens to 21 and pDNA to 0
sequence_map.loc[sequence_map.query('cell_line_name != "pDNA"').index, 'days'] = 21
sequence_map.days.fillna(0, inplace=True)



## Remove QC Failures ##

#keep only replicates which pass QC
sequence_map = sequence_map.query('passes_QC')
#trim sequence map and readcounts to contain same replicates
readcounts = readcounts.loc[:,readcounts.columns.isin(sequence_map.sequence_ID)].T
sequence_map = sequence_map.loc[sequence_map.sequence_ID.isin(readcounts.index)]

print("Chronos Inputs Generated")
print("- guide map (table with {} sgrnas, {} genes)".format(guide_map.sgrna.nunique(), guide_map.gene.nunique()))
print("- readcounts (matrix of {} replicates x {} sgrnas)".format(readcounts.shape[0], readcounts.shape[1]))
print("- sequence map (table with {} replicates, {} cell lines)".format(sequence_map.sequence_ID.nunique(), sequence_map.cell_line_name.nunique()))



## Example Chronos Run ##
#
# import chronos
# chronos.nan_outgrowths( #note will introduce NAs into dataframe inplace
#     readcounts=readcounts, 
#     guide_gene_map=guide_map, 
#     sequence_map=sequence_map
# )
# model = chronos.Chronos(
#     sequence_map={"avana": sequence_map},
#     guide_gene_map={"avana": guide_map},
#     readcounts={"avana": readcounts}
# )
# model.train(nepochs=1001)
# model.save("chronos_outputs")
#
# Note: additional post-processing steps include scaling and copy number correction
1 Like