Raw Read Counts Accession

Hello,

We want to compare our WGS data with DepMap, therefore we would like to access the raw read counts to analyze the counts with our own pipeline.
We downloaded the latest (20Q2) and several older (20Q1; 18Q1) Achilles raw read counts. In these raw counts files we are looking for HMEL (ACH-000642), 22Rv1 (ACH-000956), LNCAPCLONEFGC (ACH-000977), DU145 (ACH-000979). We checked sample_info of 20Q2, and achilles_n_replicates are not filled for HMEL, 22Rv1, DU145. For LNCAPCLONEFGC the sample_info presents 2 replicates, whereas we could not find raw read counts. These specific cell lines are also not presented in achilles_replicate_map.

In addition 20Q2 raw read counts only provide batch4 pDNA read counts, and the other batches are not available. But we need to compare the cell lines with specific batches as we checked from achilles_replicate_map file from DepMap.

Best Regards,
A. Cenk Aksu
aaksu16@ku.edu.tr

Hi, Are you sure you want Achilles readcounts? These represent the result of CRISPR screens and have no connection to WGS. If a cell line has no Achilles replicates, it has not been screened in Achilles.

LNCAPCLONEFGC is scheduled for future release. 20Q2 readcounts contains all pDNA batches.

Hi,
Thank you for showing my mistake. I want the Achilles read counts, for some comparisons with CRISPR screens we implemented in our lab. So I’m sure that I need the raw counts of achilles.

Is there a file, which presents which cell lines added to the latest release?

I’ve checked again the 20Q2 data and unfortunately I just see three different concentrations (10 ng, 50 ng, 100 ng) from batch 4 (screenshot added). To be sure I’ve downloaded the data again, and file still contain the same pDNAs. Could it be possible to reach to other pDNA batches, especially pDNA batch 3, which is the batch number of our cells lines which we want to compare.

Thanks.

I think the problem is that you are only looking at columns that start with “pDNA”. All columns in the readcounts file should be identified using the file Achilles_replicate_map, never the column name. Columns are pDNA if their DepMap_ID in the Achilles_replicate_map is null. The readcounts include pDNA measurements from batches 2, 3, and 4.

Hello Joshua,

I’ve checked the Achilles_replicate_map and find the pDNA of all batches, which also found in Raw read counts.

Thanks for all the help.

I apologize if this information is elsewhere, but I have a few quick related questions:

  1. Do the replicate IDs with DepMap_ID == NULL in the “Achilles_replicate_map.csv” correspond to the initial read counts? If so, which should we use per batch and are the initial reads the same for replicates with the same value for pDNA_batch?
  2. Are the values in “Achilles_raw_readcounts.csv” the initial read counts or the final read counts? I believe they are the are the latter, but I just want to confirm.

Thank you

  1. replicate IDs with null DepMap IDs are all pDNA measurements, which are what we use instead of initial read counts. You should use all the pDNA measurements that share a batch with the replicate of interest.

  2. They are the pDNA and replicate final readcounts.

Thank you for your quick response.

I’m still a bit confused about which pDNA to use because there are multiple for each batch. Below are the entries without a DepMap ID.

replicate_ID DepMap_ID pDNA_batch
Avana4pDNA20160601-311cas9 RepG12_batch2 2
Avana4pDNA20160601-311cas9 RepG09_batch2 2
Avana4pDNA20160601-311cas9 RepG11_batch2 2
Avana4pDNA20160601-311cas9 RepG10_batch2 2
Avana 4+ Hu pDNA (M-AA40, 9/30/15) (0.2pg/uL)_batch3 3
Avana 4+ Hu pDNA (M-AA40, 9/30/15) 0.2pg/uL_batch3 3
Avana 4+ Hu pDNA (M-AA40, 9/30/15)_batch3 3
Avana 4+ Hu pDNA (M-AF34, 11/27/18) 0.2pg/uL_batch3 3
pDNA 100pg_batch4 4
pDNA 100 pg/well_batch4 4
pDNA 100pg/well_batch4 4
pDNA 50ng_batch4 4
pDNA 10 pg/well_batch4 4
pDNA 50 pg/well_batch4 4
pDNA 100ng_batch4 4
pDNA 10ng_batch4 4
pDNA_batch4 4

To me, it looks like there are 4 replicates that correspond to the primary DNA for batch 2. Am I interpreting this correctly? If so, how do I use all four (they are all included in the “Achilles_raw_readcounts.csv” file)? Say for instance, I wanted to calculate the log-fold change for a sgRNA in some replicate that uses pDNA batch 2.

Thank you

We take the median of reads per million, but of course that’s not the only way you could imagine combining or using them. I’d recommend reading https://www.biorxiv.org/content/10.1101/720243v1.full for more details on the Achilles processing pipeline.

That makes much more sense, thank you for your help.