Hello, I have noticed that there is a significant difference between the Copy_Number_(Absolute) data in the Custom Downloads interface and the OmicsAbsoluteCNGene data in the All Data interface, both in terms of the number of genes and the number of cell lines. I would like to know more details about the Copy_Number_(Absolute) information currently available in the Custom Downloads interface. Which file in the All Data interface does it correspond to?
It’s worth noting that there two separate areas where you can download data from: The “All Data” section of Downloads has files as they were originally released to the public:
Those files which had data about a large number of models and had measurements that we thought would be useful for users of the portal, had those files imported into the portal’s database. Now, the portal’s database has had some lines dropped when we discovered they were untrustworthy and things like gene identifiers have changed over time. Only data about entities that are known to the portal is loaded, which means some samples may be omitted from the original historical files.
The “Custom Downloads” interface is exporting data from the portal’s database, and so will not contain any data that could not be loaded into the portal’s database.
Also, Data Explorer fetches all of it’s data from the portal’s database as well, so the easiest way to find the connection between a dataset and the files it came from is to look in data explorer:
If you select a dataset there, and click “details” next to “Data Version” it will give you a popup with the file info:
You can download the source file directly from here by clicking on the download link, or you can copy and paste the filename into the search bar at the top of the page to be taken to the dataset that this was contained within.
In this case, the file was one of the files released with the CCLE 2019 paper.
The OmicsAbsoluteCNGene file is from our latest pipeline which is generated each release, and uses “PureCN” as opposed to “Absolute” as the method for determining absolute copy number.