DepMap Metadata

The DepMap Portal’s metadata files provides essential contextual information about data available on the platform and allows researchers to understand the context and limitations of the data, including:

  • Cell Line Information: Details about the origin, tissue type, and genetic background of each cancer cell line.
  • Experimental Conditions: Descriptions of the conditions under which experiments were conducted.
  • Data Provenance: Information on the sources and processing of data, ensuring transparency and reproducibility.

The DepMap release dataset has two primary metadata files, Model.csv and ModelCondition.csv.

Model.csv

Information about a model’s original characteristics, condition and state. This file also contains information about the original tumor the model was created from, specific molecular information about the model and other helpful information like where a model can be acquired from, whether genomic data is available in the dbGaP dataset, etc.

  • ModelID: Unique identifier for the model
  • PatientID: Unique identifier for models derived from the same tissue sample
  • CellLineName: Commonly used cell line name
  • StrippedCellLineName: Commonly used cell line name without characters or spaces
  • DepmapModelType: Abbreviated ID for model type. For cancer models, this field is from Oncotree, information for other disease types are generated by DepMap
  • OncotreeLineage: Lineage of model. For cancer models, this field is from Oncotree, information for other disease types are generated by DepMap
  • OncotreePrimaryDisease: Primary disease of model. For cancer models, this field is from Oncotree, information for other disease types are generated by DepMap
  • OncotreeSubtype: Subtype of model. For cancer models, this field is from Oncotree, information for other disease types are generated by DepMap
  • OncotreeCode: For cancer models, this field is based on Oncotree. For some models for which no corresponding code exists, this field is left blank
  • PatientSubtypeFeatures: Aggregated features known for the patient tumor
  • RRID: Cellosaurus ID
  • Age: Age at time of sampling
  • AgeCategory: Age category at time of sampling (Adult, Pediatric, Fetus, Unknown)
  • Sex: Sex at time of sampling (Female, Male, Unknown)
  • PatientRace: Patient/clinical indicated race (not derived)
  • PrimaryOrMetastasis: Site of the primary tumor where cancer originated from (Primary, Metastatic, Recurrence, Other, Unknown)
  • SampleCollectionSite: Site of tissue sample collection
  • SourceType: Indicates where model was onboarded from (Commercial, Academic lab, Other)
  • SourceDetail: Details on where model was onboarded from
  • CatalogNumber: Catalog number of cell model, if commercial
  • ModelType: Type of model at onboarding (e.g. Organoid, Cell Line)
  • TissueOrigin: Indicates tissue model was derived from (Human, Mouse, Other)
  • ModelDerivationMaterial: Indicates what material a model was derived from (Fresh tissue, PDX, Other)
  • ModelTreatment: Indicates which virus was used to transform a cell line (hTERT, SV40, etc.)
  • PatientTreatmentStatus: Indicates if sample was collected before, during, or after the patient’s cancer treatment (Pre-treatment, Active treatment, Post-treatment, Unknown)
  • PatientTreatmentType: Type of treatment patient received prior to, or at the time of, sampling (e.g. chemotherapy, immunotherapy, etc.), if known
  • PatientTreatmentDetails: Details about patient treatment
  • Stage: Stage of patient tumor
  • StagingSystem: Classification system used to categorize disease stage (e.g. AJCC Pathologic Stage), if known
  • PatientTumorGrade: Grade (or other marker of proliferation) of the patient tumor, if known
  • PatientTreatmentResponse: Any response to treatment, if known
  • GrowthPattern: Format model onboarded in (Adherent, Suspension, Dome, Spheroid, Unknown)
  • OnboardedMedia: Description of onboarding media
  • FormulationID: The unique identifier of the onboarding media
  • SerumFreeMedia: Indicates a non-serum based media (<1% serum)
  • PlateCoating: Coating on plate model onboarded in (Laminin, Matrigel, Collagen, None)
  • EngineeredModel: Indicates if model was engineered (genetic knockout, genetic knock down, cultured to resistance, other)
  • EngineeredModelDetails: Detailed information for genetic knockdown/out models
  • CulturedResistanceDrug: Drug of resistance used for cultured to resistance models
  • PublicComments: Comments released to portals
  • CCLEName: CCLE name for the cell line
  • HCMIID: Identifier models available through the Human Cancer Models Initiative (HCMI)
  • ModelAvailableInDbgap: Indicates the availability of data for a Model on DbGaP. Refer to the “SharedToDbgap” column on OmicsProfile.csv for specific Omics Profile data available
  • ModelSubtypeFeatures: Curated list of confirmed molecular features seen in the model
  • WTSIMasterCellID: WTSI ID
  • SangerModelID: Sanger ID
  • COSMICID: Cosmic ID

ModelCondition.csv

Model Condition(s) describes the specific conditions under which a model was assayed. These conditions can be different from the initial conditions, for instance if a model was screened in the presence of a compound.

  • ModelConditionID: Unique identifier for each model condition.
  • ModelID: Unique identifier for each model (same ID as in model table)
  • ParentModelConditionID: ID of parental model condition for new model conditions derived from other model conditions
  • DataSource: Site where source data was generated (e.g. Broad, etc.)
  • CellFormat: Format the cell line is being grown in (Adherent, Suspension, Mixed, Dome, Spheroid, Unknown). This can be different from Growth Pattern.
  • Morphology: Description of morphological features of the model in a particular growth condition
  • PassageNumber: Number of cell line passages (<5, 6-10, 10+)
  • GrowthMedia: Media condition was grown in at the model condition level
  • FormulationID: Media formulation
  • PlateCoating: Substrate used to coat plates (Laminin, Matrigel, None, Unknown)
  • SerumFreeMedia: Indicates a non-serum based media (<1% serum)
  • PrescreenTreatmentDrug: Drug used in pretreatment prior to screening
  • PrescreenTreatmentDrugDays: Duration of drug treatment prior to screening
  • AnchorDrug: Name of drug treatment used in anchor screen
  • AnchorDrugConcentration: Concentration of drug treatment used in anchor screen (with units)
  • AnchorDaysWithDrug: Range of days the model was treated with drug in anchor screens