Which one of the expression data is proper for machine learning?

Hi, thank you for your effort in launching the newest dataset version 2024Q2.
But I’m a little confused about the two kinds of gene expression data: original and batch normalized.
If I want to use the gene expression data for the machine learning model training, which one should I use?
I checked that the original has some 0 values and the batch normalized dataset doesn’t have 0 but negative values.
Can anyone recommend the dataset?

Hi,

I’d recommend using the batch-corrected data, as we are planning to phase out the non-batch corrected data in the future. For details, please see the pdf attached at the end of the 24Q2 release announcement.

Thanks,
Simone