Differences of the nubmer of mutations between 23Q2 and 24Q2

Hi, I have a question about the differences in the number of mutations between 23Q2 and 24Q2.
For example, when I checked the frequency of mutation in the AARS2 gene at the Jurkat cell line, I found that it was mutated three times in 23Q2 but just once in 24Q2.
And there are several cases of similar changes decreasing the number of mutations.
Although I checked the document, I couldn’t understand why this phenomenon was evoked.
Can anyone explain about it?

+) Also, not for the mutation, some of the genes were removed from the latest version of the file even though they exist in the NCBI gene (e.g., BPY2/2B/2C, CSAG2/3, most of PRAMEFxx, etc.). Why do those genes disappear in the current files?


To your first question, I looked into the two variants that were dropped and looks like they’re both synonymous mutations. We had a major mutation pipeline update in 23Q4 that implemented this logic, which is why you are seeing these differences between 23Q2 and 24Q2. Please see our documentation for details on our filtering/rescuing logic.

To your second question, what files are you referring to?


Thank you for your kind reply.
In the second question, I compared between 23Q2 and 24Q2 gene expression data.
As in the screenshot, the number of genes was decreased from 19193 to 19137.

Thanks for sharing the screenshots. It appears that you are comparing non-batch corrected data with batch-corrected data. We introduced batch correction for expression data in 24Q2, which is based on stranded RSEM runs for samples run in the stranded sequencing protocol (for the first time) and unstranded RSEM runs for samples with nonstranded sequencing. The separate RSEM runs resulted in slightly different numbers of genes detected, and since the batch correction takes the intersection of these two gene sets, there are a small number of genes missing in the batch corrected matrix.


Thank you for your clear explanation. I fully understood about the dataset.