Differences of the nubmer of mutations between 23Q2 and 24Q2

SongSong · July 9, 2024, 2:45am

Hi, I have a question about the differences in the number of mutations between 23Q2 and 24Q2.
For example, when I checked the frequency of mutation in the AARS2 gene at the Jurkat cell line, I found that it was mutated three times in 23Q2 but just once in 24Q2.
And there are several cases of similar changes decreasing the number of mutations.
Although I checked the document, I couldn’t understand why this phenomenon was evoked.
Can anyone explain about it?

+) Also, not for the mutation, some of the genes were removed from the latest version of the file even though they exist in the NCBI gene (e.g., BPY2/2B/2C, CSAG2/3, most of PRAMEFxx, etc.). Why do those genes disappear in the current files?

simz · July 9, 2024, 2:21pm

Hi,

To your first question, I looked into the two variants that were dropped and looks like they’re both synonymous mutations. We had a major mutation pipeline update in 23Q4 that implemented this logic, which is why you are seeing these differences between 23Q2 and 24Q2. Please see our documentation for details on our filtering/rescuing logic.

To your second question, what files are you referring to?

Best,
Simone

SongSong · July 10, 2024, 1:08am

Thank you for your kind reply.
In the second question, I compared between 23Q2 and 24Q2 gene expression data.
As in the screenshot, the number of genes was decreased from 19193 to 19137.

simz · July 10, 2024, 7:56pm

Thanks for sharing the screenshots. It appears that you are comparing non-batch corrected data with batch-corrected data. We introduced batch correction for expression data in 24Q2, which is based on stranded RSEM runs for samples run in the stranded sequencing protocol (for the first time) and unstranded RSEM runs for samples with nonstranded sequencing. The separate RSEM runs resulted in slightly different numbers of genes detected, and since the batch correction takes the intersection of these two gene sets, there are a small number of genes missing in the batch corrected matrix.

Simone

SongSong · July 11, 2024, 12:54am

Thank you for your clear explanation. I fully understood about the dataset.

Topic		Replies	Views
Same ModelID with different number of mutation calls between 23Q2 and 22Q4 Issues and Bugs	1	311	July 31, 2023
Mutations for entire genes missing Issues and Bugs data	1	126	May 8, 2024
Was the silent mutation removed from the OmicsSomaticMutations.csv in 24Q2? Q&A	1	121	June 25, 2024
Expression pattern for transcripts from 23Q2 vs 22Q4 data Q&A	3	331	November 30, 2023
Why did some cell lines disappear from 20Q2 release? Q&A omics , data	1	569	June 18, 2020

Differences of the nubmer of mutations between 23Q2 and 24Q2

Related topics