Updated gene names in 21Q4 mutations data

SongSong · January 28, 2022, 6:44am

Dear DepMap team,

Greetings, I hope this problem resolved as soon as possible.
Although there was similar problem in Number of genes mutated in cell lines and I already asked in that page, I create a new topic once again for the answer.

I checked that there was a mismatch of the number of genes between the dataset (19536) and the main download page(18784).
When I mannually compared the data, I found the genes having Entrez_ID = 0.
Those have been updated to other HGNC ID.
So I tried to convert them to the newest version of ID as below:

But I realized that the location of mutations was also updated.
As you can see, the start point of DARC is 159176106 but now the location of DARC is 159204875…159206500

As a result, according to the original data, ACKR1(=DARC) were not mutated in any cancer cell lines.

In summary, “Is it okay to simply update the Gene symbol and ID without considering the differentiated location?”

Thank you for your reading this post.
Sincerely,
Songyeon

SongSong · February 2, 2022, 5:12am

I resolved this problem.
Anyone who faced this problem (i.e. mismatch of the number of genes between entrez_ID and Hugo_ID) can deal with excluding the genes whose entrez_ID=0.

I share the code that I runned:

Also, I suggest changing the number of genes in the mutation dataset.
As you can see, there are 18,783 genes in the dataset without Entrez gene ID = 0.
Because the number of genes was counted by unique() in R contains ID 0, it should be excluded.

Topic		Replies	Views
Number of genes mutated in cell lines Current Issues data	2	693	January 27, 2022
Entrez Gene ID, a tracked integers ends with ".0" in OmicsSomaticMutations.csv Report an Issue	1	225	July 6, 2023
22Q4 data, mutation data missing for certain genes Issues and Bugs data	2	421	May 24, 2023
21Q1 CCLE_mutations incorrect Entrez_Gene_Id & format of Codon_Change Issues and Bugs data	1	336	May 24, 2021
Entrez_Gene_Id for TBCE and PINX1 Q&A data	5	714	January 22, 2021

Updated gene names in 21Q4 mutations data

Related topics