Repeated uniprot accesions and weird qvalues of proteomic data in the analysis tool

HI,

I have some doubts about how the proteomic data is processed and how the qvalues are calculated on the custom analysis tool.

Regarding the data, there are some gene symbols associated to multiple uniprot accesion numbers. In particular, there is a set of genes (for example AMY1B, AMY1C, ERFL…) that have ~24 uniprot accesion numbers. Obviously, AMY1B (A4QPH2.3), AMY1C (A4QPH2.3), ERFL (A4QPH2.3) and such, all show the same values. According to uniprot A4QPH2.3 corresponds to PI4KAP2. In my downloaded data there are 134 entries of the type: <gene symbol (A4QPH2.3)>.

Is this expected behaviour?

Another somewhat related thing is that when running pearson correlation between MetMap500 and proteomic data, most of the qvalues are identical to the pvalues. On a glance, it seemed to me that the correlations that had values for all queried cell lines (numCellLines) did have a reasonable qvalue while the rest (although not always) just had qvalues exactly the same as pvalue.

Thank you!

Just a little update. After downloading the data and running correlations to MetMap500 locally in R I have:

  • Replicated the correlation and pvalues for the proteomic data.
  • Replicated the correlation, pvalues and qvalues for gene expression and metabolomics

Using p.adjust(data$Pvalue, method='fdr')

The main difference I see is that in transcriptomics and metabolomics, numCellLines is the same across all genes/metabolites.