FEV_KEGG.Experiments.38 module

Context

The approach of building a consensus/majority graph of enzymes/EC numbers to find a core metabolism shared among several organisms has to be validated against previous research. One such previous research deals with Thermus thermophilus HB27, but not with its complete metabolism, but only the essential “core” parts. They enhance the information provided by the KEGG database with information from MetaCyc and, most importantly, from manual curation with organism specific knowledge. Lee et al. (2014) list EC numbers in additional file 1 “12934_2014_968_MOESM1_ESM.xls”. As with other validations before, we have to filter EC numbers which are outdated, or somehow not represented by KEGG’s standard pathways. This is done here by restricting any EC number to the ones in NUKA, whis is done for both, their and our set of EC numbers.

Question

Does the consensus/majority graph approach to core metabolism yield a similar set of EC numbers as the approach of Lee et al. (2014)?

Method

  • extract EC numbers from Lee et al. (2014) by hand
  • sanitise them by leaving only the ones found in NUKA
  • also remove the ones with wildcards
  • REPEAT with different groups
    1. get group of organisms ‘Thermus thermophilus’
    1. get only the organism ‘Thermus thermophilus HB27’
  • REPEAT for varying majority-percentages:
  • calculate EC numbers occuring in group’s core metabolism
  • sanitise them by leaving only the ones found in NUKA
  • also remove the ones with wildcards
  • overlap their set with ours and print amount of EC numbers inside the intersection and falling off either side

Result

Maj. %  others  both   ours
All Thermus thermophilus:
100%:    118    306    106
90%:    118    306    106
80%:    118    306    106
70%:    111    313    119
60%:    111    313    119
50%:    102    322    138
40%:    102    322    138
30%:    102    322    138
20%:     95    329    160
10%:     95    329    160
1%:     95    329    160

Thermus thermophilus HB27:
100%:    98    326    119
90%:    98    326    119
80%:    98    326    119
70%:    98    326    119
60%:    98    326    119
50%:    98    326    119
40%:    98    326    119
30%:    98    326    119
20%:    98    326    119
10%:    98    326    119
1%:    98    326    119

Conclusion

Comparing the core metabolism of all Thermus thermophilus, we see less overlap between both sets of EC numbers, which was to be expected. Still, this shows that the variance between Thermus thermophilus subspecies is not huge, but certainly visible. Three EC numbers only overlap when using the consensus metabolism of the whole group (329 vs. 326), indicating that Lee et al. manually added these EC numbers to their HB27 model, while they also exist in other subspecies known to KEGG.

For further analysis, we only regard the metabolism of the HB27 subspecies.

About 30% of overlapping EC numbers fall off either side. This shows a significant disrepancy between today’s data and/or the data Lee et al. added manually. If there were mainly EC numbers occuring only in their set, they would clearly stem from the manual addition. However, even more EC numbers occur only in our set, which raises the question if that many EC numbers could have been added to these organisms in KEGG since 2014. While this is possible, we sadly have no way to verify this. All in all, the overlap is about 60% of the total sum of all EC numbers occuring in either set. This at least shows a fundamental consensus between both approaches and data sets.