FEV_KEGG.Experiments.38 module¶
Context¶
The approach of building a consensus/majority graph of enzymes/EC numbers to find a core metabolism shared among several organisms has to be validated against previous research. One such previous research deals with Thermus thermophilus HB27, but not with its complete metabolism, but only the essential “core” parts. They enhance the information provided by the KEGG database with information from MetaCyc and, most importantly, from manual curation with organism specific knowledge. Lee et al. (2014) list EC numbers in additional file 1 “12934_2014_968_MOESM1_ESM.xls”. As with other validations before, we have to filter EC numbers which are outdated, or somehow not represented by KEGG’s standard pathways. This is done here by restricting any EC number to the ones in NUKA, whis is done for both, their and our set of EC numbers.
Question¶
Does the consensus/majority graph approach to core metabolism yield a similar set of EC numbers as the approach of Lee et al. (2014)?
Method¶
- extract EC numbers from Lee et al. (2014) by hand
- sanitise them by leaving only the ones found in NUKA
- also remove the ones with wildcards
- REPEAT with different groups
- get group of organisms ‘Thermus thermophilus’
- get only the organism ‘Thermus thermophilus HB27’
- REPEAT for varying majority-percentages:
- calculate EC numbers occuring in group’s core metabolism
- sanitise them by leaving only the ones found in NUKA
- also remove the ones with wildcards
- overlap their set with ours and print amount of EC numbers inside the intersection and falling off either side
Result¶
Maj. % others both ours
All Thermus thermophilus:
100%: 118 306 106
90%: 118 306 106
80%: 118 306 106
70%: 111 313 119
60%: 111 313 119
50%: 102 322 138
40%: 102 322 138
30%: 102 322 138
20%: 95 329 160
10%: 95 329 160
1%: 95 329 160
Thermus thermophilus HB27:
100%: 98 326 119
90%: 98 326 119
80%: 98 326 119
70%: 98 326 119
60%: 98 326 119
50%: 98 326 119
40%: 98 326 119
30%: 98 326 119
20%: 98 326 119
10%: 98 326 119
1%: 98 326 119
Conclusion¶
Comparing the core metabolism of all Thermus thermophilus, we see less overlap between both sets of EC numbers, which was to be expected. Still, this shows that the variance between Thermus thermophilus subspecies is not huge, but certainly visible. Three EC numbers only overlap when using the consensus metabolism of the whole group (329 vs. 326), indicating that Lee et al. manually added these EC numbers to their HB27 model, while they also exist in other subspecies known to KEGG.
For further analysis, we only regard the metabolism of the HB27 subspecies.
About 30% of overlapping EC numbers fall off either side. This shows a significant disrepancy between today’s data and/or the data Lee et al. added manually. If there were mainly EC numbers occuring only in their set, they would clearly stem from the manual addition. However, even more EC numbers occur only in our set, which raises the question if that many EC numbers could have been added to these organisms in KEGG since 2014. While this is possible, we sadly have no way to verify this. All in all, the overlap is about 60% of the total sum of all EC numbers occuring in either set. This at least shows a fundamental consensus between both approaches and data sets.