FEV_KEGG.Experiments.39 module¶
Context¶
The approach of building a consensus/majority graph of enzymes/EC numbers to find a core metabolism shared among several organisms has to be validated against previous research. One such previous research deals with seven representative genomes of Thaumarchaeota, but manually curates all genes, including their associated EC numbers. Kerou et al. (2016) list EC numbers in the additional file “Dataset_S02.xlsx”, from which we extracted the ones on the sheets “Cell surface & glycosyl” and “Metabolism”. As with other validations before, we have to filter EC numbers which are outdated, or somehow not represented by KEGG’s standard pathways. This is done here by restricting any EC number to the ones in NUKA, whis is done for both, their and our set of EC numbers.
Question¶
Does the consensus/majority graph approach to core metabolism yield a similar set of EC numbers as the approach of Kerou et al. (2016)?
Method¶
- extract EC numbers from Kerou et al. (2016) by hand
- sanitise them by leaving only the ones found in NUKA
- also remove the ones with wildcards
- REPEAT with different groups
- get group of organisms by clade ‘Thaumarchaeota’
- get only the seven organisms used by Kerou et al. (2016)
- REPEAT for varying majority-percentages:
- calculate EC numbers occuring in group’s core metabolism
- sanitise them by leaving only the ones found in NUKA
- also remove the ones with wildcards
- overlap their set with ours and print amount of EC numbers inside the intersection and falling off either side
Result¶
Maj. % others both ours
Thaumarchaeota:
100%: 102 65 80
90%: 73 94 139
80%: 66 101 154
70%: 65 102 156
60%: 62 105 162
50%: 60 107 167
40%: 58 109 174
30%: 53 114 188
20%: 51 116 203
10%: 46 121 240
1%: 38 129 334
Representative organisms:
100%: 74 93 142
90%: 74 93 142
80%: 65 102 155
70%: 65 102 161
60%: 65 102 161
50%: 61 106 174
40%: 53 114 191
30%: 53 114 191
20%: 50 117 209
10%: 47 120 245
1%: 47 120 245
Conclusion¶
When comparing the core metabolisms of all Thaumarchaeota known today, with only the ones from the seven representative organisms, there is not much difference. This shows that these seven organisms are indeed very well chosen representatives.
Considering the amount of EC numbers falling off to either side: The number of ECs only in our set is larger than the overlap, thus, we again see that core metabolisms created with our approach tend to be bigger than manually curated ones. The latter is most likely due to the fact that ECs which occur in all genomes do not necessarily have to be essential, while Kerou et al. aimed at only including essential ECs. The number of ECs only in their set is also very high, accounting to roughly 65% of the overlap, or 40% of their set, and 20% of the overall set. These ECs only in their set can not stem from ECs not in KEGG pathways at all, since we pre-filtered them using NUKA. The most likely explanations seems to be that Kerou et al. were able to annotate many more EC numbers manually than KEGG’s GENE database has stored to this date. This, again, would mean that KEGG’s data is incomplete, which is strongly implied by the fact that even the collective graph (1% majority) does not contain 47 of the representative’s EC numbers, which can only happen if these EC numbers are nowhere to be found in any of today’s seven organisms in KEGG.
In conclusion of the effectiveness of our approach of building a core metabolism, we are left to say that completeness and quality of EC number annoations vary greatly, both within literature and KEGG. Therefore, to achieve the most exact model of an organisms metabolism, one needs to apply further steps beyond our approach. Such steps may involve flux balance analysis with a manually curated list of ‘essential’ metabolites. Still, however, when reducing the set of EC numbers to the ones known to standard KEGG pathways (using NUKA), core metabolisms created via our approach can be used to roughly compare the metabolic capabilities of closely, or even remotely related organisms, groups of organisms, and whole clades.