FEV_KEGG.Experiments.31 module

Context

As concluded in experiment 30, we first need to remove outdated EC numbers from Poot-Hernandez’ set. Then, we have to reduce both sets, theirs and ours, to the first three levels, and then compare them. Now only the group of representative organisms and the Gammaproteobacteria group excluding ‘unclassified’ organisms is used.

Question

Does the consensus/majority graph approach to core metabolism yield a similar set of EC numbers as the approach of Poot-Hernandez et al. (2015)?

Method

  • extract EC numbers from Poot-Hernandez et al. (2015) by hand, any which are marked as blue (preserved)
  • remove outdated EC numbers
  • reduce set of EC numbers to first three levels
  • REPEAT with different groups
    1. get group of organisms deemed representative by Poot-Hernandez et al.
    1. get group of organisms ‘Gammaproteobacteria’, excluding unclassified
  • REPEAT for varying majority-percentages:
  • calculate EC numbers occuring in group’s core metabolism
  • reduce set of EC numbers to first three levels
  • overlap Poot-Hernandez’ set with ours and print amount of EC numbers inside the intersection and falling off either side

Result

Maj. %    others    both    ours
Representative:
100%:    43    7    1
 90%:    8    42    11
 80%:    1    49    21
 70%:    0    50    32
 60%:    0    50    39
 50%:    0    50    41
 40%:    0    50    49
 30%:    0    50    56
 20%:    0    50    67
 10%:    0    50    83
  1%:    0    50    108

Gammaproteobacteria without unclassified:
100%:    49    1    0
 90%:    2    48    18
 80%:    0    50    36
 70%:    0    50    40
 60%:    0    50    44
 50%:    0    50    51
 40%:    0    50    56
 30%:    0    50    66
 20%:    0    50    80
 10%:    0    50    88
  1%:    0    50    104

Conclusion

Starting at 70% majority, there a no enzyme reactions in their core metabolism which do not also occur in ours. While this method can not verify our approach, it at least rules out the most obvious path of falsification.

Regarding the full group of Gammaproteobacteria on our side, their set of EC numbers is fully covered by ours at a higher percentage, as to be expected. Therefore, it seems to be a good idea to use the whole taxon, instead of only representative organisms, even though the taxon is more diverse than the chosen representatives, which shows in the higher count of EC numbers only in our set at any majority percentage.

On the other hand, our core metabolism is consistently larger, apart from the special case of a consensus (100%) core metabolism. This could indicate that Poot-Hernandez et al. used a high percentage of occurence - between 100% and 90% - to define ‘preserved’. Which percentage they used is, sadly, not documented. But still, there is no point of clean overlap, showing that the two approaches yield fundamentally different results, with ours yielding a bigger core metabolism.