FEV_KEGG.Experiments.33 module¶
Context¶
Closer look at the possible reasons for occurence of EC numbers completely unknown to us in experiment 32.
1. EC number is associated with the organism, but not listed in one of KEGG’s hand-drawn pathways. For example 1.13.11.24 is associated with all our 15 organisms, but not present in any pathway.
2. As seen in experiment 30, there may be EC numbers predicted by Oh et al. which are outdated.
3. Oh et al. used a compilation of several sources, some may have predicted EC numbers for B. subtilis which never made their way into KEGG at all, which is our only source.
Question¶
Does the high number of EC numbers only in their set from experiment 32 result from outdated/faulty data?
Method¶
- get all EC numbers known to any organism in KEGG, using NUKA.
- take the EC numbers only in Oh’s set at 1% majority (see
32). - keep only the ones occuring in NUKA.
- keep only the ones with a gene occuring in any of our 15 KEGG organisms found as “Bacillus subtilis”
Result¶
ECs not in any pathway: 50
Leaving ECs only in theirs: 58
---------------------------------
ECs not in any of our organisms: 57
Leaving ECs only in theirs: 1
3.1.3.3
Conclusion¶
- EC number is associated with the organism, but not listed in one of KEGG’s hand-drawn pathways. For example 1.13.11.24 is associated with all our 15 organisms, but not present in any pathway.
About half (50) of the EC numbers only in Oh’s set are not listed in any of KEGG’s pathways today. While B. subtilis may contain them, KEGG does not, and thus our model can not.
- As seen in experiment
30, there may be EC numbers predicted by Oh et al. which are outdated. - Oh et al. used a compilation of several sources, some may have predicted EC numbers for B. subtilis which never made their way into KEGG at all, which is our only source.
After removing all EC numbers unknown to our organisms in KEGG today, only one remains: 3.1.3.3. This seems to be another case of inconsistent data in KEGG, because 3.1.3.3 supposedly is part of pathway 00260 and 00680, and even has three associated genes in ‘bsu’, but it is nowhere to be found in the actual pathways bsu00260 or bsu00680.
Answering the question: yes, all EC numbers missing in our model can be explained by outdated or faulty data in either their set or in KEGG. Discriminating between these two possibilities is impossible for us.
While this experiment does not imply that our model is complete, it does imply that it is correct under the constraints imposed by the completeness and correctness of the KEGG database.