FEV_KEGG.Evolution.LUCA module¶
-
class
FEV_KEGG.Evolution.LUCA.CoreLUCA(clade: FEV_KEGG.Evolution.LUCA.CoreLUCA.CladeType)[source]¶ Bases:
objectLast Universal Common Ancestor by intersection of many or all organisms in KEGG.
This is the Last Universal Common Ancestor, as defined by a common “core metabolism” shared among all organisms known to KEGG within a certain NCBI top-clade. This would include Bacteria, Arachaea, and Eukaryota; which is a very big data set! Alternatively, you can specify which isolated top-clade to use, using clade, e.g. yielding the Bacteria-LUCA, or Archaea-LUCA. For each species only the first organism is considered, to prevent statistical overrepresentation.
Conversion into another type of graph is not supported, because LUCA is a strictly hypothetical organism without any exactly known genes.
Parameters: clade (CoreLUCA.CladeType) – Which clade to use for defining a LUCA. Using ‘archae’ obviously only gives an Archae-LUCA, not the “true” LUCA, etc.
Variables: - self.nameAbbreviation (str) –
- self.clade (
FEV_KEGG.Evolution.Clade.Clade) – - self.cladeType (
CoreLUCA.CladeType) –
Raises: HTTPError– If any underlying organism, pathway, or gene does not exist.URLError– If connection to KEGG fails.
Warning
This function takes hours to days to complete, and requires several gigabytes of memory, disk space, and network traffic!
-
class
CladeType[source]¶ Bases:
enum.EnumPossible types of CoreLUCA.
Each accordings to a single, or a combination of, top-clades of NCBI. Only the ‘universal’ clade gives you the “true” LUCA.
-
archaea= '/Archaea'¶
-
archaeaBacteria= ['/Archaea', '/Bacteria']¶
-
archaeaEukaryota= ['/Archaea', '/Eukaryota']¶
-
bacteria= '/Bacteria'¶
-
bacteriaEukaryota= ['/Bacteria', '/Eukaryota']¶
-
eukaryota= '/Eukaryota'¶
-
universal= '/'¶
-
-
collectiveMetabolism() → FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph[source]¶ CoreLUCA’s collective metabolism, i.e. core metabolism with the lowest possible majorityPercentage value.
Returns: Contains all substrates/products and all EC numbers of any organism in the top-clade you chose. Return type: SubstanceEcGraph
-
coreMetabolism(majorityPercentage) → FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph[source]¶ CoreLUCA’s core metabolism.
Parameters: majorityPercentage (float) – Percentage for determining how many organisms have to possess an EC edge, for it to be included in this ‘core metabolism’. Returns: Contains all substrates/products and all EC numbers in the “core metabolism” of the top-clade you chose. Return type: SubstanceEcGraph
-
class
FEV_KEGG.Evolution.LUCA.GoldmanLUCA[source]¶ Bases:
objectLast Universal Common Ancestor by Goldman et al.
This is the Last Universal Common Ancestor, as described in [1]. The original work on LUCA, however, does not specify enzyme function, but merely COGs [3]. This class already contains the list of LUCA’s enzymes from the above paper, as depicted in the first table of said paper [2]. As the most plausible minimal set of enzymatic functions, the authors chose the intersection of EC numbers found in universal sequence + structure, combined with the ones found in universal sequence + structure + function. See the original source for details.
This list is parsed and converted into a SubstanceEcGraph. Conversion is done by using the graph of a hypothetical ‘complete’ organism - NUKA - which possesses all EC numbers known to all metabolic KEGG pathways, see
FEV_KEGG.KEGG.NUKAAll EC numbers not present in LUCA are filtered out. Keep in mind, though, that LUCA’s EC numbers only contain three levels, to more adequately model the likely patchwork evolution in ancient times. Therefore, all EC numbers starting with the sub-class remain, regardless of substrate specificity.Conversion into another type of graph is not supported, because LUCA is a strictly hypothetical organism without any exactly known genes.
Variables: self.nameAbbreviation (str) – References
[1] Goldman et al. (2012), “The Enzymatic and Metabolic Capabilities of Early Life”, https://doi.org/10.1371/journal.pone.0039912 [2] Goldman et al. (2012), Table 1, https://doi.org/10.1371/journal.pone.0039912.t001 [3] Mirkin et al. (2003), “Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes”, https://doi.org/10.1186/1471-2148-3-2 -
ecNumbers¶ GoldmanLUCA’s EC numbers.
Generalised to the first three levels. The last level is always a wildcard.
Returns: Set of the EC numbers predicted by Goldman et al. to belong to LUCA. Return type: Set[EcNumber]
-
substanceEcGraph¶ GoldmanLUCA’s substance-EC graph.
Returns: Contains all substrates/products and all EC numbers in
FEV_KEGG.KEGG.NUKAfiltered by the EC numbers predicted by Goldman et al. for LUCA.Return type: Raises: HTTPError– If any underlying organism, pathway, or gene of NUKA does not exist.URLError– If connection to KEGG fails.
-