FEV_KEGG.Evolution.LUCA module

class FEV_KEGG.Evolution.LUCA.CoreLUCA(clade: FEV_KEGG.Evolution.LUCA.CoreLUCA.CladeType)[source]

Bases: object

Last Universal Common Ancestor by intersection of many or all organisms in KEGG.

This is the Last Universal Common Ancestor, as defined by a common “core metabolism” shared among all organisms known to KEGG within a certain NCBI top-clade. This would include Bacteria, Arachaea, and Eukaryota; which is a very big data set! Alternatively, you can specify which isolated top-clade to use, using clade, e.g. yielding the Bacteria-LUCA, or Archaea-LUCA. For each species only the first organism is considered, to prevent statistical overrepresentation.

Conversion into another type of graph is not supported, because LUCA is a strictly hypothetical organism without any exactly known genes.

Parameters:

clade (CoreLUCA.CladeType) – Which clade to use for defining a LUCA. Using ‘archae’ obviously only gives an Archae-LUCA, not the “true” LUCA, etc.

Variables:
Raises:
  • HTTPError – If any underlying organism, pathway, or gene does not exist.
  • URLError – If connection to KEGG fails.

Warning

This function takes hours to days to complete, and requires several gigabytes of memory, disk space, and network traffic!

class CladeType[source]

Bases: enum.Enum

Possible types of CoreLUCA.

Each accordings to a single, or a combination of, top-clades of NCBI. Only the ‘universal’ clade gives you the “true” LUCA.

archaea = '/Archaea'
archaeaBacteria = ['/Archaea', '/Bacteria']
archaeaEukaryota = ['/Archaea', '/Eukaryota']
bacteria = '/Bacteria'
bacteriaEukaryota = ['/Bacteria', '/Eukaryota']
eukaryota = '/Eukaryota'
universal = '/'
collectiveMetabolism() → FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph[source]

CoreLUCA’s collective metabolism, i.e. core metabolism with the lowest possible majorityPercentage value.

Returns:Contains all substrates/products and all EC numbers of any organism in the top-clade you chose.
Return type:SubstanceEcGraph
coreMetabolism(majorityPercentage) → FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph[source]

CoreLUCA’s core metabolism.

Parameters:majorityPercentage (float) – Percentage for determining how many organisms have to possess an EC edge, for it to be included in this ‘core metabolism’.
Returns:Contains all substrates/products and all EC numbers in the “core metabolism” of the top-clade you chose.
Return type:SubstanceEcGraph
class FEV_KEGG.Evolution.LUCA.GoldmanLUCA[source]

Bases: object

Last Universal Common Ancestor by Goldman et al.

This is the Last Universal Common Ancestor, as described in [1]. The original work on LUCA, however, does not specify enzyme function, but merely COGs [3]. This class already contains the list of LUCA’s enzymes from the above paper, as depicted in the first table of said paper [2]. As the most plausible minimal set of enzymatic functions, the authors chose the intersection of EC numbers found in universal sequence + structure, combined with the ones found in universal sequence + structure + function. See the original source for details.

This list is parsed and converted into a SubstanceEcGraph. Conversion is done by using the graph of a hypothetical ‘complete’ organism - NUKA - which possesses all EC numbers known to all metabolic KEGG pathways, see FEV_KEGG.KEGG.NUKA All EC numbers not present in LUCA are filtered out. Keep in mind, though, that LUCA’s EC numbers only contain three levels, to more adequately model the likely patchwork evolution in ancient times. Therefore, all EC numbers starting with the sub-class remain, regardless of substrate specificity.

Conversion into another type of graph is not supported, because LUCA is a strictly hypothetical organism without any exactly known genes.

Variables:self.nameAbbreviation (str) –

References

[1]Goldman et al. (2012), “The Enzymatic and Metabolic Capabilities of Early Life”, https://doi.org/10.1371/journal.pone.0039912
[2]Goldman et al. (2012), Table 1, https://doi.org/10.1371/journal.pone.0039912.t001
[3]Mirkin et al. (2003), “Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes”, https://doi.org/10.1186/1471-2148-3-2
ecNumbers

GoldmanLUCA’s EC numbers.

Generalised to the first three levels. The last level is always a wildcard.

Returns:Set of the EC numbers predicted by Goldman et al. to belong to LUCA.
Return type:Set[EcNumber]
substanceEcGraph

GoldmanLUCA’s substance-EC graph.

Returns:

Contains all substrates/products and all EC numbers in FEV_KEGG.KEGG.NUKA filtered by the EC numbers predicted by Goldman et al. for LUCA.

Return type:

SubstanceEcGraph

Raises:
  • HTTPError – If any underlying organism, pathway, or gene of NUKA does not exist.
  • URLError – If connection to KEGG fails.