FEV_KEGG.Evolution.Events module¶
-
class
FEV_KEGG.Evolution.Events.ChevronGeneDuplication(possiblyOrthologousOrganisms: Iterable[Organism] or KEGG.Organism.Group)[source]¶ Bases:
FEV_KEGG.Evolution.Events.GeneDuplicationEvolutionary event of duplicating a gene, in dependence of a certain ancestoral bond.
- The conditions for a ‘chevron’ gene duplication are:
- The gene has at least one paralog.
- The gene has at least one ortholog in a pre-defined set of organisms.
Chevron gene duplication extends simple gene duplication by limiting the possibly duplicated genes via a set of possibly orthologous organisms. In contrast to
SimpleGeneDuplication, this class has to be instantiated, using the aforementioned set of possibly orthologous organisms.Parameters: possiblyOrthologousOrganisms (Iterable[Organism] or Organism.Group) – Organisms which will be searched for the occurence of orthologs, i.e. are considered ancestoral. Variables: self.possiblyOrthologousOrganisms (Iterable[Organism]) – Raises: ValueError– If possiblyOrthologousOrganisms is of wrong type.Warning
This takes much longer than
SimpleGeneDuplication, because additionally, each found paralog is searched for an ortholog in all organisms of the other group. However, if you set returnMatches == False and ignoreDuplicatesOutsideSelf == False, the search is aborted with the very first ortholog, which is much faster than getting all orthologs. Because in this model even a single orthologous match is enough to prove gene duplication, we do not necessarily have to fully search all organisms.-
filterEnzymes(substanceEnzymeGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEnzymeGraph, eValue=1e-15, ignoreDuplicatesOutsideSelf=False) → FEV_KEGG.Graph.SubstanceGraphs.SubstanceEnzymeGraph[source]¶ Remove all enzymes from a graph which have not been gene-duplicated.
Parameters: - substanceEnzymeGraph (SubstanceEnzymeGraph) – Graph of enzymes to be checked for gene duplication.
- eValue (float, optional) – Threshold for the statistical expectation value (E-value), below which a sequence alignment is considered significant.
- ignoreDuplicatesOutsideSelf (bool, optional) – If True, count only such enzymes as gene duplicated, which have at least one of their duplicates inside the set of enzymes searched for duplicates. This can, for example, serve to exclude duplicates in secondary metabolism.
Returns: A copy of the substanceEnzymeGraph containing only enzymes which fulfil the conditions of this gene duplication definition.
Return type: Raises: ValueError– If any organism does not exist.HTTPError– If any gene does not exist.URLError– If connection to KEGG fails.
-
getEnzymePairs(enzymes: Set[FEV_KEGG.Graph.Elements.Enzyme], eValue=1e-15, ignoreDuplicatesOutsideSelf=False, geneIdToEnzyme=None) → Set[Tuple[FEV_KEGG.Graph.Elements.Enzyme, FEV_KEGG.Graph.Elements.Enzyme]][source]¶ Get gene-duplicated enzymes, in pairs of duplicates.
If enzyme A is a duplicate of enzyme B and vice versa, this does not return duplicates, but returns only one pair, with the “smaller” enzyme as the first value. An enzyme is “smaller” if its gene ID string is “smaller”.
Parameters: - enzymes (Set[Enzyme] or SubstanceEnzymeGraph) – Set of enzymes to be checked for gene duplication, or a graph.
- eValue (float, optional) – Threshold for the statistical expectation value (E-value), below which a sequence alignment is considered significant.
- ignoreDuplicatesOutsideSelf (bool, optional) – If True, count only such enzymes as gene duplicated, which have at least one of their duplicates inside the set of enzymes searched for duplicates. This can, for example, serve to exclude duplicates in secondary metabolism.
- geneIdToEnzyme (Dict[GeneID, Enzyme], optional) – Dictionary for mapping each gene ID of every found duplicate to an enzyme object. If None, gets the enzyme from the database. This avoids the KeyError, but can cause a lot of network load.
Returns: Set of pairs of gene-duplicated enzymes, realised as tuples. The order is arbitrary.
Return type: Raises: KeyError– If geneIdToEnzyme is passed, but does not contain the gene ID of every duplicate.
-
getEnzymes(enzymes: Set[FEV_KEGG.Graph.Elements.Enzyme], eValue=1e-15, returnMatches=False, ignoreDuplicatesOutsideSelf=False)[source]¶ Get gene-duplicated enzymes.
Parameters: - enzymes (Set[Enzyme] or SubstanceEnzymeGraph) – Set of enzymes to be checked for gene duplication, or a graph.
- eValue (float, optional) – Threshold for the statistical expectation value (E-value), below which a sequence alignment is considered significant.
- returnMatches (bool, optional) – If True, return not only enzymes that have homologs, but also which homologs they have. Useful for filtering for relevant homologs afterwards.
- ignoreDuplicatesOutsideSelf (bool, optional) – If True, count only such enzymes as gene duplicated, which have at least one of their duplicates inside the set of enzymes searched for duplicates. This can, for example, serve to exclude duplicates in secondary metabolism.
Returns: If returnMatches == False, all enzymes in enzymes which fulfil the conditions of this gene duplication definition. If returnMatches == True, all enzymes in enzymes which fulfil the conditions of this gene duplication definition, pointing to a set of gene IDs of the found homologs.
Return type: Raises: ValueError– If any organism does not exist.HTTPError– If any gene does not exist.URLError– If connection to KEGG fails.
-
class
FEV_KEGG.Evolution.Events.FunctionChange(ecA: FEV_KEGG.Graph.Elements.EcNumber, ecB: FEV_KEGG.Graph.Elements.EcNumber)[source]¶ Bases:
objectPossible evolutionary change of enzymatic function, from one EC number to another.
The direction of change, or if it really happened, can not be determined here! The order of EC numbers in this object is arbitrarily chosen to reflect their lexicographic order.
A function change resembles the possibility that the first EC number has evolutionarily changed into the second one. Or the other way around, since the direction of evolution can not be determined here. A function change can never have the same EC number twice, nor can the first be lexicographically “bigger” than the second, see the examples section.
Parameters: Raises: ValueError– If ecA is lexicographically “bigger” than, or equal to, ecB.-
classmethod
fromNeofunctionalisation(neofunctionalisation: FEV_KEGG.Evolution.Events.Neofunctionalisation) → Set[FEV_KEGG.Evolution.Events.FunctionChange][source]¶ Create combinations of function changes from a neofunctionalisation.
Parameters: neofunctionalisation (Neofunctionalisation) – Returns: Set of function changes which might have been caused by the neofunctionalisation. Since an enzyme of a neofunctionalisation can have multiple EC numbers, all combinations of the two enzymes’ EC numbers are formed and treated as separate possible function changes.
Return type: Set[FunctionChange] Examples
A: 1 B: 2 = (1, 2)
A: 1 B: 1, 2 = (1, 2)
A: 1, 2 B: 1, 3 = (1, 3) (1, 2) (2, 3)
A: 1, 2 B: 3, 4 = (1, 3) (1, 4) (2, 3) (2, 4)
A: 1, 2 B: 1, 2, 3 = (1, 3) (2, 3)
-
getDifferingEcLevels() → int[source]¶ Get the number of EC levels in which the EC numbers differ.
Returns: Number of differing EC levels between the two EC numbers, starting with the substrate-level. For example 1.2.3.4 and 1.2.3.7 returns 1, while 1.2.3.4 and 1.8.9.10 returns 3. However, wildcards do not match numbers: 1.2.3.4 and 1.2.3.- returns 1! Return type: int
-
classmethod
-
class
FEV_KEGG.Evolution.Events.GeneDuplication[source]¶ Bases:
objectAbstract class for any type of gene duplication.
-
class
FEV_KEGG.Evolution.Events.GeneFunctionAddition[source]¶ Bases:
objectEvolutionary event of adding a gene function (EC number) between a pair of arbitrary ancestor and descendant.
- The conditions for a gene function addition are:
- The EC number has been added, from an unknown origin, along the way from an older group of organisms to a newer one.
-
static
getECs(ancestorEcGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph, descendantEcGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph) → Set[FEV_KEGG.Graph.Elements.EcNumber][source]¶ Get EC numbers which have been added between ancestor and descendant, existing only in the descendant.
Parameters: - ancestorEcGraph (SubstanceEcGraph) –
- descendantEcGraph (SubstanceEcGraph) –
Returns: Set of EC numbers which occur in the descendant’s EC graph, but not in the ancestor’s, i.e. EC numbers which are new to the descendant.
Return type: Set[EcNumber]
-
static
getGraph(ancestorEcGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph, descendantEcGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph) → FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph[source]¶ Get graph containing EC numbers which have been added between ancestor and descendant, existing only in the descendant.
Parameters: - ancestorEcGraph (SubstanceEcGraph) –
- descendantEcGraph (SubstanceEcGraph) –
Returns: Graph of EC numbers which occur in the descendant’s EC graph, but not in the ancestor’s, i.e. EC numbers which are new in the descendant. Substance-EC-product edges are only included if both graphs, ancestor and descendant, have both nodes, substrate and product.
Return type:
-
class
FEV_KEGG.Evolution.Events.GeneFunctionConservation[source]¶ Bases:
objectEvolutionary event of conserving a gene function (EC number) between a pair of arbitrary ancestor and descendant.
- The conditions for a gene function conservation are:
- The EC number has been conserved, along the way from an older group of organisms to a newer one.
-
static
getECs(ancestorEcGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph, descendantEcGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph) → Set[FEV_KEGG.Graph.Elements.EcNumber][source]¶ Get EC numbers which have been conserved between ancestor and descendant, existing in both.
Parameters: - ancestorEcGraph (SubstanceEcGraph) –
- descendantEcGraph (SubstanceEcGraph) –
Returns: Set of EC numbers which occur in the ancestor’s EC graph and in the decendants’s, i.e. EC numbers which are conserved in the descendant.
Return type: Set[EcNumber]
-
static
getGraph(ancestorEcGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph, descendantEcGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph) → FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph[source]¶ Get graph containing EC numbers which have been conserved between ancestor and descendant, existing in both.
Parameters: - ancestorEcGraph (SubstanceEcGraph) –
- descendantEcGraph (SubstanceEcGraph) –
Returns: Graph of EC numbers which occur in the ancestor’s EC graph, and in the decendants’s, i.e. EC numbers which are conserved in the descendant. Substance-EC-product edges are only included if both graphs, ancestor and descendant, have both nodes, substrate and product.
Return type:
-
class
FEV_KEGG.Evolution.Events.GeneFunctionDivergence[source]¶ Bases:
objectEvolutionary event of diverging (adding or losing) a gene function (EC number) between a pair of arbitrary ancestor and descendant.
- The conditions for a gene function divergence are:
- The EC number exists in an older group of organisms, but not in a newer one, or the other way around.
-
static
getECs(ancestorEcGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph, descendantEcGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph) → Set[FEV_KEGG.Graph.Elements.EcNumber][source]¶ Get EC numbers which have diverged between ancestor and descendant, existing only in either one of them.
Obviously, ancestorEcGraph and descendantEcGraph can be swapped here without changing the result.
Parameters: - ancestorEcGraph (SubstanceEcGraph) –
- descendantEcGraph (SubstanceEcGraph) –
Returns: Set of EC numbers which occur in the ancestor’s EC graph, but not in the decendants’s and vice versa, i.e. EC numbers which only exist in either one of the organism groups.
Return type: Set[EcNumber]
-
static
getGraph(ancestorEcGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph, descendantEcGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph) → FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph[source]¶ Get graph containing EC numbers which have diverged between ancestor and descendant, existing only in either one of them.
Parameters: - ancestorEcGraph (SubstanceEcGraph) –
- descendantEcGraph (SubstanceEcGraph) –
Returns: Graph of EC numbers which occur in the ancestor’s EC graph, but not in the decendants’s and vice versa, i.e. EC numbers which only exist in either one of the organism groups. Substance-EC-product edges are only included if both graphs, ancestor and descendant, have both nodes, substrate and product.
Return type:
-
class
FEV_KEGG.Evolution.Events.GeneFunctionLoss[source]¶ Bases:
objectEvolutionary event of losing a gene function (EC number) between a pair of arbitrary ancestor and descendant.
- The conditions for a gene function loss are:
- The EC number has been lost, along the way from an older group of organisms to a newer one.
-
static
getECs(ancestorEcGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph, descendantEcGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph) → Set[FEV_KEGG.Graph.Elements.EcNumber][source]¶ Get EC numbers which have been lost between ancestor and descendant, existing only in the ancestor.
Parameters: - ancestorEcGraph (SubstanceEcGraph) –
- descendantEcGraph (SubstanceEcGraph) –
Returns: Set of EC numbers which occur in the ancestor’s EC graph, but not in the decendants’s, i.e. EC numbers which are lost to the descendant.
Return type: Set[EcNumber]
-
static
getGraph(ancestorEcGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph, descendantEcGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph) → FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph[source]¶ Get graph containing EC numbers which have been lost between ancestor and descendant, existing only in the ancestor.
Parameters: - ancestorEcGraph (SubstanceEcGraph) –
- descendantEcGraph (SubstanceEcGraph) –
Returns: Graph of EC numbers which occur in the ancestor’s EC graph, but not in the decendants’s, i.e. EC numbers which are lost in the descendant. Substance-EC-product edges are only included if both graphs, ancestor and descendant, have both nodes, substrate and product.
Return type:
-
class
FEV_KEGG.Evolution.Events.Neofunctionalisation(enzymeA: FEV_KEGG.Graph.Elements.Enzyme, enzymeB: FEV_KEGG.Graph.Elements.Enzyme)[source]¶ Bases:
objectEvolutionary event of Neofunctionalisation between a pair of enzymes.
- The conditions for a neofunctionalisation are:
- The enzyme’s gene has been duplicated, according to a certain class of GeneDuplication.
- The duplicated enzyme is associated with a different EC number than its duplicate.
The order of the two enzymes has no meaning, it has been arbitrarily chosen to reflect the lexicographic order of their associated EC numbers. The enzyme posessing the “smallest” EC number comes first. This absolute ordering prevents duplicate events, because without an order there would have always been a second event with the exact same enzymes, but in swapped positions, because neofunctionalisation has no direction here and is, thus, symmetric.
Parameters: Variables: self.enzymePair (Tuple[Enzyme, Enzyme]) – Tuple of the two enzymes, sorted by the lexicographic order of their “smallest” EC number.
Raises: ValueError– If the enzymes are equal, have the same set of EC numbers, or one has no EC number.-
getDifferingEcLevels() → int[source]¶ Get the maximum number of EC levels in which the enzymes’ EC numbers differ.
Returns: Number of differing EC levels between the two enzymes’ EC numbers, starting with the substrate-level. If an enzyme has multiple EC numbers, returns the biggest difference. For example 1.2.3.4 and 1.2.3.7 returns 1, while 1.2.3.4 and 1.8.9.10 returns 3. However, wildcards do not match numbers: 1.2.3.4 and 1.2.3.- returns 1! Return type: int
-
getEcNumbers() → Tuple[Set[FEV_KEGG.Graph.Elements.EcNumber], Set[FEV_KEGG.Graph.Elements.EcNumber]][source]¶ Get the enzymes’ EC numbers.
Returns: Same order as in getEnzymes(). Because an enzyme could have multiple EC numbers, they are given as sets.Return type: Tuple[Set[EcNumber], Set[EcNumber]]
-
getEnzymes() → Tuple[FEV_KEGG.Graph.Elements.Enzyme, FEV_KEGG.Graph.Elements.Enzyme][source]¶ Get the pair of enzymes.
Returns: Return type: Tuple[Enzyme, Enzyme]
-
class
FEV_KEGG.Evolution.Events.NeofunctionalisedECs(neofunctionalisedEnzymes: FEV_KEGG.Evolution.Events.NeofunctionalisedEnzymes)[source]¶ Bases:
objectEC numbers which are affected by neofunctionalisation events.
Parameters: neofunctionalisedEnzymes (NeofunctionalisedEnzymes) – Neofunctionalisation events among certain enzymes. -
colourGraph(ecGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph, colour: FEV_KEGG.Drawing.Export.Colour = <Colour.GREEN: '#55FF55'>, minimumEcDifference: int = None, minimumOrganismsCount: int = None) → FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph[source]¶ Colour EC graph’s “neofunctionalised” EC number edges.
Parameters: - minimumEcDifference (int, optional) – May only be one of [1, 2, 3, 4]. If None or 1, all neofunctionalisations are returned. If > 1, return only neofunctionalisations in which the EC numbers differ in more than the minimumEcDifference lowest levels. They then describe a different reaction, instead of only a different substrate. For example, minimumEcDifference == 2 means that 1.2.3.4/1.2.3.5 is not reported, while 1.2.3.4/1.2.5.6 is.
- minimumOrganismsCount (int, optional) – Minimum number of organisms which have to be involved in the neofunctionalisations of each EC number. If None, there is no filtering due to organism involvement. This sums the occurences of organisms across function changes, for each EC number the function changes overlap with. Hence, it is much less likely that a neofunctionalisation is filtered, compared to filtering per function change. For example, the function change 1->2 is associated with two neofunctionalisations ‘eco:12345’->’eco:69875’ and ‘obc:76535’->’abc:41356’, this involves three organisms in total (eco, obc, abc). Also, the function change 1->3 involves two organisms (‘eco:53235’->’iuf:34587’). If minimumOrganismsCount == 4, neither 1->2, nor 1->3 are reported. However, if we look at single EC numbers, 1 is involved in function changes affecting four organisms (eco, obc, abc, iuf). Thus, 1 would be reported here, but neither 2 nor 3.
Returns: A copy of ecGraph in which edges with “neofunctionalised” ECs as key have an additional colour attribute, see
FEV_KEGG.Drawing.Export.addColourAttribute().Return type:
-
filterGraph(ecGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph, minimumEcDifference: int = None, minimumOrganismsCount: int = None) → FEV_KEGG.Graph.SubstanceGraphs.SubstanceEcGraph[source]¶ Filter EC graph to only contain “neofunctionalised” EC numbers.
Parameters: - minimumEcDifference (int, optional) – May only be one of [1, 2, 3, 4]. If None or 1, all neofunctionalisations are returned. If > 1, return only neofunctionalisations in which the EC numbers differ in more than the minimumEcDifference lowest levels. They then describe a different reaction, instead of only a different substrate. For example, minimumEcDifference == 2 means that 1.2.3.4/1.2.3.5 is not reported, while 1.2.3.4/1.2.5.6 is.
- minimumOrganismsCount (int, optional) – Minimum number of organisms which have to be involved in the neofunctionalisations of each EC number. If None, there is no filtering due to organism involvement. This sums the occurences of organisms across function changes, for each EC number the function changes overlap with. Hence, it is much less likely that a neofunctionalisation is filtered, compared to filtering per function change. For example, the function change 1->2 is associated with two neofunctionalisations ‘eco:12345’->’eco:69875’ and ‘obc:76535’->’abc:41356’, this involves three organisms in total (eco, obc, abc). Also, the function change 1->3 involves two organisms (‘eco:53235’->’iuf:34587’). If minimumOrganismsCount == 4, neither 1->2, nor 1->3 are reported. However, if we look at single EC numbers, 1 is involved in function changes affecting four organisms (eco, obc, abc, iuf). Thus, 1 would be reported here, but neither 2 nor 3.
Returns: A copy of ecGraph, leaving only edges with a “neofunctionalised” EC as key.
Return type:
-
getECs(minimumEcDifference: int = None, minimumOrganismsCount: int = None) → Set[FEV_KEGG.Graph.Elements.EcNumber][source]¶ Get EC numbers participating in the change of function due to neofunctionalisations.
They could also be called “neofunctionalised” EC numbers.
Parameters: - minimumEcDifference (int, optional) – May only be one of [1, 2, 3, 4]. If None or 1, all neofunctionalisations are returned. If > 1, return only neofunctionalisations in which the EC numbers differ in more than the minimumEcDifference lowest levels. They then describe a different reaction, instead of only a different substrate. For example, minimumEcDifference == 2 means that 1.2.3.4/1.2.3.5 is not reported, while 1.2.3.4/1.2.5.6 is.
- minimumOrganismsCount (int, optional) – Minimum number of organisms which have to be involved in the neofunctionalisations of each EC number. If None, there is no filtering due to organism involvement. This sums the occurences of organisms across function changes, for each EC number the function changes overlap with. Hence, it is much less likely that a neofunctionalisation is filtered, compared to filtering per function change. For example, the function change 1->2 is associated with two neofunctionalisations ‘eco:12345’->’eco:69875’ and ‘obc:76535’->’abc:41356’, this involves three organisms in total (eco, obc, abc). Also, the function change 1->3 involves two organisms (‘eco:53235’->’iuf:34587’). If minimumOrganismsCount == 4, neither 1->2, nor 1->3 are reported. However, if we look at single EC numbers, 1 is involved in function changes affecting four organisms (eco, obc, abc, iuf). Thus, 1 would be reported here, but neither 2 nor 3.
Returns: Set of EC numbers which are part of function changes which possibly happened due to neofunctionalisations.
Return type: Set[EcNumber]
-
getEnzymesForEC(minimumEcDifference: int = None, minimumOrganismsCount: int = None) → Dict[FEV_KEGG.Graph.Elements.EcNumber, Set[FEV_KEGG.Graph.Elements.Enzyme]][source]¶ Get enzymes of neofunctionalisations, keyed by an EC number of a possible function change.
Parameters: - minimumEcDifference (int, optional) – May only be one of [1, 2, 3, 4]. If None or 1, all neofunctionalisations are returned. If > 1, return only neofunctionalisations in which the EC numbers differ in more than the minimumEcDifference lowest levels. They then describe a different reaction, instead of only a different substrate. For example, minimumEcDifference == 2 means that 1.2.3.4/1.2.3.5 is not reported, while 1.2.3.4/1.2.5.6 is.
- minimumOrganismsCount (int, optional) – Minimum number of organisms which have to be involved in the neofunctionalisations of each EC number. If None, there is no filtering due to organism involvement. This sums the occurences of organisms across function changes, for each EC number the function changes overlap with. Hence, it is much less likely that a neofunctionalisation is filtered, compared to filtering per function change. For example, the function change 1->2 is associated with two neofunctionalisations ‘eco:12345’->’eco:69875’ and ‘obc:76535’->’abc:41356’, this involves three organisms in total (eco, obc, abc). Also, the function change 1->3 involves two organisms (‘eco:53235’->’iuf:34587’). If minimumOrganismsCount == 4, neither 1->2, nor 1->3 are reported. However, if we look at single EC numbers, 1 is involved in function changes affecting four organisms (eco, obc, abc, iuf). Thus, 1 would be reported here, but neither 2 nor 3.
Returns: Dictionary of EC numbers, pointing to a set of enzymes involved in the neofunctionalisations which might have caused the function changes the EC number is part of. This can lead to many duplicated enzymes.
Return type:
-
getEnzymesForFunctionChange(minimumEcDifference: int = None, minimumOrganismsCount: int = None) → Dict[FEV_KEGG.Evolution.Events.FunctionChange, Set[FEV_KEGG.Graph.Elements.Enzyme]][source]¶ Get enzymes of neofunctionalisations, keyed by a possible change of function.
Parameters: - minimumEcDifference (int, optional) – May only be one of [1, 2, 3, 4]. If None or 1, all neofunctionalisations are returned. If > 1, return only neofunctionalisations in which the EC numbers differ in more than the minimumEcDifference lowest levels. They then describe a different reaction, instead of only a different substrate. For example, minimumEcDifference == 2 means that 1.2.3.4/1.2.3.5 is not reported, while 1.2.3.4/1.2.5.6 is.
- minimumOrganismsCount (int, optional) – Minimum number of organisms which have to be involved in the neofunctionalisations of each function change. If None, there is no filtering due to organism involvement. For example, the function change 1->2 is associated with two neofunctionalisations ‘eco:12345’->’eco:69875’ and ‘obc:76535’->’abc:41356’, this involves three organisms in total (eco, obc, abc), finally, if minimumOrganismsCount <= 3, the function change 1->2 is returned.
Returns: Dictionary of function changes, pointing to a set of enzymes involved in the neofunctionalisations which might have caused the function change. This can lead to many duplicated enzymes.
Return type: Dict[FunctionChange, Set[Enzyme]]
-
getFunctionChanges(minimumEcDifference: int = None, minimumOrganismsCount: int = None) → Set[FEV_KEGG.Evolution.Events.FunctionChange][source]¶ Get all possible changes of function between the two enzymes of every neofunctionalisation.
Parameters: - minimumEcDifference (int, optional) – May only be one of [1, 2, 3, 4]. If None or 1, all neofunctionalisations are returned. If > 1, return only neofunctionalisations in which the EC numbers differ in more than the minimumEcDifference lowest levels. They then describe a different reaction, instead of only a different substrate. For example, minimumEcDifference == 2 means that 1.2.3.4/1.2.3.5 is not reported, while 1.2.3.4/1.2.5.6 is.
- minimumOrganismsCount (int, optional) – Minimum number of organisms which have to be involved in the neofunctionalisations of each function change. If None, there is no filtering due to organism involvement.
Returns: Set of all function changes, which meet the criteria.
Return type: Set[FunctionChange]
-
getNeofunctionalisationsForEC(minimumEcDifference: int = None, minimumOrganismsCount: int = None) → Dict[FEV_KEGG.Graph.Elements.EcNumber, Set[FEV_KEGG.Evolution.Events.Neofunctionalisation]][source]¶ Get neofunctionalisation events, keyed by an EC number participating in the change of function between the two enzymes.
Parameters: - minimumEcDifference (int, optional) – May only be one of [1, 2, 3, 4]. If None or 1, all neofunctionalisations are returned. If > 1, return only neofunctionalisations in which the EC numbers differ in more than the minimumEcDifference lowest levels. They then describe a different reaction, instead of only a different substrate. For example, minimumEcDifference == 2 means that 1.2.3.4/1.2.3.5 is not reported, while 1.2.3.4/1.2.5.6 is.
- minimumOrganismsCount (int, optional) – Minimum number of organisms which have to be involved in the neofunctionalisations of each EC number. If None, there is no filtering due to organism involvement. This sums the occurences of organisms across function changes, for each EC number the function changes overlap with. Hence, it is much less likely that a neofunctionalisation is filtered, compared to filtering per function change. For example, the function change 1->2 is associated with two neofunctionalisations ‘eco:12345’->’eco:69875’ and ‘obc:76535’->’abc:41356’, this involves three organisms in total (eco, obc, abc). Also, the function change 1->3 involves two organisms (‘eco:53235’->’iuf:34587’). If minimumOrganismsCount == 4, neither 1->2, nor 1->3 are reported. However, if we look at single EC numbers, 1 is involved in function changes affecting four organisms (eco, obc, abc, iuf). Thus, 1 would be reported here, but neither 2 nor 3.
Returns: Dictionary of EC numbers which are part of function changes, pointing to a set of neofunctionalisations which might have caused them. Very likely has duplicated neofunctionalisations, because there are always at least two EC numbers involved in a neofunctionalisation.
Return type: Dict[EcNumber, Set[Neofunctionalisation]]
-
getNeofunctionalisationsForFunctionChange(minimumEcDifference: int = None, minimumOrganismsCount: int = None) → Dict[FEV_KEGG.Evolution.Events.FunctionChange, Set[FEV_KEGG.Evolution.Events.Neofunctionalisation]][source]¶ Get neofunctionalsation events, keyed by a change of function between the two enzymes.
Parameters: - minimumEcDifference (int, optional) – May only be one of [1, 2, 3, 4]. If None or 1, all neofunctionalisations are returned. If > 1, return only neofunctionalisations in which the EC numbers differ in more than the minimumEcDifference lowest levels. They then describe a different reaction, instead of only a different substrate. For example, minimumEcDifference == 2 means that 1.2.3.4/1.2.3.5 is not reported, while 1.2.3.4/1.2.5.6 is.
- minimumOrganismsCount (int, optional) – Minimum number of organisms which have to be involved in the neofunctionalisations of each function change. If None, there is no filtering due to organism involvement. For example, the function change 1->2 is associated with two neofunctionalisations ‘eco:12345’->’eco:69875’ and ‘obc:76535’->’abc:41356’, this involves three organisms in total (eco, obc, abc), finally, if minimumOrganismsCount <= 3, the function change 1->2 is returned.
Returns: Dictionary of function changes, pointing to a set of neofunctionalisations which might have caused them.
Since an enzyme of a neofunctionalisation can have multiple EC numbers, all combinations of the two enzymes’ EC numbers are formed and treated as separate possible function changes. The neofunctionalisation is then saved again for each function change, which obviously leads to duplicated neofunctionalisation objects.
Return type: Dict[FunctionChange, Set[Neofunctionalisation]]
Examples
A: 1 B: 2 = (1, 2)
A: 1 B: 1, 2 = (1, 2)
A: 1, 2 B: 1, 3 = (1, 3) (1, 2) (2, 3)
A: 1, 2 B: 3, 4 = (1, 3) (1, 4) (2, 3) (2, 4)
A: 1, 2 B: 1, 2, 3 = (1, 3) (2, 3)
-
-
class
FEV_KEGG.Evolution.Events.NeofunctionalisedEnzymes(enzymes: Set[FEV_KEGG.Graph.Elements.Enzyme], geneDuplicationModel: FEV_KEGG.Evolution.Events.GeneDuplication, eValue=1e-15, ignoreDuplicatesOutsideSet: bool = True)[source]¶ Bases:
objectNeofunctionalisation events among certain enzymes.
Parameters: - enzymes (Set[Enzyme]) – Enzymes among which to test for neofunctionalisation. Neofunctionalisations involving enzymes outside this set are not reported.
- geneDuplicationModel (GeneDuplication) – The model of gene duplication to use.
- eValue (float, optional) – Threshold for the statistical expectation value (E-value), below which a sequence alignment is considered significant.
- ignoreDuplicatesOutsideSet (bool, optional) – If True, any neofunctionalisation involving an enzyme outside the enzymes set is not reported. This helps to exclude secondary metabolism when examining core metabolism.
Raises: ValueError– If a gene duplication model is used which requires instantiation, but only its class was given.-
colourGraph(enzymeGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEnzymeGraph, colour: FEV_KEGG.Drawing.Export.Colour = <Colour.GREEN: '#55FF55'>, minimumEcDifference: int = None) → FEV_KEGG.Graph.SubstanceGraphs.SubstanceEnzymeGraph[source]¶ Colour enzyme graph’s neofunctionalised enzyme edges.
Parameters: - enzymeGraph (SubstanceEnzymeGraph) – The enzyme graph to colour.
- colour (Export.Colour, optional) – The colour to use for edges with neofunctionalised enzymes as key.
- minimumEcDifference (int, optional) – May only be one of [1, 2, 3, 4]. If None or 1, all neofunctionalisations are returned. If > 1, return only neofunctionalisations in which the EC numbers differ in more than the minimumEcDifference lowest levels. They then describe a different reaction, instead of only a different substrate. For example, minimumEcDifference == 2 means that 1.2.3.4/1.2.3.5 is not reported, while 1.2.3.4/1.2.5.6 is.
Returns: A copy of enzymeGraph in which edges with neofunctionalised enzymes as key have an additional colour attribute, see
FEV_KEGG.Drawing.Export.addColourAttribute().Return type:
-
filterGraph(enzymeGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEnzymeGraph, minimumEcDifference: int = None) → FEV_KEGG.Graph.SubstanceGraphs.SubstanceEnzymeGraph[source]¶ Filter enzyme graph to only contain neofunctionalised enzymes.
Parameters: - enzymeGraph (SubstanceEnzymeGraph) – The enzyme graph to filter.
- minimumEcDifference (int, optional) – May only be one of [1, 2, 3, 4]. If None or 1, all neofunctionalisations are returned. If > 1, return only neofunctionalisations in which the EC numbers differ in more than the minimumEcDifference lowest levels. They then describe a different reaction, instead of only a different substrate. For example, minimumEcDifference == 2 means that 1.2.3.4/1.2.3.5 is not reported, while 1.2.3.4/1.2.5.6 is.
Returns: A copy of enzymeGraph, leaving only edges with a neofunctionalised enzyme as key.
Return type:
-
getEnzymes(minimumEcDifference: int = None) → Set[FEV_KEGG.Graph.Elements.Enzyme][source]¶ Get all neofunctionalised enzymes.
Parameters: minimumEcDifference (int, optional) – May only be one of [1, 2, 3, 4]. If None or 1, all neofunctionalisations are returned. If > 1, return only neofunctionalisations in which the EC numbers differ in more than the minimumEcDifference lowest levels. They then describe a different reaction, instead of only a different substrate. For example, minimumEcDifference == 2 means that 1.2.3.4/1.2.3.5 is not reported, while 1.2.3.4/1.2.5.6 is. Returns: Set of all possibly neofunctionalised enzymes, regardless of the real direction of neofunctionalisation, which we can not determine here. Return type: Set[Enzyme]
-
getNeofunctionalisations(minimumEcDifference: int = None) → Set[FEV_KEGG.Evolution.Events.Neofunctionalisation][source]¶ Get neofunctionalisation events between two enzymes each.
Parameters: minimumEcDifference (int, optional) – May only be one of [1, 2, 3, 4]. If None or 1, all neofunctionalisations are returned. If > 1, return only neofunctionalisations in which the EC numbers differ in more than the minimumEcDifference lowest levels. They then describe a different reaction, instead of only a different substrate. For example, minimumEcDifference == 2 means that 1.2.3.4/1.2.3.5 is not reported, while 1.2.3.4/1.2.5.6 is. Returns: Set of possible neofunctionalisation events. Return type: Set[Neofunctionalisation]
-
class
FEV_KEGG.Evolution.Events.SimpleGeneDuplication[source]¶ Bases:
FEV_KEGG.Evolution.Events.GeneDuplicationEvolutionary event of duplicating a gene, regardless of ancestoral bonds.
- The conditions for a ‘simple’ gene duplication are:
- The gene has at least one paralog.
-
classmethod
filterEnzymes(substanceEnzymeGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEnzymeGraph, eValue=1e-15, ignoreDuplicatesOutsideSet=None, preCalculatedEnzymes=None) → FEV_KEGG.Graph.SubstanceGraphs.SubstanceEnzymeGraph[source]¶ Remove all enzymes from a graph which have not been gene-duplicated.
Parameters: - substanceEnzymeGraph (SubstanceEnzymeGraph) – Graph of enzymes to be checked for gene duplication.
- eValue (float, optional) – Threshold for the statistical expectation value (E-value), below which a sequence alignment is considered significant.
- ignoreDuplicatesOutsideSet (Set[GeneID] or True, optional) – If None, report all found duplicates. If True, automatically restrict to all enzymes in substanceEnzymeGraph. If a set, count only such enzymes as gene duplicated, which have at least one of their duplicates inside this set. Beware, the set has to contain the enzymes’ gene ID! This can, for example, serve to exclude duplicates in secondary metabolism.
Returns: A copy of the substanceEnzymeGraph containing only enzymes which fulfil the conditions of this gene duplication definition.
Return type: Raises: ValueError– If any organism does not exist.HTTPError– If any gene does not exist.URLError– If connection to KEGG fails.
-
classmethod
getEnzymePairs(enzymes: Set[FEV_KEGG.Graph.Elements.Enzyme], eValue=1e-15, ignoreDuplicatesOutsideSet=None, geneIdToEnzyme=None, preCalculatedEnzymes=None) → Set[Tuple[FEV_KEGG.Graph.Elements.Enzyme, FEV_KEGG.Graph.Elements.Enzyme]][source]¶ Get gene-duplicated enzymes, in pairs of duplicates.
If enzyme A is a duplicate of enzyme B and vice versa, this does not return duplicates, but returns only one pair, with the “smaller” enzyme as the first value. An enzyme is “smaller” if its gene ID string is “smaller”.
Parameters: - enzymes (Set[Enzyme] or SubstanceEnzymeGraph) – Set of enzymes to be checked for gene duplication, or a graph.
- eValue (float, optional) – Threshold for the statistical expectation value (E-value), below which a sequence alignment is considered significant.
- ignoreDuplicatesOutsideSet (Set[GeneID] or True, optional) – If None, report all found duplicates. If True, automatically restrict to all enzymes in substanceEnzymeGraph. If a set, count only such enzymes as gene duplicated, which have at least one of their duplicates inside this set. Beware, the set has to contain the enzymes’ gene ID! This can, for example, serve to exclude duplicates in secondary metabolism.
- geneIdToEnzyme (Dict[GeneID, Enzyme], optional) – Dictionary for mapping each gene ID of every found duplicate to an enzyme object. If None, gets the enzyme from the database. This avoids the KeyError, but can cause a lot of network load.
Returns: Set of pairs of gene-duplicated enzymes, realised as tuples. The order is arbitrary.
Return type: Raises: KeyError– If geneIdToEnzyme is passed, but does not contain the gene ID of every duplicate.
-
static
getEnzymes(enzymes: Set[FEV_KEGG.Graph.Elements.Enzyme], eValue=1e-15, returnMatches=False, ignoreDuplicatesOutsideSet=None, preCalculatedEnzymes=None)[source]¶ Get gene-duplicated enzymes.
Parameters: - enzymes (Set[Enzyme] or SubstanceEnzymeGraph) – Set of enzymes to be checked for gene duplication, or a graph.
- eValue (float, optional) – Threshold for the statistical expectation value (E-value), below which a sequence alignment is considered significant.
- returnMatches (bool, optional) – If True, return not only enzymes that have homologs, but also which homologs they have. Useful for filtering for relevant homologs afterwards.
- ignoreDuplicatesOutsideSet (Set[GeneID] or True, optional) – If None, report all found duplicates. If True, automatically restrict to all enzymes in substanceEnzymeGraph. If a set, count only such enzymes as gene duplicated, which have at least one of their duplicates inside this set. Beware, the set has to contain the enzymes’ gene ID! This can, for example, serve to exclude duplicates in secondary metabolism.
Returns: If returnMatches == False, all enzymes in enzymes which fulfil the conditions of this gene duplication definition. If returnMatches == True, all enzymes in enzymes which fulfil the conditions of this gene duplication definition, pointing to a set of gene IDs of the found homologs.
Return type: Raises: ValueError– If any organism does not exist.HTTPError– If any gene does not exist.URLError– If connection to KEGG fails.
-
class
FEV_KEGG.Evolution.Events.SimpleGroupGeneDuplication(sameGroupOrganisms: Iterable[Organism] or KEGG.Organism.Group)[source]¶ Bases:
FEV_KEGG.Evolution.Events.GeneDuplicationEvolutionary event of duplicating a gene, regardless of ancestoral bonds in the comparison group.
- The conditions for a ‘simple group’ gene duplication are:
- The gene has at least one homolog within the set of organisms its organism belongs to.
Simple group gene duplication extends simple gene duplication by expanding the term ‘paralog’ to every organism in the set of organisms the gene’s organism blongs to. In contrast to
SimpleGeneDuplication, this class has to be instantiated, using the aforementioned set of organisms belonging to each other. This would usually be aFEV_KEGG.KEGG.Organism.Groupof the sameFEV_KEGG.Evolution.Clade.Clade.Parameters: sameGroupOrganisms (Iterable[Organism] or Organism.Group) – Organisms which will be searched for the occurence of homologs, i.e. are considered “semi-paralogously” related. Variables: self.sameGroupOrganisms (Iterable[Organism]) – Raises: ValueError– If sameGroupOrganisms is of wrong type.Warning
This takes much longer than
SimpleGeneDuplication, because each sought gene is compared between all organisms of the same group, not only within its own organism.-
filterEnzymes(substanceEnzymeGraph: FEV_KEGG.Graph.SubstanceGraphs.SubstanceEnzymeGraph, eValue=1e-15) → FEV_KEGG.Graph.SubstanceGraphs.SubstanceEnzymeGraph[source]¶ Remove all enzymes from a graph which have not been gene-duplicated.
Parameters: - substanceEnzymeGraph (SubstanceEnzymeGraph) – Graph of enzymes to be checked for gene duplication.
- eValue (float, optional) – Threshold for the statistical expectation value (E-value), below which a sequence alignment is considered significant.
Returns: A copy of the substanceEnzymeGraph containing only enzymes which fulfil the conditions of this gene duplication definition.
Return type: Raises: ValueError– If any organism does not exist.HTTPError– If any gene does not exist.URLError– If connection to KEGG fails.
-
getEnzymePairs(enzymes: Set[FEV_KEGG.Graph.Elements.Enzyme], eValue=1e-15, ignoreDuplicatesOutsideSet=None, geneIdToEnzyme=None) → Set[Tuple[FEV_KEGG.Graph.Elements.Enzyme, FEV_KEGG.Graph.Elements.Enzyme]][source]¶ Get gene-duplicated enzymes, in pairs of duplicates.
If enzymeA is a duplicate of enzymeB and vice versa, this returns symmetric duplicates of the form (enzymeA, enzymeB) and (enzymeB, enzymeA).
Parameters: - enzymes (Set[Enzyme] or SubstanceEnzymeGraph) – Set of enzymes to be checked for gene duplication, or a graph.
- eValue (float, optional) – Threshold for the statistical expectation value (E-value), below which a sequence alignment is considered significant.
- ignoreDuplicatesOutsideSet (Set[GeneID], optional) – If None, report all found duplicates. If not None, count only such enzymes as gene duplicated, which have at least one of their duplicates inside this set. This can, for example, serve to exclude duplicates in secondary metabolism.
- geneIdToEnzyme (Dict[GeneID, Enzyme], optional) – Dictionary for mapping each gene ID of every found duplicate to an enzyme object. If None, gets the enzyme from the database. This avoids the KeyError, but can cause a lot of network load.
Returns: Set of pairs of gene-duplicated enzymes, realised as tuples. The order is arbitrary and there will almost certainly be 100% duplicates.
Return type: Raises: KeyError– If geneIdToEnzyme is passed, but does not contain the gene ID of every duplicate.
-
getEnzymes(enzymes: Set[FEV_KEGG.Graph.Elements.Enzyme], eValue=1e-15, returnMatches=False, ignoreDuplicatesOutsideSet=None)[source]¶ Get gene-duplicated enzymes.
Parameters: - enzymes (Set[Enzyme] or SubstanceEnzymeGraph) – Set of enzymes to be checked for gene duplication, or a graph.
- eValue (float, optional) – Threshold for the statistical expectation value (E-value), below which a sequence alignment is considered significant.
- returnMatches (bool, optional) – If True, return not only enzymes that have homologs, but also which homologs they have. Useful for filtering for relevant homologs afterwards.
- ignoreDuplicatesOutsideSet (Set[GeneID], optional) – If None, report all found duplicates. If not None, count only such enzymes as gene duplicated, which have at least one of their duplicates inside this set. This can, for example, serve to exclude duplicates in secondary metabolism.
Returns: If returnMatches == False, all enzymes in enzymes which fulfil the conditions of this gene duplication definition. If returnMatches == True, all enzymes in enzymes which fulfil the conditions of this gene duplication definition, pointing to a set of gene IDs of the found homologs.
Return type: Raises: ValueError– If any organism does not exist.HTTPError– If any gene does not exist.URLError– If connection to KEGG fails.
-
FEV_KEGG.Evolution.Events.defaultEValue= 1e-15¶ Default threshold for the statistical expectation value (E-value), below which a sequence alignment is considered significant. This can be overridden in each relevant method’s eValue parameter in this module.