FEV_KEGG.Evolution.Taxonomy module¶
-
class
FEV_KEGG.Evolution.Taxonomy.KEGG(rawLines, isNCBI)[source]¶ Bases:
FEV_KEGG.Evolution.Taxonomy.TaxonomyThe taxonomy of organisms in KEGG, following KEGG’s own scheme: http://www.kegg.jp/kegg-bin/get_htext?br08601.keg
Generic taxonomy of organisms in KEGG.
Parameters: - rawLines (List[str]) – List of lines making up the raw data of a known taxonomy, either NCBI or KEGG.
- isNCBI (bool) – If True, rawLines is parsed as NCBI taxonomy. If False, rawLines is parsed as KEGG taxonomy.
Variables: - self.indexOnAbbreviation (Dict[str,
anytree.node.node.Node]) – Index to find aanytree.node.node.Nodefor an organism abbreviation, with .type ==TaxonType.ORGANISM. - self.tree (
anytree.node.node.Node) – The root node of the taxonomy, with .type ==TaxonType.ROOT.
-
class
FEV_KEGG.Evolution.Taxonomy.NCBI(rawLines, isNCBI)[source]¶ Bases:
FEV_KEGG.Evolution.Taxonomy.TaxonomyThe taxonomy of organisms in KEGG, following the NCBI scheme: http://www.kegg.jp/kegg-bin/get_htext?br08610.keg
Generic taxonomy of organisms in KEGG.
Parameters: - rawLines (List[str]) – List of lines making up the raw data of a known taxonomy, either NCBI or KEGG.
- isNCBI (bool) – If True, rawLines is parsed as NCBI taxonomy. If False, rawLines is parsed as KEGG taxonomy.
Variables: - self.indexOnAbbreviation (Dict[str,
anytree.node.node.Node]) – Index to find aanytree.node.node.Nodefor an organism abbreviation, with .type ==TaxonType.ORGANISM. - self.tree (
anytree.node.node.Node) – The root node of the taxonomy, with .type ==TaxonType.ROOT.
-
class
FEV_KEGG.Evolution.Taxonomy.TaxonType[source]¶ Bases:
enum.EnumType of a taxon.
-
ORGANISM= 1¶ Organism taxon, i.e. a leaf with a unique sequenced genome.
-
OTHER= 3¶ Other taxon, i.e. any other taxonomic rank in between.
-
ROOT= 0¶ Root taxon, i.e. ‘/’.
-
SPECIES= 2¶ Species taxon, e.g. ‘Escherichia Coli’.
-
-
class
FEV_KEGG.Evolution.Taxonomy.Taxonomy(rawLines, isNCBI)[source]¶ Bases:
objectGeneric taxonomy of organisms in KEGG.
Parameters: - rawLines (List[str]) – List of lines making up the raw data of a known taxonomy, either NCBI or KEGG.
- isNCBI (bool) – If True, rawLines is parsed as NCBI taxonomy. If False, rawLines is parsed as KEGG taxonomy.
Variables: - self.indexOnAbbreviation (Dict[str,
anytree.node.node.Node]) – Index to find aanytree.node.node.Nodefor an organism abbreviation, with .type ==TaxonType.ORGANISM. - self.tree (
anytree.node.node.Node) – The root node of the taxonomy, with .type ==TaxonType.ROOT.
-
getOrganismAbbreviations(nodes: Iterable[anytree.node.node.Node]) → List[str][source]¶ Get abbreviations of organisms for organism taxon nodes.
Parameters: nodes (List[ anytree.node.node.Node]) – List of organism taxon nodes. These nodes are not traversed to find child nodes!Returns: List of organism abbreviations from the nodes passed. None if no TaxonType.ORGANISMnode was passed.Return type: List[str]
-
getOrganismAbbreviationsByName(name: Escherichia, oneOrganismPerSpecies=True) → List[str][source]¶ Get abbreviations of organisms by a part of their name.
Parameters: - name (str) – Part of the name of the desired organism taxons. The name may be abbreviated, i.e. ‘Escherichia’ will match ‘Escherichia Coli K-12 MG1655’.
- oneOrganismPerSpecies (bool, optional) – If True, return only the first organism node of each species node.
Returns: List of organism abbreviations containing name in their name attribute. None if none found.
Return type: List[str]
-
getOrganismAbbreviationsByPath(path: Gammaproteobacteria/Enterobacterales, exceptPaths: List[Gammaproteobacteria/unclassified] = None, oneOrganismPerSpecies=True) → List[str][source]¶ Get abbreviations of organisms by a part of their path.
Parameters: - path (str) – Part of the path of the desired organism taxons. The parts of the path specified here have to match the wording of the path nodes exactly, i.e. ‘Enterobac’ will not match ‘Enterobacterales’.
- exceptPaths (Iterable[str] or str) – Paths which match any of these will not be returned. Accepts iterables of exceptions or a single string exception.
- oneOrganismPerSpecies (bool, optional) – If True, return only the first organism node of each species node.
Returns: List of organism abbreviations from the organism taxon nodes found at the end of path. None if no path leading to an
TaxonType.ORGANISMnode was passed.Return type: List[str]
-
getOrganismNodeByAbbreviation(abbreviation: eco) → anytree.node.node.Node[source]¶ Get node for an organism by its abbreviation.
Parameters: abbreviation (str) – Abbreviation of the organism in KEGG. Returns: Node of .type == TaxonType.ORGANISMwith .abbreviation == abbreviation. None if none can be found.Return type: anytree.node.node.Node
-
getOrganismNodesByName(name: Escherichia, oneOrganismPerSpecies=True) → List[anytree.node.node.Node][source]¶ Get nodes for organisms by a part of their name.
Parameters: - name (str) – Part of the name of the desired organism taxons. This does not search parts of the path! The name may be abbreviated, i.e. ‘Escherichia’ will match ‘Escherichia Coli K-12 MG1655’.
- oneOrganismPerSpecies (bool, optional) – If True, return only the first organism node of each species node.
Returns: List of organism nodes containing name in their name attribute. None if none found.
Return type: List[
anytree.node.node.Node]
-
getOrganismNodesByPath(path: Gammaproteobacteria/Enterobacterales, exceptPaths: List[Gammaproteobacteria/unclassified] = None, oneOrganismPerSpecies=True) → List[anytree.node.node.Node][source]¶ Get nodes for organisms by a part of their path.
Parameters: - path (str) – Part of the path of the desired organism taxons. The parts of the path specified here have to match the wording of the path nodes exactly, i.e. ‘Enterobac’ will not match ‘Enterobacterales’.
- exceptPaths (Iterable[str] or str) – Paths which match any of these will not be returned. Accepts iterables of exceptions or a single string exception.
- oneOrganismPerSpecies (bool, optional) – If True, return only the first organism node of each species node.
Returns: List of organism nodes containing path in their path. None if none found.
Return type: List[
anytree.node.node.Node]
-
static
nodePath2String(node: anytree.node.node.Node) → str[source]¶ Parameters: node ( anytree.node.node.Node) – Node which’ path to be expressed as a string.Returns: Full path of node, expressed as string. Each taxon level is delimited by a slash (‘/’). Return type: str
-
searchNodesByName(name: Escherichia, taxonType: FEV_KEGG.Evolution.Taxonomy.TaxonType = None) → List[anytree.node.node.Node][source]¶ Search taxons of a certain type by their name.
Parameters: - name (str) – Name of the taxon to be found. The name may be abbreviated, i.e. ‘Escherichia’ will match ‘Escherichia Coli K-12 MG1655’.
- taxonType (TaxonType, optional) – Type of the taxons to be searched. Taxons of any other type are ignored. If None, all taxon types are searched.
Returns: All Nodes containing name in their name attribute. None if none can be found. Only taxons of
TaxonTypetaxonType are returned. If None, all taxon types are considered.Return type: List[
anytree.node.node.Node]
-
searchNodesByPath(path: Gammaproteobacteria/Enterobacterales, taxonType: FEV_KEGG.Evolution.Taxonomy.TaxonType = None, exceptPaths: list of "Gammaproteobacteria/unclassified Bacteria" etc. = None) → List[anytree.node.node.Node][source]¶ Search taxons of a certain type by their path, allowing exceptions.
Parameters: - path (str) – Part of the path of the desired organism taxons. The parts of the path specified here have to match the wording of the path nodes exactly, i.e. ‘Enterobac’ will not match ‘Enterobacterales’.
- taxonType (TaxonType, optional) – Type of the taxons to be searched. Taxons of any other type are ignored. If None, all taxon types are searched.
- exceptPaths (Iterable[str] or str) – Paths which match any of these will not be returned. Accepts iterables of exceptions or a single string exception.
Returns: All nodes containing path in their path. None if none can be found. Each path element has to be delimited by a slash (‘/’). Each path element has to match the name of the intermediate taxon exactly, i.e. ‘Enterobac’ will not match ‘Enterobacterales’.
Return type: List[
anytree.node.node.Node]