FEV_KEGG.KEGG.Database module¶
-
exception
FEV_KEGG.KEGG.Database.GeneDoesNotExistError[source]¶ Bases:
ValueErrorRaised if trying to download a certain gene that does not exist.
-
exception
FEV_KEGG.KEGG.Database.ImpossiblyOrthologousError[source]¶ Bases:
ValueErrorRaised if trying to find orthologs in an organism using a GeneID from the very same organism.
-
exception
FEV_KEGG.KEGG.Database.NoKnownPathwaysError[source]¶ Bases:
ValueErrorRaised if an organism has no known pathways and is therefore rather useless.
-
FEV_KEGG.KEGG.Database._filterHomologsBySignificanceBulk(matchings: Dict[FEV_KEGG.Graph.Elements.GeneID, FEV_KEGG.KEGG.SSDB.Matching], eValue, onlyGeneID=False)[source]¶ Filter sequence alignments by statistical significance.
Parameters: - matchings (Dict[GeneID, SSDB.Matching]) – Dictionary of a homolog matching, including homologous gene IDs and statistical data, keyed by the gene ID used to search for homologs.
- eValue (float) – Statistical expectation value (E-value), below which a sequence alignment is considered significant.
- onlyGeneID (bool, optional) – If True, return only the set of homologous gene IDs, not the whole matching including statistical data.
Returns: matchings reduced to the significant sequence alignments, with an E-value below eValue. If onlyGeneID == True, matchings is further reduced to only contain the homologous gene IDs, not the complete matching.
Return type: Dict[GeneID, SSDB.Matching] or Dict[GeneID, Set[GeneID]]
-
FEV_KEGG.KEGG.Database.doesOrganismExist(organismAbbreviation: eco) → bool[source]¶ Check whether an organism exists.
Parameters: organismAbbreviation (str) – The abbreviation of the organism to check. Returns: True, if something was downloaded, and thus the organism exists. False, if the download was empty (400 Bad Request), because this organism does not exist. Return type: bool Raises: URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.doesOrganismExistBulk(organismAbbreviations: List[str]) → List[str][source]¶ Check whether multiple organisms exist.
This is done in parallel in a thread pool, see
FEV_KEGG.settings.downloadThreads.Parameters: organismAbbreviations (List[str]) – The abbreviations of the organisms to check. Returns: List of organism abbreviations, taken from organismAbbreviations for which doesOrganismExist()would return True.Return type: List[str] Raises: URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getEcEnzymeBulk(ecNumbers: Iterable[FEV_KEGG.Graph.Elements.EcNumber]) → Dict[str, FEV_KEGG.KEGG.DataTypes.EcEnzyme][source]¶ Get multiple enzyme descriptions, defined by its EC number.
Downloads the data from KEGG in bulk, if not already present on disk. This is done in parallel in a thread pool, see
FEV_KEGG.settings.downloadThreads.Parameters: ecNumbers (Iterable[EcNumber]) – Enzymes to be downloaded.
Returns: Each found enzyme, keyed by the unique ID of the EC number used to search it.
Return type: Dict[str, EcEnzyme]
Raises: IOError– If result is too small. Possibly because none of the genes of a download-chunk existed.URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getEnzymeEcNumbers(enzymeAbbreviation: MiaB) → List[str][source]¶ Get EC numbers of an enzyme for the enzyme’s abbreviation.
Also works for everything else in the description of an enzyme, not just the abbreviation.
Parameters: enzymeAbbreviation (str) – Part of the enzymes description string. Returns: All EC numbers, as strings, for a given enzyme, identified by its abbreviation, from KEGG. Or None if no EC numbers could be found. Return type: List[str] or None Raises: URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getGene(geneIdString: eco:b0004) → FEV_KEGG.KEGG.DataTypes.Gene[source]¶ Get certain gene.
Downloads the data from KEGG, if not already present on disk.
Parameters: geneIdString (str) – Unique ID of the gene to be downloaded, represented as a string, including organism abbreviation and gene name, e.g. ‘eco:b0004’.
Returns: Gene object.
Return type: Raises: HTTPError– If gene does not exist.URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getGeneBulk(geneIDs: Iterable[FEV_KEGG.Graph.Elements.GeneID]) → Dict[FEV_KEGG.Graph.Elements.GeneID, FEV_KEGG.KEGG.DataTypes.Gene][source]¶ Get multiple certain genes.
Downloads the data from KEGG in bulk, if not already present on disk. This is done in parallel in a thread pool, see
FEV_KEGG.settings.downloadThreads.Parameters: geneIDs (Iterable[GeneID]) – Unique IDs of the genes to be downloaded, represented as
FEV_KEGG.Graph.Elements.GeneIDobjects.Returns: Each found Gene object, keyed by the GeneID used to search it.
Return type: Raises: IOError– If result is too small. Possibly because none of the genes of a download-chunk existed.URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getOrganismInfo(organismAbbreviation: eco, checkExpiration=False) → str[source]¶ Get organism info.
Parameters: - organismAbbreviation (str) – The abbreviation of the organism.
- checkExpiration (bool, optional) – If True, check whether the last download of the organism info is older than
FEV_KEGG.settings.organismInfoExpiration. If yes, download it again. This can be useful when relying upon a current database size for calculating E-values for aFEV_KEGG.KEGG.SSDB.Match.
Returns: Raw organism info.
Return type: str
Raises: ValueError– If organism with organismAbbreviation does not exist.URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getOrganismList() → List[str][source]¶ Get list of all organisms known to KEGG.
Returns: All organism descriptions known to KEGG.
Return type: List[str]
Raises: URLError– If connection to KEGG fails.- Returns the list of all known organisms from KEGG.
-
FEV_KEGG.KEGG.Database.getOrthologOverviewsBulk(geneIDs: Iterable[FEV_KEGG.Graph.Elements.GeneID]) → Dict[FEV_KEGG.Graph.Elements.GeneID, FEV_KEGG.KEGG.SSDB.MatchingOverview][source]¶ Get best orthologous matches for genes in all organisms in bulk.
This is done in parallel in a thread pool, see
FEV_KEGG.settings.downloadThreads. If orthologs exist in a certain organism, you can usegetOrthologsBulk()in a seconds step, to get all orthologs in that organism, not only the best match. Filtering the amount of possibly orthologous organisms with this function before using the aforementioned function is much faster in total. But, using this function with only a single organism in mind is not.Parameters: geneIDs (Iterable[GeneID]) – Genes to use for searching orthologs.
Returns: A dictionary of a matching overview, containing the best
FEV_KEGG.KEGG.SSDB.Matchfor each possibly orthologous organism, using each gene ID from geneIDs, searching the genome of all organisms, keyed by the used gene ID. Matching overviews are downloaded from KEGG SSDB.Return type: Dict[GeneID, MatchingOverview]
Raises: ValueError– If any organism does not exist.URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getOrthologs(geneID: FEV_KEGG.Graph.Elements.GeneID, comparisonOrganism: Organism or str, eValue: float = 1e-15) → FEV_KEGG.KEGG.SSDB.Matching[source]¶ Get orthologs for a gene in a certain organism, including metadata.
Parameters: - geneID (GeneID) – Gene to use for searching orthologs.
- comparisonOrganism (Organism or str) – Organism to check for orthologs. May be an Organism object or an organism abbreviation string.
- eValue (float, optional) – Statistical expectation value (E-value), below which a sequence alignment is considered significant.
Returns: A matching of orthologs for gene geneID, searching the genome of comparisonOrganism. Only matches with an E-value smaller or equal to eValue are returned. Matches are downloaded from KEGG SSDB.
Return type: Raises: ImpossiblyOrthologousError– If geneID is from comparisonOrganism.ValueError– If any organism does not exist.URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getOrthologsBulk(geneIDs: Iterable[FEV_KEGG.Graph.Elements.GeneID], comparisonOrganism: Iterable[Organism] or Iterable[str] or Organism or str, eValue: float = 1e-15, ignoreImpossiblyOrthologous=False) → Dict[FEV_KEGG.Graph.Elements.GeneID, List[FEV_KEGG.KEGG.SSDB.Matching]][source]¶ Get orthologs for genes in a certain organism in bulk, including metadata.
This is done in parallel in a thread pool, see
FEV_KEGG.settings.downloadThreads.Parameters: - geneIDs (Iterable[GeneID]) – Genes to use for searching orthologs.
- comparisonOrganism (Iterable[Organism] or Iterable[str] or Organism or str) – Organism(s) to check for orthologs. May be an organism abbreviation string.
- eValue (float, optional) – Statistical expectation value (E-value), below which a sequence alignment is considered significant.
- ignoreImpossiblyOrthologous (bool, optional) – If True, ignore if a searched gene is from any comparisonOrganism. Simply do not search for this particular gene in its own organism, but in all others from comparisonOrganism.
Returns: A dictionary of a list of matchings of orthologous genes, using each gene ID from geneIDs, searching the genome of each comparisonOrganism, keyed by the used gene ID. Only matches with an E-value smaller or equal to eValue are returned. Matches are downloaded from KEGG SSDB.
Return type: Dict[GeneID, List[SSDB.Matching]]
Raises: ImpossiblyOrthologousError– If any gene ID in geneIDs is from comparisonOrganism. Unless ignoreImpossiblyOrthologous == True.ValueError– If any organism does not exist.URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getOrthologsOnlyGeneID(geneID: FEV_KEGG.Graph.Elements.GeneID, comparisonOrganism: Organism or str, eValue: float = 1e-15) → Set[FEV_KEGG.Graph.Elements.GeneID][source]¶ Get orthologs for a gene in a certain organism, without metadata.
Parameters: - geneID (GeneID) – Gene to use for searching orthologs.
- comparisonOrganism (Organism or str) – Organism to check for orthologs. May be an Organism object or an organism abbreviation string.
- eValue (float, optional) – Statistical expectation value (E-value), below which a sequence alignment is considered significant.
Returns: Set of orthologous genes, using geneID to search the genome of comparisonOrganism. Only matches with an E-value smaller or equal to eValue are returned. Matches are downloaded from KEGG SSDB.
Return type: Set[GeneID]
Raises: ImpossiblyOrthologousError– If geneID is from comparisonOrganism.ValueError– If any organism does not exist.URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getParalogs(geneID: FEV_KEGG.Graph.Elements.GeneID, eValue: float = 1e-15) → FEV_KEGG.KEGG.SSDB.Matching[source]¶ Get paralogs for a gene, including metadata.
Parameters: - geneID (GeneID) – Gene to use for searching paralogs.
- eValue (float, optional) – Statistical expectation value (E-value), below which a sequence alignment is considered significant.
Returns: A matching of paralogous genes, using geneID to search the genome of the same organism. Only matches with an E-value smaller or equal to eValue are returned. Matches are downloaded from KEGG SSDB.
Return type: Raises: ValueError– If any organism does not exist.URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getParalogsBulk(geneIDs: Iterable[FEV_KEGG.Graph.Elements.GeneID], eValue: float = 1e-15) → Dict[FEV_KEGG.Graph.Elements.GeneID, FEV_KEGG.KEGG.SSDB.Matching][source]¶ Get paralogs for genes in bulk, including metadata.
This is done in parallel in a thread pool, see
FEV_KEGG.settings.downloadThreads.Parameters: - geneIDs (Iterable[GeneID]) – Genes to use for searching paralogs.
- eValue (float, optional) – Statistical expectation value (E-value), below which a sequence alignment is considered significant.
Returns: A dictionary of matchings of paralogous genes, using each gene ID from geneIDs to search the genome of the same organism, keyed by the used gene ID. Only matches with an E-value smaller or equal to eValue are returned. Matches are downloaded from KEGG SSDB.
Return type: Dict[GeneID, SSDB.Matching]
Raises: ValueError– If any organism does not exist.URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getParalogsOnlyGeneID(geneID: FEV_KEGG.Graph.Elements.GeneID, eValue: float = 1e-15) → Set[FEV_KEGG.Graph.Elements.GeneID][source]¶ Get paralogs for a gene, without metadata.
Parameters: - geneID (GeneID) – Gene to use for searching paralogs.
- eValue (float, optional) – Statistical expectation value (E-value), below which a sequence alignment is considered significant.
Returns: Set of paralogous genes, using geneID to search the genome of the same organism. Only matches with an E-value smaller or equal to eValue are returned. Matches are downloaded from KEGG SSDB.
Return type: Set[GeneID]
Raises: ValueError– If any organism does not exist.URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getPathway(organismAbbreviation: eco, pathwayName: 00260) → FEV_KEGG.lib.Biopython.KEGG.KGML.KGML_pathway.Pathway[source]¶ Get certain pathway object of an organism.
Downloads the data from KEGG, if not already present on disk.
Parameters: - organismAbbreviation (str) – The organism for which to retrieve the pathway.
- pathwayName (str) – The code of the pathway, e.g. ‘00260’.
Returns: Pathway object. None if pathway does not exist.
Return type: Raises: URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getPathwayBulk(organismAbbreviation: eco, pathwayNames: Iterable[str]) → Dict[str, FEV_KEGG.lib.Biopython.KEGG.KGML.KGML_pathway.Pathway][source]¶ Get multiple pathway objects of an organism.
Downloads the data from KEGG in bulk, if not already present on disk. This is done in parallel in a thread pool, see
FEV_KEGG.settings.downloadThreads.Parameters: - organismAbbreviation (str) – The organism for which to retrieve the pathway.
- pathwayNames (Iterable[str]) – The codes of the pathways, e.g. [‘00260’, ‘00530’].
Returns: Pathway objects, keyed by their respective pathway name. A pathway object is None if the pathway does not exist.
Return type: Dict[str, KGML_pathway.Pathway]
Raises: URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getPathwayDescriptions(organismAbbreviation: eco) → Set[str][source]¶ Get full pathway descriptions for an organism.
Downloads the data from KEGG, if not already present on disk.
Parameters: organismAbbreviation (str) – The organism for which to retrieve all known pathways.
Returns: Set of pathway description lines for given organism.
Return type: Set[str]
Raises: NoKnownPathwaysError– If the organism has no known pathways.URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getPathwayGeneIDs(organismAbbreviation: eco, pathwayName: 00260) → Set[str][source]¶ Get all gene ID strings in an organism’s pathway, if previously saved.
Parameters: - organismAbbreviation (str) – The organism for which to retrieve the pathway.
- pathwayName (str) – The code of the pathway, e.g. ‘00260’.
Returns: Gene ID strings from a pathway, or None, if not previously saved on disk.
Return type: Set[str]
Note
This requires you to previously call
setPathwayGeneIDs()!
-
FEV_KEGG.KEGG.Database.getSubstanceBulk(substances: Iterable[FEV_KEGG.Graph.Elements.SubstanceID]) → Dict[str, FEV_KEGG.KEGG.DataTypes.Substance][source]¶ Get multiple substance descriptions.
Downloads the data from KEGG in bulk, if not already present on disk. This is done in parallel in a thread pool, see
FEV_KEGG.settings.downloadThreads.Parameters: substances (Iterable[SubstanceID]) – Substances to be downloaded.
Returns: Each found substance, keyed by the unique ID of the substance used to search it.
Return type: Dict[str, Substance]
Raises: IOError– If result is too small. Possibly because none of the genes of a download-chunk existed.URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getTaxonomyKEGG() → List[str][source]¶ Get KEGG taxonomy from KEGG BRITE.
Returns: Taxonomy of organisms in KEGG, in special text format, following KEGG’s own scheme, line by line. Return type: List[str] Raises: URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.getTaxonomyNCBI() → List[str][source]¶ Get NCBI taxonomy from KEGG BRITE.
Returns: Taxonomy of organisms in KEGG, in special text format, following the NCBI scheme, line by line. Return type: List[str] Raises: URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.hasOrthologsBulk(geneIDs: Iterable[FEV_KEGG.Graph.Elements.GeneID], comparisonOrganisms: Iterable[Organism] or Iterable[str], eValue: float = 1e-15) → Dict[FEV_KEGG.Graph.Elements.GeneID, List[str]][source]¶ Find out whether orthologs for genes exist in certain organisms in bulk.
This is done in parallel in a thread pool, see
FEV_KEGG.settings.downloadThreads. If orthologs exist in a certain organism, you can usegetOrthologsBulk()in a seconds step, to get all orthologs in that organism. Filtering the amount of possibly orthologous organisms with this function before using the aforementioned function is much faster in total. But, using this function with only a single comparisonOrganisms is not. If you want to find only the best matches in every organism, usegetOrthologOverviewsBulk()instead.Parameters: - geneIDs (Iterable[GeneID]) – Genes to use for searching orthologs.
- comparisonOrganisms (Iterable[Organism] or Iterable[str]) – Organisms to check for orthologs. May be an organism abbreviation string.
- eValue (float, optional) – Statistical expectation value (E-value), below which a sequence alignment is considered significant.
Returns: A dictionary of a list of organisms which have at least one orthologous gene, using each gene ID from geneIDs, searching the genome of each comparisonOrganisms, keyed by the used gene ID. Only organisms with matches with an E-value smaller or equal to eValue are returned. Matching overviews are downloaded from KEGG SSDB.
Return type: Dict[GeneID, List[str]]
Raises: ValueError– If any organism does not exist.URLError– If connection to KEGG fails.
-
FEV_KEGG.KEGG.Database.setPathwayGeneIDs(organismAbbreviation: eco, pathwayName: 00260, geneIDs: Set[str])[source]¶ Save all gene ID strings in an organism’s pathway.
Parameters: - organismAbbreviation (str) – The organism for which to retrieve the pathway.
- pathwayName (str) – The code of the pathway, e.g. ‘00260’.
- geneIDs (Set[str]) – Gene ID strings of the specified organism-specific pathway.