FEV_KEGG.KEGG.SSDB module

This module represents the model of all intermediate stages of matches acquired via KEGG SSDB database, partly in conjunction with KEGG GENE.

The methods to actually perform the retrieval are not part of this module. See FEV_KEGG.KEGG.Database and FEV_KEGG.KEGG.Download for these.

class FEV_KEGG.KEGG.SSDB.JSONpickable[source]

Bases: object

__str__()[source]

Encodes object to “unpickable” JSON.

Returns:Object in JSON format, including information to “unpickle” it back into an object.
Return type:str
class FEV_KEGG.KEGG.SSDB.Match(foundGeneIdString, swScore, bitScore, identity, overlap, length)[source]

Bases: FEV_KEGG.KEGG.SSDB.PreMatch

A sequence comparison match between two distinct genes, including calculated attributes.

During creation, foundGeneIdString is used to create a GeneID object, saved as foundGeneID

Parameters:
  • foundGeneIdString (str) – ID of the gene found by SSDB to be a paralog/ortholog, e.g. syn:sll1452.
  • swScore (int) – Smith-Waterman score of the match between the gene which was searched for and the found gene specified by foundGeneIdString.
  • bitScore (float) – Length-normalised swScore scaled to bits.
  • identity (float) – Percentage of equal amino acids, without substitution.
  • overlap (int) – Number of amino acids the found gene sequence overlaps with the gene which was searched for. Maximum is the length of the searched-for gene.
  • length (int) – Length in amino acids of the found gene. Derived from downloading the gene’s information file.
Variables:
  • self.foundGeneIdString (str) –
  • self.swScore (int) –
  • self.bitScore (float) –
  • self.identity (float) –
  • self.overlap (int) –
  • self.length (int) –
  • self.foundGeneID (FEV_KEGG.Graph.Elements.GeneID) –

See also

PreMatch
Handles all other parameters.
classmethod fromPreMatch(preMatch: FEV_KEGG.KEGG.SSDB.PreMatch, length)[source]

Cast a PreMatch object to an object of this class.

During casting, foundGeneIdString is used to create a GeneID object, stored as foundGeneID. Also, save is stored.

Parameters:
  • preMatch (PreMatch) – The object to cast into this class’ type.
  • length (int) – Length in amino acids of the found gene. Derived from downloading the gene’s information file.

Note

This class method simply casts the PreMatch object, instead of going through creating a new Match object. This helps performance and does not significantly impact complexity.

class FEV_KEGG.KEGG.SSDB.Matching(queryGeneID: FEV_KEGG.Graph.Elements.GeneID, queryLength, databaseOrganism, databaseSize, matches: Iterable[FEV_KEGG.KEGG.SSDB.Match], timestamp)[source]

Bases: FEV_KEGG.KEGG.SSDB.JSONpickable

Result of a search for orthologs or paralogs in SSDB, concerning a single target organism.

The E-values for the resulting Matches depend on database size and are therefore only valid at the specified timestamp.

Parameters:
  • queryGeneID (GeneID) – ID of the gene to search homologs for, e.g. “syn:sll1450”.
  • queryLength (int) – Length of the gene product in amino acids.
  • databaseOrganism (str) – Organism to search in to find homologs for queryGeneID, e.g. “eco”.
  • databaseSize (int) – Number of genes known to belong to the databaseOrganism. This can be queried by http://rest.kegg.jp/info/eco, currently yielding “4,498 entries”.
  • matches (Iterable[Match]) – Iterable of Match objects, one for each match found during the matching.
  • timestamp (int) – When was the query run? As UNIX epoch timestamp in seconds.
Variables:
class FEV_KEGG.KEGG.SSDB.MatchingOverview(queryGeneID: FEV_KEGG.Graph.Elements.GeneID, queryLength, bestMatches: Iterable[FEV_KEGG.KEGG.SSDB.Match], timestamp)[source]

Bases: FEV_KEGG.KEGG.SSDB.JSONpickable

Result of a search for orthologs in SSDB, concerning all possible target organisms.

Because all possible organisms are searched, only the best matches are returned. If you want all matches, you will have to use Matching in a second step. The E-values for the resulting Matches depend on database size and are therefore only valid at the specified timestamp.

Parameters:
  • queryGeneID (GeneID) – ID of the gene to search homologs for, e.g. “syn:sll1450”.
  • queryLength (int) – Length of the gene product in amino acids.
  • bestMatches (Iterable[Match]) – Iterable of best Match objects, one for each orthologous organism found during the matching overview.
  • timestamp (int) – When was the query run? As UNIX epoch timestamp in seconds.
Variables:
getTransientMatches(relevantOrganisms: Iterable[str]) → List[FEV_KEGG.KEGG.SSDB.TransientMatch][source]

Get full transient matches, considering only relevant orthologous organisms.

Considering only relevant organisms is necessary, because a gene can have several thousand orthologs, including ones from organisms completely out of scope, while calculating the E-value for each of those matches is rather slow and involves several downloads.

Parameters:relevantOrganisms (Iterable[str]) – Iterable of organism abbreviations, for each organism to be considered relevant.
Returns:List of transient matches. These include E-values, which are slow to calculate, which is why only relevantOrganisms are considered. This means that only matches found in self.bestMatches which come from relevant organisms are actually converted to transient matches.
Return type:List[TransientMatch]
class FEV_KEGG.KEGG.SSDB.PreMatch(foundGeneIdString, swScore, bitScore, identity, overlap)[source]

Bases: object

A sequence comparison match between two distinct genes, without calculated attributes.

The parameters can be retrieved via KEGG SSDB [1].

Parameters:
  • foundGeneIdString (str) – ID of the gene found by SSDB to be a paralog/ortholog, e.g. syn:sll1452.
  • swScore (int) – Smith-Waterman score of the match between the gene which was searched for and the found gene specified by foundGeneIdString.
  • bitScore (float) – Length-normalised swScore scaled to bits.
  • identity (float) – Percentage of equal amino acids, without substitution.
  • overlap (int) – Number of amino acids the found gene sequence overlaps with the gene which was searched for. Maximum is the length of the searched-for gene.
Variables:
  • self.foundGeneIdString (str) –
  • self.swScore (int) –
  • self.bitScore (float) –
  • self.identity (float) –
  • self.overlap (int) –

See also

FEV_KEGG.KEGG.Database.getOrthologs
Function to retrieve PreMatches from KEGG SSDB.

References

[1]Sato et al. (2001), “SSDB: Sequence Similarity Database in KEGG”, https://www.researchgate.net/publication/254718427_SSDB_Sequence_Similarity_Database_in_KEGG
class FEV_KEGG.KEGG.SSDB.TransientMatch(foundGeneIdString, swScore, bitScore, identity, overlap, length, eValue)[source]

Bases: FEV_KEGG.KEGG.SSDB.Match

A sequence comparison match between two distinct genes, only valid at a certain point in time.

This match is transient, because it is only valid for a certain point in time, because eValue changes with the size of the database.

Parameters:
  • foundGeneIdString (str) – ID of the gene found by SSDB to be a paralog/ortholog, e.g. syn:sll1452.
  • swScore (int) – Smith-Waterman score of the match between the gene which was searched for and the found gene specified by foundGeneIdString.
  • bitScore (float) – Length-normalised swScore scaled to bits.
  • identity (float) – Percentage of equal amino acids, without substitution.
  • overlap (int) – Number of amino acids the found gene sequence overlaps with the gene which was searched for. Maximum is the length of the searched-for gene.
  • length (int) – Length in amino acids of the found gene. Derived from downloading the gene’s information file.
  • eValue (float) – Statistical expectation value for the chance of yielding a match of the same score by pure randomness alone.
Variables:
  • self.foundGeneIdString (str) –
  • self.swScore (int) –
  • self.bitScore (float) –
  • self.identity (float) –
  • self.overlap (int) –
  • self.length (int) –
  • self.foundGeneID (FEV_KEGG.Graph.Elements.GeneID) –
  • self.eValue (float) –

See also

Match
Handles all other parameters.
classmethod fromMatch(match: FEV_KEGG.KEGG.SSDB.Match, eValue)[source]

Cast a Match object to an object of this class.

During casting, eValue is stored.

Note

This class method simply casts the Match object, instead of going through creating a new TransientMatch object. This helps performance and does not significantly impact complexity.