bridge.pipelines.gh2bt_for_meta.map_funcs package#
Public Interface#
This section documents user-facing interface of the bridge.pipelines.gh2bt_for_meta.map_funcs package (as defined in its __init__.py file).
Functions#
|
Map and reconcile GitHub documentation-related metadata to the bio.tools documentation field. |
|
Map and reconcile GitHub and bio.tools homepage URLs using the generic GitHub-over-bio.tools policy with URL canonicalization. |
|
Map and reconcile GitHub description metadata and bio.tools description metadata. |
|
Map and reconcile GitHub and bio.tools programming languages using the generic GitHub-over-bio.tools policy. |
|
Map and reconcile GitHub and bio.tools license annotations using the generic GitHub-over-bio.tools policy. |
|
Map GitHub repository signals to bio.tools maturity metadata. |
|
Map and reconcile GitHub release metadata and bio.tools version metadata. |
|
Map and reconcile GitHub CITATION.cff metadata and bio.tools publication entries. |
|
Map and reconcile GitHub and bio.tools names using the generic GitHub-over-bio.tools policy. |
|
Map and reconcile GitHub repository name to bio.tools ID. |
Individual mapping functions for GitHub to bio.tools.
- async bridge.pipelines.gh2bt_for_meta.map_funcs.map_biotools_id(gh_name, bt_id)[source]#
Map and reconcile GitHub repository name to bio.tools ID.
Policy: 1. If no GitHub repo name is available, preserve existing bio.tools ID (even if None). 2. If the existing bio.tools ID and GitHub repo name contain each other
(case-insensitive), preserve the existing bio.tools ID.
If they do not contain each other, log a conflict but continue.
If the GitHub repo name is not already used as a bio.tools ID, use it as the new bio.tools ID.
If the GitHub repo name is already used as a bio.tools ID, attempt to generate a unique ID by appending suffixes
-1,-2, … up to-99. If a unique ID is found, use it.If no unique ID can be generated, log a note and return
None, requiring manual intervention.
- Parameters:
gh_name (str | None) – GitHub repository name.
bt_id (str | None) – Existing bio.tools ID.
- Returns:
Mapped bio.tools ID, or
Noneif mapping failed.- Return type:
str | None
- async bridge.pipelines.gh2bt_for_meta.map_funcs.map_description(gh_params, bt_description)[source]#
Map and reconcile GitHub description metadata and bio.tools description metadata.
Policy: 1. If GitHub provides no metadata, the existing bio.tools description
is preserved.
If GitHub provides a description: - If it is effectively identical to the bio.tools description
(ignoring trailing punctuation and whitespace), no change is made.
Otherwise, the GitHub description overwrites the bio.tools value.
If GitHub provides no description and bio.tools is empty: - If a README is available, a short description is generated from
the README using an LLM and normalized before storage.
If no README is available, no description is set.
LLM failures never overwrite existing bio.tools descriptions.
- Parameters:
gh_params (dict | None) – GitHub metadata dictionary. Expected keys include: -
"description": Repository description string (optional) -"readme": README contents used for LLM-based description generationbt_description (str | None) – Existing bio.tools description, or
Noneif unset.
- Returns:
The reconciled bio.tools description, or
Noneif no description could be determined.- Return type:
str | None
- bridge.pipelines.gh2bt_for_meta.map_funcs.map_documentation(gh_repo_data, bt_documentation)[source]#
Map and reconcile GitHub documentation-related metadata to the bio.tools documentation field.
This function applies the documentation mapping policies for all supported GitHub documentation sources: - Repository wiki - Code of conduct - GitHub Pages site
Each source is mapped independently and contributes a
DocumentationItementry when a corresponding URL is present on GitHub and not already recorded in bio.tools.- Parameters:
gh_repo_data (dict[str, Any] | None) – GitHub repository metadata dictionary. Expected keys include: -
"html_url"-"has_wiki"-"code_of_conduct"-"github_pages"bt_documentation (list[DocumentationItem] | None) – Existing bio.tools documentation entries.
- Returns:
The updated bio.tools documentation list after applying all documentation mappings.
- Return type:
list[DocumentationItem] | None
- bridge.pipelines.gh2bt_for_meta.map_funcs.map_homepage(gh_schema, bt_homepage)[source]#
Map and reconcile GitHub and bio.tools homepage URLs using the generic GitHub-over-bio.tools policy with URL canonicalization.
Homepage comparison is performed on canonicalized URLs. If neither GitHub nor bio.tools defines a homepage, the GitHub repository URL (
gh_schema["html_url"]) is used as a fallback.- Parameters:
gh_schema (dict[str, AnyUrl | str | None]) – GitHub repository metadata dictionary. Expected keys include: - ‘homepage’ : The homepage URL configured on GitHub (may be None). - ‘html_url’ : The GitHub repository URL (used as fallback).
bt_homepage (UrlftpType | None) – Existing homepage value from bio.tools metadata, or
Noneif none is defined.
- Returns:
The resolved homepage as a UrlftpType instance, or
Noneif no homepage could be determined (only possible if gh_schema is malformed).- Return type:
UrlftpType | None
- bridge.pipelines.gh2bt_for_meta.map_funcs.map_language(gh_languages, bt_languages)[source]#
Map and reconcile GitHub and bio.tools programming languages using the generic GitHub-over-bio.tools policy.
GitHub language keys and bio.tools
LanguageEnumvalues are normalized to lowercased string sets for comparison. When GitHub is authoritative, the GitHub set is mapped back toLanguageEnumvalues; unknown languages are skipped with a log entry.- Parameters:
gh_languages (Language | None) – GitHub languages object, or
Noneif no language data is available.bt_languages (list[LanguageEnum] | None) – Existing bio.tools language annotations for the tool, or
Noneif no languages are currently recorded in bio.tools.
- Returns:
The reconciled list of bio.tools language enums following the policy. May be
Noneif both inputs areNone.- Return type:
list[LanguageEnum] | None
- bridge.pipelines.gh2bt_for_meta.map_funcs.map_license(gh_license, bt_license)[source]#
Map and reconcile GitHub and bio.tools license annotations using the generic GitHub-over-bio.tools policy.
- Parameters:
gh_license (str | None) – Github repository license (SPDX ID), or
Noneif no license is configured.bt_license (License | None) – Existing bio.tools license annotation, or
Noneif no license is currently recorded.
- Returns:
The reconciled license as a License enum member, or
Noneif neither GitHub nor bio.tools provides a usable license.- Return type:
License | None
- bridge.pipelines.gh2bt_for_meta.map_funcs.map_maturity(gh_schema, bt_maturity)[source]#
Map GitHub repository signals to bio.tools maturity metadata.
Policy: 1. If the GitHub repository is archived, maturity is always set to
Maturity.Legacy, regardless of existing bio.tools values.Otherwise, a popularity score based on stars, forks, watchers, and subscribers is computed.
Scores above a fixed threshold are classified as
Maturity.Mature; lower scores asMaturity.Emerging.Existing bio.tools maturity is preserved only when it matches the GitHub-derived classification.
Conflicting values are overwritten in favor of GitHub-derived maturity and logged as conflicts.
- Parameters:
gh_schema (dict[str, Any] | None) – GitHub repository metadata dictionary, or
Noneif unavailable. Expected keys include: - ‘archived’ : Boolean indicating if the repository is archived. - ‘stargazers_count’ : Number of stars. - ‘forks_count’ : Number of forks. - ‘watchers_count’ : Number of watchers. - ‘subscribers_count’ : Number of subscribers.bt_maturity (Maturity | None) – Existing bio.tools maturity annotation, or
Noneif unset.
- Returns:
The reconciled bio.tools maturity classification, or the existing value if no GitHub-derived maturity could be computed.
- Return type:
Maturity | None
- bridge.pipelines.gh2bt_for_meta.map_funcs.map_name(gh_name, bt_name)[source]#
Map and reconcile GitHub and bio.tools names using the generic GitHub-over-bio.tools policy.
- Parameters:
gh_name (str | None) – GitHub repository name.
bt_name (str | None) – Existing bio.tools name.
- Returns:
Mapped bio.tools name.
- Return type:
str | None
- bridge.pipelines.gh2bt_for_meta.map_funcs.map_publication(gh_citation_cff, bt_publications)[source]#
Map and reconcile GitHub CITATION.cff metadata and bio.tools publication entries.
Policy: - If no CITATION.cff data is present, the existing bio.tools publication
entries are unchanged.
If CITATION.cff data is present, its references are converted to bio.tools PublicationItem instances and merged with existing bio.tools entries. Duplicates are removed, prioritizing CITATION.cff entries.
- Parameters:
gh_citation_cff (dict[str, Any]) – Parsed content of an existing CITATION.cff file from the GitHub repository.
bt_publications (list[PublicationItem] | None) – Existing bio.tools publication entries, or
Noneif none are defined.
- Returns:
Updated list of bio.tools publication entries after reconciliation, or
Noneif no publications are defined.- Return type:
list[PublicationItem] | None
- bridge.pipelines.gh2bt_for_meta.map_funcs.map_version(gh_latest_version_tag, bt_versions)[source]#
Map and reconcile GitHub release metadata and bio.tools version metadata.
Ensures that the GitHub latest release tag is present in the bio.tools version list. If any existing bio.tools version appears newer than the GitHub latest (based on parsed version comparison), the bio.tools version list is reset to contain only the GitHub version.
Policy: 1. If GitHub provides no latest tag, existing bio.tools versions are preserved. 2. If bio.tools has no versions, the GitHub version is adopted. 3. If the GitHub version is already present, no change is made. 4. If a newer bio.tools version is detected, a conflict is logged and the
list is reset to the GitHub version only.
Otherwise, the GitHub version is appended to the existing list.
- Parameters:
gh_latest_version_tag (str | None) – Latest GitHub release tag, or
Noneif unavailable.bt_versions (list[VersionType] | None) – Existing bio.tools versions.
- Returns:
Updated list of bio.tools versions after reconciliation.
- Return type:
list[VersionType] | None
Submodules#
Map GitHub repository name to bio.tools ID. |
|
Mapping functions for description metadata. |
|
Mapping functions for documentation metadata. |
|
Map homepage metadata from GitHub to bio.tools. |
|
Map language metadata from GitHub to bio.tools. |
|
Map license metadata from GitHub to bio.tools. |
|
Mapping functions for maturity metrics. |
|
Mapping name from GitHub to bio.tools. |
|
Mapping functions for publication metadata. |
|
Mapping releases for version metadata. |
Dependencies diagram#
Each architecture diagram below visualizes the internal dependency structure of the bridge.pipelines.gh2bt_for_meta.map_funcs package.
It shows how modules and subpackages within the package depend on each other, based on direct Python imports.
Packages are shown as purple rectangles
Modules are shown as pink rectangles
Arrows (A → B) indicate that A directly imports B
Each subpackage’s diagram focuses only on its own internal structure, it does not include imports to or from higher-level packages (those appear in the parent package’s diagram).