bridge.pipelines.gh2bt_for_meta.map_funcs package#

Public Interface#

This section documents user-facing interface of the bridge.pipelines.gh2bt_for_meta.map_funcs package (as defined in its __init__.py file).

Functions#

map_documentation(gh_repo_data, bt_documentation)

Map and reconcile GitHub documentation-related metadata to the bio.tools documentation field.

map_homepage(gh_schema, bt_homepage)

Map and reconcile GitHub and bio.tools homepage URLs using the generic GitHub-over-bio.tools policy with URL canonicalization.

map_description(gh_params, bt_description)

Map and reconcile GitHub description metadata and bio.tools description metadata.

map_language(gh_languages, bt_languages)

Map and reconcile GitHub and bio.tools programming languages using the generic GitHub-over-bio.tools policy.

map_license(gh_license, bt_license)

Map and reconcile GitHub and bio.tools license annotations using the generic GitHub-over-bio.tools policy.

map_maturity(gh_schema, bt_maturity)

Map GitHub repository signals to bio.tools maturity metadata.

map_version(gh_latest_version_tag, bt_versions)

Map and reconcile GitHub release metadata and bio.tools version metadata.

map_publication(gh_citation_cff, bt_publications)

Map and reconcile GitHub CITATION.cff metadata and bio.tools publication entries.

map_name(gh_name, bt_name)

Map and reconcile GitHub and bio.tools names using the generic GitHub-over-bio.tools policy.

map_biotools_id(gh_name, bt_id)

Map and reconcile GitHub repository name to bio.tools ID.

Individual mapping functions for GitHub to bio.tools.

async bridge.pipelines.gh2bt_for_meta.map_funcs.map_biotools_id(gh_name, bt_id)[source]#

Map and reconcile GitHub repository name to bio.tools ID.

Policy: 1. If no GitHub repo name is available, preserve existing bio.tools ID (even if None). 2. If the existing bio.tools ID and GitHub repo name contain each other

(case-insensitive), preserve the existing bio.tools ID.

  1. If they do not contain each other, log a conflict but continue.

  2. If the GitHub repo name is not already used as a bio.tools ID, use it as the new bio.tools ID.

  3. If the GitHub repo name is already used as a bio.tools ID, attempt to generate a unique ID by appending suffixes -1, -2, … up to -99. If a unique ID is found, use it.

  4. If no unique ID can be generated, log a note and return None, requiring manual intervention.

Parameters:
  • gh_name (str | None) – GitHub repository name.

  • bt_id (str | None) – Existing bio.tools ID.

Returns:

Mapped bio.tools ID, or None if mapping failed.

Return type:

str | None

async bridge.pipelines.gh2bt_for_meta.map_funcs.map_description(gh_params, bt_description)[source]#

Map and reconcile GitHub description metadata and bio.tools description metadata.

Policy: 1. If GitHub provides no metadata, the existing bio.tools description

is preserved.

  1. If GitHub provides a description: - If it is effectively identical to the bio.tools description

    (ignoring trailing punctuation and whitespace), no change is made.

    • Otherwise, the GitHub description overwrites the bio.tools value.

  2. If GitHub provides no description and bio.tools is empty: - If a README is available, a short description is generated from

    the README using an LLM and normalized before storage.

    • If no README is available, no description is set.

  3. LLM failures never overwrite existing bio.tools descriptions.

Parameters:
  • gh_params (dict | None) – GitHub metadata dictionary. Expected keys include: - "description" : Repository description string (optional) - "readme" : README contents used for LLM-based description generation

  • bt_description (str | None) – Existing bio.tools description, or None if unset.

Returns:

The reconciled bio.tools description, or None if no description could be determined.

Return type:

str | None

bridge.pipelines.gh2bt_for_meta.map_funcs.map_documentation(gh_repo_data, bt_documentation)[source]#

Map and reconcile GitHub documentation-related metadata to the bio.tools documentation field.

This function applies the documentation mapping policies for all supported GitHub documentation sources: - Repository wiki - Code of conduct - GitHub Pages site

Each source is mapped independently and contributes a DocumentationItem entry when a corresponding URL is present on GitHub and not already recorded in bio.tools.

Parameters:
  • gh_repo_data (dict[str, Any] | None) – GitHub repository metadata dictionary. Expected keys include: - "html_url" - "has_wiki" - "code_of_conduct" - "github_pages"

  • bt_documentation (list[DocumentationItem] | None) – Existing bio.tools documentation entries.

Returns:

The updated bio.tools documentation list after applying all documentation mappings.

Return type:

list[DocumentationItem] | None

bridge.pipelines.gh2bt_for_meta.map_funcs.map_homepage(gh_schema, bt_homepage)[source]#

Map and reconcile GitHub and bio.tools homepage URLs using the generic GitHub-over-bio.tools policy with URL canonicalization.

Homepage comparison is performed on canonicalized URLs. If neither GitHub nor bio.tools defines a homepage, the GitHub repository URL (gh_schema["html_url"]) is used as a fallback.

Parameters:
  • gh_schema (dict[str, AnyUrl | str | None]) – GitHub repository metadata dictionary. Expected keys include: - ‘homepage’ : The homepage URL configured on GitHub (may be None). - ‘html_url’ : The GitHub repository URL (used as fallback).

  • bt_homepage (UrlftpType | None) – Existing homepage value from bio.tools metadata, or None if none is defined.

Returns:

The resolved homepage as a UrlftpType instance, or None if no homepage could be determined (only possible if gh_schema is malformed).

Return type:

UrlftpType | None

bridge.pipelines.gh2bt_for_meta.map_funcs.map_language(gh_languages, bt_languages)[source]#

Map and reconcile GitHub and bio.tools programming languages using the generic GitHub-over-bio.tools policy.

GitHub language keys and bio.tools LanguageEnum values are normalized to lowercased string sets for comparison. When GitHub is authoritative, the GitHub set is mapped back to LanguageEnum values; unknown languages are skipped with a log entry.

Parameters:
  • gh_languages (Language | None) – GitHub languages object, or None if no language data is available.

  • bt_languages (list[LanguageEnum] | None) – Existing bio.tools language annotations for the tool, or None if no languages are currently recorded in bio.tools.

Returns:

The reconciled list of bio.tools language enums following the policy. May be None if both inputs are None.

Return type:

list[LanguageEnum] | None

bridge.pipelines.gh2bt_for_meta.map_funcs.map_license(gh_license, bt_license)[source]#

Map and reconcile GitHub and bio.tools license annotations using the generic GitHub-over-bio.tools policy.

Parameters:
  • gh_license (str | None) – Github repository license (SPDX ID), or None if no license is configured.

  • bt_license (License | None) – Existing bio.tools license annotation, or None if no license is currently recorded.

Returns:

The reconciled license as a License enum member, or None if neither GitHub nor bio.tools provides a usable license.

Return type:

License | None

bridge.pipelines.gh2bt_for_meta.map_funcs.map_maturity(gh_schema, bt_maturity)[source]#

Map GitHub repository signals to bio.tools maturity metadata.

Policy: 1. If the GitHub repository is archived, maturity is always set to

Maturity.Legacy, regardless of existing bio.tools values.

  1. Otherwise, a popularity score based on stars, forks, watchers, and subscribers is computed.

  2. Scores above a fixed threshold are classified as Maturity.Mature; lower scores as Maturity.Emerging.

  3. Existing bio.tools maturity is preserved only when it matches the GitHub-derived classification.

  4. Conflicting values are overwritten in favor of GitHub-derived maturity and logged as conflicts.

Parameters:
  • gh_schema (dict[str, Any] | None) – GitHub repository metadata dictionary, or None if unavailable. Expected keys include: - ‘archived’ : Boolean indicating if the repository is archived. - ‘stargazers_count’ : Number of stars. - ‘forks_count’ : Number of forks. - ‘watchers_count’ : Number of watchers. - ‘subscribers_count’ : Number of subscribers.

  • bt_maturity (Maturity | None) – Existing bio.tools maturity annotation, or None if unset.

Returns:

The reconciled bio.tools maturity classification, or the existing value if no GitHub-derived maturity could be computed.

Return type:

Maturity | None

bridge.pipelines.gh2bt_for_meta.map_funcs.map_name(gh_name, bt_name)[source]#

Map and reconcile GitHub and bio.tools names using the generic GitHub-over-bio.tools policy.

Parameters:
  • gh_name (str | None) – GitHub repository name.

  • bt_name (str | None) – Existing bio.tools name.

Returns:

Mapped bio.tools name.

Return type:

str | None

bridge.pipelines.gh2bt_for_meta.map_funcs.map_publication(gh_citation_cff, bt_publications)[source]#

Map and reconcile GitHub CITATION.cff metadata and bio.tools publication entries.

Policy: - If no CITATION.cff data is present, the existing bio.tools publication

entries are unchanged.

  • If CITATION.cff data is present, its references are converted to bio.tools PublicationItem instances and merged with existing bio.tools entries. Duplicates are removed, prioritizing CITATION.cff entries.

Parameters:
  • gh_citation_cff (dict[str, Any]) – Parsed content of an existing CITATION.cff file from the GitHub repository.

  • bt_publications (list[PublicationItem] | None) – Existing bio.tools publication entries, or None if none are defined.

Returns:

Updated list of bio.tools publication entries after reconciliation, or None if no publications are defined.

Return type:

list[PublicationItem] | None

bridge.pipelines.gh2bt_for_meta.map_funcs.map_version(gh_latest_version_tag, bt_versions)[source]#

Map and reconcile GitHub release metadata and bio.tools version metadata.

Ensures that the GitHub latest release tag is present in the bio.tools version list. If any existing bio.tools version appears newer than the GitHub latest (based on parsed version comparison), the bio.tools version list is reset to contain only the GitHub version.

Policy: 1. If GitHub provides no latest tag, existing bio.tools versions are preserved. 2. If bio.tools has no versions, the GitHub version is adopted. 3. If the GitHub version is already present, no change is made. 4. If a newer bio.tools version is detected, a conflict is logged and the

list is reset to the GitHub version only.

  1. Otherwise, the GitHub version is appended to the existing list.

Parameters:
  • gh_latest_version_tag (str | None) – Latest GitHub release tag, or None if unavailable.

  • bt_versions (list[VersionType] | None) – Existing bio.tools versions.

Returns:

Updated list of bio.tools versions after reconciliation.

Return type:

list[VersionType] | None

Submodules#

biotools_id

Map GitHub repository name to bio.tools ID.

description

Mapping functions for description metadata.

documentation

Mapping functions for documentation metadata.

homepage

Map homepage metadata from GitHub to bio.tools.

language

Map language metadata from GitHub to bio.tools.

license

Map license metadata from GitHub to bio.tools.

maturity

Mapping functions for maturity metrics.

name

Mapping name from GitHub to bio.tools.

publication

Mapping functions for publication metadata.

version

Mapping releases for version metadata.

Dependencies diagram#

Each architecture diagram below visualizes the internal dependency structure of the bridge.pipelines.gh2bt_for_meta.map_funcs package. It shows how modules and subpackages within the package depend on each other, based on direct Python imports.

  • Packages are shown as purple rectangles

  • Modules are shown as pink rectangles

  • Arrows (A → B) indicate that A directly imports B

Each subpackage’s diagram focuses only on its own internal structure, it does not include imports to or from higher-level packages (those appear in the parent package’s diagram).

bridge package dependencies