bridge.pipelines.utils package#

Public Interface#

This section documents user-facing interface of the bridge.pipelines.utils package (as defined in its __init__.py file).

Functions#

check_file_with_extension_exists(...)

Check whether a folder contains at least one file with a given extension.

get_file_content(file_path)

Read the contents of a text file as UTF-8.

load_dict_from_yaml_file(file_path)

Load a YAML file and return its content as a dictionary.

compose_badge(label, message, color, ...[, ...])

Construct a Badge with a Shields.io URL and optional embedded SVG logo.

canonicalize_shields_url(url)

Canonicalize a shields.io image URL for stable comparison.

canonicalize_url(url)

Canonicalize a generic URL for stable comparison.

escape_shields_part(value)

Prepare a label or message for use in a Shields.io badge path segment.

normalize_color(value)

Normalize a color value by stripping a leading '#' and percent-encoding.

normalize_text(value[, normalize_multiline])

Clean and normalize free-text values.

normalize_dict_strings(d)

Recursively normalize all string-like values in a dictionary.

normalize_pydantic_model_strings(model)

Recursively normalize all string fields in a Pydantic model in-place.

fill_template(template, placeholders)

Render a simple string template by substituting named placeholders.

remove_first_snippet_from_text(text, snippet)

Remove the first occurrence of a snippet from a string.

svg_to_base64(svg_path)

Convert an SVG file into a cleaned, base64-encoded string.

find_matching_enum_member(value, enum_cls)

Resolve a free-text string to a member of a given Enum via case-insensitive matching on both member .value and .name.

object_to_primitive(obj)

Recursively convert complex objects into plain Python types.

str_contain_each_other(str1, str2)

Check if two strings contain each other (case-insensitive).

Classes#

Badge(**data)

Representation of a README badge and its Markdown rendering.

Utilities for pipelines.

class bridge.pipelines.utils.Badge(**data)[source]#

Bases: BaseModel

Representation of a README badge and its Markdown rendering.

Two Badge instances are considered equal if their canonical image URL and link URL match, irrespective of superficial formatting differences in the original Markdown.

Parameters:
  • alt_text (str) – The alternative text for the badge image (used in the Markdown alt field and as a textual fallback).

  • image_url (str) – The URL of the badge image (e.g. a Shields.io badge endpoint).

  • link_url (str | None) – The URL to link to when the badge is clicked. If None, the badge will be rendered as an image without a surrounding link.

  • full_match (str | None) – The original Markdown string representing the badge, if this Badge was created from a parsed README. When set, as_markdown() will return this exact string, preserving original formatting.

alt_text: str#
as_markdown()[source]#

Render the badge as a Markdown-formatted string.

If full_match is set (e.g. when this badge came from a parsed README), that original Markdown string is returned verbatim. This preserves existing formatting and parameter ordering, even if internal fields were canonicalized.

Otherwise, a canonical Markdown representation is generated:

  • If link_url is not None:

    [![alt_text](image_url)](link_url)

  • If link_url is None:

    ![alt_text](image_url)

Returns:

The Markdown representation of the badge.

Return type:

str

full_match: str | None#
image_url: str#
bridge.pipelines.utils.canonicalize_shields_url(url)[source]#

Canonicalize a shields.io image URL for stable comparison.

This helper focuses on shields.io badge URLs served from img.shields.io. The main goal is to strip out purely cosmetic parameters that shouldn’t affect logical equality (e.g. user-chosen logos) and to provide a stable query parameter ordering.

Behaviour: - If the URL does not contain img.shields.io, it is returned unchanged. - If it does, the query string is parsed, the logo parameter is removed

(case-insensitive), and the remaining parameters are sorted lexicographically and re-encoded.

Parameters:

url (str) – The shields.io URL to canonicalize.

Returns:

The canonicalized URL, suitable for equality checks or deduplication.

Return type:

str

bridge.pipelines.utils.canonicalize_url(url)[source]#

Canonicalize a generic URL for stable comparison.

This performs a minimal, well-defined normalization intended to make string-based URL comparisons less fragile without changing semantics for typical HTTP(S) URLs.

Normalizations applied: - Lowercase the scheme and netloc (host + port). - Strip trailing slashes from the path, but ensure the path is at least “/”. - Parse the query string into key/value pairs, sort them, and re-encode

(preserving multiplicity via doseq=True).

  • Drop the fragment entirely (anything after ‘#’).

Parameters:

url (str) – The URL to canonicalize.

Returns:

The canonicalized URL.

Return type:

str

bridge.pipelines.utils.check_file_with_extension_exists(in_folder_path, file_extension)[source]#

Check whether a folder contains at least one file with a given extension.

Parameters:
  • in_folder_path (str) – Path to the folder that should be searched.

  • file_extension (str) – File extension to look for, including the leading dot (e.g. ".md", ".json").

Returns:

True if at least one file with the specified extension is found anywhere under in_folder_path; False otherwise.

Return type:

bool

bridge.pipelines.utils.compose_badge(label, message, color, label_color, alt_text, url=None, svg_path=None)[source]#

Construct a Badge with a Shields.io URL and optional embedded SVG logo.

This high-level helper:

  1. Optionally reads an SVG file from svg_path and encodes it as base64.

  2. Builds a Shields.io badge URL with label, message, color, label_color, and the embedded logo (if any).

  3. Instantiates a Badge with alt_text, the generated image_url, and an optional link_url.

  4. Pre-populates the full_match field with the Markdown representation of the badge, so that as_markdown() returns a ready-to-use snippet.

Parameters:
  • label (str) – The text shown on the left-hand side of the badge.

  • message (str) – The text shown on the right-hand side of the badge.

  • color (str) – The color for the right-hand side of the badge (e.g. hex or named).

  • label_color (str) – The color for the label side (left-hand side) of the badge.

  • alt_text (str) – The alternative text for the badge image (used as the alt attribute in Markdown).

  • url (str | None, optional) – The URL to link to when the badge is clicked. If None, the badge is rendered as a plain image without a link.

  • svg_path (str | None, optional) – Filesystem path to an SVG file to embed as a logo in the badge. If None, no logo is embedded.

Returns:

A fully constructed Badge instance with image_url pointing to a Shields.io badge URL and full_match containing the Markdown snippet.

Return type:

Badge

bridge.pipelines.utils.escape_shields_part(value)[source]#

Prepare a label or message for use in a Shields.io badge path segment.

Shields.io encodes meaning into certain characters in the path portion of the URL. This function escapes a free-form string so that it can be safely embedded in that position without accidentally triggering Shields’ special syntax.

Shields path semantics: - - = segment separator - = literal - - _ = space - __ = literal _

This function: - Converts - to . - Converts _ to __. - Percent-encodes anything else that needs escaping, but leaves _

unchanged so that Shields can interpret it as a space.

Parameters:

value (str) – The label or message to escape.

Returns:

The escaped value, suitable for direct inclusion in a Shields.io URL path segment.

Return type:

str

bridge.pipelines.utils.fill_template(template, placeholders)[source]#

Render a simple string template by substituting named placeholders.

This function performs lightweight templating by replacing occurrences of {{ key }} in the template with corresponding values from the placeholders dictionary. Whitespace immediately inside the curly braces is ignored, so all of the following forms are supported and treated equivalently:

  • {{key}}

  • {{ key }}

  • {{    key   }}

Substitution is performed for each key in placeholders using a regular expression that matches the placeholder pattern. Placeholders without a corresponding key in the placeholders dictionary are left unchanged.

Parameters:
  • template (str) – The template string containing one or more placeholders of the form {{ key }}.

  • placeholders (dict[str, str]) – A mapping from placeholder names (without braces) to replacement strings. Each key is treated literally (escaped in the underlying regular expression).

Returns:

The rendered template with all matching placeholders replaced by their corresponding values.

Return type:

str

bridge.pipelines.utils.find_matching_enum_member(value, enum_cls)[source]#

Resolve a free-text string to a member of a given Enum via case-insensitive matching on both member .value and .name.

The input is matched against:
  1. str(member.value) (primary match target)

  2. member.name (fallback match target)

Matching is performed in a case-insensitive manner. No fuzzy matching, partial matching, or alias expansion is applied.

Parameters:
  • value (str) – Free-text input to be normalized.

  • enum_cls (type[E]) – The Enum class to match against.

Returns:

The matching Enum member if an exact case-insensitive match is found against either .value or .name; otherwise None.

Return type:

E | None

Notes

  • This function assumes Enum values are string-like or safely castable to str.

  • If multiple Enum members share the same normalized value, the first match in definition order is returned.

bridge.pipelines.utils.get_file_content(file_path)[source]#

Read the contents of a text file as UTF-8.

Parameters:

file_path (str | Path) – Path to the file whose contents should be read.

Returns:

The file contents as a string if the file exists, otherwise None.

Return type:

str | None

bridge.pipelines.utils.load_dict_from_yaml_file(file_path)[source]#

Load a YAML file and return its content as a dictionary.

Parameters:

file_path (str | Path) – Path to the YAML file to load.

Returns:

The parsed YAML content as a dictionary, or an empty dictionary if the file is missing, empty, invalid, or cannot be parsed.

Return type:

dict[str, Any]

bridge.pipelines.utils.normalize_color(value)[source]#

Normalize a color value by stripping a leading ‘#’ and percent-encoding.

Behaviour: - Coerces the value to a string and strips surrounding whitespace. - Removes a leading # if present. - Percent-encodes the remaining value with no safe characters.

Parameters:

value (str) – The color value to normalize (e.g. “#4c1”, “brightgreen”).

Returns:

The normalized color string, without a leading ‘#’, and percent-encoded for safe use in URLs.

Return type:

str

bridge.pipelines.utils.normalize_dict_strings(d)[source]#

Recursively normalize all string-like values in a dictionary.

This function walks the entire structure of the given dict and applies normalize_text() to any string or None value it encounters.

Keys are left unchanged. Non-string scalar values (numbers, booleans, etc.) are preserved as-is.

Parameters:

d (dict[str, Any]) – The dictionary whose string values (and nested structures) should be normalized.

Returns:

A new dictionary with the same structure as d, where all nested string/None values have been normalized.

Return type:

dict[str, Any]

bridge.pipelines.utils.normalize_pydantic_model_strings(model)[source]#

Recursively normalize all string fields in a Pydantic model in-place.

For each declared field on the model:

  • If the current value is a string or None, it is passed through normalize_text().

  • If the current value is a container (dict, list, tuple, set), its contents are normalized recursively via _normalize_structure().

  • If the current value is another Pydantic model, it is normalized recursively by calling normalize_pydantic_model_strings on it.

If the object does not look like a Pydantic model (i.e. has neither model_fields nor __fields__), it is returned unchanged.

Parameters:

model (Any) – The Pydantic model instance to normalize, or any other object.

Returns:

The same model object, potentially modified in-place if it is a Pydantic model with string fields or nested containers.

Return type:

Any

bridge.pipelines.utils.normalize_text(value, normalize_multiline=True)[source]#

Clean and normalize free-text values.

This is a general-purpose text scrubber intended to remove presentation artefacts (HTML entities/tags, box-drawing characters) and normalize whitespace so that strings are more suitable for storage, comparison, or inclusion in metadata formats.

Steps performed: 1. Decode HTML entities (e.g. &lt;i&gt;<i>, &amp;&). 2. Strip all remaining HTML tags (e.g. <i>name</i>name). 3. Replace the box-drawing dash with a plain ASCII -. 4. Remove non-printable characters. 5. Optionally collapse all whitespace, including newlines, into single

spaces (normalize_multiline=True).

  1. Strip leading and trailing whitespace.

Parameters:
  • value (str | None) – The text to normalize. If None, the function returns None.

  • normalize_multiline (bool, optional) – If True (default), newlines and runs of whitespace are collapsed into single spaces. If False, existing line breaks are preserved and only non-printable characters and HTML artefacts are removed.

Returns:

The normalized text, or None if the input was None.

Return type:

str | None

bridge.pipelines.utils.object_to_primitive(obj)[source]#

Recursively convert complex objects into plain Python types.

This function walks an arbitrary Python object and produces a structure composed only of “primitive” container-friendly types:

  • dict with primitive values

  • list of primitive values

  • str, int, float, bool, or None

It is useful before serializing data to JSON, YAML, or other text-based formats where custom classes (e.g. Pydantic models, enums) would otherwise introduce unwanted artifacts or non-serializable types.

Conversion rules#

  • Pydantic models: - For v2 models, model_dump(mode="python", exclude_none=True) is used. - For v1 models, dict(exclude_none=True) is used. - The resulting dict is then processed recursively.

  • Enum instances: - Replaced with their .value.

  • Mappings / dicts: - Keys are left as-is, values are passed through object_to_primitive

    recursively.

  • Iterables (list, tuple, set): - Converted to a list with each element converted recursively.

  • Anything else: - Returned unchanged, under the assumption that it is already a primitive

    type or is otherwise safely serializable.

type obj:

Any

param obj:

The object (or nested structure of objects) to convert.

type obj:

Any

returns:

A recursively converted object that only contains primitive types and containers thereof.

rtype:

Any

Parameters:

obj (Any)

Return type:

Any

bridge.pipelines.utils.remove_first_snippet_from_text(text, snippet)[source]#

Remove the first occurrence of a snippet from a string.

This helper searches for the first occurrence of snippet in text and returns a new string with that occurrence removed. All content before and after the snippet is preserved. If the snippet is not found, the original string is returned unchanged.

Behaviour: - If text is None or an empty string, an empty string is returned. - If snippet is None or an empty string, text is returned unchanged. - Only the first match is removed; subsequent occurrences of snippet

remain untouched.

Parameters:
  • text (str | None) – The original text from which the snippet should be removed. May be None, in which case an empty string is returned.

  • snippet (str | None) – The snippet to remove from the text. If None or empty, no removal is performed.

Returns:

A new string with the first occurrence of snippet removed, or the original text (or an empty string) if no removal is performed.

Return type:

str

bridge.pipelines.utils.str_contain_each_other(str1, str2)[source]#

Check if two strings contain each other (case-insensitive).

Parameters:
  • str1 (str) – First string.

  • str2 (str) – Second string.

Returns:

True if either string contains the other, False otherwise.

Return type:

bool

bridge.pipelines.utils.svg_to_base64(svg_path)[source]#

Convert an SVG file into a cleaned, base64-encoded string.

This helper is intended for scenarios where an SVG needs to be embedded directly into another format (e.g. HTML img tags with data URIs, Markdown, or JSON/YAML configuration files) rather than referenced by filesystem path.

Parameters:

svg_path (str) – Path to the SVG file on disk.

Returns:

A base64-encoded string representing the cleaned SVG content. The resulting string contains only ASCII characters and no newlines, and can be safely used in data URIs such as:

f"data:image/svg+xml;base64,{svg_to_base64('icon.svg')}"

Return type:

str

Raises:

FileNotFoundError – If the SVG file does not exist at the given path.

Submodules#

badges

Utilities for constructing and handling badge assets (e.g. Shields.io badges).

cleaning

Utilities for cleaning and canonicalizing objects.

comparisons

Utility functions for comparisons.

conversions

Utility functions for converting object types and preparing them for serialization.

files

Utility functions for basic file handling.

templating

Utilities for handling string templating.

Dependencies diagram#

Each architecture diagram below visualizes the internal dependency structure of the bridge.pipelines.utils package. It shows how modules and subpackages within the package depend on each other, based on direct Python imports.

  • Packages are shown as purple rectangles

  • Modules are shown as pink rectangles

  • Arrows (A → B) indicate that A directly imports B

Each subpackage’s diagram focuses only on its own internal structure, it does not include imports to or from higher-level packages (those appear in the parent package’s diagram).

bridge package dependencies