bridge.pipelines.shared.publications module

bridge.pipelines.shared.publications module#

Shared publication utilities for pipelines.

bridge.pipelines.shared.publications.deduplicate_references(references)[source]#

Deduplicate a list of publication-like objects based on shared identifiers.

Two references are considered duplicates if they share at least one normalized identifier (DOI, PMID, PMCID, or title). The first occurrence in the input list is retained; all later duplicates are dropped.

Parameters:

references (list[Publication | Mapping[str, Any]]) – List of references (models or dicts) to deduplicate.

Returns:

Deduplicated list of references, preserving original order for the first occurrence of each logical publication.

Return type:

list[Publication | Mapping[str, Any]]

bridge.pipelines.shared.publications.extract_cff_references(citation_cff)[source]#

Extract publication information from CITATION.cff dictionary.

This helper parses an existing CFF structure and returns: - the list of reference entries, and - the preferred-citation entry, if present.

The preferred citation is ensured to be part of the references list: if it is not already present, it is appended.

Parameters:

citation_cff (dict[str, Any] | None) – Parsed content of an existing CITATION.cff file, or None if no file exists.

Returns:

A tuple of: - A list of reference dictionaries extracted from the CFF. - The preferred-citation dictionary, or None if not present.

Return type:

tuple[list[dict[str, Any]], dict[str, Any] | None]

bridge.pipelines.shared.publications.ref_ids(ref)[source]#

Extract a normalized set of identifiers for a publication-like object. This function collects all available identifiers and returns them as normalized strings, which can then be used for deduplication.

Supports: - Europe PMC Publication model - bio.tools PublicationItem model - plain dict-like mappings

Normalized identifiers include (when present): - DOI - PMID - PMCID - title

Parameters:

ref (Publication | Mapping[str, Any]) – The Publication object or dictionary representing a publication.

Returns:

A set of normalized identifier strings. The set may be empty if no identifiers are present.

Return type:

set[str]

Raises:

TypeError – If ref is neither a Publication instance nor a mapping.