Small library for validating and normalising persistent identifiers used in scholarly communication.
Free software: Revised BSD license
Documentation: https://idutils.readthedocs.io.
Features¶
Addition of custom schemes supporting all features of predefined schemes
Validation and normalization of persistent identifiers.
Detection of persistent identifier scheme.
Generation of resolving links for persistent identifiers.
Supported schemes: ISBN10, ISBN13, ISSN, ISTC, DOI, Handle, EAN8, EAN13, ISNI ORCID, ARK, PURL, LSID, URN, Bibcode, arXiv, PubMed ID, PubMed Central ID, GND, SRA, BioProject, BioSample, Ensembl, UniProt, RefSeq, Genome Assembly, GEO, ArrayExpress.
Installation¶
The IDUtils package is on PyPI so all you need is:
$ pip install idutils
API¶
Small library for persistent identifiers used in scholarly communication.
- idutils.detect_identifier_schemes(val)[source]¶
Detect persistent identifier scheme for a given value.
Note
Some schemes like PMID are very generic.
- idutils.is_arrayexpress_experiment(val)[source]¶
Test if argument is an ArrayExpress experiment accession.
- idutils.is_ean(val)[source]¶
Test if argument is a International Article Number (EAN-13 or EAN-8).
See http://en.wikipedia.org/wiki/International_Article_Number_(EAN).
- idutils.is_handle(val)[source]¶
Test if argument is a Handle.
Note, DOIs are also handles, and handle are very generic so they will also match e.g. any URL your parse.
- idutils.is_istc(val)[source]¶
Test if argument is a International Standard Text Code.
See http://www.istc-international.org/html/about_structure_syntax.aspx
- idutils.is_orcid(val)[source]¶
Test if argument is an ORCID ID.
- See http://support.orcid.org/knowledgebase/
articles/116780-structure-of-the-orcid-identifier
- idutils.is_pmid(val)[source]¶
Test if argument is a PubMed ID.
Warning: PMID are just integers, with no structure, so this function will say any integer is a PubMed ID
- idutils.normalize_pid(val, scheme)[source]¶
Normalize an identifier.
E.g. doi:10.1234/foo and http://dx.doi.org/10.1234/foo and 10.1234/foo will all be normalized to 10.1234/foo.
- idutils.to_url(val, scheme, url_scheme='http')[source]¶
Convert a resolvable identifier into a URL for a landing page.
- Parameters:
val – The identifier’s value.
scheme – The identifier’s scheme.
url_scheme – Scheme to use for URL generation, ‘http’ or ‘https’.
- Returns:
URL for the identifier.
Added in version 0.3.0:
url_schemeused for URL generation.
Changes¶
Version v1.5.0 (released 2025-07-14)
chores: replaced importlib_metadata with importlib.metadata
Version 1.4.5 (2025-06-05)
ark: fix regex to match new ARK identifiers without slash
Version 1.4.4 (2025-06-03)
swhid: improved SWHID validation
tests: additional tests
Version 1.4.3 (2025-05-12)
is_url: allow URL parameters (i.e. semicolon)
gnd: improve validation and normalization
pmcid: fix url to a working location
pmid: add trailing slash
new: email and sha1 identifiers
Version 1.4.2 (2024-11-01)
setup: remove pytest-invenio to make imports cleaner
setup: install importlib_metadata for compatibility
bibcode/ads: normalize unicode
Version 1.4.1 (2024-10-18)
install: add importlib_metadata
Version 1.4.0 (2024-10-17)
Restructure module to be configurable and readable.
Adds a new entrypoint to register new custom schemes
Adds deprecations for direct imports of schemes
Version 1.3.0 (2024-10-15) (yanked due to undesired flask dependency)
Restructure module to be configurable and readable.
Adds a new entrypoint to register new custom schemes
Adds deprecations for direct imports of schemes
Version 1.2.1 (2023-03-02)
Fixes ORCiD validation, by adding the new ISNI block range.
Version 1.2.0 (2023-01-30)
schemes: add support for viaf and urn
Version 1.1.12 (2022-02-28)
Replaces
isbnid_forkwithisbnlib
Version 1.1.11 (2022-01-28)
Normalize pmid + their URL identifiers
Version 1.1.10 (2022-01-11)
Add purl.fdlp.gov as a valid PURL netloc
Normalize ror identifiers
Version 1.1.9 (2021-08-30)
Update ARK’s NAAN regex per https://datatracker.ietf.org/doc/html/draft-kunze-ark-28#section-2.3.
Version 1.1.8 (2020-08-13)
Adds support for GEO and ArrayExpress identifiers.
Version 1.1.7 (2020-06-22)
Updates Software Heritage identifiers
Adds Research Organization Registry identifiers
Fixes DeprctationWarnings by using raw strings for regular expressions
Version 1.1.6 (2020-05-07)
Deprecates Python versions lower than 3.6.0. Now supporting 3.6.0 and 3.7.0.
Version 1.1.5 (2020-02-26)
Adds support for Software Heritage identifiers.
Fixes handling of non-digit characters in DOI detection.
Version 1.1.4 (2019-09-27)
Adds support for ASCL identifiers.
Fixes the ADS identifier regex to also detect lower-case author initials.
Version 1.1.3 (2019-09-17)
Adds support for HTTPS ORCiD identifiers.
Version 1.1.2 (2019-02-12)
Adds support for HAL identifiers.
Version 1.1.1 (2018-11-18)
Changes URL resolution for bibcodes to use https://ui.adsabs.harvard instead of https://adsabs.harvard.edu/abs/.
Allows choosing HTTP/HTTPS for any generated URL by
idutils.to_url.
Version 1.1.0 (2018-08-17)
Adds support for genomic identifiers: SRA, BioProject, BioSample, Ensembl, UniProt, RefSeq, GenBank/RefSeq.
Fixes bug in bibcode detection for non-capitalized journals.
Version 1.0.1 (2018-05-02)
Fixes bug causing invalid DOIs to be accepted.
Version 1.0.0 (2017-12-07)
Fixes handling of unicode characters in DOIs.
Adds support for APS style arXiv identifiers.
Version 0.2.4 (2017-01-30)
Removes Python 3.3 from a list of supported Python versions and adds Python 3.6
Moves from isbnid (v0.3.4) to isbnid_fork (v0.4.4) library.
Version 0.2.3 (2016-09-21)
Adds an optional parameter in idutils.to_url to use HTTPS scheme for PID providers that support it.
Detects and parses Handles and DOIs without the “http(s)://”, and ignores whitespace after scheme tags (eg. “doi: 10.123/456”).
Version 0.2.2 (2016-09-16)
Fixes issue where a valid ISBN with dashes and spaces could not be normalized.
Version 0.2.1 (2016-06-17)
Changes ISBN normalization to use isbnid instead of isbnlib. Now, importing this library will not change the default socket timeout, resulting in unwanted side effects.
Version 0.2.0 (2016-04-07)
Changes URL resolution for DOIs to use https://doi.org instead of http://dx.doi.org according to https://www.doi.org/doi_handbook/3_Resolution.html#3.8
Version 0.1.1 (2015-07-22)
Fixes GND validation and normalization.
Replaces invalid package name in run-tests.sh and makes run-tests.sh file executable. One can now use docker-compose run –rm web /code/run-tests.sh to run all the CI tests (pep257, sphinx, test suite).
Initial release of Docker configuration suitable for local developments. docker-compose build rebuilds the image, docker-compose run –rm web /code/run-tests.sh runs the test suite.
Version 0.1.0 (2015-07-02)
First public release.
Contributing¶
Bug reports, feature requests, and other contributions are welcome. If you find a demonstrable problem that is caused by the code of this library, please:
Search for already reported problems.
Check if the issue has been fixed or is still reproducible on the latest master branch.
Create an issue with a test case.
If you create a feature branch, you can run the tests to ensure everything is operating correctly:
$ ./run-tests.sh
How to add your own schemes¶
Extension class to collect and register new schemes via entrypoints.
In order to define your own custom schemes you can use the following entrypoint to register them
[options.entry_points]
idutils.custom_schemes =
my_new_scheme = my_module.get_scheme_config_func
The entry point 'my_new_scheme = my_module.get_scheme_config_func' defines an entry
point named my_new_scheme pointing to the function my_module.get_scheme_config_func
which returns the config for your new registered scheme.
That function must return a dictionary with the following format:
def get_scheme_config_func():
return {
# See examples in `idutils.validators` file.
"validator": lambda value: True else False,
# Used in `idutils.normalizers.normalize_pid` function.
"normalizer": lambda value: normalized_value,
# See examples in `idutils.detectors.IDUTILS_SCHEME_FILTER` config.
"filter": ["list_of_schemes_to_filter_out"],
# Used in `idutils.normalizers.to_url` function.
"url_generator": lambda scheme, normalized_pid: "normalized_url",
}
Each key is optional and if not provided a default value is defined in idutils.ext._set_default_custom_scheme_config() function.
Note: You can only add new schemes but not override existing ones.
- class idutils.ext.CustomSchemesRegistry[source]¶
Singleton class for loading and storing custom schemes from entry points.
- property custom_schemes¶
Return the registered custom registered schemes.
- Each item of the registry is of the format:
- {
“custom_scheme”: {
# See examples in idutils.validators file. “validator”: lambda value: True else False, # Used in idutils.normalizers.normalize_pid function. “normalizer”: lambda value: normalized_value, # See examples in idutils.detectors.IDUTILS_SCHEME_FILTER config. “filter”: [“list_of_schemes_to_filter_out”], # Used in idutils.normalizers.to_url function. “url_generator”: lambda scheme, normalized_pid: “normalized_url”
}
}
License¶
IDUtils is free software; you can redistribute it and/or modify it under the terms of the Revised BSD License quoted below.
Copyright (C) 2015-2018 CERN. Copyright (C) 2018 Alan Rubin.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
In applying this license, CERN does not waive the privileges and immunities granted to it by virtue of its status as an Intergovernmental Organization or submit itself to any jurisdiction.