Small library for validating and normalising persistent identifiers used in scholarly communication.

Features

  • Validation and normalization of persistent identifiers.
  • Detection of persistent identifier scheme.
  • Generation of resolving links for persistent identifiers.
  • Supported schemes: ISBN10, ISBN13, ISSN, ISTC, DOI, Handle, EAN8, EAN13, ISNI ORCID, ARK, PURL, LSID, URN, Bibcode, arXiv, PubMed ID, PubMed Central ID, GND, SRA, BioProject, BioSample, Ensembl, UniProt, RefSeq, Genome Assembly.

Installation

The IDUtils package is on PyPI so all you need is:

$ pip install idutils

API

Small library for persistent identifiers used in scholarly communication.

idutils.is_isbn10(val)[source]

Test if argument is an ISBN-10 number.

Courtesy Wikipedia: http://en.wikipedia.org/wiki/International_Standard_Book_Number

idutils.is_isbn13(val)[source]

Test if argument is an ISBN-13 number.

Courtesy Wikipedia: http://en.wikipedia.org/wiki/International_Standard_Book_Number

idutils.is_isbn(val)[source]

Test if argument is an ISBN-10 or ISBN-13 number.

idutils.is_issn(val)[source]

Test if argument is an ISSN number.

idutils.is_istc(val)[source]

Test if argument is a International Standard Text Code.

See http://www.istc-international.org/html/about_structure_syntax.aspx

idutils.is_doi(val)[source]

Test if argument is a DOI.

idutils.is_handle(val)[source]

Test if argument is a Handle.

Note, DOIs are also handles, and handle are very generic so they will also match e.g. any URL your parse.

idutils.is_ean8(val)[source]

Test if argument is a International Article Number (EAN-8).

idutils.is_ean13(val)[source]

Test if argument is a International Article Number (EAN-13).

idutils.is_ean(val)[source]

Test if argument is a International Article Number (EAN-13 or EAN-8).

See http://en.wikipedia.org/wiki/International_Article_Number_(EAN).

idutils.is_isni(val)[source]

Test if argument is an International Standard Name Identifier.

idutils.is_orcid(val)[source]

Test if argument is an ORCID ID.

See http://support.orcid.org/knowledgebase/
articles/116780-structure-of-the-orcid-identifier
idutils.is_purl(val)[source]

Test if argument is a PURL.

idutils.is_url(val)[source]

Test if argument is a URL.

idutils.is_lsid(val)[source]

Test if argument is a LSID.

idutils.is_urn(val)[source]

Test if argument is an URN.

idutils.is_ads(val)[source]

Test if argument is an ADS bibliographic code.

idutils.is_arxiv_post_2007(val)[source]

Test if argument is a post-2007 arXiv ID.

idutils.is_arxiv_pre_2007(val)[source]

Test if argument is a pre-2007 arXiv ID.

idutils.is_arxiv(val)[source]

Test if argument is an arXiv ID.

See http://arxiv.org/help/arxiv_identifier and
http://arxiv.org/help/arxiv_identifier_for_services.
idutils.is_pmid(val)[source]

Test if argument is a PubMed ID.

Warning: PMID are just integers, with no structure, so this function will say any integer is a PubMed ID

idutils.is_pmcid(val)[source]

Test if argument is a PubMed Central ID.

idutils.is_gnd(val)[source]

Test if argument is a GND Identifier.

idutils.is_sra(val)[source]

Test if argument is an SRA accession.

idutils.is_bioproject(val)[source]

Test if argument is a BioProject accession.

idutils.is_biosample(val)[source]

Test if argument is a BioSample accession.

idutils.is_ensembl(val)[source]

Test if argument is an Ensembl accession.

idutils.is_uniprot(val)[source]

Test if argument is a UniProt accession.

idutils.is_refseq(val)[source]

Test if argument is a RefSeq accession.

idutils.is_genome(val)[source]

Test if argument is a GenBank or RefSeq genome assembly accession.

idutils.detect_identifier_schemes(val)[source]

Detect persistent identifier scheme for a given value.

Note

Some schemes like PMID are very generic.

idutils.normalize_doi(val)[source]

Normalize a DOI.

idutils.normalize_handle(val)[source]

Normalize a Handle identifier.

idutils.normalize_ads(val)[source]

Normalize an ADS bibliographic code.

idutils.normalize_orcid(val)[source]

Normalize an ORCID identifier.

idutils.normalize_gnd(val)[source]

Normalize a GND identifier.

idutils.normalize_pmid(val)[source]

Normalize an PubMed ID.

idutils.normalize_arxiv(val)[source]

Normalize an arXiv identifier.

idutils.normalize_pid(val, scheme)[source]

Normalize an identifier.

E.g. doi:10.1234/foo and http://dx.doi.org/10.1234/foo and 10.1234/foo will all be normalized to 10.1234/foo.

idutils.to_url(val, scheme, url_scheme='http')[source]

Convert a resolvable identifier into a URL for a landing page.

Parameters:
  • val – The identifier’s value.
  • scheme – The identifier’s scheme.
  • url_scheme – Scheme to use for URL generation, ‘http’ or ‘https’.
Returns:

URL for the identifier.

New in version 0.3.0: url_scheme used for URL generation.

Changes

Version 1.1.1 (2018-11-18)

Version 1.1.0 (2018-08-17)

  • Adds support for genomic identifiers: SRA, BioProject, BioSample, Ensembl, UniProt, RefSeq, GenBank/RefSeq.
  • Fixes bug in bibcode detection for non-capitalized journals.

Version 1.0.1 (2018-05-02)

  • Fixes bug causing invalid DOIs to be accepted.

Version 1.0.0 (2017-12-07)

  • Fixes handling of unicode characters in DOIs.
  • Adds support for APS style arXiv identifiers.

Version 0.2.4 (2017-01-30)

  • Removes Python 3.3 from a list of supported Python versions and adds Python 3.6
  • Moves from isbnid (v0.3.4) to isbnid_fork (v0.4.4) library.

Version 0.2.3 (2016-09-21)

  • Adds an optional parameter in idutils.to_url to use HTTPS scheme for PID providers that support it.
  • Detects and parses Handles and DOIs without the “http(s)://”, and ignores whitespace after scheme tags (eg. “doi: 10.123/456”).

Version 0.2.2 (2016-09-16)

  • Fixes issue where a valid ISBN with dashes and spaces could not be normalized.

Version 0.2.1 (2016-06-17)

  • Changes ISBN normalization to use isbnid instead of isbnlib. Now, importing this library will not change the default socket timeout, resulting in unwanted side effects.

Version 0.2.0 (2016-04-07)

Version 0.1.1 (2015-07-22)

  • Fixes GND validation and normalization.
  • Replaces invalid package name in run-tests.sh and makes run-tests.sh file executable. One can now use docker-compose run –rm web /code/run-tests.sh to run all the CI tests (pep257, sphinx, test suite).
  • Initial release of Docker configuration suitable for local developments. docker-compose build rebuilds the image, docker-compose run –rm web /code/run-tests.sh runs the test suite.

Version 0.1.0 (2015-07-02)

  • First public release.

Contributing

Bug reports, feature requests, and other contributions are welcome. If you find a demonstrable problem that is caused by the code of this library, please:

  1. Search for already reported problems.
  2. Check if the issue has been fixed or is still reproducible on the latest master branch.
  3. Create an issue with a test case.

If you create a feature branch, you can run the tests to ensure everything is operating correctly:

$ ./run-tests.sh

License

IDUtils is free software; you can redistribute it and/or modify it under the terms of the Revised BSD License quoted below.

Copyright (C) 2015-2018 CERN. Copyright (C) 2018 Alan Rubin.

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

In applying this license, CERN does not waive the privileges and immunities granted to it by virtue of its status as an Intergovernmental Organization or submit itself to any jurisdiction.

Authors

  • Adrian Pawel Baran
  • Alan Rubin
  • Alexander Ioannidis
  • Jiri Kuncar
  • Lars Holm Nielsen
  • Pedro Gaudencio
  • Tibor Simko