Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isbn hyphenation #2921

Merged
merged 5 commits into from
Aug 6, 2023
Merged

Conversation

axiomizer
Copy link
Contributor

@axiomizer axiomizer commented Jul 25, 2023

For #2892

Here is information about the range message: https://www.isbn-international.org/range_file_generation. That page also has a link to download the range message, and provides the URL where you can send a request for the range message. I happened to notice that the range message had 1 very minor update over a period of about 3 days. This could help inform a decision about how often, or if, we need to update the range message in production. In this PR I included a function that can be used to update the range message, but it isn't called anywhere.

@axiomizer
Copy link
Contributor Author

axiomizer commented Jul 25, 2023

As detailed on wikipedia and in this manual, the ISBN-13 has 5 parts: GS1 prefix, registration group, registrant, publication, and checksum. The GS1 prefix is always 3 digits, and the checksum is always 1 digit. The other parts are variable length, and the range message helps determine their length for a given ISBN-13.

@axiomizer
Copy link
Contributor Author

axiomizer commented Jul 25, 2023

Tests:

>>> from bookwyrm.isbn.isbn import hyphenator_singleton as hyphenator
>>> hyphenator.hyphenate('9780439554930')  # 978-0 (English language) 3700000-6389999
'978-0-439-55493-0'
>>> hyphenator.hyphenate('9782070100927')  # 978-2 (French language) 0000000-1999999
'978-2-07-010092-7'
>>> hyphenator.hyphenate('9783518188125')  # 978-3 (German language) 2000000-6999999
'978-3-518-18812-5'
>>> hyphenator.hyphenate('9784101050454')  # 978-4 (Japan) 0000000-1999999
'978-4-10-105045-4'
>>> hyphenator.hyphenate('9786269533251')  # 978-626 (Taiwan) 9500000-9999999
'978-626-95332-5-1'
>>> hyphenator.hyphenate('9798627974040')  # 979-8 (United States) 4000000-8499999
'979-8-6279-7404-0'
>>> hyphenator.hyphenate('9786268533251')  # 978-626 (Taiwan) 8000000-9499999 (unassigned)
'9786268533251'
>>> hyphenator.hyphenate('9786769533251')  # 978 range 6600000-6999999 (unassigned)
'9786769533251'
>>> hyphenator.hyphenate('9798311111111')  # 979-8 (United States) 2300000-3499999 (unassigned)
'9798311111111'
>>>

@mouse-reeve
Copy link
Member

In my naivety I would have thought you could solve this with a simple regex! Thank you for such a thorough PR. I think it may make the most sense to update the ranegfile from a management command? But that's a question for another day.

@mouse-reeve mouse-reeve merged commit e76b44f into bookwyrm-social:main Aug 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants