Skip to content

Add support for Croatian#643

Open
Jetman80 wants to merge 4 commits into
savoirfairelinux:masterfrom
microdevops-com:master
Open

Add support for Croatian#643
Jetman80 wants to merge 4 commits into
savoirfairelinux:masterfrom
microdevops-com:master

Conversation

@Jetman80
Copy link
Copy Markdown

@Jetman80 Jetman80 commented Aug 2, 2025

This commit adds support for the Croatian language. Implementation includes cardinal numbers, EUR, USD, HRK currencies, tests. It is based on the Serbian language implementation as the closest one.

Changes proposed in this pull request:

  • Croatian language support (num2words/lang_HR.py)
  • Tests (tests/test_hr.py)
  • __init__.py and README.rst updates to reflect the changes

Status

  • READY
  • HOLD
  • WIP (Work-In-Progress)

How to verify this change

Run the tests for Croatian language with python3 -m pytest tests/test_hy.py -v to run tests. You can also manually test with examples like:

from num2words import num2words
print(num2words(42, lang='hr')) # četrdeset dva
print(num2words(1.5, lang='hr', to='currency', currency='EUR')) # jedan euro, pedeset centi
print(num2words(2020, lang='hr', to='year')) # dvije tisuće dvadeset

@tihomirjauk
Copy link
Copy Markdown

@Jetman80, thanks for putting this together — Croatian support has been a long gap in num2words, and the cardinal + currency implementation here is solid. I tested it against real-world Croatian text (TTS pre-normalisation for narration) and the cardinals match what a native speaker would say.

Two pieces are still missing for the PR to fully cover Croatian usage:

to_ordinal() raises NotImplementedError. Croatian ordinals are essential — “17. rujna”, “21. stoljeću”, “1986. godine” all involve ordinal forms.
to_year() falls through to to_cardinal() via the base class default, so num2words(1986, lang='hr', to='year') returns "jedna tisuća devetsto osamdeset šest" — that’s grammatically correct as a count but not how a Croatian speaker reads “1986. godine”. The natural form is "tisuću devetsto osamdeset šeste" (feminine genitive ending on the last word, “jedna tisuća” collapsed to “tisuću”).
I have a follow-up branch ready that adds both, on top of this PR’s commit:

Masculine-nominative ordinals via last-cardinal-word substitution (prvi, drugi, treći, četvrti, peti, …, dvadeseti, stoti, tisućiti)
to_year() that collapses jedna tisuća → tisuću for 1000-1999 and applies feminine-genitive endings to the last word
~30 new test cases covering ordinals 1-1,000,000 and four-digit year forms
All 1,493 existing tests still pass
I’m happy to:

(a) submit it as a separate PR after this one merges, or
(b) push it to your branch as additional commits if that’s easier for review.
Either way, would love to see this Croatian PR move forward — it’s the closest thing the Slavic-language community has to a complete num2words implementation, and it’d unblock several downstream projects (TTS pipelines, accounting localisation).

Could a maintainer (@savoirfairelinux team) take a look?

@Jetman80
Copy link
Copy Markdown
Author

Jetman80 commented May 3, 2026

(a) submit it as a separate PR after this one merges, or
(b) push it to your branch as additional commits if that’s easier for review.

@tihomirjauk I don't believe (a) can happen in the nearest future, it's been almost a year since I posted my PR. I just reference github source in my pip installs for now.

I am happy to merge commits if you provide PR to my fork, so this implementation and PR are more complete in this way. Thanks!

PR savoirfairelinux#643 implements cardinals + currency for Croatian but leaves
to_ordinal raising NotImplementedError, and to_year falls through to
to_cardinal via the base class default.

This commit:
- Implements masculine-nominative-singular ordinals (prvi, drugi, ...,
  šesti, dvadeseti, stoti, tisućiti) via last-cardinal-word substitution.
- Implements to_year() with feminine-genitive ending (the form actually
  used before 'godine' in Croatian: 1986 → tisuću devetsto osamdeset
  šeste). Collapses 'jedna tisuća' → 'tisuću' for years 1000-1999.
- Adds ~25 ordinal tests + 5 year tests covering ones (regular and
  irregular), tens, twenties, hundreds, exact powers of 10, and full
  4-digit year forms.
Add to_ordinal() and to_year() for Croatian
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants