Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lang tags: using BCP47 instead of ISO639-1 codes #113

Open
eroux opened this issue Sep 15, 2022 · 2 comments
Open

lang tags: using BCP47 instead of ISO639-1 codes #113

eroux opened this issue Sep 15, 2022 · 2 comments

Comments

@eroux
Copy link

eroux commented Sep 15, 2022

Hello, first thank you very much for your work on hocr! I'm part of an organization that gets hocr from Google Books and I'm quite new to the specification. Something that caught my eye is the reference to ISO639-1 for language codes. Since it doesn't contain all language codes, I think referring to BCP47 is more generic and future-proof. What do you think? It's a retro-compatible change since ISO639-1 tags are BCP47 compliant (at least in a first approximation)

@kba
Copy link
Owner

kba commented Sep 15, 2022

I don't feel strongly either way, but it might be a good opportunity to align with how ALTO and PAGE handle language/script.

In ALTO we decided on using what xsd:language expects, i.e. RFC 1766, which in turn references ISO639-1. IIUC this might not be expressive enough for your puproses?

@eroux
Copy link
Author

eroux commented Sep 15, 2022

thanks for your answer!

My understanding of the latest XSD spec is that it requires BCP47 lang tags, the 1.0 spec indeed refers to RFC1766. I don't think there might be any reason why RFC1766 should be recommended instead of BCP47, but perhaps there are some?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants