Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commas in non-splittable translation metadata #37

Open
amir-zeldes opened this issue Oct 1, 2019 · 3 comments
Open

Commas in non-splittable translation metadata #37

amir-zeldes opened this issue Oct 1, 2019 · 3 comments

Comments

@amir-zeldes
Copy link
Member

NT translation field contains commas, which are reserved for splittable translators in the CTS repo:

The Septuagint Version of the Old Testament, L.C.L. Brenton, 1851, available at <a href='https://ebible.org/eng-Brenton/'>ebible.org</a>

@lgessler has applied a patch to CTS repo to prevent splitting if any segmented is longer than 50 chars. Ultimately this metadatum should be fixed and possibly shortened, with faceted search in the repo in mind (we don't want to display a long value for users to search by). Once commas are removed, the fix should possibly be disabled.

@lgessler
Copy link
Contributor

lgessler commented Oct 1, 2019

Adding to what @amir-zeldes wrote, the other Bible corpora (1 Corinthians, Mark, NT) have the translation value World English Bible (WEB). Something similar to that, preferably without a hyperlink (they cause all kinds of trouble that's best avoided) would be consistent and, I think, preferable unless there's a reason we want all this information in the translation field. Even L.C.L. Brenton might be the right value here--I think the rest of the data is probably better left either unexpressed under translation or moved into other metadata fields.

@ctschroeder
Copy link
Member

Oh yes those commas would be a problem! I'm fine with L.C.L. Brenton 1851 in "translation" and moving more info+link to "source" or "source_info". (Source is usually the names of people; source_info might be better? But really it doesn't matter to me.)

@amir-zeldes
Copy link
Member Author

Alright, I've added it to the OT metagenerator script, so whenever we rerun that corpus it should get fixed. I'm not retroactively fixing it in ANNIS/GH, since that would constitute a new version, but we can aim to re-do the Bible corpora with the newest NLP in an upcoming release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants