Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bible/Data] Restructure the Input Paths and Formats #193

Open
pishoyg opened this issue Aug 15, 2024 · 1 comment
Open

[Bible/Data] Restructure the Input Paths and Formats #193

pishoyg opened this issue Aug 15, 2024 · 1 comment
Labels
data Why: Data user Why: User convenience

Comments

@pishoyg
Copy link
Owner

pishoyg commented Aug 15, 2024

In #130 and #131, we declare our intention to modify the Bohairic Bible text.
In #123, we defined our data/ directory conventions.

Regarding directory names:

  • We should read all data under data/raw/, since these will be displayed as is.
  • We will probably only read the Bohairic from data/input/ because that is the only language that we are interested in editing at the moment.
  • Since, for languages other than Bohairic, the data under data/raw/ and data/input/ will likely remain identical, then they should be deleted from input/ to avoid confusion, and simply read from raw/. An input/ copy should be created only if we have an intention to edit the data, otherwise it would cause confusion.

Regarding format:

@pishoyg pishoyg added the user Why: User convenience label Aug 15, 2024
@pishoyg pishoyg added this to the Bible Pipeline milestone Aug 15, 2024
pishoyg added a commit that referenced this issue Aug 16, 2024
This essentially reverts 72c791f and
5714157.

We no longer intend to use JSON as the input (as opposed to raw) format.
To make the pipeline simpler, we will use TSV for input.

We also don't intend to edit all languages. We will only edit Bohairic.
@pishoyg pishoyg self-assigned this Aug 16, 2024
pishoyg added a commit that referenced this issue Aug 17, 2024
This will make it possible to keep the data private. The current setup,
which uses `curl`, forces us to make the data public.

Retrieve `JSON_KEYFILE_NAME` from the environment variables.
@pishoyg pishoyg modified the milestones: Pipeline: Bible, Data Collection v1.0, Bible v1.0 Aug 26, 2024
@pishoyg pishoyg added the data Why: Data label Aug 31, 2024
@pishoyg pishoyg removed their assignment Sep 2, 2024
@pishoyg pishoyg added this to coptic Sep 11, 2024
@pishoyg pishoyg modified the milestones: Bible v1.0, Pipeline: Bible Sep 22, 2024
@pishoyg
Copy link
Owner Author

pishoyg commented Mar 2, 2025

Status:

Abandoned:
everything else

TODO:

  • Allow reading input data from a Google Docs spreadsheet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Why: Data user Why: User convenience
Projects
Status: No status
Development

No branches or pull requests

1 participant