Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full FUA(s) for full algo run #38

Closed
6 of 7 tasks
jGaboardi opened this issue Oct 11, 2024 · 8 comments · Fixed by #41
Closed
6 of 7 tasks

full FUA(s) for full algo run #38

jGaboardi opened this issue Oct 11, 2024 · 8 comments · Fixed by #41
Assignees
Labels

Comments

@jGaboardi
Copy link
Collaborator

jGaboardi commented Oct 11, 2024

xref: #29 (comment)

Which (if any) FUAs should we test against in our CI and where should we store the data?

  • 1133 – Aleppo, Syria, Middle East / Asia
  • 869 – Auckland, New Zealand, Oceania / Asia
  • 809 – Douala, Cameroon, Africa
  • 1656 – Liège, Belgium, Europe
  • 4617 – Bucaramanga, Colombia, S. America
  • 4881 – Salt Lake City, Utah, USA, N. America
  • 8989 – Wuhan, China, Far East / Asia

cc @martinfleis @anastassiavybornova

@martinfleis
Copy link
Contributor

If we want full coverage, we may even need more of those from simplification repo. Plus potentially those that just errored recently on Central Europe.

Everything apart from Wuhan takes minutes. But every single one of these triggered some fixes of the algo.

@jGaboardi
Copy link
Collaborator Author

So maybe let's just include them all and "suffer" from longer testing runs? It beats spending hours debugging.

The question remains as to where to store the data? We are probably looking at ≈20mb storage for the "original" and "known" results for the 6 FUA (excluding Wuhan). Not the end of the world if we have copies that live here in sgeop, but do we have a better idea to source the data directly from simplification?

@martinfleis
Copy link
Contributor

I think that this repo should be self-sufficient so we should store our test data here. But certainly not package them.

@jGaboardi
Copy link
Collaborator Author

I think that this repo should be self-sufficient so we should store our test data here. But certainly not package them.

Agreed

@jGaboardi
Copy link
Collaborator Author

But let's agree on where to put the data before I start with getting a PR in for it. For the current little testing data we just merged in we have that in sgeop/sgeop/tests/data/*. Should these full FUA results perhaps live in a top-level directory simply named data/, like over in simplification/data/*? Though perhaps a more friendly storage and naming schema than we have over there.

@anastassiavybornova
Copy link
Collaborator

Should these full FUA results perhaps live in a top-level directory simply named data/, like over in simplification/data/*? Though perhaps a more friendly storage and naming schema than we have over there.

it sounds good to me with sgeop/data/*/file

(where * is currently the FUA code, and if we want a more friendly naming schema - use city name instead?)

@jGaboardi
Copy link
Collaborator Author

@anastassiavybornova How about like the following (which does not include everything in the repo for brevity):

--- sgeop
      |--- ...
      |      |--- ...
      |
      |--- ci
      |      |--- ...
      |
      |--- data
      |      |--- README.md (for stuff in data dir)
      |      |--- <creation_script>.py
      |      |--- <fua_name>_<fua_number>
      |      |      |--- original.parquet
      |      |      |--- simplified.parquet
      |      |--- ...
      |
      |--- sgeop
      |      |--- ...
      |
      |--- README.md (top-dir)
      |--- ...

Here we'd have in the ./data/ directory:

  • a README.md + some script for (re)creating the data to assist with potential testing schema #7
  • 1 directory for each FUA's data with an input (original.parquet) and known result (simplified.parquet)

@anastassiavybornova
Copy link
Collaborator

looks good to me! @jGaboardi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants