Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geospatial Search behavior #9798

Closed
stevenferey opened this issue Aug 21, 2023 · 5 comments
Closed

Geospatial Search behavior #9798

stevenferey opened this issue Aug 21, 2023 · 5 comments
Labels
Feature: Geospatial NIH CAFE Issues related to and/or funded by the NIH CAFE project Size: 0.5 A percentage of a sprint. 0.35 hours Status: Needs Input Applied to issues in need of input from someone currently unavailable Type: Bug a defect User Role: Depositor Creates datasets, uploads data, etc.

Comments

@stevenferey
Copy link
Contributor

entrepot.recherche.data.gouv.fr team

What steps does it take to reproduce the issue?

Populate a dataset with geospatial data (Geographic Bounding Box)

  • When does this issue occur?

When we search for this dataset with the search API and with the geo_point and geo_radius parameters

  • Which page(s) does it occurs on?

The search API result page

  • What happens?

For example for a Geographic Bounding Box =
westLongitude "-2.258631"
eastLongitude "-2.392748"
northLongitude "47.518038"
southLongitude "47.496346"

Link for details : https://linestrings.com/bbox/#-2.258631,47.496346,-2.392748,47.518038
Example in dataverse demo : https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/UYLYFK

and a search from Paris (about 400km):
geo_point=48.872895,2.354527

Search results do not reflect reality:
https://demo.dataverse.org/api/search?q=*&geo_point=48.872895,2.354527&geo_radius=400 => result found (OK)
https://demo.dataverse.org/api/search?q=*&geo_point=48.872895,2.354527&geo_radius=150 => result found (KO)
https://demo.dataverse.org/api/search?q=*&geo_point=48.872895,2.354527&geo_radius=50 => result not found (OK)

  • To whom does it occur (all users, curators, superusers)?

all users

  • What did you expect to happen?

A more precise search result, depending on the geo_point and the geo_radius given as query parameters.

Which version of Dataverse are you using?

5.13, 5.14

Any related open or closed issues to this bug report?

Google Group topic :
https://groups.google.com/g/dataverse-community/c/0NynPGQAnE0

@pdurbin pdurbin added Type: Bug a defect User Role: Depositor Creates datasets, uploads data, etc. Feature: Geospatial labels Oct 13, 2023
@cmbz cmbz added the NIH CAFE Issues related to and/or funded by the NIH CAFE project label Mar 12, 2024
@cmbz cmbz moved this to SPRINT- NEEDS SIZING in IQSS Dataverse Project Mar 12, 2024
@cmbz cmbz added the Size: 10 A percentage of a sprint. 7 hours. label Mar 14, 2024
@cmbz
Copy link

cmbz commented Mar 14, 2024

2024/03/14

  • Sized at 10 for investigation, please resize based upon results
  • Also, note that the API has changed

@cmbz cmbz moved this from SPRINT- NEEDS SIZING to SPRINT READY in IQSS Dataverse Project Mar 14, 2024
@jp-tosca jp-tosca assigned jp-tosca and unassigned jp-tosca Apr 9, 2024
@jp-tosca jp-tosca moved this from SPRINT READY to This Sprint 🏃‍♀️ 🏃 in IQSS Dataverse Project Apr 25, 2024
@jp-tosca jp-tosca moved this from This Sprint 🏃‍♀️ 🏃 to In Progress 💻 in IQSS Dataverse Project May 3, 2024
@jp-tosca
Copy link
Contributor

jp-tosca commented May 3, 2024

Hi @stevenferey, I am trying to test the example that you gave us but it seems the Geographic bounds on your post are not correct.

  • What version of Dataverse are you using?
  • Could you please let me know if you were able to input this data on the system and was not caught by the validation?
image

Best,
Juan

@jp-tosca
Copy link
Contributor

jp-tosca commented May 3, 2024

Hi @stevenferey, I was talking a bit with the team, and from what I see entrepot.recherche.data.gouv.fr is using Dataverse 5.14. and I have a couple of things.

On Dataverse 6.1 we added the validation to this field as you can see in the picture that I posted, this was not on 5.14 so there is a possibility that the data that you posted (which, is invalid data) was introduced on the database. I would suggest fixing this data, and then re-index to check if this solves the search problem.

We also rename some of these fields recently and * northLongitude* and southLongitude doesn't exist anymore, they were renamed to northLatitude and southLatitude as they should be.

Best,
Juan

@qqmyers
Copy link
Member

qqmyers commented May 3, 2024

My guess would be that in earlier versions, this box was indexed as the strip around the whole Earth, excluding the small east-west region intended (as in the image). That could explain why there was a hit at 150K - the box extended directly south of Paris (and wasn't ~400K west). Flipping the east/west coords should give the expected results. If not, this is still an issue. @stevenferey - can you check and close/update this issue as appropriate?

image

@jp-tosca jp-tosca self-assigned this May 6, 2024
@scolapasta scolapasta added the Status: Needs Input Applied to issues in need of input from someone currently unavailable label May 7, 2024
@cmbz cmbz added Size: 0.5 A percentage of a sprint. 0.35 hours and removed Size: 10 A percentage of a sprint. 7 hours. labels May 8, 2024
@stevenferey
Copy link
Contributor Author

Hello,
Thank you for your feedback, I tested and the results are OK with the inverted values :

/api/search?q=*&geo_point=48.872895,2.354527&geo_radius=400 => result found (OK)
/api/search?q=*&geo_point=48.872895,2.354527&geo_radius=150 => result not found (OK)
/api/search?q=*&geo_point=48.872895,2.354527&geo_radius=50 => result not found (OK)

Data validation in the form is a good thing for data quality, thank you very much.
I close the ticket.
Steven.

@jp-tosca jp-tosca removed their assignment May 15, 2024
@DS-INRAE DS-INRAE moved this to Done in Recherche Data Gouv Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Geospatial NIH CAFE Issues related to and/or funded by the NIH CAFE project Size: 0.5 A percentage of a sprint. 0.35 hours Status: Needs Input Applied to issues in need of input from someone currently unavailable Type: Bug a defect User Role: Depositor Creates datasets, uploads data, etc.
Projects
Status: Done
Development

No branches or pull requests

6 participants