Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data too large error from very large data products #133

Closed
jordanpadams opened this issue Sep 27, 2023 · 6 comments
Closed

Data too large error from very large data products #133

jordanpadams opened this issue Sep 27, 2023 · 6 comments
Assignees
Labels
B14.1 bug Something isn't working duplicate This issue or pull request already exists s.medium

Comments

@jordanpadams
Copy link
Member

jordanpadams commented Sep 27, 2023

Checked for duplicates

Yes - I've already checked

πŸ› Describe the bug

When I did a harvest of a data set with some very large data products, I get a data too large error and the data is not loaded into the Registry.

πŸ•΅οΈ Expected behavior

I expected the loaded would be nominally loaded into the Registry.

πŸ“œ To Reproduce

  1. Download TBD data product
  2. Attempt to harvest the product
  3. Note the error
[ERROR] LIDVID = urn:esa:psa:em16_tgo_acs:data_raw:acs_raw_hk_nir_20170907t000000-20170907t055959::3.0, 
Message = [parent] Data too large, data for [indices:data/write/bulk[s]] would be [16591820628/15.4gb], 
which exceeds the limit of [16287753830/15.1gb]. Current usage: [16591415264/15.4gb], new bytes reserved: [405364/395.8kb], 
usages [request=0/0b, fielddata=0/0b, in_flight_requests=405364/395.8kb, accounting=613644/599.2kb]

πŸ–₯ Environment Info

Linux

πŸ“š Version of Software Used

3.7.6

🩺 Test Data / Additional context

TBD

πŸ¦„ Related requirements

No response

βš™οΈ Engineering Details

No response

@alexdunnjpl
Copy link
Contributor

@jordanpadams what's the best way to get a copy of the label for this product?

@jordanpadams
Copy link
Member Author

@alexdunnjpl a ping is out to the user.

@alexdunnjpl
Copy link
Contributor

@jordanpadams looking deeper into this error, it appears to be due to imminent exhaustion of the JVM heap on OpenSearch, rather than any one request/product being too large. (Presumably RAM allocation is currently 16GB on that node)

The fix here is to bump up the instance size to cope with peak throughput, and/or incorporate pause/retry behaviour in harvest.

Closing as a duplicate of #125 on that basis, since the fix for that is a fix for this.

@github-project-automation github-project-automation bot moved this from Release Backlog to 🏁 Done in B14.1 Oct 3, 2023
@jordanpadams jordanpadams added the duplicate This issue or pull request already exists label Oct 4, 2023
@jordanpadams
Copy link
Member Author

@alexdunnjpl nice sleuthing. πŸŽ‰

@alexdunnjpl
Copy link
Contributor

alexdunnjpl commented Oct 4, 2023

@sjoshi-jpl I see that psa is currently r5.4xlarge.search (128GB RAM) - did this get bumped up from r5.xlarge.search (16GB RAM) at some point recently?

@sjoshi-jpl
Copy link

@alexdunnjpl yes this was recently bumped up based on our last conversation with @jordanpadams and @tloubrieu-jpl as we discussed how PSA could be as large / resource intensive as GEO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B14.1 bug Something isn't working duplicate This issue or pull request already exists s.medium
Projects
No open projects
Status: 🏁 Done
Development

No branches or pull requests

3 participants