-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large Data Storage in Drake #6124
Comments
Sounds good to me. |
Its important that the Bazel integration offer a way to prefetch the data files without compiling nor running all of the tests, to support road warrior builds. Even better if |
|
Quick question about GitHub authentication: |
Not sure that functionality is there at the moment, but I imagine we can add it for you. |
All issues must have owners. I'm assigning one arbitrarily. (Fix it up if I'm wrong.) |
As a latent update, we've been using a flavor of this repo in Anzu for about a year now: \cc @RussTedrake |
@EricCousineau-TRI I'm not sure what additional action we should anticipate under the umbrella of this issue? Is there more we should do, or should we close this issue? We have a few other issues open about handling model assets via |
I'd like to move for keeping this open for 2 months. I'd like to come back to this and prototype using I've set a calendar item for me to close this if I do not get back to it. |
If the only further action is you doing some personal testing, then it seems more suitable to keep that in your personal TODO list instead of the team's collaboratively-maintained TODO list. On the other hand, we've kept this open for several years without any change, so I can't really object to another two months, either. |
Didn't make it happen in time, gonna close. Can re-open later if need be. |
EDIT (eric), as of 2019-01-29: I've generalized this title to not necessarily be Girder-specific, but just handle large data storage in general.
The text below is relevant to stuff being Girder-specific.
This issue is to land some information on the table and permit discussion for interested parties. The problem being solved here is how to store, retrieve and consume large data (meshes, ...) files in the drake bazel workspace.
Kitware recently presented a demo triggered by previous discussions w/ David and others.
From @jamiesnape:
The backstory is really these issues, and a discussion that we had when we visited Cambridge in March, of which you possibly have the minutes:
#3257
In this demo, we will show using Girder to store large object files referenced from a Git repository. Girder is a scalable, extensible open source, Python based data management framework for the web, developed by Kitware based on years of experience working in the scientific-data-management space. For this demo, we have deployed Girder to Amazon EC2, and are using Amazon S3 as a file storage backend to match the existing type of infrastructure that Kitware maintains for the Drake project.
We have created an example code repository (https://github.com/jcfr/bazel-large-files-with-girder) with a Bazel build system and test files. The test files are STL meshes, and rather than an actual unit test, our “tests" will display a mesh viewer to demonstrate the current contents of the test object file. Using this system, a developer can add a large test file as test data, the test file will be stored in Girder and only a description of the SHA-512 checksum of the full object will be added to the Git repository. The test file can change contents, and Girder will support hosting the multiple versions of the file, along with downloading the full object contents via its SHA-512 checksum. Our approach directly integrates with the Bazel build system to leverage its dependency resolution mechanism in order to selectively download to the sandbox only those files needed to run the specific tests requested, potentially caching the downloaded files.
Interesting features
In it's current state, for this particular use case, it could be almost a drop-in. Note that this is separate from the OSRC work to develop a more general solution than can support use cases beyond a bazel workspace (it could feasibly amalgamate with this bazel support once in place).
Current State
We have ~100MB+ of data files in drake. There is an occasional decision paralysis when deciding on whether to add more data files or not. There is not a strict need right now, but an anticipated need.
Proposal (after discussion with @sammy-tri)
Have kitware drop-in a solution (while they're active on it). Trial it for a couple of months and see if there is uptake - this will answer the question of whether the anticipated need is real or otherwise.
The text was updated successfully, but these errors were encountered: