Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to cache integer data sets to disk #641

Merged
merged 48 commits into from
Mar 12, 2019

Conversation

drroe
Copy link
Contributor

@drroe drroe commented Oct 2, 2018

This PR is aimed at reducing data set memory usage in certain cases. The first data set to be used this way for testing is DataSet_integer, for use with the hbond/lifetime commands (addresses a user issue on Amber mailing list). Adds new command:

usediskcache {on|off}

When usediskcache on is specified, whenever an integer data set is allocated by the master DataSetList, it is cached to disk via a NetCDF file instead. This uses less memory at the expense of speed - seems the code is about 1-2x orders of magnitude slower when caching to disk. This will be a work in progress while I try to improve the speed somewhat.

A test is added for cached data sets.

This PR also adds a new keyword aimed at reducing memory usage in another way: for 1D standard data file reads you can use onlycols to only read certain columns from the file:

readdata <file> onlycols <range>

For example:

readdata myfile.dat onlycols 1,3-5 index 1

Will read columns 1, 3, 4, and 5, using column 1 as the index column. Note that if index is used in conjuction with onlycols, the index column must be one of the columns specified.

Daniel R. Roe added 21 commits September 28, 2018 11:06
…, lets us introduce transparent disk caching of data sets.
@swails
Copy link
Contributor

swails commented Feb 28, 2019

[ci-skip] only works for Jenkins on the amber repository, just FYI (and it has a - after the ci)

@drroe
Copy link
Contributor Author

drroe commented Feb 28, 2019

[ci-skip] only works for Jenkins on the amber repository, just FYI (and it has a - after the ci)

I think it works for Travis as well (see here). At least it seems to be working in this case.

@swails
Copy link
Contributor

swails commented Feb 28, 2019

I stand corrected. :) Thanks

@drroe drroe merged commit cc6b700 into Amber-MD:master Mar 12, 2019
@drroe drroe deleted the datasetcache branch March 12, 2019 20:18
@hainm
Copy link
Contributor

hainm commented Mar 13, 2019

I think this PR breaks pytraj build ( error in dataset integer).

https://travis-ci.org/Amber-MD/pytraj/jobs/505631275

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants