Add option to cache integer data sets to disk #641

drroe · 2018-10-02T18:17:00Z

This PR is aimed at reducing data set memory usage in certain cases. The first data set to be used this way for testing is DataSet_integer, for use with the hbond/lifetime commands (addresses a user issue on Amber mailing list). Adds new command:

usediskcache {on|off}

When usediskcache on is specified, whenever an integer data set is allocated by the master DataSetList, it is cached to disk via a NetCDF file instead. This uses less memory at the expense of speed - seems the code is about 1-2x orders of magnitude slower when caching to disk. This will be a work in progress while I try to improve the speed somewhat.

A test is added for cached data sets.

This PR also adds a new keyword aimed at reducing memory usage in another way: for 1D standard data file reads you can use onlycols to only read certain columns from the file:

readdata <file> onlycols <range>

For example:

readdata myfile.dat onlycols 1,3-5 index 1

Will read columns 1, 3, 4, and 5, using column 1 as the index column. Note that if index is used in conjuction with onlycols, the index column must be one of the columns specified.

…ched versions.

…, lets us introduce transparent disk caching of data sets.

…e tricky to do with disk caching.

…caching to work properly

…t rely on tmpnam

… read, allocate integer data set.

…compatible.

swails · 2019-02-28T01:11:25Z

~~[ci-skip] only works for Jenkins on the amber repository, just FYI (and it has a - after the ci)~~

drroe · 2019-02-28T19:38:07Z

[ci-skip] only works for Jenkins on the amber repository, just FYI (and it has a - after the ci)

I think it works for Travis as well (see here). At least it seems to be working in this case.

swails · 2019-02-28T19:39:59Z

I stand corrected. :) Thanks

…Bytes instead of the two SizeInBytes routine. Name is better, and size estimation is only for special cases.

…duplicate routines.

hainm · 2019-03-13T09:35:22Z

I think this PR breaks pytraj build ( error in dataset integer).

https://travis-ci.org/Amber-MD/pytraj/jobs/505631275

Daniel R. Roe added 21 commits September 28, 2018 11:06

DRR - Cpptraj: Test splitting DataSet_integer into memory and disk-ca…

43a326d

…ched versions.

DRR - Cpptraj: Use switch statement to allocate sets instead of array…

db2d810

…, lets us introduce transparent disk caching of data sets.

DRR - Cpptraj: Update dependencies

b43474a

DRR - Cpptraj: Get rid of DataSet_integer iterators for now - could b…

b8c29ac

…e tricky to do with disk caching.

DRR - Cpptraj: Change reference operator to SetElement to allow disk …

8d26f13

…caching to work properly

DRR - Cpptraj: Finish initial disk cache class for integer data set.

6dd2302

DRR - Cpptraj: Add utility for making temporary file names that doesn…

5189f34

…t rely on tmpnam

DRR - Cpptraj: Should be pure virtual.

9f2d3d2

DRR - Cpptraj: Ensure actual temp file is removed as well.

13a5678

DRR - Cpptraj: Ensure definitions are ended

bf871ab

DRR - Cpptraj: Add usediskcache command

8a70c29

DRR - Cpptraj: Replace non-const bracket operator with SetElement

a25226e

DRR - Cpptraj: Add the 'onlycols' option to 1d standard data file reads.

dee06b5

DRR - Cpptraj: Add 'title' option for gnuplot output

780e3e1

DRR - Cpptraj: Add lifetime test with data set caching

1db1792

DRR - Cpptraj: Add onlycols test for 1d standard data read

2d95b3c

DRR - Cpptraj: When integer values are detected in standard data file…

88d3119

… read, allocate integer data set.

DRR - Cpptraj: Clean up some functions

7d039b2

DRR - Cpptraj: Protect with ifdefs

02695e1

DRR - Cpptraj: Revision bump. Disk caching should be fully backwards …

0d7261b

…compatible.

DRR - Cpptraj: Improve code docs.

a16a095

drroe added enhancement Work in Progress labels Oct 2, 2018

drroe self-assigned this Oct 2, 2018

Daniel R. Roe and others added 6 commits October 30, 2018 11:13

Merge branch 'master' into datasetcache

fe48179

Merge branch 'master' into datasetcache

27ede4b

Merge branch 'master' into datasetcache

cc31689

DRR - Cpptraj: Add Send and Recv functions for DataSet_integer.

52220d6

Merge branch 'master' into datasetcache

172577c

DRR - Cpptraj: Remove extra title key grab leftover from merge

0b7e732

drroe added 5 commits February 27, 2019 07:24

DRR - Cpptraj: Protect test in parallel

f5b2486

Merge branch 'master' into datasetcache

da0cfcb

DRR - Cpptraj: Add disk cache test for hbond

81114de

DRR - Cpptraj: Hide some debug info

6be736a

DRR - Cpptraj: Start implementing SizeInBytes [ci skip]

5307c21

drroe added 16 commits March 12, 2019 08:46

Merge branch 'master' into datasetcache

f0211ee

DRR - Cpptraj: Add more size routines.

3625d71

DRR - Cpptraj: Change it so DataSet has only one function, MemUsageIn…

b1e1707

…Bytes instead of the two SizeInBytes routine. Name is better, and size estimation is only for special cases.

DRR - Cpptraj: Size routines for COORDS classes

5d5f74e

DRR - Cpptraj: Size routines for 1D data sets

02261ef

DRR - Cpptraj: Add size routines for grids

e418a4c

DRR - Cpptraj: misc size routines.

0082d22

DRR - Cpptraj: Size for constant pH data sets.

814f085

DRR - Cpptraj: Remlog data set size

49b38d3

DRR - Cpptraj: Add remaining size routines.

0d67b80

DRR - Cpptraj: Ensure functions are inlined

7ea4949

DRR - Cpptraj: Print data set memory usage when listing sets. Remove …

b4d8cf2

…duplicate routines.

DRR - Cpptraj: Protect test in parallel

ac063e7

DRR - Cpptraj: Revision bump for usediskcache

a1832ea

DRR - Cpptraj: Add usediskcache entry

01224b5

DRR - Cpptraj: Fix up readdata standard options

b2d607e

drroe removed the Work in Progress label Mar 12, 2019

drroe merged commit cc6b700 into Amber-MD:master Mar 12, 2019

drroe deleted the datasetcache branch March 12, 2019 20:18

drroe mentioned this pull request Mar 13, 2019

Pytraj needs to handle new disk-cached data sets (CPPTRAJ) Amber-MD/pytraj#1478

Closed

hainm mentioned this pull request Mar 17, 2019

First steps towards pytraj testing as part of cpptraj merge gate #690

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to cache integer data sets to disk #641

Add option to cache integer data sets to disk #641

drroe commented Oct 2, 2018

swails commented Feb 28, 2019 •

edited

Loading

drroe commented Feb 28, 2019

swails commented Feb 28, 2019

hainm commented Mar 13, 2019

Add option to cache integer data sets to disk #641

Add option to cache integer data sets to disk #641

Conversation

drroe commented Oct 2, 2018

swails commented Feb 28, 2019 • edited Loading

drroe commented Feb 28, 2019

swails commented Feb 28, 2019

hainm commented Mar 13, 2019

swails commented Feb 28, 2019 •

edited

Loading