Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish converting calcstate to use DataSet/DataFile framework, add data file 'groupby' options #635

Merged
merged 12 commits into from
Sep 6, 2018

Conversation

drroe
Copy link
Contributor

@drroe drroe commented Sep 6, 2018

This finishes the conversion to DataSet/DataFile framework for calcstate begun in #629, converting the transitions data into DataSets that can use DataFiles. Since the transitions have a different dimension than the states data (there is more transition data than state data), this PR also adds an option to standard data file writes to group data in different ways.

	groupby <type> : (1D) group data sets by <type>.
		name   : Group by name.
		aspect : Group by aspect.
		idx    : Group by index.
		ens    : Group by ensemble number.
		dim    : Group by dimension.

The groupby option controls what data goes side by side in standard data file output. For example, normally all data sets in a standard data file are printed in columns - if a data set does not contain data for a given index, a default blank value is printed. For example, if running calcstate with 2 states, the state data will have size 3 (the 2 states and Undefined) while the transitions will probably be larger.

calcstate state 1,d1,3.0,4.0 state 2,a1,100,120 out state.dat curveout curve.agr \
  stateout States.dat transout States.dat name d1_a1

The output file may look like so:

#d1_a1[Nlifetimes] d1_a1[Avglife] d1_a1[Maxlife] d1_a1[Name] d1_a1[Xlifetimes] d1_a1[Xavglife] d1_a1[Xmaxlife] d1_a1[Xname]
                21         3.5238             10   Undefined                19          3.4737              10 Undefined->1
                19         1.3158              3           1                 1          1.0000               1 Undefined->2
                 1         1.0000              1           2                19          1.3158               3 1->Undefined
                 0         0.0000              0      NoData                 1          1.0000               1 2->Undefined

Here the state data was size 3, but the transitions data was size 4, so for the last line the state data is all 0 and the state name is NoData (blank string value). This doesn't look nice. It makes more sense in this case to group by dimension.

calcstate state 1,d1,3.0,4.0 state 2,a1,100,120 out state.dat curveout curve.agr \
  stateout States.dat transout States.dat name d1_a1
datafile States.dat groupby dim

Now the output file looks like this:

#d1_a1[Nlifetimes] d1_a1[Avglife] d1_a1[Maxlife] d1_a1[Name]
                21         3.5238             10   Undefined
                19         1.3158              3           1
                 1         1.0000              1           2

#d1_a1[Xlifetimes] d1_a1[Xavglife] d1_a1[Xmaxlife] d1_a1[Xname]
                19          3.4737              10 Undefined->1
                 1          1.0000               1 Undefined->2
                19          1.3158               3 1->Undefined
                 1          1.0000               1 2->Undefined

Much easier to read.

@drroe drroe self-assigned this Sep 6, 2018
@drroe drroe merged commit 08380d5 into Amber-MD:master Sep 6, 2018
@drroe drroe deleted the state.datasets branch September 6, 2018 14:22
drroe pushed a commit to drroe/cpptraj that referenced this pull request Sep 7, 2018
…sition output uses dataset datafile framework, changing output format slightly) and revision bump for splitcoords and 'for VAR in list'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant