Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few questions #15

Closed
rkr8 opened this issue Oct 29, 2024 · 6 comments
Closed

A few questions #15

rkr8 opened this issue Oct 29, 2024 · 6 comments

Comments

@rkr8
Copy link

rkr8 commented Oct 29, 2024

Hi everyone,

I noticed that parallel tempering is now supported, which is fantastic! I have a few questions about some specific functionalities in the library that I’d love to clarify:

  1. Task Naming for Logging: Is there a way to customize task names based on specific parameter combinations (e.g., L, T values)? Something like this: (...) task_16_0.5 is done. Merging.. Would you recommend overloading Carlo.current_task_name(tm::TaskMaker) to achieve this?

  2. Progress Logging During Execution: Is it possible to log progress for each task while it’s running? I was thinking of inserting logging in Carlo.measure!(mc::myMC, ctx::MCContext), but I’m not sure if that’s the best approach.

  3. Manual Binsize for Rebinning: Is it possible to manually set the binsize during postprocessing? For instance, I would like to fit the autocorrelation time for each observable and then set the binsize based on the longest autocorrelation time. Additionally, is there an option to obtain directly resampled values in the merged results without averaging? This would help in further analyses, like calculating crossing points of the Binder cumulant for each sample to estimate error bars.

  4. Bootstrap Resampling in Postprocessing: Finally, can the Bootstrap resampling method be used during postprocessing?

Thanks a lot for your help, and I appreciate all the work that has gone into this project!

@lukas-weber
Copy link
Owner

Hi, thanks for your interest!

1. Task Naming for Logging

Overloading current_task_name is not going to work at this point. It sounds like a useful feature to make the logs more easy to understand. Maybe a nice way to implement this feature is to introduce an optional property tm.taskname that you can set to change the task name.

2. Progress Logging During Execution

The recommended way is to use the status command like (julia myjob.jl status) that will give you a summary of the current progress.

If you want to estimate the time your sweeps are taking and how long your simulation will take in total, you can do a premature merge (julia myjob merge) and look at the observables _ll_measure_time and _ll_sweep_time which give you the average time per measure or sweep in seconds. Maybe there’s some room for a little external tool that does this automatically.

A more silly approach (I frequently use) is to look at status in the morning and in the evening and do some mental math based on the numbers.

For debugging, I sometimes insert logging into the measure! or sweep! functions, but for parallel scenarios, this is often less easy to understand.

3. Manual Binsize for Rebinning

You can set a custom rebinning binsize by setting rebin_length in the task parameters (see TaskInfo for the full list of optional parameters).
The raw samples (averaged only over binsize, also set in task parameters) are available in the myjob.data/$TASKNAME/run$N.meas.h5 files. $N is the number of the run, i.e. the eventual work-sharing replica that was scheduled to run in parallel by Carlo.

So one possible workflow would be:
1. Run the simulation normally
2. Fit autocorrelation on the h5 files.
3. Set rebin_length in the job script.
4. Merge the results with julia myjob.jl merge

To avoid setting the rebin_length manually you can read it from a file in the jobscript

rebin_file = "myjob_rebin_lengths.json"
if isfile(rebin_file)
     tm.rebin_length = JSON.parsefile(rebin_file)[current_task_name(tm)]
end

which is ignored on the first run and then read once you have created it from your fitting script.

Let me know if that is an acceptable workflow (or if you have alternative ideas)

4. Bootstrap Resampling in Postprocessing

Concerning the Binder cumulant, last time I calculated crossings used parallel tempering enabled (in the old c++ version) I ran into a similar issue and wrote a small script that calculates it via bootstrap from the HDF5 files.

Basically, for now, for bootstrapping or more complicated analyses, you need script it yourself to run on the HDF5 files directly.
Maybe in the future it makes sense to implement bootstrapping as an alternative method to jackknifing.

@rkr8
Copy link
Author

rkr8 commented Oct 29, 2024

Hi, thank you very much for the quick and detailed response - this is very helpful! Given the workflow you suggested, I think the best approach in my case might indeed be to skip the merge step and perform the analysis directly on the HDF5 files in some script, to handle the autocorrelation & bootstrap resampling. Is there a flag or parameter I can set to bypass the built-in analysis and perform only the MC simulation itself?

Thanks again for your help!

@lukas-weber
Copy link
Owner

Sounds good. Unfortunately, that flag does not exist, but likely the cost of the analysis is negligible compared to the simulation time, so you can probably ignore it for now. On a side note, I just stumbled on Bootstrap.jl which might be useful.

If at the end you come up with something that may be useful to other people (either postprocessing logic or proposals for APIs that help working with HDF5 raw data), we can think about integrating it into Carlo somehow.

For now I’m opening some separate issues for the points you mentioned.

@rkr8
Copy link
Author

rkr8 commented Oct 29, 2024

Thank you very much, this sounds good!

@rkr8
Copy link
Author

rkr8 commented Nov 11, 2024

Hi!

Just a quick following:
For the direct postprocessing of the raw data in myjob.data/$TASKNAME/run$N.meas.h5, can I read parameters like the system size, temperature etc. directly from there? Or are they only stored in the results.json files?

Thanks a lot for your help!

@lukas-weber
Copy link
Owner

The parameters are not saved there.

The recommended workflows are either

  1. read them from the .results.json file or
  2. write your job script in a way that allows you to call the postprocessing code with knowledge about the parameters.

The second case is helpful when parameters have complicated values that are hard to reconstruct after they have been serialized to JSON.

I have enabled the Discussions feature for this repository, as a space to better fit this kind of Q&A topic. Feel free to open a new discussion for a new question.

Repository owner locked and limited conversation to collaborators Nov 11, 2024
@lukas-weber lukas-weber converted this issue into discussion #20 Nov 11, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants