Skip to content

Call 15 Dec 2016

Matteo Turilli edited this page Mar 8, 2017 · 1 revision

Notes

  • Mockup example tested locally. Small scale example also tested both locally and on remote (Stampede).
  • Currently, we have N "meshfem" tasks for N "specfem" tasks. It might be useful to have just 1 "meshfem" task for N "specfem" tasks. (from email exchanges over the mailing list).

Some points to discuss:

  • If OLCF policy is to limit the number of tasks that read shared files concurrently, wouldn't N "meshfem" tasks be the approach to take ? Regardless, we can create copies of the files to avoid reading the same file.

  • If we have to use multiple machines: Titan for simulation, Rhea for analysis (both blocking tasks), we have to either have A) 2 pilots (Titan, Rhea) for the entire experiment (multiple iterations) or B) submit pilots on Titan, Rhea as and when required.

Approach A leads to some duration of the pilot spent with 0 utilization of resources and B leads to multiple wait times due to multiple pilot submissions.

  • Will be sending out an RP example to help understand the pilot concept in which we look at
    • number tasks * number of cores/task = total no. of cores acquired
    • number tasks * number of cores/task > total no. of cores acquired

TODOs

  • VB: Send the RP example (Nt >> Nr) [Done on 12/16/2016]
  • VB: Work on extending the current example with different parameters ( details in email thread with Matthieu) From email (Matthieu): [Done on 12/25/2016][Due on 1/1/2017]
For the next step, it would be nice to be able to:

· Launch “mesfem” only once. Its output is used by all instances of “specfem” and it needs to be run only once per iteration
· (Matthieu) Reproduce what you did with the mockup version, that is using the “simultaneous run” feature. That is setting:
  o NUMBER_OF_SIMULTANEOUS_RUNS = 2  (for instance)
  o BROADCAST_SAME_MESH_AND_MODEL = .true.

In https://github.com/radical-cybertools/radical.ensemblemd/blob/usecase/seisflow/examples/usecase_seisflow/regional_Greece_small/input_data/DATA/Par_file#L287
  • VB: Work towards an example that can use multiple HPCs [TBD]

Next meeting: TBD over emails

Clone this wiki locally