-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DLB_TALP_Attach() creates the shared-memory segment if it does not exist yet #7
Comments
I created a commit on my fork (link). |
I'm not sure. The thing with Imagine you initiate first just a monitor program. If there's no other DLB program running, the monitor program will exit with an error because there's no shared memory to attach to. With the suggested change, one needs to start an application that uses TALP before a third-party program may attach to it. Whereas now, the third-party program may start, and sit idle waiting for TALP processes to start and monitor. Is calling |
My monitoring code looks like:
If I run the following command:
Running the monitoring code after this make it fetch
The monitoring code loops forever to get the |
By the way, thank you for exaplaining the intended behaviour. This is helpful and I understand now that ideally creating a shared-memory is not a problem as long as it works as intended. |
Oh, I see. There's a bug when
If you need it to work right now, I can think of a workaround: diff --git a/src/LB_comm/shmem_procinfo.c b/src/LB_comm/shmem_procinfo.c
index 04ab8e4..7bb9a59 100644
--- a/src/LB_comm/shmem_procinfo.c
+++ b/src/LB_comm/shmem_procinfo.c
@@ -244,7 +244,8 @@ static int shmem_procinfo__init_(pid_t pid, pid_t preinit_pid, const cpu_set_t *
if (shdata->allow_cpu_sharing != allow_cpu_sharing) {
// For now we require all processes registering the procinfo
// to have the same value in 'allow_cpu_sharing'
- error = DLB_ERR_NOCOMP;
+ // error = DLB_ERR_NOCOMP;
+ shdata->allow_cpu_sharing = allow_cpu_sharing;
}
} In any case, in the following days I will try to upload a proper fix. Thanks. |
Thanks Victor for the intermediate fix. I can confirm that this works. |
Right, I've done some tests with an external profiler doing Thanks for pointing it out, I will do a fix for all these things in this issue, no need for creating another for now. |
I think it should be fixed, but let us know if you find anything. You can also undo the workaround in We've also implemented a function to do Using DLB_TALP_Attach();
while(...) {
int pidlist[MAX_PROCS], nelems;
DLB_TALP_GetPidList(pidlist, &nelems, MAX_PROCS);
for(n in nelems) {
double mpi_time, useful_time;
if (DLB_TALP_GetTimes(pid, &mpi_time, &useful_time) == DLB_SUCCESS) {
printf("Found pid: %d, mpi_time: %f s, useful_time: %f s\n",
pid, mpi_time, useful_time);
}
}
}
DLB_TALP_Detach(); Using DLB_TALP_Attach();
while(...) {
dlb_node_times_t node_times[MAX_PROCS];
DLB_TALP_GetNodeTimes(DLB_MPI_REGION, node_times, &nelems, MAX_PROCS);
for(n in nelems) {
printf("Found pid: %d, mpi_time: %"PRId64" ns, useful_time: %"PRId64" ns\n",
node_times[n].pid,
node_times[n].mpi_time,
node_times[n].useful_time);
}
}
DLB_TALP_Detach(); You could also call |
Calling
DLB_TALP_Attach()
from outside callsshmem_cpuinfo_ext__init()
andshmem_procinfo_ext__init()
. They both callopen_shmem()
->shmem_init()
->shm_open()
+ftruncate()
. This creates the segment even when it does not exist.Perhaps something similar to what is done in
DLB_DROM_PreInit()
(callsshmem_procinfo_ext__preinit()
) to check for its existence would be helpful. Or, it can be checked from/dev/shm
as well.The text was updated successfully, but these errors were encountered: