Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure create_table sets allocation mode #5953

Merged
merged 13 commits into from
Nov 9, 2021
Merged

Conversation

PaulWessel
Copy link
Member

I noticed that gmtmath were unable to free datasets it had allocated because the tables never had their alloc_mode set to GMT_ALLOC_INTERNALLY. No error related to this, just a memory leak I am trying to track down in gmtmath.c and this is part of it.

I noticed that gmtmath were unable to free datasets it had allocated because the tables never had their alloc_mode set to GMT_ALLOC_INTERNALLY.
@PaulWessel PaulWessel added the maintenance Boring but important stuff for the core devs label Nov 5, 2021
@PaulWessel PaulWessel added this to the 6.3.0 milestone Nov 5, 2021
@PaulWessel PaulWessel requested a review from maxrjones November 5, 2021 21:33
@PaulWessel PaulWessel self-assigned this Nov 5, 2021
maxrjones
maxrjones previously approved these changes Nov 5, 2021
@maxrjones
Copy link
Member

maxrjones commented Nov 5, 2021

Sorry, approved before noticing the CI issues - looks like there may be some problems here. Edit: many tests fail locally too.

@maxrjones maxrjones self-requested a review November 5, 2021 22:12
@maxrjones maxrjones dismissed their stale review November 5, 2021 22:13

Failing tests

@PaulWessel
Copy link
Member Author

Yep, working on it.

@PaulWessel PaulWessel changed the title Make sure create_table sets allocation mode WIP Make sure create_table sets allocation mode Nov 5, 2021
@PaulWessel
Copy link
Member Author

Problem is this: While we set the alloc_mode for a dataset we create in GMT to GMT_ALLOC_INTERNALLY, we failed to set the alloc_mode for its tables to that, so it defaulted to GMT_ALLOC_EXTERNALLY [0]. Hence, a bunch of tables created in gmtmath did not get freed and resulted in a memory leak (only seen if compiling with -DMEMDEBUG). This PR tries to fix this by setting the correct alloc_mode, but this opened up another issue that had been hidden due to the first bugs: The current example (there may be more) is when pscoast creates a dataset from DCW and then passes that to psxy for plotting. The dataset and tables are now properly alloc_mode'd but since it was created at the same allocation level as psxy (2), the psxy module destroys the data upon completion. This is not known in pscoast that also tries to free it and by that time there is junk in that memory, causing other crashes.

The general problem seems to be this:

  1. Module 1 (level 1) calls module 2 to create a dataset. This is allocated at level 2.
  2. Module 1 (level 1) then calls module 3 to make a plot with this data.

Module 3 operates at level 2 and hence seems nothing wrong in destroying memory allocated at that level.
Module 1 then tries to free it but it has already happened. Seems I need to rethink how that can be communicated. I've renamed this branch WIP. So master works fine but there are unseen memory leaks in places.

@PaulWessel PaulWessel added the bug Something isn't working label Nov 6, 2021
@PaulWessel
Copy link
Member Author

PaulWessel commented Nov 6, 2021

This PR grew a bit because there were several pretty bad situations in the API related mostly to GMT modules passing GMT_DATASET structures via GMT_IS_REFERENCE to other module, so I do not think this should have any effect on the externals, but we will leave it as WIP until @meghanrjones @seisman @weiji14 @joa-quim have tried this branch and given green light.

Here is the main things in this PR:

  1. The functions that allocate a dataset, table, or segment structure (e.g., gmt_get_dataset) now set the allocation level and mode right off the bat. This was missing in some cases and that drove the garbage collector man crazy.
  2. Module pslegend was sloppily written where perhaps 64 segments were allocated up front but then only 13 were actually used, so we just set S->n_segments to 13 without reallocating the memory like normal people would do. So once the stricter checking on when to free kicked in, this causes major troubles. Now, we properly reallocate to the needed space before passing things into other modules.
  3. When a dataset structure is passed by reference we make sure the passing object is set to NULL once the receiving object holds the data, otherwise we would have two pointers to the same memory and that is never good when one of them frees the data and then the other tries the same later...

A few comments were also added. This should strengthen the API for dataset virtual use. Please give it a spin asap.

Copy link
Member

@joa-quim joa-quim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No visible effects on Julia

@weiji14
Copy link
Member

weiji14 commented Nov 7, 2021

I got a crash on test_plot_lines_with_arrows in PyGMT. This regression test was added to PyGMT in GenericMappingTools/pygmt#490 to fix GenericMappingTools/pygmt#406, as a follow up to #3528.

Edit: triggered tests at GenericMappingTools/pygmt@388d3a2 using GMT_IN and all tests pass (see https://github.com/GenericMappingTools/pygmt/runs/4128825392?check_suite_focus=true#step:15:451).

Will see if I can find any other problems using GMT_IN | GMT_IS_REFERENCE.

Edit2: test_wiggle also crashed.

weiji14 added a commit to GenericMappingTools/pygmt that referenced this pull request Nov 7, 2021
@PaulWessel
Copy link
Member Author

I am unable to test the PyGMT until Tuesday since conda does not yet support M1 natively. If you can see what is happening in debug please let me know.

@maxrjones
Copy link
Member

That is very odd. All those pass for me in this branch. Are you sure? I just double-checked I am on the table-alloc-mode branch, pulled, built from scratch (btr) and ran ctr. On M1 I only get the ex49 failure (which I will soon post on for discussion - but it has nothing to do with this branch).

Yes, just double-checked:

git checkout table-alloc-mode
git pull
bnrnp
crb
cd rbuild
ctest --rerun-failed --verbose
261: Test command: /opt/local/bin/bash "gmtest" "test/api/apiuservec.sh"
261: Test timeout computed to be: 10000000
261: Set GMT_SESSION_NAME = 65631
261: gmtest: line 78: 65653 Abort trap: 6           "$@"
261: ERROR: gmtest:82
261: memtrack errors: 0
261: exit status: 1
2/7 Test #261: test/api/apiuservec.sh ...........***Failed    0.27 sec
test 264
    Start 264: test/api/apivectorplot.sh

264: Test command: /opt/local/bin/bash "gmtest" "test/api/apivectorplot.sh"
264: Test timeout computed to be: 10000000
264: Set GMT_SESSION_NAME = 65654
264: gmtest: line 78: 65675 Abort trap: 6           "$@"
264: ERROR: gmtest:82
264: memtrack errors: 0
264: exit status: 1
3/7 Test #264: test/api/apivectorplot.sh ........***Failed    0.10 sec
test 555
    Start 555: test/grdview/categorical.sh

555: Test command: /opt/local/bin/bash "gmtest" "test/grdview/categorical.sh"
555: Test timeout computed to be: 10000000
555: Set GMT_SESSION_NAME = 65676
555: gmtest: line 78: 65703 Abort trap: 6           "$@"
555: ERROR: gmtest:82
555: memtrack errors: 0
555: exit status: 1
4/7 Test #555: test/grdview/categorical.sh ......***Failed    0.16 sec
test 558
    Start 558: test/grdview/grdview.sh

558: Test command: /opt/local/bin/bash "gmtest" "test/grdview/grdview.sh"
558: Test timeout computed to be: 10000000
558: Set GMT_SESSION_NAME = 65704
558: gmtest: line 78: 65736 Abort trap: 6           "$@"
558: ERROR: gmtest:82
558: memtrack errors: 0
558: exit status: 1
5/7 Test #558: test/grdview/grdview.sh ..........***Failed    0.34 sec
test 562
    Start 562: test/grdview/tiles.sh

562: Test command: /opt/local/bin/bash "gmtest" "test/grdview/tiles.sh"
562: Test timeout computed to be: 10000000
562: Set GMT_SESSION_NAME = 65737
562: gmtest: line 78: 65759 Abort trap: 6           "$@"
562: ERROR: gmtest:82
562: memtrack errors: 0
562: exit status: 1
6/7 Test #562: test/grdview/tiles.sh ............***Failed    0.13 sec
test 753

@maxrjones
Copy link
Member

I am unable to test the PyGMT until Tuesday since conda does not yet support M1 natively. If you can see what is happening in debug please let me know.

Sounds good, I'll take a look.

@PaulWessel
Copy link
Member Author

I ssh'ed into macnut (Intel) and ran the tests there too and no errors. Please cd into your build/src directory and run

testapi_uservectors -q

Does it crash?

@maxrjones
Copy link
Member

I ssh'ed into macnut (Intel) and ran the tests there too and no errors. Please cd into your build/src directory and run

testapi_uservectors -q

Does it crash?

Yes:

(base) DarthMac:rbuild meghanj$ gmt --version
6.3.0_10f666e_2021.11.08
(base) DarthMac:rbuild meghanj$ src/testapi_uservectors -q
testapi_uservectors(33555,0x10fe55e00) malloc: *** error for object 0x7fd07a41f520: pointer being freed was not allocated
testapi_uservectors(33555,0x10fe55e00) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

Even more bizarre - when I run test/grdview/tiles.sh from the terminal I get an abort trap but if I try to use lldb with vscode to debug then the commands run fine.

@PaulWessel
Copy link
Member Author

If you run this in Xcode you can under Diagnostic tab select Malloc guard, malloc etc (3 separate things I think) and then it will find the first violation instead of the above error.

@PaulWessel
Copy link
Member Author

I do get lots of warnings on this one but no crash, so I will look at that here.

@PaulWessel
Copy link
Member Author

Thanks. I am working on a change to address numerous meg issues related to segments. I had a single alloc_mode variable for a segment, but when externals provide columns it is possible that only some are provided externally and others are allocated in GMT. So the segment alloc_mode must be a vector as well and we must set the mode for each data pointer to INTERNAL or EXTERNAL so that freeing can work. Still testing now after many changes.

Since GMT_Put_Vector adds individual vectors we really need to track allocation mode accordingly.
@PaulWessel
Copy link
Member Author

Updated with per-column allocation mode. Test again pass for me and a lot less memory issues in the logs. If you have not seen these before you will need to set MEM_DEBUG in the Advanced.cmake settings and then you will see things like

gmt [WARNING]: Max total memory allocated was 929.559 kb [951868 bytes]
gmt [WARNING]: Single largest allocation was 805.141 kb [824464 bytes]
gmt [WARNING]: MEMORY NOT FREED: 1.656 kb [1696 bytes]
gmt [WARNING]: Items allocated: 115 reallocated: 13 freed: 111
gmt [WARNING]: Items NOT PROPERLY FREED: 4
gmt [WARNING]: Memory not freed first allocated in gmt_api.c:2269(gmt_get_header) (ID = 82): 0.977 kb [1000 bytes]
gmt [WARNING]: Memory not freed first allocated in gmt_api.c:2270(gmt_get_header) (ID = 83): 0.609 kb [624 bytes]
gmt [WARNING]: Memory not freed first allocated in gmt_grdio.c:2790(gmt_get_grid) (ID = 80): 0.039 kb [40 bytes]
gmt [WARNING]: Memory not freed first allocated in gmt_grdio.c:2791(gmt_get_grid) (ID = 81): 0.031 kb [32 bytes]

I have a few of these I am still tracking down but the majority of the ones associated with testapi_uservectors are gone.

@maxrjones
Copy link
Member

Down to three failing tests:

555 - test/grdview/categorical.sh (Failed)
558 - test/grdview/grdview.sh (Failed)
562 - test/grdview/tiles.sh (Failed)

Here's an example simplified to one line that fails for me:

gmt grdview @earth_relief_05m -Cturbo -JM5i -T+owhite -R-1/3/-1/3 > test.ps

I think all three failures relate to the grdview -T option:

gmt/src/grdview.c

Lines 1300 to 1326 in fd54c73

if (Ctrl->T.active) { /* Plot image as polygonal pieces. Here, -JZ is not set */
double *xx = NULL, *yy = NULL;
struct GMT_FILL fill;
struct GMT_DATASEGMENT *S = gmt_get_segment (GMT);
gmt_init_fill (GMT, &fill, -1.0, -1.0, -1.0); /* Initialize fill structure */
GMT_Report (API, GMT_MSG_INFORMATION, "Tiling without interpolation\n");
if (Ctrl->T.outline) gmt_setpen (GMT, &Ctrl->T.pen);
S->data = gmt_M_memory (GMT, NULL, 2, double *);
S->n_columns = 2;
gmt_M_grd_loop (GMT, Z, row, col, ij) { /* Compute rgb for each pixel */
if (gmt_M_is_fnan (Topo->data[ij]) && Ctrl->T.skip) continue;
if (use_intensity_grid && Ctrl->T.skip && gmt_M_is_fnan (Intens->data[ij])) continue;
gmt_get_fill_from_z (GMT, P, Topo->data[ij], &fill);
if (use_intensity_grid)
gmt_illuminate (GMT, Intens->data[ij], fill.rgb);
else
gmt_illuminate (GMT, Ctrl->I.value, fill.rgb);
n = gmt_graticule_path (GMT, &xx, &yy, 1, true, xval[col] - inc2[GMT_X], xval[col] + inc2[GMT_X], yval[row] - inc2[GMT_Y], yval[row] + inc2[GMT_Y]);
gmt_setfill (GMT, &fill, Ctrl->T.outline);
S->data[GMT_X] = xx; S->data[GMT_Y] = yy; S->n_rows = n;
gmt_geo_polygons (GMT, S);
gmt_M_free (GMT, xx);
gmt_M_free (GMT, yy);
}
gmt_free_segment (GMT, &S);

@PaulWessel
Copy link
Member Author

Are you sure you are on the latest commit? gmt_get_segment takes two arguments now.

Fix bad grdview usage.
Copy link
Member

@maxrjones maxrjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problems with the gmt and pygmt (both on main and gmt-in) tests for me after the latest update to grdview.

@PaulWessel
Copy link
Member Author

OK, I still have a few memory leaks but all tests pass for me etc. I can continue to work on this but it does not have to be fixed for 6.3 since those things were already there from before.

@PaulWessel PaulWessel changed the title WIP Make sure create_table sets allocation mode Make sure create_table sets allocation mode Nov 9, 2021
@PaulWessel PaulWessel merged commit c25e954 into master Nov 9, 2021
@PaulWessel PaulWessel deleted the table-alloc-mode branch November 9, 2021 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working maintenance Boring but important stuff for the core devs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants