Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbounded %MEM growth #2628

Open
dcnorris opened this issue Oct 19, 2024 · 4 comments
Open

Unbounded %MEM growth #2628

dcnorris opened this issue Oct 19, 2024 · 4 comments

Comments

@dcnorris
Copy link

dcnorris commented Oct 19, 2024

Below is a self-contained repro for my problem. The unbounded %MEM growth in top apparently happens where I try to use d_n_qs_int/4 'backwards' to convert the integer 4th argument to the list-of-quotients 3rd arg. (I would really like not to have to implement separate 'forward' and 'backward' predicates! 🤡 )

I'm very glad to pare this repro down further, if the problem is not immediately apparent.

% Repro for unbounded mem growth on large inputs
:- use_module(library(lists)).
:- use_module(library(clpz)).
:- use_module(library(lambda)).
:- use_module(library(error)).
:- use_module(library(debug)).

clpz:monotonic.

q_r(T/N, T:U) :- 0 #=< #T, 0 #=< #U, #N #= T + U.

qs_Ts_Us(Qs, ΣTs, ΣUs) :-
    maplist(\Q^T^U^(q_r(Q, T:U)), Qs, Ts, Us),
    intlist_partsums(Ts, ΣTs),
    intlist_partsums(Us, ΣUs).

intlist_partsums([X|Xs], [X|Ss]) :-
    intlist_partsums_acc(Xs, Ss, X).

intlist_partsums_acc([], [], _).
intlist_partsums_acc([X|Xs], [S|Ss], A) :-
    #S #= #X + #A,
    intlist_partsums_acc(Xs, Ss, S).

qs_d_nmax(Qs, D, Nmax) :-
    length(Qs, D),
    maplist(\Q^T^N^(Q = T/N), Qs, Ts, Ns),
    Ns ins 0..Nmax, label(Ns),
    maplist(\T^N^(T in 0..N), Ts, Ns), label(Ts).

% d_n_qs_int(+D, +N, ?Qs, ?K)
d_n_qs_int(D, N, Qs, K) :-
    #B #= #D * #N + 1, % K is a base-(DN+1) number
    #M #= #B ^ #D, % M-1 is maximum D-digit, base-B number
    length(Qs, D),
    qs_Ts_Us(Qs, Ts, Us),
    base_digits_int(B, Ts, TK),
    base_digits_int(B, Us, UK),
    %   Top D digits +  Low D digits
    #K #= (#M * #TK) + (#M - 1 - #UK). % a (2*D)-digit number

horner(X, A, P0, P) :- #P #= #A + #X * #P0. % https://en.wikipedia.org/wiki/Horner%27s_method

base_digits_int(B, Digs, K) :-
    #Bminus1 #= #B-1,
    Digs ins 0..Bminus1,
    foldl(horner(B), Digs, 0, K).

/*
?- D=3, Nmax=6,
   L+\(findall(Q, qs_d_nmax(Q, D, Nmax), Qs),
   maplist(d_n_qs_int(D,Nmax), Qs, Ks),
   sort(Ks, SKs),
   same_length(SQs, Qs),
   maplist(same_length, SQs, Qs),
   maplist(d_n_qs_int(D,Nmax), SQs, SKs),
   length(SKs, L)).
%@    error('$interrupt_thrown',repl/0). %MEM in 'top' grows w/o limit
%@    D = 2, Nmax = 6, L = 784. % D=2 case completes in mere seconds
*/

% For context, the rationale for above is the following predicate,
% which accepts a _partially ordered_ +Qs and returns -SQs sorted
% according to a *complete* order that contains the partial order.
% In my application, this enables certain calculations to be done
% in a single pass through a large list such as the one generated
% by findall/3 above.
qs_sorted(Qs, SQs) :-
    N = 6, % TODO: Generalize
    maplist(d_n_qs_int(D,N), Qs, Ks),
    sort(Ks, SKs),
    same_length(SQs, Qs),
    maplist(same_length, SQs, Qs),
    maplist(d_n_qs_int(D,N), SQs, SKs).
@triska
Copy link
Contributor

triska commented Oct 20, 2024

One general note, after a brief look at this example: Some computations inherently take a lot of space, this does not necessarily mean that the memory growth is unbounded.

In this concrete example, there seems to be a huge internal "branching" of terms. For instance, if we consider just the following fragment:

?- D=3, Nmax=6,
   L+\(findall(Q, qs_d_nmax(Q, D, Nmax), Qs),
       length(Qs, L)).
   D = 3, Nmax = 6, L = 21952.

The step from D = 2 to D = 3 causes a huge increase in this length, from 784 to more than 20_000 elements.

This list with more than 20_000 elements is represented in memory, and subsequent goals seem to fan this out further.

I once had a similar situation with a huge Binary Decision Diagram (more than 40 GB) I wanted to represent in memory. For such purposes, you can rent a server with 256 GB of RAM for less than 200 USD per month:

https://www.hetzner.com/de/dedicated-rootserver/matrix-ex/

@dcnorris
Copy link
Author

dcnorris commented Oct 20, 2024

My little laptop has 64GB ram, I think. I know I will soon want to refine the generation of test cases to something less profligate, and no doubt I will take advantage of CLP(ℤ) for this! But my sense is that it's the "fanning-out" of subsequent goals you reference that really causes the problem. (I conjecture that a more 'eager' clpz might just eat the goals as they are posted, but don't know how to express that with any sophistication. I'm going to try rewriting the encoding predicate in the hope of promoting such eagerness.)

%?- D=2, Nmax=6, time(L+\(findall(Q, qs_d_nmax(Q, D, Nmax), Qs), length(Qs, L))).
%@    % CPU time: 0.068s, 249_976 inferences
%@    D = 2, Nmax = 6, L = 784.

%?- D=3, Nmax=6, time(L+\(findall(Q, qs_d_nmax(Q, D, Nmax), Qs), length(Qs, L))).
%@    % CPU time: 1.519s, 6_660_470 inferences
%@    D = 3, Nmax = 6, L = 21952.

%?- D=4, Nmax=6, time(L+\(findall(Q, qs_d_nmax(Q, D, Nmax), Qs), length(Qs, L))).
%@    % CPU time: 41.612s, 182_781_624 inferences
%@    D = 4, Nmax = 6, L = 614656.

@triska
Copy link
Contributor

triska commented Oct 20, 2024

Also, Scryer Prolog does not yet have a garbage collector and therefore currently reclaims memory only on backtracking.

You can use this to reclaim unneeded memory, by wrapping a computation in findall/3 (which uses a failure-driven loop internally to find all solutions), especially if you need only a small number of terms as the result of the computation. For instance, with findall(N, computation(..., N, ...), [N]), you would get only the integer N (as presumed result of a computation), while the memory that was allocated during that computation is efficiently reclaimed. That's not elegant, but may work as a last resort and should of course ideally be kept as localized as possible.

Another way out could be store what you need more compactly in memory. For instance, you can use the fundamental theorem of arithmetic to store a sequence of positive integers [A,B,C,...] as a single integer by regarding the numbers as powers of different primes and computing a single composite number where the sequence can be read off from the prime decomposition when you need it, i.e., 2A×3B×5C×···. If you can represent a sequence of, say 6 integers, as a single integer that fits in a 64-bit WAM cell, then this will reduce the memory overhead by a factor of 18.

Another approach may be to introduce program-specific "abbreviations" of sequences or other data structures that occur frequently in your program, such as compressed(1), compressed(2), compressed(3) as variants of (say) verbatim([1,2,3]), verbatim(4,3,0) etc.

Yet another way could be to make use of Scryer Prolog's compact internal string representation, and store a sequence of integers as a list of characters with these code points. For instance, we would expect the following to create such a compact string from a list of codes, and also automatically reclaim data that was created during the conversion:

:- use_module(library(lists)).
:- use_module(library(iso_ext)).

codes_to_chars(Codes, Chars) :-
        findall(Cs, (maplist(char_code, Cs0, Codes),
                     partial_string(Cs0, Cs, [])),
                [Chars]).

Example:

?- codes_to_chars([1,2,3,4], Chars).
   Chars = "\x1\\x2\\x3\\x4\".

Chars can be represented extremely compactly in memory, especially with the rebis-dev development branch, which is in fact also needed to reclaim such strings on backtracking.

@triska
Copy link
Contributor

triska commented Oct 22, 2024

One additional point: The rebis-dev branch stores short atoms in a single WAM heap cell (64-bit) without affecting the atom table (and which therefore also yields efficient memory reclamation on backtracking!), so you can store a single atom that takes up to 6 bytes (small integers) very compactly:

?- atom_codes(A, [1,2,3,4,5,6]).
   A = '\x1\\x2\\x3\\x4\\x5\\x6\'.

dcnorris referenced this issue in Precisfice/DEDUCTION Oct 22, 2024
Also implements what *should* (algorithmically) have been
[I think] a more efficient computation d_gs/2 of all D+1 gₓ,
but without demonstrated speedup and still very high peak
memory usage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants