Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ER: avoid explosion in memory requirements for simple foreach loops #148

Closed
pkoppstein opened this issue Nov 18, 2021 · 6 comments
Closed

Comments

@pkoppstein
Copy link

pkoppstein commented Nov 18, 2021

I've appended a simplified version of a foreach-within-foreach program from
https://rosettacode.org/wiki/Law_of_cosines_-_triples#jq
together with some /usr/bin/time statistics that illustrate the problem.

Hopefully the combination of the program's simplicity and its explosive
growth in memory requirements will make it easy to identify a way to
improve gojq.

The memory requirements should be sublinear in N (as illustrated by the
statistics for jq) but gojqMaster's memory requirements are quadratic or
worse, even though there is no possibility of backtracking, and even
though the state variable is just a two-key object, both keys of which
are numeric.

For n=10,000, gojq fails to complete the task on a machine with 16GB of RAM.

My guess is that the explosion is related to the foreach-within-foreach structure of the program.

rc-law-of-cosines.gojq

def squares(n):
  reduce range(1; 1+n) as $i ({}; .[$i*$i|tostring] = $i);

def solve(maxLen):
  squares(maxLen) as $squares

  | def qsqrt($n):
    $squares[$n|tostring] as $sqrt
    | if $sqrt then $sqrt else null end;

  reduce range(1; maxLen+1) as $a ({};
      reduce range($a; maxLen+1) as $b (.;
        .lhs = $a*$a + $b*$b
        | .lhs += ( - $a*$b)
        | qsqrt(.lhs) as $c
        | if $c != null
          then if $a != $b or $b != $c
               then .solutions += 1
               else .
               end
           else .
           end
        )
    )
  | .solutions ;

def task2(n):
  "For sides in the range [1, \(n)] where they cannot ALL be of the same length:",
  (solve(n)
   | "  For an angle of 60 degrees, there are \(.) solutions.") ;

task2($n)

jqMaster

for n in 100 1000 2000 10000 ; do /usr/bin/time -lp jqMaster -r --argjson n $n -ncf rc-law-of-cosines.gojq ; done
For sides in the range [1, 100] where they cannot ALL be of the same length:
For an angle of 60 degrees, there are 70 solutions.
user 0.05
sys 0.00
2043904 maximum resident set size

For sides in the range [1, 1000] where they cannot ALL be of the same length:
For an angle of 60 degrees, there are 1260 solutions.
user 4.93
sys 0.05
2170880 maximum resident set size

For sides in the range [1, 2000] where they cannot ALL be of the same length:
For an angle of 60 degrees, there are 2872 solutions.
user 17.21
sys 0.15
2166784 maximum resident set size

For sides in the range [1, 10000] where they cannot ALL be of the same length:
For an angle of 60 degrees, there are 18394 solutions.
user 515.65
sys 4.81
3862528 maximum resident set size

gojqMaster

for n in 100 1000 2000 10000 ; do /usr/bin/time -lp gojqMaster -r --argjson n $n -ncf rc-law-of-cosines.gojq ; done
For sides in the range [1, 100] where they cannot ALL be of the same length:
For an angle of 60 degrees, there are 70 solutions.
user 0.05
sys 0.00
10391552 maximum resident set size

For sides in the range [1, 1000] where they cannot ALL be of the same length:
For an angle of 60 degrees, there are 1260 solutions.
user 6.11
sys 0.22
423104512 maximum resident set size

For sides in the range [1, 2000] where they cannot ALL be of the same length:
For an angle of 60 degrees, there are 2872 solutions.
user 24.85
sys 0.92
1657319424 maximum resident set size

terminated

@wader
Copy link
Contributor

wader commented Nov 20, 2021

I wonder if this might be related to the memory usage observed in #86? (the later discussion not related to TCO).

@itchyny
Copy link
Owner

itchyny commented Nov 21, 2021

This is the issue of the absence of reference counting in gojq, due to difficulty of interface design for custom internal functions. Someone suggested destructive mode (LINK) to make their generated code to run, but I didn't accept the patch for various reasons.

@pkoppstein
Copy link
Author

@itchyny - I believe that in a foreach loop (or a reduction), when the state variable is a "flat object" or "flat array", it should be relatively straightforward to manage memory without reference counting, precisely because the state is flat, since jq atoms are immutable.

@itchyny
Copy link
Owner

itchyny commented Nov 21, 2021

I don't get how straightforward it is but considering a reducing query with comma in its update query, reduce range(10) as $x ({}; .x = 1, .y = 1), state value needs to be copied in some cases.

@pkoppstein
Copy link
Author

Please note that I'm not suggesting that all book-keeping can be
magically waved away. I'm mainly suggesting that it should not be
too hard to avoid explosive memory requirements when handling simple
loops with simple state.

Interestingly, however, jq'sreduce semantics are quite simple.
Consider, for example:

gojqMaster -nc 'reduce range(3) as $x ({}; .x += 1, .y += 2)'
{"y":6}

In other words, the "left-hand" fork is effectively ignored: reduce really does reduce relentlessly!

By contrast, foreach does exhibit some forking behavior, but not exponentially:

gojqMaster -nc 'foreach range(3) as $x ({}; .i = $x | (.x += 1, .y += 2))'
{"i":0,"x":1}
{"i":0,"y":2}
{"i":1,"x":1,"y":2}
{"i":1,"y":4}
{"i":2,"x":1,"y":4}
{"i":2,"y":6}

That is, at the end of each iteration, there are only two JSON objects,
not 2^n as one might expect. (Maybe this was an oversight?)

Anyway, however many state objects (using
"object" in a generic sense) may be needed at the end of each
iteration, and however much copying may be needed to obtain them, it
should be possible to avoid retaining copies of the prior states of
those objects, without requiring reference counting.

Thanks.

@itchyny
Copy link
Owner

itchyny commented Nov 25, 2021

There is nothing I can do to this ambiguous request so closing. The reduce body of rc-law-of-cosines.gojq is complex enough for me. But patch suggestion is welcome.

@itchyny itchyny closed this as completed Nov 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants