Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Py_EnterRecursiveCall and Py_LeaveRecursiveCall? #2510

Closed
samuelcolvin opened this issue Jul 14, 2022 · 6 comments
Closed

Support Py_EnterRecursiveCall and Py_LeaveRecursiveCall? #2510

samuelcolvin opened this issue Jul 14, 2022 · 6 comments

Comments

@samuelcolvin
Copy link
Contributor

I'm struggling with some recursive code in pydantic-core, particularly related to wasm and pyodide, see pydantic/pydantic-core#167.

It was suggested I use Py_EnterRecursiveCall, is there any case where this would help? Is it something pyo3 could help/support?

I had assumed recursive code in rust avoided the stack and python's recursion limit even when working with python objects, but messing with setrecursionlimit is effecting when tests pass and fail in wasm.

To be honest, I'm out of my depth here, any guidance here would be appreciated. Thanks in advance.

@davidhewitt
Copy link
Member

recursive code in rust avoided the stack

Recursive Rust code definitely uses the stack - it's where all function locals and return addresses etc reside. e.g. here's the simplest stack overflow you could possibly hit: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f47a031011c7aff7d62a65519ca66700

Not sure if this is helpful, if you really need to have unlimited recursion the way I tend to do it is transform recursive function calls into loops.

e.g.

enum RecursiveData {
    Many(Vec<RecursiveData>),
    One(String),
}

fn recursive_flatten(data: RecursiveData) -> Vec<String> {
    match data {
        RecursiveData::Many(datas) => datas.into_iter().map(recursive_flatten).flatten().collect(),
        RecursiveData::One(string) => vec![string],
    }
}

can become this loop version which is bounded only by the memory of the machine:

enum RecursiveData {
    Many(Vec<RecursiveData>),
    One(String),
}

fn loop_flatten(data: RecursiveData) -> Vec<String> {
    match data {
        RecursiveData::Many(datas) => {
            let mut paused = Vec::new();
            let mut out = Vec::new();
            let mut current = datas.into_iter();
            loop {
                // Process next value from deepest iterator
                while let Some(next) = current.next() {
                    match next {
                        RecursiveData::Many(next_datas) => {
                            // Save current for later, descend into new level
                            paused.push(std::mem::replace(&mut current, next_datas.into_iter()));
                        },
                        RecursiveData::One(string) => out.push(string),
                    }
                }
                // Current is exhausted, try resuming
                if let Some(prev) = paused.pop() {
                    current = prev;
                } else {
                    // All done
                    return out;
                }
            }
        },
        RecursiveData::One(string) => vec![string],
    }
}

Whether it's possible to do such a transformation will depend on the complexity of your recursive algo :)

@samuelcolvin
Copy link
Contributor Author

Recursive Rust code definitely uses the stack

Sorry, I wasn't very clear.

What I meant was

I had assumed recursive code in rust avoided the python call stack

In other words, you could call recursive code in rust without the significant performance penalty applicable in python because rust doesn't need to do the hard work of maintaining a call stack.

I was fairly confident about that in pure rust, my question was about recursive rust code which accessed a python object, e.g. a nested dict or list.

I think an example is in order:

Let's say we have the following rust code which counts the items in a list, and any sublists:

#[pyfunction]
fn count(x: &PyAny, depth: usize) -> PyResult<i64> {
    if depth > 100 {
        Ok(0)
    } else if let Ok(list) = x.cast_as::<PyList>() {
        let mut sum: i64 = 0;
        for item in list.iter() {
            sum += count(item, depth + 1)?;
        }
        Ok(sum)
    } else {
        x.extract::<i64>()
    }
}

We can call it thus:

from ... import count

print(count([1, 2, 3], 0))
#> 6
print(count([1, 2, [4, 5]], 0))
#> 12
x = [1]
x.append(x)
print(count(x, 0))
#> 100

My questions is: how does count interact with the python call stack?

On the one hand, if I remove the if depth > 100 { check, and call this function, it seg faults, I never get a python RecursionError.

On the other hand, similar code in pydantic-core can pass or fail based on changing sys.setrecursionlimit(...) with both pypy and emscripten/wasm.

Hence my confusion.

Secondly, is the above code a good place to use Py_EnterRecursiveCall?


the way I tend to do it is transform recursive function calls into loops.

I'm afraid that's really not possible in pydantic-core it's inherently a pile of nested validators which call each other recursively.

When a recursive model needs to be validated, e.g.

class Branch:
    name: str
    sub_branches: 'list[Branch]'

It's hard to imagine a way to validate this (in a general way, with no knowledge of the schema at compile time) without recursion.

@davidhewitt
Copy link
Member

Right, sorry for my misunderstanding above.

So when you call the count(item, depth + 1) from Rust, you are indeed calling the original Rust function and not some wrapper which modifies the Python call stack.

So yes, to avoid infinite recursion causing a stack overflow a depth check is reasonable. It would probably make sense to use Py_EnterRecursiveCall (and Py_LeaveRecursiveCall) because it'll allow users to configure the depth with sys.setrecursionlimit?

Why you're having different experience with PyPy / wasm is a surprise to me, I had a quick peek at the Python implementation of the recursion limit and it looks like it's essentially just a counter on all platforms.

@davidhewitt
Copy link
Member

Circling back here, with Py_EnterRecursiveCall and Py_LeaveRecursiveCall now exposed as ffi symbols, I'm going to close this issue as I don't think we're planning to add support for a "safe" API for this for now. If anyone does have a need for this, please comment and we can discuss a design.

@davidhewitt davidhewitt closed this as not planned Won't fix, can't repro, duplicate, stale Aug 21, 2022
@davidhewitt
Copy link
Member

Circling back here, with Py_EnterRecursiveCall and Py_LeaveRecursiveCall now exposed as ffi symbols, I'm going to close this issue as I don't think we're planning to add support for a "safe" API for this for now. If anyone does have a need for this, please comment and we can discuss a design.

@samuelcolvin
Copy link
Contributor Author

Makes sense, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants