Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to capture long output from C stdio #16

Open
marcom opened this issue Jul 10, 2023 · 3 comments
Open

Fails to capture long output from C stdio #16

marcom opened this issue Jul 10, 2023 · 3 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@marcom
Copy link

marcom commented Jul 10, 2023

This currently fails:

using IOCapture, Test
function test_puts(str)
    cap = IOCapture.capture() do
        @ccall puts(str::Cstring)::Cint
    end
    @test cap.value == length(str) + 1
end
test_puts("Hello World\n"^100_000)

For Julia-only code using e.g. print this works fine.

Using Libc.flush_cstdio() doesn't seem to help:

function test_puts_flush(str)
    cap = IOCapture.capture() do
        ret = @ccall puts(str::Cstring)::Cint
        Libc.flush_cstdio()
        ret
    end
    @test cap.value == length(str) + 1
end
test_puts_flush("Hello World\n"^100_000)

Related: JuliaIO/Suppressor.jl#46 and JuliaIO/Suppressor.jl#47

@mortenpi mortenpi added bug Something isn't working help wanted Extra attention is needed labels Jul 11, 2023
@mortenpi
Copy link
Member

Okay, interesting. Probably not too surprising that directly calling into C would not work properly (IOCapture just uses redirect_stdour/err functions).

I also see an issue with a short string, where we don't actually return the capture. With

function test_puts(str)
    IOCapture.capture() do
        @ccall puts(str::Cstring)::Cint
    end
end

I get

julia> test_puts("Hello World\n"^10)
(value = 121, output = "", error = false, backtrace = Ptr{Nothing}[])

value is fine, but output is empty. With a longer string, or multiple calls, it eventually shows up. It seems like there is some ~4KB buffer somewhere.

I can also replicate the issue with the long string. I guess the -1 from puts is because it runs into some error and returns the EOF? Could we extract the C error somehow?

Would be great to get this working if possible, but I don't have the bandwidth currently to go down the IO rabbit hole.

@marcom
Copy link
Author

marcom commented Jul 11, 2023

Here is a bit more systematic testing script and output, I think I can confirm your results:

Test script
using IOCapture, Test

nbytes = [10^0, 10^1, 10^2, 10^3, 10^4, 10^5, 10^6]

function test_print(str)
    cap = IOCapture.capture() do
        print(str)
    end
    # save in a bool to avoid printing long strings when test fails
    cap_output_equals_str = cap.output == str
    @test cap_output_equals_str
end

function test_puts(str)
    cap = IOCapture.capture() do
        @ccall puts(str::Cstring)::Cint
    end
    @test cap.value == length(str) + 1
    # save in a bool to avoid printing long strings when test fails
    cap_output_equals_str = cap.output == str * "\n"
    @test cap_output_equals_str
end

function test_puts_flush(str)
    cap = IOCapture.capture() do
        ret = @ccall puts(str::Cstring)::Cint
        Libc.flush_cstdio()
        return ret
    end
    @test cap.value == length(str) + 1
    # save in a bool to avoid printing long strings when test fails
    cap_output_equals_str = cap.output == str * "\n"
    @test cap_output_equals_str
end

function run_testset(fn::Function)
    @testset verbose=true "$fn" begin
        for (i, n) in enumerate(nbytes)
            # print different letters for each test
            c = collect('A':'Z')[i]
            @testset "$n" begin
                fn(c^n)
            end
        end
    end
end

@testset "IOCapture cstdio" verbose=true begin
    run_testset(test_print)
    run_testset(test_puts)
    run_testset(test_puts_flush)
end
Test script output
Test Summary:     | Pass  Fail  Total  Time
IOCapture cstdio  |   22    13     35  3.1s
  test_print      |    7            7  0.2s
    1             |    1            1  0.1s
    10            |    1            1  0.0s
    100           |    1            1  0.0s
    1000          |    1            1  0.0s
    10000         |    1            1  0.0s
    100000        |    1            1  0.0s
    1000000       |    1            1  0.0s
  test_puts       |    5     9     14  2.1s
    1             |    1     1      2  2.0s
    10            |    1     1      2  0.0s
    100           |    1     1      2  0.0s
    1000          |    1     1      2  0.0s
    10000         |    1     1      2  0.0s
    100000        |          2      2  0.0s
    1000000       |          2      2  0.0s
  test_puts_flush |   10     4     14  0.1s
    1             |    2            2  0.0s
    10            |    2            2  0.0s
    100           |    2            2  0.0s
    1000          |    2            2  0.0s
    10000         |    2            2  0.0s
    100000        |          2      2  0.0s
    1000000       |          2      2  0.0s

For the the smaller sizes (10k bytes and below), puts still reports that it has written all bytes, and adding Libc.flush_cstdio() allows the output to be captured completely. At larger sizes, puts cannot write all bytes anymore because the pipe buffer is full.

So I guess a first step would be to:

  • add a disclaimer to the README that it will fail with cstdio

Second step:

  • add a Libc.flush_cstdio() to the IOCapture code, perhaps making it optional if there is a large performance impact

Third step maybe (I'm not sure if this is correct):

  • let the capture code and the function to be captured run on different threads, so IOCapture can keep on reading from the pipe while the cstdio code is writing to it

What do you think?

Edit: looks like the code to copy from the pipe to the IOBuffer is already running on another thread

buffer_redirect_task = @async write(output, pipe)

so there must be something else going on.

@mortenpi
Copy link
Member

Regarding the async stuff -- those should be tasks, not threads. So it's still all single-threaded. But regardless, those tasks should be taking care of keeping the buffer empty.

add a disclaimer to the README that it will fail with cstdio

Yeah, happy to take a PR for this. The edge cases should definitely be documented.

  • add a Libc.flush_cstdio() to the IOCapture code, perhaps making it optional if there is a large performance impact

This also sounds good to me, and would happily accept a PR here too. I wouldn't worry too much about performance here -- I generally assume that the IOCapture calls are not very performance critical, and we can add the option to disable it later once it becomes a problem for someone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants