Problem parsing large messages (i.e. 1MB) #151

gegles · 2016-07-13T02:28:12Z

Version of protoc (`protoc --version`)

3.0

Version of ProtocolBuffers.framework

ProtoBuf3.0-Swift3.0 branch (but most likely all the branches)

`.proto` file to reproduce

message Data {
    uint64              offset              = 1;
    bool                last                = 2;
    bytes               data                = 3;
}

Description

If this input.read is not able to read the full rSize of the message but only a subset, the function still returns a message with only the partial data.

At a minimum, this should give an error or a way for users to detect the issue, at best, a mechanism to progressively build the message (in multiple reads) should be provided.

Let me know I am completely off here or if more details are needed.

The text was updated successfully, but these errors were encountered:

alexeyxo · 2016-07-13T18:10:52Z

I need some details about this. How do you use this method? Do you have UnitTest or example?

gegles · 2016-07-13T18:25:56Z

I will try to give you more details or spend time building a UT for it.

In the meantime, understand that the problem is very localized to the input.read call I am referring to.

Since you do not check/use the return value (the actual number of bytes read or 0 (EOF) or -1 (error)), you basically assume input.read will always successfully read the full "maxLength" number of bytes. That is not always the case as explained in the InputStream.read() doc.

alexeyxo · 2016-07-21T09:27:34Z

Can you try last version of protobuf-swift repository?

gegles · 2016-07-23T18:50:45Z

Thanks for bring the codebase up to speed with Swift 3.0, I've already moved to it.

I haven't yet tried with large messages, I will soon, but I don't see how anything could be different, since the return value of inputStream.read() is still not being checked.

alexeyxo · 2016-07-25T13:48:49Z

You talk about "length-delimited encoding/decoding"... If your file/stream has a correct format, then all will be ok. Please look test case:

    func testDelimitedEncodingEncoding()
    {
         do {
            let str =  (NSBundle(forClass:TestUtilities.self).resourcePath! as NSString).stringByAppendingPathComponent("delimitedFile.dat")
            let stream = NSOutputStream(toFileAtPath: str, append: false)
            stream?.open()
            for i in 1...100 {
                let mes = try PBProtoPoint.Builder().setLatitude(1.0).setLongitude(Float(i)).build()
                try mes.writeDelimitedToOutputStream(stream!)
            }
            stream?.close()

            let input = NSInputStream(fileAtPath: str)
            input?.open()
            let message = try PBProtoPoint.parseArrayDelimitedFromInputStream(input!)
            XCTAssertTrue(message[1].longitude == 2.0, "")
        }
        catch
        {
            XCTFail("Fail testDelimitedEncodingEncoding")
        }
    }

I compare return value with "!= 1", then return nil(== Invalid ProtocolBuffers stream) value.

gegles · 2016-07-25T19:17:54Z

Alex,

Maybe my understanding is still off, but I do think there is a clear problem here. I think your test is only succeeding because your file-based InputStream is able to keep up and always read the "maxLength" requested fully...

Let me try to describe it better.

I have a very simple "Data" message (simplified here):

message DataMessage {
    bytes               data                = 1;
}

The code I use to read/receive this message is like this:

    func processMessage() {
        do {
            if let msg = try DataMessage.parseDelimitedFrom(inputStream: inputStream!) {
                onMessage?(msg)
            }
            print("No more message for now")
        } catch {
            print("Parsing error: \(error)")
        }
    }

Important: My InputStream is a TCP connection here (Not a filesystem-based IO)

When the message/data size is relatively small (1K to 30K), things work just fine....

When the size gets bigger, here is what happens,

A Message is still produced by parseDelimitedFrom but the data is clearly not fully there.
If I modify your AbstractMessage.mergeDelimitedFrom() function like this:

    //Delimited Encoding/Decoding
    public func mergeDelimitedFrom(inputStream: InputStream) throws -> Self? {
        var firstByte:UInt8 = 0
        if inputStream.read(&firstByte, maxLength: 1) != 1 {
            return nil
        }
        let rSize = try CodedInputStream.readRawVarint32(firstByte: firstByte, inputStream: inputStream)
        let data  = Data(bytes: [0],count: Int(rSize))
        let pointer = UnsafeMutablePointer<UInt8>((data as NSData).bytes)

    // Original
    // inputStream.read(pointer, maxLength: Int(rSize))

        // Now just printing the return value if different then rSize
    let actuallyReadSize = inputStream.read(pointer, maxLength: Int(rSize))

    if (actuallyReadSize != Int(rSize)) {
        print("DANGER: actuallyReadSize(\(actuallyReadSize)) != rSize(\(rSize))")
    }
        return try mergeFrom(data: data)
    }

I do see the following printout:
DANGER: actuallyReadSize(400729) != rSize(1048594)

Subsequent message parsing is screwed up as the current inputStream is still receiving data from the previous message.....

Yes, clearly from this, you do the proper checking "!= 1" for the "firstByte" reading. No issue with that.

The problem is that you do no such check on the second inputStream.read() and there is no guarantee that the full data can be read (in one shot)..... My understanding is that InputStream.read() is a non-blocking call that will only return what it has read (up to maxLength).

Let me know if this makes it clearer or if I am still missing something. I will see how I can come up with a solution, but I don't know your codebase too well, so it may take some time.

Cheers. G.

gegles · 2016-07-29T21:36:44Z

@alexeyxo could you just confirm to me that you too consider this as a bug?

gegles · 2016-08-17T17:49:07Z

FYI, I still "know" that there is a bug here, but for now, instead of decoding the message directly from the socket InputStream, I first make sure I read all the data for the message (into a Data buffer) and only then use CodedInputStream....

thandang · 2016-08-25T12:15:03Z

Hi @gegles
How can you read data into Data buffer?

Kind regards,

thandang · 2016-08-25T14:20:14Z

I had tried something like
`let rawInput:NSInputStream = NSInputStream(data: dataResponse)
rawInput.open()

        var buffer = [UInt8](count: dataResponse.length, repeatedValue: 0)

        while rawInput.hasBytesAvailable {
            let len = rawInput.read(&buffer, maxLength: buffer.count)
            if len > 0 {
                let outData = NSData(bytes: &buffer, length: buffer.count)
                let codedInputStream = CodedInputStream(data: outData)

                let lengthRaw = try codedInputStream.readRawVarint32()
                let lengthThrough = try codedInputStream.pushLimit(lengthRaw)
                protobufMessage = try Com.Celertech.Baseserver.Communication.Protobuf.ProtobufMessage.parseFromCodedInputStream(codedInputStream)
                codedInputStream.popLimit(lengthThrough)
            }
        }`

But still like it doesn't work. The protobufMessage sill be nil

Do you have any idea?
Cheers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem parsing large messages (i.e. 1MB) #151

Problem parsing large messages (i.e. 1MB) #151

gegles commented Jul 13, 2016

alexeyxo commented Jul 13, 2016

gegles commented Jul 13, 2016

alexeyxo commented Jul 21, 2016

gegles commented Jul 23, 2016

alexeyxo commented Jul 25, 2016

gegles commented Jul 25, 2016

gegles commented Jul 29, 2016

gegles commented Aug 17, 2016

thandang commented Aug 25, 2016

thandang commented Aug 25, 2016

Problem parsing large messages (i.e. 1MB) #151

Problem parsing large messages (i.e. 1MB) #151

Comments

gegles commented Jul 13, 2016

Version of protoc (protoc --version)

3.0

Version of ProtocolBuffers.framework

ProtoBuf3.0-Swift3.0 branch (but most likely all the branches)

.proto file to reproduce

Description

alexeyxo commented Jul 13, 2016

gegles commented Jul 13, 2016

alexeyxo commented Jul 21, 2016

gegles commented Jul 23, 2016

alexeyxo commented Jul 25, 2016

gegles commented Jul 25, 2016

gegles commented Jul 29, 2016

gegles commented Aug 17, 2016

thandang commented Aug 25, 2016

thandang commented Aug 25, 2016

Version of protoc (`protoc --version`)

`.proto` file to reproduce