Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk padding not recognized #90

Closed
Jazzdoodle opened this issue Sep 9, 2020 · 3 comments
Closed

Chunk padding not recognized #90

Jazzdoodle opened this issue Sep 9, 2020 · 3 comments

Comments

@Jazzdoodle
Copy link

The current parser assumes that the data fields in a (sub)chunk fills up the entire chunk size. I have encountered WAV files with chunk padding, that means the next chunk starts only after a few unused padding bytes. Before reading the next subchunk header, you should explicitly seek to the next subchunk start as indicated by the actual chunk size in the chunk header instead on relying on things lining up.

@Jazzdoodle
Copy link
Author

I've fixed the issue, and while I was at it I've also fixed the restriction of the format and data chunk order. I cannot be bothered to fork and make a pull request, so here's my new version of wavread in WAV.jl:

function wavread(io::IO; subrange=(:), format="double")
    chunk_size = read_header(io)
    samples = Array{Float64, 1}()
    nbits = 0
    sample_rate = Float32(0.0)
    opt = WAVChunk[]

    # Subtract the size of the format field from chunk_size; now it holds the size
    # of all the sub-chunks
    chunk_size -= 4
    # GitHub Issue #18: Check if there is enough data to read another chunk
    subchunk_header_size = 4 + sizeof(UInt32)
    fmt = WAVFormat()
    data_position = 0
    data_size = 0
    while chunk_size >= subchunk_header_size
        # Read subchunk ID and size
        subchunk_id = Vector{UInt8}(undef, 4)
        read!(io, subchunk_id)
        subchunk_size = read_le(io, UInt32)
        nextchunk_start = position(io) + subchunk_size
        if subchunk_size > chunk_size
            chunk_size = 0
            break
        end
        chunk_size -= subchunk_header_size + subchunk_size
        # check the subchunk ID
        if subchunk_id == b"fmt "
            fmt = read_format(io, subchunk_size)
            sample_rate = Float32(fmt.sample_rate)
            nbits = bits_per_sample(fmt)
            push!(opt, WAVChunk(fmt))
        elseif subchunk_id == b"data"
            data_position = position(io)
            data_size = subchunk_size
        else
            subchunk_data = Vector{UInt8}(undef, subchunk_size)
            read!(io, subchunk_data)
            push!(opt, WAVChunk(Symbol(subchunk_id), subchunk_data))
        end
        seek(io, nextchunk_start)
    end
    if data_size > 0 && data_position > 0
        seek(io, data_position)
        if format == "size"
            return convert(Int, data_size / fmt.block_align), convert(Int, fmt.nchannels)
        end
        samples = read_data(io, data_size, fmt, format, make_range(subrange))
    end
    return samples, sample_rate, nbits, opt
end

mgkuhn pushed a commit to mgkuhn/WAV.jl that referenced this issue Oct 3, 2020
up the entire chunk size. I have encountered WAV files with chunk
padding, that means the next chunk starts only after a few unused
padding bytes. Before reading the next subchunk header, you should
explicitly seek to the next subchunk start as indicated by the actual
chunk size in the chunk header instead on relying on things lining up.

I've fixed the issue (dancasimiro#90), and while I was at it I've also fixed the
restriction of the format and data chunk order.
@mgkuhn
Copy link
Contributor

mgkuhn commented Oct 3, 2020

A slightly fixed version of that fix is now PR #91.

@dancasimiro
Copy link
Owner

Fixed by #91

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants