Skip to content
This repository has been archived by the owner on May 13, 2022. It is now read-only.

Support for stream block processing #37

Open
DoCode opened this issue Jun 19, 2018 · 5 comments
Open

Support for stream block processing #37

DoCode opened this issue Jun 19, 2018 · 5 comments
Assignees
Milestone

Comments

@DoCode
Copy link

DoCode commented Jun 19, 2018

Provide support for stream block processing, like a default .NET HashAlgorithm:

var sourceStream = ... // From anywhere
var hashAlgorithm = ... // HashFunction

var bufferSize = 8192;
using (Stream stream = new MemoryStream)
{
    var buffer = new byte[bufferSize];
    int bytesRead;
    while ((bytesRead = sourceStream.Read(buffer, 0, buffer.Length)) > 0)
    {
        hashAlgorithm.TransformBlock(buffer, 0, bytesRead, null, 0);

        stream.Write(buffer, 0, bytesRead);

        blobLength += bytesRead;
    }

    hashAlgorithm.TransformFinalBlock(new byte[0], 0, 0);
}
@DoCode DoCode changed the title Stream block processing Support for stream block processing Jun 19, 2018
@netclectic
Copy link

+1 for something like this.

it would be nice to use an action, similar to whats already happening with the foreach methods in the IUnifiedData, something like this...

            using (var outputStream = new MemoryStream())
            {
                hash = _hash.ComputeHash(inputStream, outputStream.Write);
            }

@brandondahler
Copy link
Owner

I'm considering that it might make sense to do something like:

IHashValue ComputeHash(Stream inputStream, Stream outputStream, CancellationToken cancellationToken);
IHashValue ComputeHashAsync(Stream inputStream, Stream outputStream, CancellationToken cancellationToken);

@netclectic
Copy link

Yep, perfect. I tried out the action method with the xxhash function that I've been using and managed to make it work but having input / output stream would make more sense.

@netclectic
Copy link

I had a look through your work WIP, any reason why you didn't add an output stream to the byte array methods?

I made a fork and implemented it on those methods to do some testing with. I can make a PR if you're interested. https://github.com/netclectic/Data.HashFunction/commit/c16e7794d719a55c804a1f3369299043f59c2253

@brandondahler
Copy link
Owner

I recognize its been over a year, but I'm now taking another look at this.

Use cases to be solved for

Read + calculate hash value

Have a stream of some unknown (possibly large size), for instance from the network or file system.
With that stream you want to a) calculate the hash value of the data and b) doing some other processing in the same sized chunk of data, all without reading more than necessary into memory or re-buffering the data.

Write + calculate hash value

Have a stream of some unknown (possibly large size), for instance from the network or file system.
With that stream you want to a) calculate the hash value of the data and b) stream that data to some other endpoint, all without reading more than necessary into memory or re-buffering the data.

Current WIP solution

Being a year later, I'm not sure if I actually like my idea of having input/output streams. I think that from a usability standpoint it is awkward and error prone -- streams do not behave strictly like pipes or buffers, they only have a single read/write head and therefore having something simultaneously reading and writing to a stream doesn't make sense.

In the input/output streams case, we solve for the "Write + calculate hash value" use case, but we do not effectively solve for the "Read + calculate hash value" use case.

Thoughts on better solution

I think a better path would be to have underlying support for the type of TransformBlock / FinalizeBlock API which can be used by end consumers, while maintaining our current ComputeHash functionality as well.

Since I will be doing #46 as well as a v3.0, I plan on punting this change to that milestone and making this change dependent on that issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants