Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

API Proposal: BufferWriter<T> #2177

Closed
KrzysztofCwalina opened this issue Mar 22, 2018 · 22 comments
Closed

API Proposal: BufferWriter<T> #2177

KrzysztofCwalina opened this issue Mar 22, 2018 · 22 comments
Assignees
Labels
area-System.Buffers OpenBeforeArchiving These issues were open before the repo was archived. For re-open them, file them in the new repo
Milestone

Comments

@KrzysztofCwalina
Copy link
Member

KrzysztofCwalina commented Mar 22, 2018

The prototype of this type is in https://github.com/dotnet/corefxlab/tree/master/src/System.Buffers.ReaderWriter/System/Buffers/Writer

namespace System.Buffers.Writer {
    public ref struct BufferWriter<T> where T : IBufferWriter<byte> {

        public BufferWriter(T output);
        public ReadOnlySpan<byte> NewLine; 

        public Span<byte> Buffer { get; }
        public void Advance(int count);
        public void Ensure(int count=1);
        public void Flush();

        // primitive APIs will be available for:
        // byte, sbyte, ushort, short, uint, int, ulong, long, float, double, char, Utf8Char
        // DateTime, DateTimeOffset, TimeSpan, Guid, Uri, BigInteger, Decimal
        public void Write(int value);
        public void WriteLine(int value);
        public void Write(int value, StandardFormat format=default);
        public void WriteLine(int value, StandardFormat format=default);
        public void Write(int value, TransformationFormat format);
        public void WriteLine(int value, TransformationFormat format);

        // string APIs
        public void Write(string value);
        public void WriteLine(string value);
        public void Write(string value, TransformationFormat format);
        public void WriteLine(string value, TransformationFormat format);

        public void Write(ReadOnlySpan<char> value);
        public void WriteLine(ReadOnlySpan<char>  value);
        public void Write(ReadOnlySpan<char> value, TransformationFormat format);
        public void WriteLine(ReadOnlySpan<char> value, TransformationFormat format);

        public void Write(Utf8String value);
        public void WriteLine(Utf8String value);
        public void Write(Utf8String value, TransformationFormat format);
        public void WriteLine(Utf8String value, TransformationFormat format);

        public void Write(ReadOnlySpan<Utf8Char> value);
        public void WriteLine(ReadOnlySpan<Utf8Char>  value);
        public void Write(ReadOnlySpan<Utf8Char> value, TransformationFormat format);
        public void WriteLine(ReadOnlySpan<Utf8Char> value, TransformationFormat format);

        // writables
        public void Write<TWritable>(TWritable value) where TWritable : IWritable;
        public void WriteLine<TWritable>(TWritable value) where TWritable : IWritable;

        public void Write<TWritable>(TWritable value, StandardFormat format) where TWritable : IWritable;
        public void WriteLine<TWritable>(TWritable value, StandardFormat format) where TWritable : IWritable;

        public void Write<TWritable>(TWritable value, TransformationFormat format) where TWritable : IWritable;
        public void WriteLine<TWritable>(TWritable value, TransformationFormat format) where TWritable : IWritable;

        // buffers
        public void WriteBytes(ReadOnlySpan<byte> value);
        public void WriteBytes(ReadOnlySpan<byte> value, TransformationFormat format);

        // Binary write APIs will be available for:
        // byte, sbyte, ushort, short, uint, int, ulong, long, float, double
        // binary. format is L for Little Endian, and B for Big Endian.
        public void WriteBytes(int value, StandardFormat format=default);
        public void WriteBytes(int value, TransformationFormat format);
    }

    public struct TransformationFormat {
        public TransformationFormat(IBufferTransformation transformation);
        public TransformationFormat(params IBufferTransformation[] transformations);
        public StandardFormat Format { get; }
        public bool TryTransform(Span<byte> buffer, ref int written);
    }
}
namespace System.Buffers {
    public interface IWritable {
        bool TryWrite(Span<byte> buffer, out int written, StandardFormat format=default(StandardFormat));
    }
}
namespace System.Buffers.Operations {
    public interface IBufferOperation {
        OperationStatus Execute(ReadOnlySpan<byte> input, Span<byte> output, out int consumed, out int written);
    }
    public interface IBufferTransformation : IBufferOperation {
        OperationStatus Transform(Span<byte> buffer, int dataLength, out int written);
    }
}

cc: @davidfowl, @pakrym, @GrabYourPitchforks, @ahsonkhan, @joshfree, @terrajobst

@pakrym
Copy link
Contributor

pakrym commented Mar 22, 2018

Do we need to have all these Write methods on BufferWriter<T>? Can they be implemented as extension methods?

@KrzysztofCwalina
Copy link
Member Author

What's the upside to making them extension methods?

@pakrym
Copy link
Contributor

pakrym commented Mar 22, 2018

What about Commit? We decided that calling interface method (Advance) per write is too expensive and added Commit to the BufferWriter we use.

@pakrym
Copy link
Contributor

pakrym commented Mar 22, 2018

What's the upside to making them extension methods?

Same reason we did it for spans and sequence - fewer methods to JIT when type gets used with different Ts

@pakrym
Copy link
Contributor

pakrym commented Mar 22, 2018

I see, Commit is Flush in your prototype.

@pakrym
Copy link
Contributor

pakrym commented Mar 22, 2018

BufferWriter -> ByteBufferWriter?

@KrzysztofCwalina
Copy link
Member Author

Same reason we did it for spans and sequence - fewer methods to JIT when type gets used with different Ts

I think it might make sense for very low level types like Span. I don't think we should be unnaturally moving members that belong to a type out in general.

@pakrym
Copy link
Contributor

pakrym commented Mar 22, 2018

Does IWritable represent conversion to binary form or to textual form? What if type want's to have both?

Can you please include other types used in the proposal to API listing (like TransformationFormat, IWritable, etc.)

@KrzysztofCwalina
Copy link
Member Author

StandardFormat controls whether its binary or text.

@pakrym
Copy link
Contributor

pakrym commented Mar 22, 2018

StandardFormat controls whether its binary or text.

Why do we need two sets of methods for int then?

    public void Write(int value, StandardFormat format=default);
    public void WriteBytes(int value, StandardFormat format=default);

@KrzysztofCwalina
Copy link
Member Author

Why do we need two sets of methods for int then?

Yeah, I was thinking about it. We could remove WriteBytes. We need to measure perf impact and see if we have consistent format chars that could mean "BE" and "LE" and not be already taken by existing text formats.

@benaadams
Copy link
Member

Should be called:

public ref struct BufferedWriter<TBufferWriter> where TBufferWriter : IBufferWriter<byte>

Otherwise its a bit weird, why are you passing something that's already a IBufferWriter to a BufferWriter

The answer is because you are buffering it; hence the addition of Flush.

So BufferWriter<T> -> BufferedWriter<TBufferWriter>

Also add a static .Create<TBufferWriter> method to a non-generic BufferedWriter to avoid specifying the generic params as the angle brackets get heavy

var writer = new BufferedWriter<BufferWriterFormatter<PipeWriter>>(formattter);

vs

var writer = BufferedWriter.Create(formattter);

@ahsonkhan
Copy link
Member

Assigning to @JeremyKuhne

@jnm2
Copy link
Contributor

jnm2 commented Jun 21, 2018

@benaadams BufferingWriter<TBufferWriter>?

@KrzysztofCwalina
Copy link
Member Author

One thing I think we should be careful about is that "buffering" and even "buffered" might imply that data is written to some intermediary buffer and then copied to the ultimate destination (this is how buffered streams work). But there is really not any intermediary buffer here and no data copy on Flush. Flush merely advances a "committed pointer".

@benaadams
Copy link
Member

BatchingWriter<TBufferWriter>?

@benaadams
Copy link
Member

IBufferWriter is in corefx so can't change name, but does seem weird giving a IBufferWriter to a BufferWriter?

e.g. wrap your BufferWriter in a BufferWriter<BufferWriter> to get better performance; and don't forget to call the additional method Flush; its not a clear api?

@benaadams
Copy link
Member

benaadams commented Jun 21, 2018

The other nit is the field

public ReadOnlySpan<byte> NewLine; 

It basically doubles the size of the struct and adds an additional GC copy barrier? Just for the additional WriteLine convenience methods?

public void WriteLine(int value);
public void WriteLine(int value, StandardFormat format=default);
...

Would it be better to have it as an Enviorment.NewLine by default, with an overload for a custom newLine? e.g.

private static readonly byte[] s_newLine = Encoding.UTF8.GetBytes(Environment.NewLine);

public void WriteLine(string value) => WriteLine(value, s_newLine);
public void WriteLine(string value, ReadOnlySpan<byte> newLine);

@benaadams
Copy link
Member

And finally...

Due to it converting to strings to Utf8; should it be BufferWriterUtf8? Or Utf8Writer?

Or would there be a BufferWriter<T> and a BufferWriterUtf16<T> ; i.e. how would different encodings be handled?

@benaadams
Copy link
Member

Perhaps that's the answer? Its an EncodingWriter?

public ref struct EncodingWriterUtf8<T> where T : IBufferWriter<byte>

@KrzysztofCwalina
Copy link
Member Author

Or just Utf8Writer

@benaadams
Copy link
Member

Oh, one more thing 🍎...

Add Html writers; the current HtmlEncoding in the framework takes a string, jumps though a bunch on non-inlining virtuals, then does some defensive encoding (as it doesn't know what end character encoding it will be in); then returns a string; which you then have to Utf8 encode.

If you you are encoding direct to Utf8 then 90% of it can be skipped and you are only really worried about control chars (as they bother infosec people) and <,>,&,',"

So something like

public ref struct Utf8Writer<T> where T : IBufferWriter<byte>
{
    WriteHtmlEncoded(string text);
    WriteHtmlEncoded(ReadOnlySpan<char> text);
    WriteHtmlEncoded(Utf8String text);
    WriteHtmlEncoded(ReadOnlySpan<Utf8Char> text);
}

There's an implementation here if that's of any benefit?

@JeremyKuhne JeremyKuhne removed their assignment Mar 25, 2020
@pgovind pgovind added the OpenBeforeArchiving These issues were open before the repo was archived. For re-open them, file them in the new repo label Mar 11, 2021
@pgovind pgovind closed this as completed Mar 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Buffers OpenBeforeArchiving These issues were open before the repo was archived. For re-open them, file them in the new repo
Projects
None yet
Development

No branches or pull requests

7 participants