-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: UTF8 Support for System.Data.Common.DbDataReader #57262
Comments
Tagging subscribers to this area: @eiriktsarpalis, @layomia Issue DetailsBackground and motivationThere is Utf8JsonWriter which allows us to write UTF8 Strings directly to the browser. API Proposalnamespace System.Data.Common
{
public class DbDataReader
{
public virtual ReadOnlySpan<Byte> GetUtf8String(int ordinal)
{//Default implementation needed
}
}
}
// TBD: Add to IDataRecord as well (as we now have default implementations for interfaces), but that seems overkill to me API UsageDbCommand cmd;
System.Text.Json.Utf8JsonWriter jsonWriter;
using var rdr = await cmd.ExecuteReaderAsync();
while(await rdr.ReadAsync())
{
jsonWriter.WriteStartObject();
for(int i=0; i<rdr.FieldCount; i++)
{
jsonWriter.WriteString(propertyName: rdr.GetName(i), utf8Value: rdr.GetUtf8String(i));
}
jsonWriter.WriteEndObject();
} RisksAs long as db providers do not implement UTF8 in their drivers, the new API does not benefit. It would just mean to do the UTF8->UTF16 conversion earlier.
|
Hey msft-bot: It's acutally area-System.Data :) Therefore @ajcvickers @cheenamalhotra @David-Engel , please have a look |
Tagging subscribers to this area: @roji, @ajcvickers Issue DetailsBackground and motivationThere is Utf8JsonWriter which allows us to write UTF8 Strings directly to the browser. API Proposalnamespace System.Data.Common
{
public class DbDataReader
{
public virtual ReadOnlySpan<Byte> GetUtf8String(int ordinal)
{//Default implementation needed
}
}
}
// TBD: Add to IDataRecord as well (as we now have default implementations for interfaces), but that seems overkill to me API UsageDbCommand cmd;
System.Text.Json.Utf8JsonWriter jsonWriter;
using var rdr = await cmd.ExecuteReaderAsync();
while(await rdr.ReadAsync())
{
jsonWriter.WriteStartObject();
for(int i=0; i<rdr.FieldCount; i++)
{
jsonWriter.WriteString(propertyName: rdr.GetName(i), utf8Value: rdr.GetUtf8String(i));
}
jsonWriter.WriteEndObject();
} RisksAs long as db providers do not implement UTF8 in their drivers, the new API does not benefit. It would just mean to do the UTF8->UTF16 conversion earlier.
|
It must have triggered off mention of Utf8JsonWriter |
@danmoseley Another use case for an first class Though I should mention that UTF-8 <-> UTF-16 conversion is fast. Very, very fast. And JSON's |
Duplicate of #28135 |
Returning a ReadOnlySpan from DbDataReader is problematic in various ways...
Ideally, if one day we get a 1st-class Utf8String (@GrabYourPitchforks), providers could then add support for returning that. It would still probably mean copying data in-memory once, so that we can hand the user an independent Utf8String instance that avoids all the lifecycle/buffering issues above; but it would still avoid the UTF8<->UTF16 conversion. Note that you can simply read into a byte[] today (via DbDataReader.GetBytes or just |
I get the point with the Lifecycle/Buffering, but I actually did not want to go that far: I can live with a copy and just thought the data type for Utf8 Strings is I did not know that GetBytes works with strings, however I had to base on metadata that would tell me if it's a utf8 column or a varchar column or an nvarchar column (in case of MS SQL Server), right? That sounds very error prone |
@aersamkull if you're OK with getting a copy, then yeah, simply asking for a byte array with You're right that with SQL Server you'd need to know whether the column is is varchar or nvarchar (that can be checked via DbDataReader.GetDataTypeName); in that sense it sounds like you're proposing a new API whose sole purpose would be to check whether the column indeed contains UTF8 data, and then return a byte[] as usual. That sounds like something that could be easily done as an extension method (call |
I think the perfect solution would be an Utf8String type and something like |
Sounds like the right way forward - and thanks for the proposal, it's important to know that people are interested in UTF8 strings etc. I'll keep #28135 to track the more extreme proposal (which avoids copying in addition to avoiding decoding), but otherwise I don't think there's anything left to do here at the runtime level. |
Background and motivation
There is Utf8JsonWriter which allows us to write UTF8 Strings directly to the browser.
However my source is often some database which always returns data as common C# Strings, which are UTF16. This means that some encoding is needed. In fact, MS SQL Server and many other databases support (eg MySql, Postgres) storing text data in UTF8. It does not make sense to me to convert all that UTF8 data to UTF16 and then back to UTF8. Of course one cannot support every possible Charset but UTF8 is different - I mean, it has it's special JsonWriter.
API Proposal
API Usage
Risks
As long as db providers do not implement UTF8 in their drivers, the new API does not benefit. It would just mean to do the UTF8->UTF16 conversion earlier.
Also, it might be a rare usecase as it's quite low-level. And it's unclear to me whether ORM's like Entity Framework could benefit from this or not. If they did use this interface, they would expose a lot more UTF8-Strings in C# which is not friendly to newbies. I think UTF8 Strings should stay low-level
The text was updated successfully, but these errors were encountered: