Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is uuid/char translated as string? #58

Closed
mvdan opened this issue Oct 20, 2016 · 8 comments
Closed

Why is uuid/char translated as string? #58

mvdan opened this issue Oct 20, 2016 · 8 comments
Labels

Comments

@mvdan
Copy link

mvdan commented Oct 20, 2016

If I use a uuid (as a primary key) in a postgres table, sqlboiler generates a string field in the model. Is there a reasoning behind this?

I'm not too bothered by this, but I think it should either be [16]byte or a type UUID [16]byte. It would make using https://github.com/satori/go.uuid easier, and it would consume less memory as it would be 16 bytes instead of 36 bytes (characters) plus the string header.

CC @teasherm

@aarondl
Copy link
Member

aarondl commented Oct 23, 2016

Hi @mvdan. First let me apologize for not getting back to you quicker. I saw this issue right away and then promptly got distracted by other things.

I'm going to respond to both issues here since they're essentially the same kind of problem. And the first thing I want to say is that I agree with you that there are inefficiencies in the in-memory representation. However it's a trade-off on multiple levels. The reason they behave like they do currently is that was the path of least resistance to get it working for these cases. Let me elaborate on the reasons:

UUID:

  1. [16]byte is not supported by the lib/pq driver (which we shamelessly assume everyone uses). Meaning we would have to have created another type and the less types we create the better since we then become the maintainers of those types. I know we have a lot already but they were all absolutely necessary, a UUID type is not.
  2. [16]byte is not supported nicely by things like JSON and we'd be forced to implement MarshalJSON, MarshalTOML, TextMarshal, BinaryMarshal etc etc and support all that too.

Now these things could be circumvented by using []byte but then you've lost a lot of the reason for making the change since you're saving some 3 integers or so by not having the slice header. And recall that you'll always have multiple allocations of this anyway since you'll be forced into converting to slice all over the place (as usual when using non-slices).

Char:

  1. byte is not a supported type for the lib/pq library to scan a char(N) into, because it's a variable width type. So in order to accomplish this one we'd not only need a custom byte type, but we'd have to dig into another level of metadata for field types inside databases (the length of variable width columns) which we'd like to avoid if possible since it puts more requirements on information_schema support and may impact later additions of say SQLite.

In summary the positive side of these changes are:

  1. Leaner (and truer) in-memory representation of these types

The downsides are:

  1. More custom SQLBoiler types for users to deal with, rather than pure Go types.
  2. Array usage in Go, which complicates usage of the struct fields, with no usability gains since you're still converting to/from a separate UUID package
  3. Unpleasant representations in other serialization formats that we currently try to support.
  4. Requires additional metadata from the information_schema

I recognize the inefficiencies, and hopefully you recognize the tradeoffs that I've outlined and that neither is an ideal state. I don't want to say we'll never do this because this kind of inefficiency makes me really sad but I think we need some more data on whether or not this amounts to being a bottleneck in real-world applications. I'm sure there's some awful cases like a table with 80 char(1) columns... but I'd like to see something in the wild where this is hurting enough to make the change. I think rather than have this, I'd prefer a mechanism whereby users can select the types generated via config files - but that has it's own downsides in terms of complexity.

Thoughts?

@mvdan
Copy link
Author

mvdan commented Oct 25, 2016

[16]byte is not supported by the lib/pq driver

Ah, didn't know that.

[16]byte is not supported nicely by things like JSON and we'd be forced to implement MarshalJSON, MarshalTOML, TextMarshal, BinaryMarshal etc etc and support all that too.

JSON/TOML and the rest would be covered by the Text one. And I don't know why Binary would be important here - currently it's just the string version, so you could just use Text as well.

Now these things could be circumvented by using []byte but then you've lost a lot of the reason for making the change since you're saving some 3 integers or so by not having the slice header.

Remember that the []byte would contain 16 bytes, not the 36 bytes (chars) that it takes when stringified. But yes, it would be a small gain to go from string to []byte.

And recall that you'll always have multiple allocations of this anyway since you'll be forced into converting to slice all over the place (as usual when using non-slices).

Depends on the use-case. My understanding is that right now there is one stringification and allocation when gathering the fields from postgres. If we used [16]byte, this would not happen upfront and it would be up to the user to do if needed.

In summary the positive side of these changes are:

  1. Leaner (and truer) in-memory representation of these types

I would add the ability to use go.uuid a lot easier. Right now, you have to use String() everywhere.

The downsides are:

  1. More custom SQLBoiler types for users to deal with, rather than pure Go types.

I understand that's only because lib/pq does not support it.

  1. Array usage in Go, which complicates usage of the struct fields, with no usability gains since you're still converting to/from a separate UUID package

Complicates how?

There would be usability gains. If go.uuid defines type UUID [16]byte, you can assign and convert to [16]byte easily. If not, you have to do str = uuid.String() everywhere.

  1. Unpleasant representations in other serialization formats that we currently try to support.

If this is a problem, you could always fall back to the text marshaling which is what is currently done, right?

  1. Requires additional metadata from the information_schema

I don't know anything about that :)

I'm not going to insist, so feel free to close this issue if you want to decline this request/proposal. This was more of a "I wonder why they do it this way" issue.

I have a different view of the ups and downs of [16]byte vs string, but I do see that there are still disadvantages. And since you guys know more about sqlboiler and SQL in general than me, I'll leave it to your judgement.

Thanks again!

@mvdan
Copy link
Author

mvdan commented Oct 25, 2016

Also, since you merged the "char" issue into this one.

  1. byte is not a supported type for the lib/pq library to scan a char(N) into, because it's a variable width type. So in order to accomplish this one we'd not only need a custom byte type, but we'd have to dig into another level of metadata for field types inside databases (the length of variable width columns) which we'd like to avoid if possible since it puts more requirements on information_schema support and may impact later additions of say SQLite.

I understood that "char" and char(1) were different things. Like byte and [1]byte are in Go.

If "char" really is a byte in postgres, then I think what I said makes sense. If "char" is just a synonim for char(1), then I misread the documentation and it's nonsense.

@aarondl
Copy link
Member

aarondl commented Oct 25, 2016

You're right about serialization I THINK. Having said that I've never had the TextMarshal work for JSON encoding for me. Not sure why but I'm sure I tried several times. Going to test this again :)
The one point I made that was sort of lost here I think was any use of the [16]byte requires a slice struct to be created regardless, since no method in Go takes [n]byte you may end up doing something like this: strings.ToUpper(uuid[:]), but this is also assuming you actually need to use it in some function you didn't write, and since it's a uuid you probably don't. This is why I mentioned extra allocs. It's possible they live on the stack though. I also considered this a "complication" hence my point where I said "Array usage in Go, which complicated the usage of struct fields". Arrays in Go are more cumbersome to use than a string or []byte.
lib/pq not supporting it does force us to make additional types yes, you're right.
I hadn't realized that a type alias for [16]byte would allow assignment, somehow I thought you still needed to have an explicit conversion.
Just so you're aware we harvest metadata from columns from the information_schema parts, this is an SQL Standard and if we're to maintain compatibility across multiple database engines, we need to make sure they all support the attributes that we ask for. It -is- a standard but like most things it's half implemented by every sql database :|
You're also absolutely right about char, thanks very much for pointing it out! This may change things for us depending on what's in the information_schema tables.

Anyway. I'm going to think about this a little more with renewed enthusiasm about char. It's really not a bad idea. Thanks for all the discussion :)

@aarondl
Copy link
Member

aarondl commented Nov 4, 2016

Hey @mvdan. We've been busy on other projects so let this issue drift a little.

Did a bit more research on this one, we're good to go on all fronts. Aside from it being a breaking and just an optimization forgive me for a little more drift :) The facts we gathered were all correct -and- I found that the TextMarshaler problems I've seen in the past were probably my own doing. The information_schema data is also sufficient.

We'll get to this one.

@aarondl aarondl added the planned label Nov 4, 2016
@mvdan
Copy link
Author

mvdan commented Nov 4, 2016

Awesome! I assume you're talking about "char" - if so, could you change the title?

@aarondl aarondl changed the title Why is uuid translated as string? Why is uuid/char translated as string? Nov 8, 2016
@aarondl
Copy link
Member

aarondl commented Nov 8, 2016

I still want the discussion searchable so that we can point to it later. Looked at UUID again and found several other negatives, so definitely not going to do that unless there's solid data and people having trouble keeping a lot of these in ram for long periods of time. Starting on "char" implementation.

@aarondl
Copy link
Member

aarondl commented Nov 12, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants