Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datasize, blueprint: add new datasizes.Size type and use in blueprints #1049

Closed
wants to merge 4 commits into from

Conversation

mvo5
Copy link
Contributor

@mvo5 mvo5 commented Nov 18, 2024

This is split out from #1041 and also includes a cleanup from #951

The idea is to make simplify the filesystem customization decoding to only use primitive types (string and the new datasizes.Size type) which means simpler code (as the mountpoint to string conversion will be done automatically for us) and also that we can use meta.Undecoded() for the toml library.

The tradeoff is of couse that it's slightly ugly that we now need to cast MinSize to uint64, if that is considered too ugly I can close this again.

@schutzbot
Copy link
Contributor

This PR changes the images API or behaviour causing integration failures with osbuild-composer. The next update of the images dependency in osbuild-composer will need work to adapt to these changes.

This is simply a notice. It will not block this PR from being merged.

@achilleas-k
Copy link
Member

if that is considered too ugly I can close this again.

I don't think it's ugly at all. Maybe it could be ever so slightly cleaner if we have a size.Uint64()? Not a big difference though.

achilleas-k
achilleas-k previously approved these changes Nov 18, 2024
Comment on lines 68 to 70
assert.Equal(t, 2147483648, int(bp.Customizations.Filesystem[0].MinSize))
assert.Equal(t, "/opt", bp.Customizations.Filesystem[1].Mountpoint)
assert.Equal(t, uint64(20*datasizes.GiB), bp.Customizations.Filesystem[1].MinSize)
assert.Equal(t, 20*datasizes.GiB, int(bp.Customizations.Filesystem[1].MinSize))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that the original test is more correct, since it creates an expected value that is the same type as the one being tested (uint64) rather than converting the value being tested to match the type of the literal, but it doesn't really matter much. Unless we make a size.Uint64() method.

Not an issue at all though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the suggestion with the size.Uint64() and I agree that the original test was/is nicer - I updated the PR. Thanks for the excellent feedback.

@bcl
Copy link
Contributor

bcl commented Nov 21, 2024

I don't see the point in making it a custom type. It's a uint64 so that's the type it should have, and the toml parser should be doing the conversion to that, not a custom type that you then have to always call a helper function on.
If there was more to it than that then it would make sense, but in this limited set of features I think it just makes it harder to understand the code.

@mvo5 mvo5 force-pushed the disk-new-size-parser branch 4 times, most recently from 4ddd026 to 80bfa30 Compare November 21, 2024 09:03
@mvo5
Copy link
Contributor Author

mvo5 commented Nov 21, 2024

I don't see the point in making it a custom type. It's a uint64 so that's the type it should have, and the toml parser should be doing the conversion to that, not a custom type that you then have to always call a helper function on. If there was more to it than that then it would make sense, but in this limited set of features I think it just makes it harder to understand the code.

Thanks for your review! Maybe it makes more sense now in this slightly broader context (after #1049) is merged (as the new type is used more and there is more custom parsing that can be avoided now).

There are certainly drawbacks, it increases complexity by moving things into a different package, adds a new type and the error message are less less nice (because encoding(json is not very good with that for custom unmarshalers).

To me the benefits are separation of concern, i.e. a central place that can decode json/toml (and later yaml I presume) numbers that can be str/int/str-with-unit. From a certain PoV the existing MinSize uint64 is a "lie" because while this is what ends up in the datastruct its not what we read from the on-disk representation. Using datasizes.Size here hints that something more is going on there. Another (small) advantage is that toml is not good with complex decoding, it will always mark the complex type as "undecoded" which lead to issues in the past osbuild/bootc-image-builder#656

But as everything it's about tradeoffs, if it's considered too ugly, it can be closed.

Copy link
Member

@ondrejbudai ondrejbudai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach, it centralizes the parsing logic in one place and since we seem to be dealing with datasizes quite a lot in blueprints, I think this is less error-prone in the future.

Copy link
Member

@thozza thozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit with @bcl on this PR. While I like the changes for (un)marshaling values from/to JSON/TOML, I find the use of datasize.SIze in disk module adding unnecessary complexity. I think that we should internally use primitive types if possible and hide all the complexity in custom (un)marshallers. The new data type does not add anything for any use case other than (un)marshaling.

Maybe the BP customizations structure could keep using the primitive type, and then its (un)mashalers could use a private struct that uses datasize.Size, and converts the value to uint64.

mvo5 added a commit to mvo5/images that referenced this pull request Nov 22, 2024
This commit fixes the issue that minsize cannot be a string
by using the existing pattern.

See osbuild#1049 for alternative
ideas.
@mvo5
Copy link
Contributor Author

mvo5 commented Nov 22, 2024

Thanks for the feedback @thozza - I will stare long and hard at this and see if I can find a more agreeable way of doing this. Part of the issue I think is the "quantum" state nature of our blueprints, they are both the "resolved" (unmarshaled) structs and also contain the json/toml keywords and the custom unmarshaler. Which leads to the situation now where the consumer of the structs want a simple uint64 for minsize (as that is what the resolved result is and should be) but the implementation of the struct must know that it's not really a uint64 and not forget to add the custom unmarshaling into all the right places (and do it right which for nested structs is not trivial(IMHO)). Which then leads to bugs like: 3b3912b that we currently have where the "DiskPartition" structre needs a custom unmarshaler to deal with minsize as a string (I will push a fix using the existing pattern shortly). Anyway, going back to starring long and hard at this :)

[edit: fwiw, the fix with datasize.Size for the above issue is:

diff --git a/pkg/blueprint/disk_customizations.go b/pkg/blueprint/disk_customizations.go
index 09b92099f..76ea92e2f 100644
--- a/pkg/blueprint/disk_customizations.go
+++ b/pkg/blueprint/disk_customizations.go
 type DiskCustomization struct {
        // TODO: Add partition table type: gpt or dos
-       MinSize    uint64                   `json:"minsize,omitempty" toml:"minsize,omitempty"`
+       MinSize    datasizes.Size           `json:"minsize,omitempty" toml:"minsize,omitempty"`
        Partitions []PartitionCustomization `json:"partitions,omitempty" toml:"partitions,omitempty"`
 }

(and it got fixed "by-accident" by just introducing the new datatype)

the fix using the exiting pattern:

diff --git a/pkg/blueprint/disk_customizations.go b/pkg/blueprint/disk_customizations.go
index 09b92099f..53aa4e471 100644
--- a/pkg/blueprint/disk_customizations.go
+++ b/pkg/blueprint/disk_customizations.go
@@ -18,6 +18,27 @@ type DiskCustomization struct {
        Partitions []PartitionCustomization `json:"partitions,omitempty" toml:"partitions,omitempty"`
 }
 
+func (dc *DiskCustomization) UnmarshalJSON(data []byte) error {
+       var dcAnySize struct {
+               MinSize    any                      `json:"minsize,omitempty" toml:"minsize,omitempty"`
+               Partitions []PartitionCustomization `json:"partitions,omitempty" toml:"partitions,omitempty"`
+       }
+       if err := json.Unmarshal(data, &dcAnySize); err != nil {
+               return err
+       }
+
+       dc.Partitions = dcAnySize.Partitions
+
+       if dcAnySize.MinSize != nil {
+               size, err := decodeSize(dcAnySize.MinSize)
+               if err != nil {
+                       return err
+               }
+               dc.MinSize = size
+       }
+       return nil
+}
+
 // PartitionCustomization defines a single partition on a disk. The Type
 // defines the kind of "payload" for the partition: plain, lvm, or btrfs.
 //   - plain: the payload will be a filesystem on a partition (e.g. xfs, ext4).

mvo5 added a commit to mvo5/images that referenced this pull request Nov 22, 2024
This commit fixes the issue that minsize cannot be a string
by using the existing pattern.

See osbuild#1049 for alternative
ideas.
@thozza
Copy link
Member

thozza commented Nov 22, 2024

@mvo5 FWIW, I like your approach with datasize.Size in general. I just don't like its universal use. I would limit its use only to (un)marshalers because, in my opinion, it adds unnecessary complexity everywhere else.

My suggestion is something along these lines:

diff --git a/pkg/blueprint/filesystem_customizations.go b/pkg/blueprint/filesystem_customizations.go
index fba50da52..dc00c8aad 100644
--- a/pkg/blueprint/filesystem_customizations.go
+++ b/pkg/blueprint/filesystem_customizations.go
@@ -9,11 +9,16 @@ import (
        "github.com/osbuild/images/pkg/pathpolicy"
 )
 
-type FilesystemCustomization struct {
+type filesystemCustomizationMarshaling struct {
        Mountpoint string         `json:"mountpoint,omitempty" toml:"mountpoint,omitempty"`
        MinSize    datasizes.Size `json:"minsize,omitempty" toml:"minsize,omitempty"`
 }
 
+type FilesystemCustomization struct {
+       Mountpoint string
+       MinSize    uint64
+}
+
 // CheckMountpointsPolicy checks if the mountpoints are allowed by the policy
 func CheckMountpointsPolicy(mountpoints []FilesystemCustomization, mountpointAllowList *pathpolicy.PathPolicies) error {
        var errs []error
@@ -35,14 +40,14 @@ func (fsc *FilesystemCustomization) UnmarshalJSON(data []byte) error {
        // if the custom unmarshal for minsize failed (as encoding/json
        // provides sadly no context), c.f.
        // https://github.com/golang/go/issues/58655
-       type filesystemCustomization FilesystemCustomization
-       var fc filesystemCustomization
+       var fc filesystemCustomizationMarshaling
        if err := json.Unmarshal(data, &fc); err != nil {
                if fc.Mountpoint != "" {
                        return fmt.Errorf("JSON unmarshal: error decoding minsize value for mountpoint %q: %w", fc.Mountpoint, err)
                }
                return err
        }
-       *fsc = FilesystemCustomization(fc)
+       fsc.Mountpoint = fc.Mountpoint
+       fsc.MinSize = fc.MinSize.Uint64()
        return nil
 }

We should ideally decouple the external API for (un)marshaling for any internally used and exported structures. Initially, the external API for (un)marshaling was supposed to be the blueprint package in osbuild-composer, and the one in images was supposed to be the internal representation. We then added tags for (un)marshaling to the blueprint structures in images for testing purposes, and eventually, it grew into another external API for (un)marshaling user data.

This is IMO an initially unintended outcome, and it creates a mess when we need to modify internally used structures and complicate them just to handle the external (un)marshaling API. We shouldn't need to do this.

This is similar to our (never finished) goal to not directly use osbuild stage options for image customizations and in OSCustomizations structure.

The bottom line is that IMHO, pkg/blueprints in images should not handle user-facing (un)marshaling of any kind for its exported structures. These should be decoupled so as not to taint the implementation.

@mvo5 mvo5 requested a review from achilleas-k November 22, 2024 10:32
@mvo5 mvo5 marked this pull request as draft November 22, 2024 11:20
mvo5 added a commit to mvo5/images that referenced this pull request Nov 22, 2024
This commit decouples the marshaling representation of the
`FilesystemCustomization` from the actual struct. This means that
we can reuse custom unmarshaling via datasizes.Size and also
have our external represenation "pure".

If we continue using this pattern is also means that we can rename
fields in the marshaling and provide compatbility easily.

Thanks to Thozza, c.f.
osbuild#1049 (comment)
and
osbuild#983
mvo5 added a commit to mvo5/images that referenced this pull request Nov 22, 2024
This commit decouples the marshaling representation of the
`FilesystemCustomization` from the actual struct. This means that
we can reuse custom unmarshaling via datasizes.Size and also
have our external represenation "pure".

If we continue using this pattern is also means that we can rename
fields in the marshaling and provide compatbility easily.

Thanks to Thozza, c.f.
osbuild#1049 (comment)
and
osbuild#983
@mvo5
Copy link
Contributor Author

mvo5 commented Nov 22, 2024

I moved this to draft and opened #1057 with the first part of your idea, if it is well received we can continue to convert the other DiskPartitioning structs to the same pattern - which will be a bit verbose but go does not make this easy it seems :(

This commit adds a new `datasizes.Size` type that is an alias
for uint64 but has supports direct json/toml decoding. So sizes
can be specified as 123, "123" or "123 GiB" and this is transparently
handled.

datasizes: tweak error return to avoid "stuttering"

The `datasizes.Size.UnmarshalJSON()` returns a `JSON unmarshal`
prefix. However this is a bit too generic as other (nested)
unmarshalers may also use the prefix. So instead just indicate
the type in the error string itself:
```
error decoding {TOML,JSON} size: ...
```
mvo5 added 3 commits November 22, 2024 12:29
This commit makes use of the new `datasizes.Size` type in
`blueprint.FilesystemCustomization` so that all
toml decoding happens on "primitive" datatypes and we can use
`meta.Undecoded()` after the decoding.
This commit makes use of the new `datasizes.Size` type that gives
us automatic json/toml unmarshal for string/int sizes. This allows
us to remove some code (not that much but a bit).
@mvo5 mvo5 force-pushed the disk-new-size-parser branch from 80bfa30 to 02d485e Compare November 22, 2024 11:29
mvo5 added a commit to mvo5/images that referenced this pull request Nov 22, 2024
This commit fixes the issue that minsize cannot be a string
by using the existing pattern.

See osbuild#1049 for alternative
ideas.
mvo5 added a commit to mvo5/images that referenced this pull request Nov 22, 2024
This commit decouples the marshaling representation of the
`FilesystemCustomization` from the actual struct. This means that
we can reuse custom unmarshaling via datasizes.Size and also
have our external represenation "pure".

If we continue using this pattern is also means that we can rename
fields in the marshaling and provide compatbility easily.

Thanks to Thozza, c.f.
osbuild#1049 (comment)
and
osbuild#983
@mvo5
Copy link
Contributor Author

mvo5 commented Nov 22, 2024

Closing in favor of #1057

@mvo5 mvo5 closed this Nov 22, 2024
github-merge-queue bot pushed a commit that referenced this pull request Nov 22, 2024
This commit decouples the marshaling representation of the
`FilesystemCustomization` from the actual struct. This means that
we can reuse custom unmarshaling via datasizes.Size and also
have our external represenation "pure".

If we continue using this pattern is also means that we can rename
fields in the marshaling and provide compatbility easily.

Thanks to Thozza, c.f.
#1049 (comment)
and
#983
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants