Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(project): config merging #1460

Merged
merged 8 commits into from
Jan 12, 2024
Merged

Conversation

arendjr
Copy link
Contributor

@arendjr arendjr commented Jan 7, 2024

Summary

This is my first shot at fixing #1440. There are two main changes:

  • All the configuration structs (the ones where all the fields are already wrapped in Option) now have a NoneState in addition to the Default state. As you might guess, the NoneState represents the state where all the Options are None. This state is used as the starting point for deserialization instead of Default, which allows merging to detect which fields may be overridden, and which ones shouldn't.
  • The MergeWith trait has been replaced with a Merge trait, which has also been moved from biome_service to biome_deserialize. All the implementations of MergeWith have been removed in favor of a Mergeable derive macro that generates all the Merge implementations.

It is worth mentioning that the Merge trait has a dedicated merge_in_defaults() method, which we use to merge in the defaults after merging of extends configs has been done. The reason it requires a separate method (as opposed to simply Configuration::default().merge_with(config)) is because merge_with() overrides None fields with the value from the struct it is merging with, while in the case of merging defaults we need to also recursively check if that value has other None fields for which a Default might be available.

I have added test cases for the issue reported in #1440 and another to highlight that it is possible for a config that extends another to revert values to their default value (which the original merge_if_not_default() method would have had trouble with).

One complication I ran into is that the configurations coming out of bpaf arguments are initialized with Default instead of NoneState, and yet we want to merge them into the configuration from the config file. This was previously resolved by custom MergeWith implementations, although I have to say this was easy to overlook. I have now resolved this by explicitly patching the configs as needed in the check/lint/format commands. It's not super pretty, but at least it's explicit and localized.

Test Plan

I only had to update 1 test case (see comment) and added two more test cases to demonstrate more merging behavior. Still, I might very well have missed something (especially around the CLI invocations), so it would be great if someone could help identify pain points.

Future Work

I think it is possible to simplify the Override structs as well based on this, but I've left it out of scope for now since the PR is big enough as it is.

There is also still an issue where rule options cannot be deeply merged. I've left it for now, since I don't think it's a regression. I expect it's also not very high priority, but it would be nice if it would work as expected. It will require wrapping all the rule options in Option as well though.

Copy link

netlify bot commented Jan 7, 2024

Deploy Preview for biomejs ready!

Name Link
🔨 Latest commit 71f76c7
🔍 Latest deploy log https://app.netlify.com/sites/biomejs/deploys/65a0fbd3e8b8510008b50d15
😎 Deploy Preview https://deploy-preview-1460--biomejs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 99 (no change from production)
Accessibility: 97 (no change from production)
Best Practices: 100 (no change from production)
SEO: 93 (no change from production)
PWA: -
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify site configuration.

@github-actions github-actions bot added A-CLI Area: CLI A-Project Area: project A-Formatter Area: formatter A-Tooling Area: internal tools L-JavaScript Language: JavaScript and super languages labels Jan 7, 2024
@@ -52,7 +52,7 @@ check ━━━━━━━━━━━━━━━━━━━━━━━━
```block
test.js:1:1 lint/suspicious/noDebugger FIXABLE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

× This is an unexpected use of the debugger statement.
! This is an unexpected use of the debugger statement.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only test case where I think the original snapshot was actually incorrect. The config already specified it should be a warning, but it looks like it was previously still reported as an error.

Copy link

codspeed-hq bot commented Jan 7, 2024

CodSpeed Performance Report

Merging #1460 will degrade performances by 12.97%

⚠️ No base runs were found

Falling back to comparing arendjr:fix-config-merging (71f76c7) with main (925df81)

Summary

❌ 1 regressions
✅ 92 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main arendjr:fix-config-merging Change
eucjp.json[cached] 4.3 ms 4.9 ms -12.97%

@Conaclos
Copy link
Member

Conaclos commented Jan 7, 2024

The MergeWith trait has been replaced with a Merge trait, which has also been moved from biome_service to biome_deserialize.

I am not sure that biome_deserialize is the good place for this trait and some of its impls...

All the configuration structs (the ones where all the fields are already wrapped in Option) now have a NoneState in addition to the Default state.
One complication I ran into is that the configurations coming out of bpaf arguments are initialized with Default instead of NoneState

Couldn't this be simpler to remove the current implementation of Default and use derive(Default) that set to None all optional fields?

Instead of merging the deserialized and overridden state with a default state, we could expose queries.
Let's me take an example:

#[derive(Default, Mergeable)]
struct Config {
  enabled: Option<bool>,
  threshold: Option<u32>,
}

impl Config {
  fn enabled(&self) -> bool {
    self.enabled.unwrap_or(true)
  }

  fn threshold(&self) -> u32 {
    self.threshold.unwrap_or(255)
  }
}

fn main() {
  let first_config = Config::default();
  let override_config = Config { enabled: Some(false) };
  let final = first_config.merge_with(override_config);
  assert_eq!(final.enabled(), false);
  assert_eq!(final.threshold(), 255);
}

@arendjr
Copy link
Contributor Author

arendjr commented Jan 8, 2024

I am not sure that biome_deserialize is the good place for this trait and some of its impls...

Yeah, I should have provided some rationale for that. The Merge trait couldn't stay in biome_service, since all types used in config files need to have it, so I also had to add it to types in biome_formatter and others. I could have given it its own crate, but it seemed overkill, especially since it would need to have its own macro crate too. Of course it's slightly questionable whether merging belongs with deserialization, but there's even a precedent for it: StringSet. Just like StringSet is part of biome_deserialize because it's a common type we deserialize to, so too do we commonly want our deserialized types to be mergeable.

Couldn't this be simpler to remove the current implementation of Default and use derive(Default) that set to None all optional fields?

Instead of merging the deserialized and overridden state with a default state, we could expose queries.

I can see the appeal for using queries indeed, although I would have some concerns with that approach as well. Semantically I would even consider it more risky, since the Merge trait relies on the initial values being None, which is what NoneState guarantees. Default doesn't offer those guarantees; it's merely what the Default derive does based on the assumption the struct contains only Options. But that means custom Default impls would be forbidden by convention, and there would be no compile-time guarantee the structs only contain Options. I think those concerns might make it harder for new contributors to understand how things work, and there would be additional burden on reviewers to verify the conventions are being followed.

A similar concern would arise with the queries themselves, because by convention direct field accesses should then be avoided in favor of using the query methods, and it's easy to make a mistake there. Migration-wise, hunting down all configuration field accesses and updating them to use queries also feels like a bigger task than I'd be willing to undertake.

@ematipico
Copy link
Member

I haven't seen any issue in terms of performance:

PR

  Time (mean ± σ):     825.5 ms ±  15.7 ms    [User: 141.4 ms, System: 72.3 ms]
  Range (min … max):   801.5 ms … 855.9 ms    10 runs

main

  Time (mean ± σ):     821.1 ms ±   9.8 ms    [User: 141.2 ms, System: 66.5 ms]
  Range (min … max):   806.5 ms … 840.8 ms    10 runs

crates/biome_cli/src/commands/check.rs Outdated Show resolved Hide resolved
crates/biome_cli/src/commands/format.rs Outdated Show resolved Hide resolved
crates/biome_deserialize/src/string_set.rs Show resolved Hide resolved
@ematipico
Copy link
Member

Personally, I think this solution works.

Sure, Merge doesn't feel that it belongs to the deserializer crate, but I don't mind that much, considering the issue we're trying to fix.

@Conaclos
Copy link
Member

Conaclos commented Jan 9, 2024

Just like StringSet is part of biome_deserialize because it's a common type we deserialize to, so too do we commonly want our deserialized types to be mergeable.

I still think it is not the good place for these traits. We could move them (also StringSet) in a dedicated crate (biome_configuration or something else). However, I will not block the PR for this. We could do that later.

By the way, I am not fond of the NoneState trait and the Merge::merge_in_defaults method. However, I will not block the PR for that.

@ematipico
Copy link
Member

By the way, I am not fond of the NoneState trait and the Merge::merge_in_defaults method. However, I will not block the PR for that.

What's your opinion on this? It would be great to collect some feedback on the PR and then address it later.

@Conaclos
Copy link
Member

Conaclos commented Jan 9, 2024

By the way, I am not fond of the NoneState trait and the Merge::merge_in_defaults method. However, I will not block the PR for that.

What's your opinion on this? It would be great to collect some feedback on the PR and then address it later.

NoneState::none is only used inside the deserializers of objects and maps. This seems like overkill to introduce a trait just for that. Instead of introducing the trait NoneState, we could directly set the fields of a struct with the defaulkt value of the field type (that is None in the case of Option.
Moreover, in the future, we could automatically generate most of deserializers. I am not comfortable with the idea of requiring that every Deserializable type to also implement NoneState.

Merge::merge_in_defaults looks like a code sniff to me. However, I don't know exactly what's wrong with it - it's more of an intuition. I will comment here if anything comes to mind.

@arendjr
Copy link
Contributor Author

arendjr commented Jan 9, 2024

To be honest I was a bit bumped I needed merge_in_defaults() as well, but it’s necessary because unlike merging in an instance, recursion shouldn’t stop at None values.

For instance, Configuration::formatter has a default value of None, so if we merged in Configuration::default() like any other instance, no fields of FormatterConfiguration would be initialized. merge_in_defaults() makes sure recursion can continue by instantiating FormatterConfiguration::default() when needed.

Short of using queries, like you suggested, I don’t see how we can avoid that.

I do like the idea of generating all deserializers, btw. Maybe once we do that, we could just let the macro for it generate a Deserializable::none() method instead of needing it as a separate trait indeed?

@Conaclos
Copy link
Member

Conaclos commented Jan 9, 2024

I do like the idea of generating all deserializers, btw. Maybe once we do that, we could just let the macro for it generate a Deserializable::none() method instead of needing it as a separate trait indeed?

We should just generate the necessary code directly in the deserializer, no need of introducing a none method.

To be honest I was a bit bumped I needed merge_in_defaults() as well, but it’s necessary because unlike merging in an instance, recursion shouldn’t stop at None values.

Yes, I understand your point.

@arendjr
Copy link
Contributor Author

arendjr commented Jan 9, 2024

@Conaclos I think I may also have another proposal that hopefully addresses most of your concerns.

What if we just got rid of all the Option wrappers in Configuration and its related structs? Then we could make a new derive macro, Partial, that generates a second struct in which all the fields are automatically wrapped in Option again. That secondary struct would derive Default and Merge, and we would also generate a From impl for it. Like this:

#[derive(Partial)]
struct Configuration {
    formatter: FormatterConfiguration
}

Would generate:

#[derive(Default, Deserializable, Merge)]
struct PartialConfiguration {
    formatter: Option<PartialFormatterConfiguration>,
}

impl From<PartialConfiguration> for Configuration {
    fn from(other: PartialConfiguration) -> Self {
        Self {
            formatter: other
                .formatter
                .map(FormatterConfiguration::from)
                .unwrap_or_default()
        }
    }
}

The From impl effectively replaces the merge_in_defaults() method and because the Partial struct has all the None values, which we force by generating it, the regular struct remains free to specify semantic defaults like we do today, and no separate NoneState trait is needed.

What I personally really like about this is that it makes it very easy to extend mergeability to the lint rule options, which currently all lack the Option wrappers.

But there’s two difficulties with this approach still:

  • How does the Partial derive on Configuration know that FormatterConfiguration also had a Partial derive? It cannot assume that for all fields.
  • Deserializable would be implemented on the Partial struct. We could still use the handwritten impls we have today, but it feels wrong to hand-maintain impls for generated structs, so we’re practically making generation of the deserializers a blocker for this as well.

What do you think?

@ematipico ematipico changed the title Fix config merging fix(project): config merging Jan 10, 2024
@github-actions github-actions bot added A-Website Area: website A-Changelog Area: changelog labels Jan 10, 2024
@Conaclos
Copy link
Member

Conaclos commented Jan 10, 2024

That's an interesting approach. I also thought about something like this (although less achieved than yours).

However, I am not comfortable of generating a Partial that automatically derives Deserializable, serde::{Deserialize, Serialize}, schemars:JsonSchema, and Bpaf. This breaks modularity and composability.
Unless we use a declarative macro-rule?

create_partial! {
  #[derive(Bpaf, Deserializable, Deserialize, ...)}
  struct MyConfiguration {
    prop: bool
  }
}

This also implies many code change?

I have to think about this...

NOTE: Feel free to open another PR to propose a derive macro for Deserializable.

@arendjr
Copy link
Contributor Author

arendjr commented Jan 10, 2024

I think the derives of the generated struct can also be specified like this:

#[derive(Partial)]
#[partial(derive(Bpaf, Deserialize, JsonSchema))]
struct Configuration {
    #[partial] // Signals that `FormatterConfiguration` also has a Partial derive.
    formatter: FormatterConfiguration
}

Speaking of Bpaf. It’s currently slightly problematic in the current state of the PR that Bpaf initializes its structs with Default::default() rather than NoneState::none. That would be another issue fixed this way.

I do agree it will take quite a few code changes, but at least they’ll be pointed out by the compiler. I also expect they’ll be mostly simplifications, since code won’t need to deal with Option anymore.

@Conaclos
Copy link
Member

I think the derives of the generated struct can also be specified like this:

#[derive(Partial)]
#[partial(derive(Bpaf, Deserialize, JsonSchema))]
struct Configuration {
    #[partial] // Signals that `FormatterConfiguration` also has a Partial derive.
    formatter: FormatterConfiguration
}

Speaking of Bpaf. It’s currently slightly problematic in the current state of the PR that Bpaf initializes its structs with Default::default() rather than NoneState::none. That would be another issue fixed this way.

I do agree it will take quite a few code changes, but at least they’ll be pointed out by the compiler. I also expect they’ll be mostly simplifications, since code won’t need to deal with Option anymore.

I personally like your idea. Maybe you can make a try in another PR? And keep this one around in case you find some shortcomings / technical issues.

@ematipico
Copy link
Member

@Conaclos please let us know if want us to merge the pr

@Conaclos
Copy link
Member

Conaclos commented Jan 11, 2024

Looks good to me! We can try another approach in a follow-up PR. Let's merge the PR since this fixes some bugs.

@ematipico ematipico merged commit a100bec into biomejs:main Jan 12, 2024
19 of 20 checks passed
@arendjr
Copy link
Contributor Author

arendjr commented Jan 12, 2024

Thanks for merging! I will follow up with the Deserializable derive first, then I'll try the Partial approach.

@arendjr arendjr deleted the fix-config-merging branch January 12, 2024 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Changelog Area: changelog A-CLI Area: CLI A-Formatter Area: formatter A-Project Area: project A-Tooling Area: internal tools A-Website Area: website L-JavaScript Language: JavaScript and super languages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants