Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve formatting for chained boolean operators with complex expressions #189

Closed
tconbeer opened this issue Jun 2, 2022 · 10 comments
Closed
Labels
enhancement New feature or request
Milestone

Comments

@tconbeer
Copy link
Owner

tconbeer commented Jun 2, 2022

Describe the bug
Reported by @Rainymood on discourse.

We're not very smart about building a hierarchy of chained boolean expressions; mixing simple expressions with complex expressions can produce unexpected and hard-to-read formatting.

To Reproduce

select *
from historic_orders h
where 1 = 1
    and order_type = 'clothing'
    and category = "pants"
    and dt_utc > date('2019-10-01')
    and is_test is false

Expected behavior

select *
from historic_orders h
where
    1 = 1
    and order_type = 'clothing'
    and category = "pants"
    and dt_utc > date('2019-10-01')
    and is_test is false

Actual behavior

select *
from historic_orders h
where
    1 = 1 and order_type = 'clothing' and category = "pants" and dt_utc > date(
        '2020-10-01'
    ) and is_test is false

Additional context
What is the output of sqlfmt --version?
0.8.0

@tconbeer tconbeer added the enhancement New feature or request label Jun 2, 2022
@tconbeer tconbeer added this to the v0.10.0 milestone Jun 2, 2022
@Rainymood
Copy link

How would you sketch a solution for this bug?

I've been browsing through the code a bit, getting set up with a dev environment, running all the tests, etc. I'm diving into the code/data now but I'm having a tough time seeing where I would have to change what to make this fix.

This bug is the one big thing that is stopping our team from using sqlfmt everywhere so I'm eager to get this fixed :)

@tconbeer
Copy link
Owner Author

tconbeer commented Jun 7, 2022

That's great to hear!

I'm not quite sure what a solution will look like just yet. The changes will be in merger.py, specifically in _maybe_merge_segment() and _maybe_merge_operators()

I think _maybe_merge_operators() needs some pretty major changes to scale to more edge cases.

  1. I think ideally we would establish a hierarchy of sorts from the operators (we already sort of do this with "low priority" operators but that's much too simple).
  2. The other change is to that this method probably should be "all or nothing" for operators at a certain depth, so we don't create specifically the situation you encountered, where we merge the first 4 lines but not the last 1

Finally, right now, _maybe_merge_operators() runs first, and then _maybe_merge_segment() runs on the output, but I'm not convinced that is the right order of operations or overall approach.

Sorry I can't be more helpful here. merger.py is the hardest/most complicated logic of this whole app, and it's hard to intuit solutions. I'd really recommend test-driven development here, so you can try some things and see what works (and doesn't cause regressions).

@tconbeer
Copy link
Owner Author

tconbeer commented Jun 7, 2022

Thinking about this a little more.

If we change your query to:

select *
from historic_orders h
where 1 = 1
    and order_type = 'clothing'
    and category = "pants"
    and is_test is false
    and dt_utc > date('2020-10-01')  -- moved this to the end

sqlfmt does what you want.

The easiest/least destructive way to change this behavior is by changing _line_continues_operator_sequence() and the code that calls it to allow for blocks that are indented from the initial run of operators. After splitting, your code becomes

select
    *
from
    historic_orders h
where 
    1 
    = 1
    and order_type 
    = 'clothing'
    and category 
    = "pants"
    and dt_utc 
    > date(
        '2019-10-01'
    )
    and is_test 
    is false

We get the "bad" formatting because we merge the lines up to and including and dt_utc > date(, and don't consider the fact that and testschool... is in the same run of operators that should be merged together.

@tconbeer
Copy link
Owner Author

@Rainymood did you want to try to contribute to this? Otherwise I might be able to pick it up next week

@Rainymood
Copy link

No no go ahead! Would love to see this fixed :)

Thanks for fixing the other issue so fast btw! Awesome!

@tconbeer
Copy link
Owner Author

@Rainymood This is merged to main now, but 0.10.0 may not be released for a while yet. If you want, you can install this patch with:

python -m pip install "shandy-sqlfmt[jinjafmt] @ git+https://github.com/tconbeer/sqlfmt.git@6efbf29510fc93b3d0d1bef0d8d6b2a310b9690a"

If you get that working, I'd love your feedback on the new style!

@Rainymood
Copy link

Rainymood commented Jul 25, 2022

Yes! It works perfectly for me.

I'm wondering though: is there a toggle to turn this behavior on/off or is this behavior now enabled by default? For the queries that I tested so far it looks really promising but I'll try to find more edge cases :)


P.S. Would it be possible to change the examples in the original post to this? (I'll delete this message after you edited the OP). I tried to message you privately on twitter but that seems disabled. Let me know!

To Reproduce

select *
from historic_orders h
where
    1 = 1
    and order_type = 'clothing'
    and category = "pants"
    and dt_utc > date('2019-10-01')

Expected behavior

select *
from historic_orders h
where
    1 = 1
    and order_type = 'clothing'
    and category = "pants"
    and dt_utc > date('2019-10-01')

Actual behavior

select *
from historic_orders h
where
    1 = 1 and order_type = 'clothing' and category = "pants" and dt_utc > date(
        '2020-10-01'
    )

@tconbeer
Copy link
Owner Author

This is the default and only style from now on, although I opened a couple of issues for small tweaks I'd like to make to build on this style and improve it.

Code above updated.

@Rainymood
Copy link

There's also another reference in the comments around halfway this thread :).

The changes seem great already, this change will be huge in support of my team adopting this :).

@tconbeer
Copy link
Owner Author

Got it this time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants