Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some yaml lines are extremely long #1802

Closed
matkoniecz opened this issue Sep 1, 2015 · 29 comments
Closed

Some yaml lines are extremely long #1802

matkoniecz opened this issue Sep 1, 2015 · 29 comments

Comments

@matkoniecz
Copy link
Contributor

For example:

('railway_' || (CASE WHEN railway = 'preserved' AND service IN ('spur', 'siding', 'yard') THEN 'INT-preserved-ssy'::text WHEN (railway = 'rail' AND service IN ('spur', 'siding', 'yard')) THEN 'INT-spur-siding-yard' WHEN (railway = 'tram' AND service IN ('spur', 'siding', 'yard')) THEN 'tram-service' ELSE railway END)),

is one line. In my typical editing environment 'siding', 'yard')) THEN 'INT-spur-siding-yard' and later part requires scrolling to be visible - what is not making easier to read and edit this SQL.

I think that readability and editability would be improved with linebreaks. I already did it with surface SQL - see bf562c6?diff=unified

The question is - how long line should be? Or should it stay unlimited as it is now?

@HolgerJeromin
Copy link
Contributor

perhaps the correct answer is not line length, but count of strings. Perhaps something like:

  • three strings are ok.
  • four till ten should get an own line.
  • more than ten should...
    • every entry gets an own line (not really good)?
    • group of three to five gets an own line (if could be groups logically)?
    • another suggestion?

@kocio-pl
Copy link
Collaborator

kocio-pl commented Sep 1, 2015

every entry gets an own line (not really good)?

I would like it for shops and amenities for example, because every time something is changing here, I (or @math1985, who is usually pulling my code) have to make the changes in YAML once again. It's error prone and tedious, because diff is not able to find that a single new 'value', string is inserted or moved (it just sees the whole line has changed).

I guess this could be the source of this regression.

@kocio-pl
Copy link
Collaborator

kocio-pl commented Sep 1, 2015

group of three to five gets an own line (if could be groups logically)?

This would work only if we're adding new items at the end, but I like to have more or less alphabetical order and when I add special icon for already existing item, it means also removing the same string from another place in the same line.

@kocio-pl
Copy link
Collaborator

kocio-pl commented Sep 4, 2015

We could maybe use some group variables? This would make this code:

CASE WHEN surface IN ('unpaved', 'compacted', 'dirt', 'earth', 'fine_gravel', 'grass', 
                  'grass_paver', 'gravel', 'ground', 'mud', 'pebblestone', 'salt', 'sand', 'woodchips', 'clay') THEN 'unpaved' ELSE 
                  CASE WHEN surface IN ('paved', 'asphalt', 'cobblestone', 'cobblestone:flattened', 'sett', 'concrete', 'concrete:lanes', 'concrete:plates', 'paving_stones', 'metal', 'wood') THEN 'paved' 
ELSE null END END AS int_surface,

look more clean:

CASE WHEN surface IN (@unpaved) THEN 'unpaved' ELSE 
                  CASE WHEN surface IN (@paved) THEN 'paved' 
ELSE null END END AS int_surface,

Then the variables could be defined using separate lines, so diff would have no problem catching the exact change. I don't know YAML syntax too well, but it should look like:

@unpaved: 
'unpaved', 
'compacted', 
'dirt', 
'earth', 
'fine_gravel', 
'grass',
'grass_paver', 
'gravel', 
'ground', 
'mud', 
'pebblestone', 
'salt', 
'sand', 
'woodchips', 
'clay';

@pnorman
Copy link
Collaborator

pnorman commented Sep 4, 2015

We could maybe use some group variables? This would make this code

There are no such thing in YAML.

@kocio-pl
Copy link
Collaborator

kocio-pl commented Sep 4, 2015

So maybe another format converted to YAML (or directly to MML)?

Or just a bit of cheating (let's say "YAML+") - the script would have to expand variables in the first step, then convert as usual?

Of course we can also just make line breaks in the proper YAML, but separating data from the code would be more elegant.

@pnorman
Copy link
Collaborator

pnorman commented Sep 4, 2015

So maybe another format converted to YAML (or directly to MML)?

No.

Of course we can also just make line breaks in the proper YAML, but separating data from the code would be more elegant

All of the SQL is logic.

@pnorman
Copy link
Collaborator

pnorman commented Sep 4, 2015

because diff is not able to find that a single new 'value', string is inserted or moved (it just sees the whole line has changed).

Use --word-diff when showing it with git

@kocio-pl
Copy link
Collaborator

kocio-pl commented Sep 5, 2015

OK, so what about node anchors (&) and references (*)? Could it help us to move the lists out of the SQL statements?

@pnorman
Copy link
Collaborator

pnorman commented Sep 6, 2015

OK, so what about node anchors (&) and references (*)? Could it help us to move the lists out of the SQL statements?

No, because each layer has different text for its SQL.

@kocio-pl
Copy link
Collaborator

kocio-pl commented Sep 6, 2015

Why it could be a problem? We don't have to really reuse the data (the name can be modified with layer name or number for each case, like &landcover-aeroway), because we still gain nice readability this way.

I was just not able to use it inside the SQL statement - with:

name: *landcover-aeroway

the substitution for *landcover-aeroway is made, but with:

CASE WHEN aeroway IN *landcover-aeroway THEN aeroway ELSE NULL END,

it is just copied as it is.

If we can't solve this, I still prefer just one list item per line inside SQL.

@pnorman
Copy link
Collaborator

pnorman commented Sep 6, 2015

Why it could be a problem?

Because anchors and references would get us the same text for the SQL for multiple layers, and no two layers have the same SQL.

@kocio-pl
Copy link
Collaborator

kocio-pl commented Sep 6, 2015

They won't, as the anchors would be named like layer_name + variable_name (landcover-aeroway in the example above).

Still the open question is if we can make the substitution happen inside the SQL statement or chop the SQL code into parts (with one of them being the reference to anchor) and join these parts into proper SQL command later (probably converting it to MML).

@pnorman
Copy link
Collaborator

pnorman commented Sep 6, 2015

They won't [be the same text]

If they don't have the same text then anchors and references are no use!

Word-wrapping lists at line-length based points doesn't help diffs, as they need to be rewrapped when editing the middle. Putting each list item on its own line inflates the SQL too much.

Examples like the initial one should probably have more line breaks in them, but this is not connected to long lists, as that example doesn't have any long lists.

@pnorman pnorman closed this as completed Sep 6, 2015
@kocio-pl
Copy link
Collaborator

kocio-pl commented Sep 6, 2015

If they don't have the same text then anchors and references are no use!

This is just their intended use (efficient notation for recurring data), we are not bound by that design.

Putting each list item on its own line inflates the SQL too much.

What problem do you see with it? File size is not the concern, I guess, but they are already hard to read in many places.

@HolgerJeromin
Copy link
Contributor

Why was this issue closed?
The main question

how long line should be? Or should it stay unlimited as it is now?

is not solved... Or did i missed something?
The discussion could lead to the answer:
Give every entry a new line for better diffs.

If this is the solution, we could decide if there is a one time conversion of all SQL strings, or if the conversion should be made, if someone is editing one SQL statement.

@kocio-pl
Copy link
Collaborator

kocio-pl commented Sep 7, 2015

+1

And maybe instead of one YAML file we could have one YAML file per layer then, to make it easier to navigate/read.

@pnorman
Copy link
Collaborator

pnorman commented Sep 8, 2015

This is just their intended use (efficient notation for recurring data), we are not bound by that design.

We are bound by the design of anchors and references. Beyond what I've explained, about all I can do is refer to the YAML specification

And maybe instead of one YAML file we could have one YAML file per layer then, to make it easier to navigate/read.

No, the layers need to be defined in one file, in a layers sequence

how long line should be? Or should it stay unlimited as it is now?

is not solved... Or did i missed something?
The discussion could lead to the answer:

Unlimited, as it is for now. Putting one item per line would probably approximately double the linecount of project.yaml.

Just for reference, a diff adding one key to a layer looks like this for how we format SQL

image
(Screenshot for colour)

Or, with purely color
image

@matthijsmelissen
Copy link
Collaborator

There is no way to merge based on word-diff, right?

I also experienced that the long lines make merging is time-consuming and error-prone. Even doubling the number of lines might be worth it to me.

@kocio-pl
Copy link
Collaborator

kocio-pl commented Sep 8, 2015

@pnorman

Just for reference, a diff adding one key to a layer looks like this for how we format SQL

The problem is this is not a standard diff behavior, but even if there is a way to configure it to make code merging sane again (wouldn't it be trouble with merging MML files in turn?), the GitHub still shows it in a nasty way (just the whole lines) and that's not the only problem.

No, the layers need to be defined in one file, in a layers sequence

Nothing prevents us from having it this way even with a layers split, we can have both. I think it would be quite easy to name the files like 1-layername.yaml, 2-layername.yaml ... n-layername.yaml etc and in the first step of the script just merge them in this sequence into a (temporary) project.yaml - and then proceed as before.

Putting one item per line would probably approximately double the linecount of project.yaml.

That's the question I was trying to ask you: how bad is having double the linecount (for me it's just not nice, but with the layers split it would be even better than currently) considering the real problems with (a) editing, (b) displaying, (c) merging and (d) debugging?

I have no doubt these problems are big enough to outweigh the not-niceness of having more lines. It is so hard for me, that I wanted to open such issue myself, I was just busy with some other things, so thanks, @matkoniecz, for doing it!

@matthijsmelissen
Copy link
Collaborator

Re-opened because for me the problem is severe enough that it is worth more research.

@kocio-pl
Copy link
Collaborator

kocio-pl commented Sep 8, 2015

$ wc -l project.yaml 
2325 project.yaml

I guess the split is needed anyway - it's not possible for me to browse through the layers, so I basically use search here.

$ cat project.yaml | grep id: | wc -l
81

So probably special directory (layers? yaml?) would be good and file names could go from 00-header.yaml.

@kocio-pl
Copy link
Collaborator

kocio-pl commented Sep 8, 2015

I'm not a programmer, so it's probably rather naive implementation, but I've just made working solution to test splitting "project.yaml" file to a set of YAML files - one per layer - in subdirectory "yaml" (script called "split-yaml.py") and merging them back into one file (script called "merge-yaml.py"). For testing purposes merging doesn't overwrite "project.yaml" and instead uses "0project.yaml" file as an output (just remove "0" if you want to actually go back and forth with conversion).

Usage from the main style directory (where the "project.yaml" file is):
./scripts/split-yaml.py
./scripts/merge-yaml.py

The code is in this branch: https://github.com/kocio-pl/openstreetmap-carto/tree/yaml

@pnorman
Copy link
Collaborator

pnorman commented Sep 8, 2015

Needing to preprocess YAML to MML is an annoying enough step, but it has some support in map design programs, and it's a standard file format. Doing our own format isn't an option.

Besides, there's an issue with what you're trying to do. As I've pointed out, layers are a sequence. You can't generate a merged file by simply concatenating the files. You would need to walk the YAML tree for each file and merge them

@gravitystorm
Copy link
Owner

I want to avoid adding any further complexity to editing the layers definitions. It's already complex enough, in that we have to compile the yaml to mml to work with both Tilemill and raw carto, but that was a tradeoff worth doing. (To explain: the SQL queries in .mml couldn't contain any line breaks at all, and .yaml is both a reasonable format and also chosen by mapbox-studio to replace .mml files for this very reason).

I don't want to add more levels of complexity and therefore make it even harder to use this style with standard tools - ideally, none of us would be editing these by hand. The old stylesheets were a mess of ad-hoc included files and I don't want us to make the same mistakes.

There are two problems mentioned here - individual line lengths, and the total length of the file. The second is not a big problem, jumping around the file using search is easier than trying to search across multiple files.

For the individual line lengths, they should be limited to around 160-200 characters, or 5-10 items in a "foo IN ('bar', 'baz', ...)" style list, but it's not worth having hard limit or being too exact. It's also not worth reflowing lists on every change, so I'd suggest adding line breaks as a one-off, and then letting them grow or contract for a while.

In summary, short lines are good, but not one-item-per-line, and don't be too fussy about keeping them the same length.

@HolgerJeromin
Copy link
Contributor

gravitystorm: this seems a very good solution. Thanks.
Perhaps a description of this philosophy could be added on top of the yaml file as a comment.

@kocio-pl
Copy link
Collaborator

kocio-pl commented Sep 9, 2015

@gravitystorm This sounds like a sane solution given the constrains. For me it means for example we won't have alphabetical order there and we should just add new items to the last available list. Lack of hard length limits also makes removing sane - we don't have to refactor items in the lists every time something is deleted.

@HolgerJeromin +1 - comments/documentation would be good, indeed.

@pnorman OK, I understand that splitting it that way is just a simple tool and we don't have the proper YAML in the resulting files (I guess we should expand all the references to anchors and make references again when merging, which means finding recurring data segments - and so on).

Still I think this tool could be useful for kind of manipulations limited only to some layers: searching doesn't give you the clear idea what layer are you working on and if you're not messing with another one by accident, you also don't have the list of all the layers. Is it possible to add them to repository as a helper scripts? If this would be the case, I would change the splitted filenames from xx-name.yaml to just xxx-name (to not suggest they are still proper YAML, just a temporary, working files, and make 99+ layers possible), replace the merge script with @pnorman simpler shell version for example and document them a bit, but if not, I'm not going to make it more clean just for myself, because it takes a lot of time when you're not experienced with coding.

@gravitystorm
Copy link
Owner

For me it means for example we won't have alphabetical order there and we should just add new items to the last available list.

I think most lists have an order that should be preserved - not always alphabetical. For example, roads are listed starting with motorway, then trunk, etc. But if the list is already alphabetical then you can add new values into the middle of the list, perhaps making it 7 items on a row instead of 6. If we stick to a smallish number of items, diffs are still easy to read even if the whole line is marked as changed in the diff.

Only if the list is completely unsorted should you add to the end.

@kocio-pl
Copy link
Collaborator

kocio-pl commented Sep 9, 2015

You're right, of course.

Checking what order should be used for given layer will be a minor task, yet worth taking - should we make it before the lines split or after that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants