-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some yaml lines are extremely long #1802
Comments
perhaps the correct answer is not line length, but count of strings. Perhaps something like:
|
I would like it for shops and amenities for example, because every time something is changing here, I (or @math1985, who is usually pulling my code) have to make the changes in YAML once again. It's error prone and tedious, because diff is not able to find that a single new I guess this could be the source of this regression. |
This would work only if we're adding new items at the end, but I like to have more or less alphabetical order and when I add special icon for already existing item, it means also removing the same string from another place in the same line. |
We could maybe use some group variables? This would make this code: CASE WHEN surface IN ('unpaved', 'compacted', 'dirt', 'earth', 'fine_gravel', 'grass',
'grass_paver', 'gravel', 'ground', 'mud', 'pebblestone', 'salt', 'sand', 'woodchips', 'clay') THEN 'unpaved' ELSE
CASE WHEN surface IN ('paved', 'asphalt', 'cobblestone', 'cobblestone:flattened', 'sett', 'concrete', 'concrete:lanes', 'concrete:plates', 'paving_stones', 'metal', 'wood') THEN 'paved'
ELSE null END END AS int_surface, look more clean: CASE WHEN surface IN (@unpaved) THEN 'unpaved' ELSE
CASE WHEN surface IN (@paved) THEN 'paved'
ELSE null END END AS int_surface, Then the variables could be defined using separate lines, so diff would have no problem catching the exact change. I don't know YAML syntax too well, but it should look like: @unpaved:
'unpaved',
'compacted',
'dirt',
'earth',
'fine_gravel',
'grass',
'grass_paver',
'gravel',
'ground',
'mud',
'pebblestone',
'salt',
'sand',
'woodchips',
'clay'; |
There are no such thing in YAML. |
So maybe another format converted to YAML (or directly to MML)? Or just a bit of cheating (let's say "YAML+") - the script would have to expand variables in the first step, then convert as usual? Of course we can also just make line breaks in the proper YAML, but separating data from the code would be more elegant. |
No.
All of the SQL is logic. |
Use |
OK, so what about node anchors (&) and references (*)? Could it help us to move the lists out of the SQL statements? |
No, because each layer has different text for its SQL. |
Why it could be a problem? We don't have to really reuse the data (the name can be modified with layer name or number for each case, like &landcover-aeroway), because we still gain nice readability this way. I was just not able to use it inside the SQL statement - with: name: *landcover-aeroway the substitution for *landcover-aeroway is made, but with: CASE WHEN aeroway IN *landcover-aeroway THEN aeroway ELSE NULL END, it is just copied as it is. If we can't solve this, I still prefer just one list item per line inside SQL. |
Because anchors and references would get us the same text for the SQL for multiple layers, and no two layers have the same SQL. |
They won't, as the anchors would be named like Still the open question is if we can make the substitution happen inside the SQL statement or chop the SQL code into parts (with one of them being the reference to anchor) and join these parts into proper SQL command later (probably converting it to MML). |
If they don't have the same text then anchors and references are no use! Word-wrapping lists at line-length based points doesn't help diffs, as they need to be rewrapped when editing the middle. Putting each list item on its own line inflates the SQL too much. Examples like the initial one should probably have more line breaks in them, but this is not connected to long lists, as that example doesn't have any long lists. |
This is just their intended use (efficient notation for recurring data), we are not bound by that design.
What problem do you see with it? File size is not the concern, I guess, but they are already hard to read in many places. |
Why was this issue closed?
is not solved... Or did i missed something? If this is the solution, we could decide if there is a one time conversion of all SQL strings, or if the conversion should be made, if someone is editing one SQL statement. |
+1 And maybe instead of one YAML file we could have one YAML file per layer then, to make it easier to navigate/read. |
We are bound by the design of anchors and references. Beyond what I've explained, about all I can do is refer to the YAML specification
No, the layers need to be defined in one file, in a layers sequence
Unlimited, as it is for now. Putting one item per line would probably approximately double the linecount of project.yaml. Just for reference, a diff adding one key to a layer looks like this for how we format SQL |
There is no way to merge based on word-diff, right? I also experienced that the long lines make merging is time-consuming and error-prone. Even doubling the number of lines might be worth it to me. |
The problem is this is not a standard diff behavior, but even if there is a way to configure it to make code merging sane again (wouldn't it be trouble with merging MML files in turn?), the GitHub still shows it in a nasty way (just the whole lines) and that's not the only problem.
Nothing prevents us from having it this way even with a layers split, we can have both. I think it would be quite easy to name the files like 1-layername.yaml, 2-layername.yaml ... n-layername.yaml etc and in the first step of the script just merge them in this sequence into a (temporary) project.yaml - and then proceed as before.
That's the question I was trying to ask you: how bad is having double the linecount (for me it's just not nice, but with the layers split it would be even better than currently) considering the real problems with (a) editing, (b) displaying, (c) merging and (d) debugging? I have no doubt these problems are big enough to outweigh the not-niceness of having more lines. It is so hard for me, that I wanted to open such issue myself, I was just busy with some other things, so thanks, @matkoniecz, for doing it! |
Re-opened because for me the problem is severe enough that it is worth more research. |
$ wc -l project.yaml
2325 project.yaml I guess the split is needed anyway - it's not possible for me to browse through the layers, so I basically use search here. $ cat project.yaml | grep id: | wc -l
81 So probably special directory (layers? yaml?) would be good and file names could go from 00-header.yaml. |
I'm not a programmer, so it's probably rather naive implementation, but I've just made working solution to test splitting "project.yaml" file to a set of YAML files - one per layer - in subdirectory "yaml" (script called "split-yaml.py") and merging them back into one file (script called "merge-yaml.py"). For testing purposes merging doesn't overwrite "project.yaml" and instead uses "0project.yaml" file as an output (just remove "0" if you want to actually go back and forth with conversion). Usage from the main style directory (where the "project.yaml" file is): The code is in this branch: https://github.com/kocio-pl/openstreetmap-carto/tree/yaml |
Needing to preprocess YAML to MML is an annoying enough step, but it has some support in map design programs, and it's a standard file format. Doing our own format isn't an option. Besides, there's an issue with what you're trying to do. As I've pointed out, layers are a sequence. You can't generate a merged file by simply concatenating the files. You would need to walk the YAML tree for each file and merge them |
I want to avoid adding any further complexity to editing the layers definitions. It's already complex enough, in that we have to compile the yaml to mml to work with both Tilemill and raw carto, but that was a tradeoff worth doing. (To explain: the SQL queries in .mml couldn't contain any line breaks at all, and .yaml is both a reasonable format and also chosen by mapbox-studio to replace .mml files for this very reason). I don't want to add more levels of complexity and therefore make it even harder to use this style with standard tools - ideally, none of us would be editing these by hand. The old stylesheets were a mess of ad-hoc included files and I don't want us to make the same mistakes. There are two problems mentioned here - individual line lengths, and the total length of the file. The second is not a big problem, jumping around the file using search is easier than trying to search across multiple files. For the individual line lengths, they should be limited to around 160-200 characters, or 5-10 items in a "foo IN ('bar', 'baz', ...)" style list, but it's not worth having hard limit or being too exact. It's also not worth reflowing lists on every change, so I'd suggest adding line breaks as a one-off, and then letting them grow or contract for a while. In summary, short lines are good, but not one-item-per-line, and don't be too fussy about keeping them the same length. |
gravitystorm: this seems a very good solution. Thanks. |
@gravitystorm This sounds like a sane solution given the constrains. For me it means for example we won't have alphabetical order there and we should just add new items to the last available list. Lack of hard length limits also makes removing sane - we don't have to refactor items in the lists every time something is deleted. @HolgerJeromin +1 - comments/documentation would be good, indeed. @pnorman OK, I understand that splitting it that way is just a simple tool and we don't have the proper YAML in the resulting files (I guess we should expand all the references to anchors and make references again when merging, which means finding recurring data segments - and so on). Still I think this tool could be useful for kind of manipulations limited only to some layers: searching doesn't give you the clear idea what layer are you working on and if you're not messing with another one by accident, you also don't have the list of all the layers. Is it possible to add them to repository as a helper scripts? If this would be the case, I would change the splitted filenames from xx-name.yaml to just xxx-name (to not suggest they are still proper YAML, just a temporary, working files, and make 99+ layers possible), replace the merge script with @pnorman simpler shell version for example and document them a bit, but if not, I'm not going to make it more clean just for myself, because it takes a lot of time when you're not experienced with coding. |
I think most lists have an order that should be preserved - not always alphabetical. For example, roads are listed starting with motorway, then trunk, etc. But if the list is already alphabetical then you can add new values into the middle of the list, perhaps making it 7 items on a row instead of 6. If we stick to a smallish number of items, diffs are still easy to read even if the whole line is marked as changed in the diff. Only if the list is completely unsorted should you add to the end. |
You're right, of course. Checking what order should be used for given layer will be a minor task, yet worth taking - should we make it before the lines split or after that? |
For example:
('railway_' || (CASE WHEN railway = 'preserved' AND service IN ('spur', 'siding', 'yard') THEN 'INT-preserved-ssy'::text WHEN (railway = 'rail' AND service IN ('spur', 'siding', 'yard')) THEN 'INT-spur-siding-yard' WHEN (railway = 'tram' AND service IN ('spur', 'siding', 'yard')) THEN 'tram-service' ELSE railway END)),
is one line. In my typical editing environment
'siding', 'yard')) THEN 'INT-spur-siding-yard'
and later part requires scrolling to be visible - what is not making easier to read and edit this SQL.I think that readability and editability would be improved with linebreaks. I already did it with surface SQL - see bf562c6?diff=unified
The question is - how long line should be? Or should it stay unlimited as it is now?
The text was updated successfully, but these errors were encountered: