Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default nugget has potential clashes with json #69

Open
raulvejar opened this issue May 13, 2013 · 17 comments
Open

Default nugget has potential clashes with json #69

raulvejar opened this issue May 13, 2013 · 17 comments
Milestone

Comments

@raulvejar
Copy link
Contributor

The default nuggets "[[[" and "]]]" are a somewhat common combination when building a json structure that has arrays within arrays.
Since most people will not change the default, we should try to make sure we pick one that is failry unique among the domains we are targeting (C#, Razor, HTML, Json, Css, XML and all other web-related techs for .net based websites)

I've opened this issue so we can track suggestions as well

I've been experimenting with '[||[' and ']||]', and they seem to work pretty well as '||' is generally used as an or operator that is binary which will make it fail because of the brackets so it is very unlikely we'll run into a case where this has a valid meaning in one of the targeted domains.

@turquoiseowl
Copy link
Owner

If going for [||[message]||], any particular reason why not [|[message]|] ?

I haven't used JSON much myself and would just be interested to see a JSON structure that has [[[ or ]]] if you have one to hand.

From: Raul Vejar [mailto:[email protected]]
Sent: 13 May 2013 19:00
To: danielcrenna/i18n
Subject: [SPAM] [i18n] Default nugget has potential clashes with json (#69)

The default nuggets "[[[" and "]]]" are a somewhat common combination when building a json structure that has arrays within arrays.
Since most people will not change the default, we should try to make sure we pick one that is failry unique among the domains we are targeting (C#, Razor, HTML, Json, Css, XML and all other web-related techs for .net based websites)

I've opened this issue so we can track suggestions as well

I've been experimenting with '[||[' and ']||]', the seem to work pretty well as '||' is generally used as an or operator that is binary which will make it fail because of the brackets so it is very unlikely we'll run into a case where this has a valid meaning in one of the targeted domains.


Reply to this email directly or view it on GitHub #69 .Image removed by sender.

@raulvejar
Copy link
Contributor Author

You asked for it. This is a service response that returns a geo shape for pennsylvania (I'm trimming the middle otherwise it's too big for github). Notice the coordinates array

{"type":"FeatureCollection","features":[{"type":"Feature","geometry":{"type":"MultiPolygon","coordinates":[[[[-80.293756,39.721187],[-80.290936,39.721172],[-80.290809,39.721171],[-80.288876,39.721161],[-80.287441,39.721153],[-80.283903,39.721135],[-80.283508,39.721133],[-80.281164,39.721119],
....
,[-80.336281,39.721343],[-80.335811,39.721342],[-80.335759,39.721342],[-80.335672,39.721342],[-80.335567,39.721342],[-80.335482,39.721342],[-80.335469,39.721342],[-80.335432,39.721341],[-80.335391,39.721341],[-80.334352,39.72134],[-80.334314,39.72134],[-80.332044,39.72132],[-80.3319,39.721319],[-80.329893,39.721302],[-80.328396,39.721289],[-80.328353,39.721288],[-80.328334,39.721288],[-80.328262,39.721288],[-80.309478,39.721273],[-80.309228,39.721276],[-80.308651,39.721283],[-80.30768,39.721282],[-80.306398,39.721267],[-80.306066,39.721261],[-80.306019,39.72126],[-80.305899,39.721258],[-80.305878,39.721258],[-80.305808,39.721257],[-80.293889,39.721188],[-80.293756,39.721187]]]]},"properties":{"Name":"Pennsylvania"}}]}

@turquoiseowl
Copy link
Owner

Hmm. Boiling that down I'm getting a structure like this:

{
    "type":"MultiPolygon",
    "coordinates":
    [[[
        [-80.293756,39.721187],
        ...
        [-80.293756,39.721187]
    ]]]
}

which can be further simplified to this:

{
    "coordinates":
    [[[[-80.293756,39.721187]]]]
}

Seems slightly odd structure that but I suppose perfectly valid?

There is probably an issue with C-based languages as well with respect to the ending marker:

    rg1[rg2[rg3[0]]]

although I don't think that would be an issue with the present regex used to scan for nuggets as it is making 'ungreedy' matches going from [[[ forwards.

@raulvejar
Copy link
Contributor Author

Because I had to take out a lot of the text to make it fit you can't tell, but the structure is an array of arrays of arrays of coordinates, there is really no way to simplify it if you would see the full content as it is something like
[[[[1,2][2,3]...[4,5]],[[1,2]],...[[1,2]]],[[[1,2]]]]
I thought it was an unlikely case, but I didn't realize it was much more common when dealing with geo shapes, so for my use cases it is a failry common usecase.

@turquoiseowl
Copy link
Owner

Even though it is hard to type in on most keyboards and without a keyboard macro, I still like «««message»»».

@raulvejar
Copy link
Contributor Author

Yeah, not being in most keyboards is a major drawback. I can feel the hate mail/github-issues flowing already

@turquoiseowl
Copy link
Owner

No doubt, but at the same time (playing devils advocate) we're only talking about a default and the haters are free to override. I've got a VS-extension that enters the chevrons on pressing a keyboard shortcut, which could also be wired up to a button next to the PostBuild button.

So:

[[[message]]] is easy to enter and looks neat, but will always clash with something.
[|[message]|] or [||[message]||] is easyish to enter but fiddly, but unlikely to clash.
«««message»»» is hard to enter without extension, but looks neat and is unlikely to clash.

@raulvejar
Copy link
Contributor Author

Haters are free to override, but the default should work fine and easy out of the box for the majority of people

Regarding the 'neat looking' factor... I don't know Martin, I guess it doesn't have a lot of value to me, specially compared with 'easiness to enter'. I can hear the screams of my designers already if I tell them they'll have to encode every single visible string of html with characters that are not on the keyboard...
As long as it's obvious from looking at the string that it has been marked for internationalization it is fullfilling the purpose and I feel all the options do a good job of that.

The plugin idea... not enough I think. I do feel eventually we'll have to add a plugin that gives warnings on strings that have not been internationalized and potentially suggest applying the nuggets automatically, but a button is not a good solution. The shortcut might be a good idea regardless, but VS has it's own way to defining shortcuts and I've seen plugins (cough RESHARPER) that try to define new ones only to have them clash and make the user go through a conflict resultion process every time the shortcut is used which sucks.

@turquoiseowl
Copy link
Owner

Okay, I tend to agree on that. In the hope of eliciting more views:

  1. [||[message %0|||{0}///comment]||]
  2. [|[message %0|||{0}///comment]|]
  3. [#[message %0|||{0}///comment]#]

...

@rubydagr
Copy link

So a general question is how this plays with user entered text. If I render a user generated field they could potentially match by chance any string that we pick.

A reasonable assumption is user generated content will be escaped. Is there any reason to avoid something like "<<<" and ">>>" or "<<|" and "|>>"? That way it wouldn't be content that would be expected to be found in a html document.

@turquoiseowl
Copy link
Owner

Yes, I see your insight there. But I can think of at least one problem which is entering nuggets into an HTML editor, specifically in my case the Visual Studio razor/cshtml editor. E.g.

          <th>
            [[[License]]]
          </th>

Another one might be that nuggets could break the HTML if/when they happen to slip through un-parsed, e.g. when testing.

My original reason for going with the [ and ] chars was to mimic the XML CDATA element in a way.

But you are correct that it will potentially clash with user generated input. Can we say, however, that < and > and the only chars we can assume will be escaped? I presume so as & and apostophe's don't seem to strictly require escaping.

I think in recognition of the user-input clash we have had in mind a sequence that is very unlikely to clash. The downside of a clash is only manifest if the whole sequence matches a message for which there is a translation. If there is no translation, then the nugget is left in the output untouched and so no problem.

@turquoiseowl
Copy link
Owner

May I add to the previous list:

  1. [[message]]

The thinking here is that the ` char (top-left of most keyboards by the looks of it) is easy to type, not requiring SHIFT.

In that was it is preferable to | which does require SHIFT on some keyboards, such a UK ones. And also preferable to # which requires shift on a US keyboard (but not UK ...).

@turquoiseowl
Copy link
Owner

LOL, that idea just tripped over markdown. I thought markdown used three of those ` chars?!

Second attempt:

  1. [`[message]`]

@turquoiseowl
Copy link
Owner

@rubydagr I see that <<|message|>> in fact works okay with the VS2012 editor so scrub that objection in that instance.

@rubydagr
Copy link

Just to come back, we've deployed in production with . It doesn't match any text users could possibly enter, the mark-up remains valid during development, and as a bonus, strings to be translated are color coded.

@turquoiseowl
Copy link
Owner

That's very helpful, thank you. And I'm pleased to hear the library is working for you.

@Sshnyari
Copy link
Contributor

Instead of searching for an universal nugget syntax, why not just ad the possibility to override nugget tokens per content type ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants