Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

:" " syntax for non-standard quoted symbols? #9945

Open
alyst opened this issue Jan 28, 2015 · 18 comments
Open

:" " syntax for non-standard quoted symbols? #9945

alyst opened this issue Jan 28, 2015 · 18 comments
Labels
needs decision A decision on this change is needed parser Language parsing and surface syntax
Milestone

Comments

@alyst
Copy link
Contributor

alyst commented Jan 28, 2015

At the moment to generate non-standard symbols in Julia (e.g. with whitespace), one has to write symbol("a b"). R uses backtick quotes for similar purposes, and that is much more concise. The typical use case is to refer to a dataframe column with non-standard characters in its name (e.g. imported from CSV/XLS file).
It would be nice to use backticks to quote symbols in Julia as well, but backticks are currently reserved for command interpolation. What if, instead of backticks, commands would be specified as sh"<cmd>" or cmd"<cmd>", and backticks would be granted to symbols (given it's a more generic and frequent use case)?
The use of backticks for commands was inspired by Unix shell, but, in fact, it's a little bit misleading as backquoting a command in Julia doesn't execute it, as in shell. sh"<cmd>" alternative could be naturally extended to the other external scripts, e.g perl"...", lua"..." etc.

@johnmyleswhite
Copy link
Member

The reason we made the choice to use symbols instead of strings in DataFrames was to discourage non-standard column names. It would be much better to reverse that decision than to redefine backticks in Base Julia.

@alyst
Copy link
Contributor Author

alyst commented Jan 28, 2015

I also try to avoid non-standard column names whenever possible, but sometimes it's just much more straightforward to use the original data format. Do you propose to use something like df."top 5%" in the revised syntax? In dplyr one could also write mutate( diff = top 5%-bottom 5% ), would it be also possible in DataFrames.jl without additional syntactic burden?

@JeffBezanson
Copy link
Member

There is a lot to be said for using cmd"..." for commands, but in that case I think there are probably several potential uses for backticks that would have higher priority. We could also add sym"a b" for symbols.

@nalimilan
Copy link
Member

@alyst I think DataFrames should support column labels for more expressive descriptions which could be used automatically e.g. for plotting. This has been discussed somewhere in DataFrames.jl. Column names would better remain very simple, like top5. Every time I've used names like "top 5%" in R I've come to regret it the second time I had to type the name with backticks.

@alyst
Copy link
Contributor Author

alyst commented Jan 28, 2015

@nalimilan I am also the strong proponent of "column names must be valid identifiers" policy, but in certain cases (data is imported and the input format is fixed; column names encode metadata; porting existing R scripts to Julia etc) it would just be convenient to use non-standard ones instead of fixing the data and the scripts to comply with the policy.
Abstracting from the data frames, the generic question here is -- should Julia support [an easy way of expressing] non-standard identifiers/symbols? Someone might overuse the feature, but ATM Julia gives so much coding freedom that it would be just one minor thing.

@johnmyleswhite
Copy link
Member

My personal sense is that Julia does support an easy way already: symbol("a b") is exactly how hard it should be to create a non-standard identifier. I think of that syntax as a tax on creating the externalities generated by non-standard identifiers.

My sense about this issue is that we should have part of the discussion in DataFrames.

@tonyhffong
Copy link

I'm very much in agreement with what @johnmyleswhite said. It's a minor cognitive load commensurate to the reminder that one cannot use that symbol as if it's a field or a variable.

I like sym"a b" (to @JeffBezanson's point), as it makes show( s::Symbol ) more compact/pleasing for non-standard identifiers.

So,
+1 for sym"a b"
-1 for "backtick a b backtick"

@prcastro
Copy link
Contributor

If we changed the commands for something like cmd"", we would free the backtick for something like infix operators (like Haskell does). But don't know what are the implications of that.

@StefanKarpinski
Copy link
Member

I would propose making :"a b" the syntax for making a symbol from a strong literal instead. This is a pretty safe syntax to use since currently it means the same thing as without the colon, so why would you write it?

Using cmd"..." for commands is not very appealing because then you need to screw around with escaping " inside of shell commands. Currently, you can just cut and paste any commands from the shell and single and double quotes work the exact same way. Using a triple-quote form could help with that too, but it's kind of ugly.

If we were going to use backticks for anything else, I would consider expression quoting since that is how we quote expressions is markdown and emails anyway.

@alyst
Copy link
Contributor Author

alyst commented Feb 2, 2015

@StefanKarpinski +1 for :"a b"! I just wonder if, for the sake of generality, backquotes, instead of being bound to Cmd class, could become an alias to triple double quotes and any custom interpolation logic would be handled by Cmdstartme`` literals or inside run(cmd::AbstractString).

@tonyhffong
Copy link

Hang on. what does :( "a b" ) parse to?

@alyst
Copy link
Contributor Author

alyst commented Feb 2, 2015

@tonyhffong Recent 0.4-devel gives ASCIIString for typeof(:( "a b" )).

@JeffBezanson
Copy link
Member

I think we'd have to keep parsing :(" ") as a string, and give a quoted symbol only for :" ".

@JeffBezanson JeffBezanson changed the title Backtick quotes for symbols :" " syntax for non-standard quoted symbols? Feb 2, 2015
@JeffBezanson JeffBezanson added parser Language parsing and surface syntax feature needs decision A decision on this change is needed labels Feb 2, 2015
@tonyhffong
Copy link

Conceptually, we have now

:( a ) == :a  # A quote of a variable equals to that variable's symbol.
:"a b" == :( "a b" ) == "a b" # in both 0.3 and 0.4-dev. A quote of a string literal is just that string literal.

which kind of makes sense to me. Maybe it's a hobgoblin of little minds, but I do think it's nice that way.

@StefanKarpinski
Copy link
Member

It's consistent, but the former is useful while the latter is useless.

@toivoh
Copy link
Contributor

toivoh commented Mar 7, 2015

Late to the party here, but I really wouldn't feel comfortable with punning the quoting : operator like this. I feel that quoting is enough to wrap your head around already, and I don't see how avoiding the minor inconvenience of typing two more characters with sym"a b" justifies this.

@ScottPJones
Copy link
Contributor

+1 also for sym"a b"
+1 for cmd"a b" (I'd already thought of that independently... freeing up backtic would be somewhat breaking, but very nice to have free for better uses)... wouldn't cmd"""I have "quotes" inside me""" work also?

@tkelman tkelman removed this from the 0.6.0 milestone Jan 5, 2017
@JeffBezanson JeffBezanson modified the milestones: 2.0+, 1.0 May 2, 2017
@c42f
Copy link
Member

c42f commented Aug 17, 2019

While implementing #32408 I noticed the notation :"foo$x" already means something useful and isn't redundant with "foo$x" (in contrast to :"foo" vs "foo").

If we made :"foo" notation for Symbol("foo"), we'd presumably also want :"foo$x" to be Symbol("foo$x"). It's a useful pun on the quote operator, but a pun nonetheless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs decision A decision on this change is needed parser Language parsing and surface syntax
Projects
None yet
Development

No branches or pull requests