-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Scopes for module/namespace access #1842
Comments
I completely agree with point 1. |
For method or static function calls on objects/classes, do we treat those as namespaces as well? There is no way to differentiate those from modules in most languages. |
I did not directly think of static class member access, but this is a very good question though. Basically a Furthermore everything most languages call I personally like the idea of using In general it might be a question of the languages concept of whether we use Examples:
I just tend to suggest to limit scope names to The question keeps open what to use, if no decision can be made. Scope names would look like:
|
Another question: when importing a module that is sourced from a file with the same name (and not explicitly specified), would the import statement be a usage or a definition of the namespace? |
I think this depends on the perspective. Importing a module means to declare it in order to be able to use it and its members. It can be compared to declare/define a variable or a function. From the global point of view, an import is a usage of an existing module. I tend to prefer the local perspective. See python: import os # os <- entity.name.namespace
# os <- variable.other.namespace
os.getcwd() The usage of The interesting question here is - how about from os import getcwd # <- os = usage or definition?
getcwd() or from os import path # <- os = usage, path = definition?
path.join() |
The topic of scoping qualifiers is not well addressed currently, so I'd like to come up with something to move forward. Currently run into this issue in lots of places (like #737, where I was just working). I think part of the issue is that sometimes we know looking at the code that a qualifier is a class name, or a namespace, etc. However, it isn't always clear when highlighting which we are currently dealing with, and for some it is impossible. Because of this, and the fact that we need have many different syntaxes with different nuances, I think we need to come up with a somewhat generic scope that can be applied to the identifier qualifiers in a "path". This could be for a function call, a type name, an inherited class name, an XML namespace. Previously I think we had thrown around the idea of a new top-level scope, such as
Currently I would imagine that most users wouldn't want these too heavily colored. Additionally, a series of Thoughts? |
I think the most obvious alternative to
This keeps most such identifiers in syntaxes under |
It sounds like we're talking about the following four types of constructs:
Here, the whole construct should get a meta scope,
This deserves more examples, because import syntax varies greatly between languages. I think that we can come up with an answer generic enough for general use. Taking the broad view, import os
^^ FOO entity.name.something
from os import getcwd
^^ FOO
^^^^^^ entity.name.something
My biggest concern is with (3). In many languages, namespaces or modules are first-class values and a name representing a namespace is used in the same way as any other name. For instance, in Python, In such languages, the only way to try to highlight references to a namespace is to guess based on the name of the variable. This is a bad idea, because the guess would often be wrong, and users get annoyed when some identifiers are colored differently seemingly at random. In some languages, namespace references are used with special syntax. For instance, in C++, a namespace reference may be followed by the scope resolution operator We should keep in mind that we can't scope these things reliably. In
Most of the time in most languages, a dotted path like We can try to scope the entire path, but to be honest I've never liked doing this. It's fundamentally unreliable and annoying to implement, and even in real-world use a typical file would likely see a lot of misses. Plus, I don't really see the motivation: why have a special meta scope for Other thoughts:
|
The initial post doesn't say something about metas. That said, the question just is: Which scope to use for the declaration of a namespace vs. the usage of it? The scoping guideline says: With regards to
This issue was not raised to propose adding much guess work to syntaxes. It just tries to find answers for situations / languages, which allow to identifiy namespaces as different scopes for same things were found in already existing syntax definitions. If something can't be identified reliably, an as general as possible scope should be applied. |
This topic is huge. I apologize for a wall of text, but this touches a very fundamental concept of how definitions and references are applied (so it very much relates to #1861). It seems the question is three-way.
Usually it cannot be decided for 3. whether a namespace, a class, some random object or whatever is being accessed, so I'll hold off on that for now. For 1., I believe For 2., I second @deathaxe's opinion in that an import that assigns something to an identifier becomes a declaration and should then be scoped as However, not all imports also assign to a specific identifier. Some imports work on a literal basis and behave as if the referenced file was imported verbatim into the current file (barring preprocessor checks to prevent re-importing), e.g. Now for the important part: How do we scope qualifiers and usages of variables or identifiers in general? First, I believe we should clarify on terminology.
Unless I am mistaken, we can use these concepts to represent most, if not all, currently used patterns in programming languages. Going forward, I conclude that we have to answer the following questions:
I'll proceed with answering these question by myself, but I'm interested in your opinions.
Currently, the Talking about declarations or definitions of identifiers, i.e. where they are usually used without the presence of qualifiers, is handled by The second part of the question considers identifiers that are not the last segment of a path and where we usually don't know much. Encountering them at any place in a language like Python could mean anything from a namespace (module/class?), a type (class), a function or any other object data reference. However, we may be able to guess at what the identifier references based on naming conventions for constants or (built-in) types. To conclude, yes, we should scope each identifier to the best of our abilities. If we cannot tell what an identifier represents (at its position in a qualifier), choose a generic scope. To make things easier, all identifiers should get this generic scope and only for those whose meaning we can decipher feasibly we add another scope.
For identifiers we currently have However, the scope for where we "don't know" is crutial. JavaScript currently scopes these as I think choosing
For Python, most of this is already finished and I just need to exchange scope names. For other languages, you'll most likely end up with a similar context layout where you need to scope path segments and accessors individually anyway. The largest effort, as always, falls down to establishing the updated scoping guidelines and color schemes together with user perception, although we hardly have any backwards-incompatible changes in here. I hope you've been following me until this point, although I hope I structured it decently enough despite writing on it for an hour or so. Did I miss a certain aspect? Does a language you know not fall into the raster/structure I imagined? Do you think non-first-level identifiers in a qualifier should get a |
Your thoughts parallel my own. In particular, I agree that we have to address the general problem in a general way. I do think that In some cases, it may make sense to further specialize these scopes in a purely additive fashion and on a best-effort basis. For example: foo
# <- variable.other
foo()
# <- variable.other.function
foo.bar
# ^ variable.member
foo.bar()
# ^ variable.member.function All that said, I'm not convinced that it's worthwhile to scope “paths” in the general case. An example in Python: class Foo(collections.abc.Sequence):
... There's an argument for scoping the path In this example, it might make sense to use a special additional scope for |
I wrote up a rough draft based on the discussion. Comments welcomed.
|
I agree with everything I don't comment on. Definitions
Yes and see below. Add
Keep as is, for now. Most recently we agreed on using
Yes. I wonder how to call it, though. Usually it's just a function-local variable, so func add(string x: String) {/* variable name is `s` inside body */}
func add(_ x: Int) {/* variable name is `x` inside body */}
add(2)
add(string: "12") In the string example, Thus I suggest that if a name is (also) the external name, use I like
Yes. ReferencesI prefer Differentiating between func()
# <- variable.function
foo.func()
# <- variable.other
# ^ meta.member variable.function
foo.bar.func()
# <- variable.other
# ^ meta.member variable.member
# ^ meta.member variable.function
Everything that we would classify as More questions:
|
I've been using the model of In this model, “other” isn't just a catch-all, but a category in its own right, so the “other” in To find all lexical variables in a JS file, we could select
sublimehq/sublime_text#2619 would solve this (and a host of other issues), though (alas) we have to plan around the features we have.
The idea is that if a color scheme or tool selects (say)
I'm really starting to dislike
Nah, explicit declarations only. In (say) Python, I'd rather say that there are no variable declarations than that every assignment is a declaration. (Function formal parameters, imports, and other constructs would still explicitly declare names.)
No strong opinion on
Ugh. I don't like |
I can get behind this, but I also have a problem with specifically using the name I thougth about reversing the relation so that we the suggested sub-scopes of We can use
Yeah, we can't really do that with your proposal but also don't need to. The
So, do we want to pretend they are different vocabulary when used in statically typed languages (or in Python type hints) or do we not? In my opinion, tokens affecting storage type and kind in languages like C or even JavaScript are significant enough to warrant their own treatment. In statically typed languages, they are a different token entirely and should thus not be
I'm undecided on whether Let's take a look at what this would imply. The following is a tree of "your suggestion" (I'll skip naming the concepts):
This is "my suggestion":
Downsides of your suggestion:
Downsides of my suggestion:
It fundamentally depends on how you weight these. I consider the first downside of my suggestion to be the most significant. Since your suggestion is more breaking than mine, I took a brief look at how breaking it would be. The following is how the variable scope is used currently (as suggested by PackageDev, collected empirically):
We didn't talk about
I found
Thus, I conclude that except for Some statistics on `variable` usage to work with
It's getting pretty late over here and I should've been doing something entirely different, but this problem in particular takes a lot of consideration and I always end up working on multiple parts of my post at the same time or in short succession. Hopefully I didn't mix things up too much. I would be very interested in other opinions. Besides us two, nobody has commented on the Big Picture Discussion so far. |
Agreed.
I think that, where types are concerned, languages generally fit into three categories.
In C, a For languages in category (2), An alternate approach would be to use val x: Int
// ^^^ storage.type.primitive And in Python: x: int
# ^^^ variable.other storage.type
v: sublime.View
# ^^^^^^^ variable.other
# ^^^^ variable.member storage.type
Generally agreed as to what the tradeoffs are. In my mind, the biggest advantage to my suggestion is unifying I hadn't considered using a meta scope. It almost seems like a hack to get around the lack of more powerful selectors, but we work with the system we have. Because these scopes should never cover more than one token, it should be safe to select
Also, I'm starting to become skeptical of
|
First and third look good. I don't have experience with VB or Scala, so I can't realy comment on your assessment regarding the second category. My hunch would be to only use x: int
# ^^^ variable.other storage.type support.type
# (specifically not `variable.other.type` due to the naming being unconventional) The good part is that you can highlight built-in types used in a declaration or function annotation easily with Note that we decided to use
Yes, this is a hack because we'd mask the same scope with a meta to differentiate between the two syntactically different usages but with a simpler selector. It still requires specifically excluding the meta scope when you want to target the non-member variant, so it's not an improvement over In my suggestion, Either way, the more I think about it, the more I like your grouping suggestion, although I already preferred it yesterday. Get approximate matching into core and I'm entirely sold. just match for
In your or my suggestion? Assuming yours. Type (3) languages, or Python in particular, wouldn't use
A notable concern with all this talk is that we might be overwhelming color scheme authors, although we don't exactly use rocket science here. Most useful selectors don't exceed two stacked scopes while maintaining lexical accuracy for more complex selectors or tools to work with. Maybe a compilation of standard scope coverage or common colorization efforts using the proposed schema would be useful. Any other unanswered questions so far? |
I concur with the example.
Given the scope
I think I wrote confusingly. I'm skeptical of the scope that in my suggestion would be
In a category (3) language like Python, a type name in an annotation might be If we remove For comparison, By contrast, I'm not set against
Not that I can think of. |
Only in situations where they are used in a declaration, i.e. variable type hints and function annotations. Here's what I had in mind: x: typing.Option[Abc] = Abc(2.2)
# <- entity.name.variable?
#^^^^^^^^^^^^^ meta.* (probably)
# ^^^^^^ variable.*
# ^^^^^^ variable.member.type storage.type support.type
# ^^^ variable.*.type storage.type
# ^^^ variable.*.type - storage This translates to C++ just fine: Type *a = new Type();
// <- storage.type
// ^ entity.name.variable?
// ^^^^ variable.type - storage Here's another example where isinstance(x, MyClass)
# ^^^^^^^ variable.other.type Tl;dr: use While that seems redundant, since I doubt you'll colorize one of these differen than the other, it stays true to the grouping in For inheritance in Python, a simple look-ahead to check for a simple type to be scoped as Edit: Actually, I just noticed a problem. What if the type in C++ is defined as a member, e.g. |
I think I'm convinced. For one thing, if we didn't scope
Wouldn't we use
I think I'm a little confused as the meaning of
|
Not always, as with my suggestion we wouldn't be using
Yes, this assessment is correct. It's in a weird compatibility limbo with being used extensively historically for A potential less awkward solution would be to always use |
I think I get it now: It may not be completely redundant though. In C#, In fact, many C# "type names" are actually keywords: |
Interesting case. Java also has primitive types like I initially wanted to say if they behave exactly like a type but are in fact an alias, that still qualifies as an identifier being used as a type (and not a keyword being replaced with a type). But they aren't in user space because they are reserved keywords and may never be used as the name for a custom type. I suppsoe in that situation, Do you have an opinion on |
Not final due to sublimehq#1842 being unresolved, but still an improvement.
This is late to the game, but as a user of scoping definitions, any time I run into My own corner of the world is VHDL, a strongly and statically typed language. There are no types in the LRM reserved words, however the standard library (which might as well be considered part of the language as it doesn't even really need to be declared) defines boolean, integer, bit, character, real, and so forth. As a result, I end up scoping these as Due to the concurrency of hardware, the language also has multiple things similar to "variables" the difference being on when a value assigned to them take effect (immediately lexically, or driven at a later resolution point). I find great value in using
And so forth. Here though, that Later on I might declare a signal with that type:
Again, that Anyhow, I'm not sure how this factors into the discussionn other than to try to throw in one of the stranger tributaries of language scoping, and maybe that'll aid definition. (I also wish that |
@Remillard Check out #1861 It might help. |
The I personally find
Basically IIRC, I did so for several data types or functions in Perl. I just scoped them as
I wouldn't recommend
This is basically the situaiton we are faced to with Even though a
Same here:
It is currently not clear what the best choice for scoping user defined complex datatypes like your I'd prefer the solution suggested in the two examples above |
Well I think a single selector As for Feel like this is attempting to define a generic meta-language for writing languages, what concepts they embody, and how they are used! Very tricky to cover all sorts of languages with equal notions. |
* [Haskell] Rewrite operator matching - Use variables - Highlight '*' (and combinations) as operator - Add punctuation scope to infix notation - Scope non-infix notation as `keyword.operator` * [Haskell] Update keyword matching - Add proper scopes to control keywords - Add proper scopes to declarations - Use proper scope names for entities in declarations * [Haskell] Restructure with contexts * [Haskell] Remove usages of double quoted scalars * [Haskell] Simplify string matches Also highlight superfluous characters. * [Haskell] Reduce max line length * [Haskell] Match groups * [Haskell] Adjust scopes for imports Not final due to #1842 being unresolved, but still an improvement. * [Haskell] Match lists * [Haskell] Correctly match idents with trailing ' * [Haskell] More gracious infix operator matching * [Haskell] Adjust keyword scopes to recent standards * [Haskell] match OPTIONS_HADDOCK Same as https://github.com/sublimehq/Packages/pull/2270/files * [Haskell] match deriving (..) via (..) Same as https://github.com/sublimehq/Packages/pull/2271/files * [Haskell] match @ and # in keyword.operator.haskell Same as - https://github.com/sublimehq/Packages/pull/2272/files - https://github.com/sublimehq/Packages/pull/2273/files Co-Authored-By: Nikos Baxevanis <[email protected]> * [Haskell] match deriving instance (..) * [Haskell] Match functions from the prelude Based on https://github.com/atom-haskell/language-haskell/blob/e036e449909816e616b880157e2703e70fc9b5df/grammars/haskell.cson#L1306-L1307 Co-Authored-By: Nikos Baxevanis <[email protected]> * [Haskell] Add tests for `via` derives * [Haskell] match deriving instance (..) without breaking data deriving This fixes a bug introduced via 0d36dd1 Co-authored-by: Nikos Baxevanis <[email protected]>
I'd like to step away from specific scopes and classify the problem. Apologies for the infodump. From a compiler's perspective, identifiers can be divided into:
Types belong to the open set, because most languages let you define them. Built-in types, constants, and functions belong to the open set of identifiers. In some languages, like Go, they're merely predeclared, not reserved, and can be redefined. Scoping built-ins as built-in is optional. It's worth extending the definition of closed-set identifiers to special symbols like Closed-set keywords usually have special syntax. Open-set identifiers usually don't, with the exception of custom operators and macros. See below. Another, orthogonal, classification:
In languages with custom operators, such as Haskell, user-defined operators like Closed-set sub-classification:
Open-set sub-classification:
Open-set "name being used" sub-classification:
Sidenote. As far as I can tell, in C, C++, and other languages where functions are defined with Whew! I hope this makes sense. The above was objective. Now for my subjective conclusions. For me personally, the most important information is the role of the identifier in the current context. Important role 1: whether it controls syntax. Special keywords, operators, and punctuation define a syntax structure with "holes" where you can plug the non-special words from the "open set". For this reason, scoping these two roles differently is most important. The simplest approach is As noted earlier, some languages have custom operators which belong to the "open set" of identifiers, yet involve special syntax such as prefix or infix, distinct from normal function calls. I believe these should be treated as keywords, since syntactic structure is more important whether something is "well known". Important role 2: declaration or merely usage. Declarations are used for symbol navigation. From my perspective, declarations of root-level functions, types, variables, and constants, are all equally important, and symbol search for all of them is useful in practice. For this reason, there should be one scope for declared names that should be indexed (currently Important role 2.1: declaration keyword or regular keyword. Traditionally, declaration keywords have been scoped as Important role 3: storage properties of a value. Its memory layout, numeric or structured, available fields, constant or mutable, reference or value. Traditionally this has been Traditionally, some syntaxes scope certain types as "classes", and many color schemes give them special colors. This never made sense to me. In languages with classes, for all intents and purposes they're types, and should receive no special treatment. Syntactically, it's usually impossible to distinguish. A type can be used:
A type's role as a storage modifier is entirely unrelated to its role as a value or namespace. I believe they should be scoped and colored differently. In many languages it's already impossible to detect whether part of a namespace is a type or a package name. The same applies to using them as values. For this reason, I believe we should scope types as Important role 4: call or value. Identifiers "called" as functions, methods, or macros have a special semantic role, and need a generic scope. The current standard is It should be noted that given the same function name, calling it and passing it as a value are entirely different roles. Even if the syntax could unambuguously (pun intended) detect that the given value is a function, I want calls and values scoped and colored differently. There are more conclusions to draw, but I ran out of steam and must return to work. This is already much to absorb. I apologize and hope that this is useful to the discussion. |
It might be helpful to note that we aren't working in a vacuum. We aren't going to break backwards compatibility of syntaxes and themes. Changing how we scope keywords isn't going to change. Part of the reason there has been no movement on this issue is:
We have requests that run the gamut from "every identifier should be scoped as a Overall if 50%+ of tokens in a source file are the same color, does it matter if they are the foreground, or another color? Or in other words, if everything is special, is nothing special? Unfortunately I don't have time at the moment to devote to getting this unstuck, but I am hoping to during the next dev cycle. |
Of course. I'm all for compatibility. There isn't much to gain, and much to lose, by breaking the existing conventions. But I feel it would be useful to rebuild our mental model for this, figure out the consensus on how it "should" be in a vacuum, then see how existing syntaxes and color schemes can be nudged there with least blood. |
@mitranim this is a very good breakdown, thanks for that.
By this you are referring to the type of a variable declaration, correct? |
Was referring to the difference in the role of
And this:
|
* [Haskell] Rewrite operator matching - Use variables - Highlight '*' (and combinations) as operator - Add punctuation scope to infix notation - Scope non-infix notation as `keyword.operator` * [Haskell] Update keyword matching - Add proper scopes to control keywords - Add proper scopes to declarations - Use proper scope names for entities in declarations * [Haskell] Restructure with contexts * [Haskell] Remove usages of double quoted scalars * [Haskell] Simplify string matches Also highlight superfluous characters. * [Haskell] Reduce max line length * [Haskell] Match groups * [Haskell] Adjust scopes for imports Not final due to sublimehq#1842 being unresolved, but still an improvement. * [Haskell] Match lists * [Haskell] Correctly match idents with trailing ' * [Haskell] More gracious infix operator matching * [Haskell] Adjust keyword scopes to recent standards * [Haskell] match OPTIONS_HADDOCK Same as https://github.com/sublimehq/Packages/pull/2270/files * [Haskell] match deriving (..) via (..) Same as https://github.com/sublimehq/Packages/pull/2271/files * [Haskell] match @ and # in keyword.operator.haskell Same as - https://github.com/sublimehq/Packages/pull/2272/files - https://github.com/sublimehq/Packages/pull/2273/files Co-Authored-By: Nikos Baxevanis <[email protected]> * [Haskell] match deriving instance (..) * [Haskell] Match functions from the prelude Based on https://github.com/atom-haskell/language-haskell/blob/e036e449909816e616b880157e2703e70fc9b5df/grammars/haskell.cson#L1306-L1307 Co-Authored-By: Nikos Baxevanis <[email protected]> * [Haskell] Add tests for `via` derives * [Haskell] match deriving instance (..) without breaking data deriving This fixes a bug introduced via 0d36dd1 Co-authored-by: Nikos Baxevanis <[email protected]>
With another roundtrip looking for meaningful common name qualifiers, I came up with two solutions, which take into account predefined and user defined namespace variables.
C#C# has a predefined C++It may even make sense to scope variable PHPPHP knows about special namespace variables for late bindings such as PythonSame applies to Thoughts? |
Hi, I came here from an issue/suggestion that I opened, #3676. After reading this RFC, like others said doing a generic meta language that could attend multiple languages and communities ideas seems complex. Some will like to highlight all functions same color others may want different depending on role. Or you should categorize as keyword or storage... performance and it is not about color syntax only. I will speak more about my experience trying to color syntax to get like:
It is an approach that I see mostly on GitHub, docs... I tend to prefer it today. But it seems difficult to achieve unless customize syntax, mostly classes and first member path. Usually they are in more generic scopes or only possible to color whole path. Default color schemes opt to approach different, which I respect and understand. Celeste is different, seems to use a random way to color somethings based on two defined colors. I feel that if these ideas that you talked here could happen will help on this case. Maybe support more approach/ideas. I saw python examples in these RFC and deathaxe initial post where he mention I am posting few examples that illustrate what I try to achieve in st but could not unless customize syntax. class Foo
Foo() from test import Foo, bar import { Foo, bar } from './test.js' process.stdout.write() /// builder() is `meta.function meta.block`, while the others functions are `variable.function`
let req = Request::builder()
.method(Method::POST)
.uri(URL)
.header(header::CONTENT_TYPE, "application/json")
.body(POST_DATA.into())
.unwrap(); |
Intro
In general a trend/principle in syntax definitions can be found which ends up with scoping the declaration/definition of constructs with
entity.name.<construct>
. When calling/using such constructs, something likevariable.<construct>
orvariable.other.<construct>
is used.The most popular example is
entity.name.function
vs.variable.function
.The goal is clear - distinguish definition and usage of the same object.
Question
How are
namespaces
ormodules
to handle in that manner?A couple of syntaxes including C, C++, Python, PHP, Java, JavaScript, Erlang, ... support such concepts. Most of them use
entity.name.namespace
to scope the identifier in the definition/declaration statement as the ST3 documentation at https://www.sublimetext.com/docs/3/scope_naming.html says:But I can't find a common solution how to scope a namespace/module upon usage.
support.namespace.cs
andvariable.other.namespace.cs
entity.other.namespace-prefix.css
orentity.name.namespace.wildcard.css
entity.name.type.class.module.erlang
meta.generic-name
upon usage ormeta.import-name
in import statements.Can we find a common scope for that usage?
From my point of view anything starting with
entity.
is a no-go when we talk about usage.The most pleasant approach with respect of existing scoping guidelines and implementations seems to be
variable.other.namespace
. So we'd end up inentity.name.namespace
vs.variable.other.namespace
To keep up with the concept of function declaration and usage, I also could imagine to use
variable.namespace
. So we'd end up inentity.name.namespace
vs.variable.namespace
Thoughts?
The text was updated successfully, but these errors were encountered: