rust-analyzer reader-mode #8696

matklad · 2021-04-30T12:35:38Z

TL;DR: https://github.com/rust-dev-tools/cargo-src, but on top of rust-analyzer

Add a command, rust-analyzer reader-mode path/to/Cargo.toml, which produces a directory with a bunch of html files, which contain source for the project, a-la rustdoc output, but also have goto definition, hover, etc.

Interface Constraints:

Static site
Assume modern desktop browser
No external CSS, or CSS pre-processors, minimal hand-written CSS to get the job done.
No or minimal JavaScript
No build-in HTTP serever (but it's OK to try to auto-discover python3 -m http.server and such on usres' machine)
(Maybe) an option to compress everything down to a single html file which you can just open in the browser.

Implementation Constraints:

Implementatin lives in ide-reader-mode and is exposed via ide crate.
There's a transparent intermediate representation. The result of reader mode is not directly an HTML string, but something else. That something else is a set of annotated files, struct ReaderView. This view is independent of the hir or ide types, and is just a large chunk of data. The data includes two things, a list of files and a list of defs. Each file is a text plus a number of auxilary maps that describe ranges in the file. There should be a map for syntax hightlighting, a map for references and definitons, maybe a map for folding ranges (fold method bodies would be useful feature to have). Spans in files refer to defs by ID. Defs are untyped, and store their defining spans and a list of refs. Not sure if we need a separate ref list or just spans would be enough.
ReaderView might want to live in a separate crate, just to make sure it doesn't depend on rust-analyzer types
ReaderView has fn to_html(&self) -> HashMap<PathBuf, String> method, which renders it as a directory with htmls and csses. This should be completely independent of the rust-analyzer.

The text was updated successfully, but these errors were encountered:

edwin0cheng · 2021-04-30T12:46:54Z

There's a transparent intermediate representation ..

Should it store in LSIF ?

matklad · 2021-04-30T13:03:37Z

I think it’s better not to, it seems simpler to have a thing taylored specifically for rust and code browsing

DJMcNab · 2021-04-30T13:41:57Z

We probably should also have LSIF support, because my guess is the GitHub's code analysis engine which they use for providing support for go to definition in other languages uses.

matklad · 2021-04-30T13:42:55Z

I do think we should have LSIF, but I think that's orthogonal feature .

DJMcNab · 2021-04-30T13:44:24Z

Yeah, I think we're in agreement there. In theory, the LSIF dump would probably use a lot of similar infrastructure.

jonas-schievink · 2021-04-30T14:11:05Z

GitHub's code navigation uses https://github.com/github/semantic

flodiebold · 2021-04-30T14:58:16Z

Seems a bit of a weird choice for them to try to implement language analysis for a bunch of languages themselves 😕

DJMcNab · 2021-04-30T15:00:13Z

That would at least explain why it can be so inconsistent though, I guess.

matklad · 2021-04-30T16:15:19Z

This drifting oftopic, but I actually see reasons for them to do their own thing. They have a boatload of code, and they can't build it (both in terms of resources, and in terms of each project requiring a specific environment to build). In this scenario, building a distributed version on IntelliJ architecture makes sense. It would not be precise, but it would be robust.

If you have a boatload of code, but you can build it (because, for example, you are Google, so there's unified Bazel build system), then language-specific Kythe approach makes sense.

flodiebold · 2021-04-30T16:23:13Z

Spans in files refer to defs by ID. Defs are untyped, and store their defining spans and a list of refs.

Why have def IDs at all, and not just refer to everything by span?

flodiebold · 2021-04-30T16:26:51Z

a list of files

I imagine this might include virtual "macro expansion" files so we can navigate into those 🤔

matklad · 2021-04-30T16:33:25Z

Yeah, maaybe spans would be enough. I just thought that, as we are going to store lists of usages somewhere, storing them in defs would be cleaner. It might actually save some memory -- for span, you need to save file id and range, while id is u32. ids give you span interning in a sense, which is nice, because every def is used several times typically.

I guess specific details of intermediate repr are not importatn, as log as:

It's abstraction layer are text documents (not hir types)
it knows (almost) nothing about rust-analyzer (it's tempting to re-use minor things like hightlight tags)
it knows nothing about html

matklad · 2021-05-07T20:18:54Z

Has anyone wrote some code here? I feel like I might want to do this on the weekend, but I don't want to spoil the fun if someone is already working on it ) (not: I might totally drop the ball here myself)

benmkw · 2021-05-07T21:37:07Z

this seems very similar to rust-lang/rust#84176 and also related to #6570

matklad · 2021-05-10T10:40:01Z

@benmkw thanks for that link, that's exciting! I think rustdoc gaining support for goto def changes motivation here a bit. Mainly, it will relieve the pressure to ship "anything" asap, so instead we can concentrate more on exploration and experimentation, unencumbered by maintenance and backward compatibility concerns. I think the end goal we want is two tools (which might be surfaced via the same top-level command), one of which renders the docs (so it rearranges the items to make docs more readable), while the other renders source code. On the doc front, rustdoc is perferct. On the source code rendereng, I think we need a lot more exploration before we are ready to ship something stable, and I think exploration on top of rust-analyzer would be the easiest.

benmkw · 2021-05-10T10:57:35Z

while the other renders source code

to me this sounds kind of like vscode/ an editor

concentrate more on exploration

as I tried to describe in #6570 I think it would be nice to have easier access to rust source code for e.g. creating graphs of the flow of execution (callgraph) or maybe custom lints (clippy plugins are deprecated https://blog.rust-lang.org/inside-rust/2019/11/04/Clippy-removes-plugin-interface.html , I'm not sure what would be the current best practice to have per project lints, thats something Jai wants to push where people enforce certain things in their project by looking at the AST) ... maybe such an API could generalise to this extend. I'm thinking of very project specific rules which are not possible to be upstreamed. Something like clang tidy/ libtooling https://www.youtube.com/watch?v=rbbCgVQrjWs

matklad · 2021-05-10T11:05:52Z

Ok, I didn't went farther than outlining the general design in my head, here are some points:

The struct RearedView and to_html should actually live in a separate repository (rm crate), which does not depend on rust-analyzer. This should allow folks to iterate on the front-end parts without suffering compile times.
rm crate shouldn't be rust centric -- I imagine someone would like to re-use the front-end bits to build similar tools for other languages.
rm crate shouldn't use LSIF from the start (to improve iteration speed), but using LSIF as a source of data should be kept in mind.
rust-analyzer should depend on rm crate

The niche we are targeting is "casual source code browsing for medium-sized projects". If you want to really deeply understand the source code, you need to open it in an IDE. If you have a monorepo with loads of code, you probably want to have more integrated tooling (source browser + vcs + ide + surfacing coverage results). If you are on your couch with a tablet, got bored with Tolstoy and want to read some burntSushi instead, that's where reader mode will be useful.

matklad · 2021-05-10T11:12:55Z

Regarging the more general point of providing the base API which makes building such tooling easy outside of rust-analyzer, that's a laudable goal, but I don't see it coming in the next few years. We are still figuring this out. Like, a week ago I realised that we probably can do significantly better then our current approach: #8713.

Our meta-strategy here is thus:

do not attempt to build any kind of public API
do ship "end-user experience / tools"
use the experience from shipping features to shape the in-progress API better.

benmkw · 2021-05-10T12:07:52Z

shouldn't use LSIF from the start (to improve iteration speed), but using LSIF as a source of data should be kept in mind

I briefly looked at this https://github.com/sourcegraph/sourcegraph and it seems pretty close to what you have in mind but I see you already interacted with them so probably no news here

matklad · 2021-05-10T12:15:32Z

Yeah, SG is similar, but different. They target "monorepo with loads of code", hence:

Git
Go (1.15 or later)
Docker
PostgreSQL (v11 or higher)
Node.js (version 8 or 10)
Redis
Yarn

For this tool, I imagine a singe static binary which you can invoke from CI pipilene to publish a set of static documents.

HKalbasi · 2021-09-04T22:47:21Z

I made Sourcetree which is basically the equivalent of rm trait mentioned here. It can show source code of itself. It is buggy and not perfect (just two days old), but it works and has hover, go to definition and finding references.

Theoretically, it should work out of the box with any LSIF generator, but it is only tested against lsif-tsc. So if RA can generate a LSIF dump, problem of this issue would be solved.

@matklad Are you still against starting with LSIF? LSIF actually helped me in prototyping and I now have a working demo without writing a single line of LSP related things. And Sourcegraph can take benefit from a rust LSIF generator in addition to Sourcetree/RA-reader-mode. And LSIF is pretty close to LSP so it should be easy to emit LSIF in RA. And it is easily extendable so LSIF won't limit us, I'm ok to support non-standard LSIF features from RA, though we should prioritize supported features over non-standard ones.

matklad · 2021-09-06T14:28:46Z

I'd love the end-experience to be rust-analyzer reader-mode /path/to/project/Cargo.toml, that is, I think there's a value in this working out of the box in rust-analyzer. If this is a single binary, I don't think we should have IPC format (LSIF) involved here.

Ultimately, we want to binaries: rust-analyzer that can produce html, and reader-mode-standalone, which can consume LSIF and produce html:

We can glue rust-analyzer and reader-mode-standalone via LSIF, but that adds an extra, unnecessary intermediate language.

We should have LSIF output from rust-analyzer some day, but that's for making it interoperable, not for rendering source code in a nice way.

That said, I won't be against going through ra -> lsif -> rm route, I just think that architectualy it's suboptimal.

matklad · 2021-09-06T14:30:20Z

To clarify -- I imagine rm to be written in Rust and usable as a library -- we should stay within a single language ecosystem.

HKalbasi · 2021-09-06T18:46:36Z

I see reader-mode project out of scope of rust analyzer. It has some signs of it:

rm crate shouldn't be rust centric
rm crate does not depend on rust-analyzer (for compile times, but anyway)

RA doesn't have it's own code editor, but expose LSP so existing code editors like vscode and vim can use it. Like code editors and LSP, there is a class of code consumers which I call code viewers, and rm crate is a member of this class, (other examples are Sourcegraph, Github, rustdoc, ...) and as RA doesn't need it's own code editor (thought it is possible and it may have some benefits), it also doesn't need it's own code viewer.

If we decouple reader mode project from rust analyzer, staying within a single language ecosystem would not be necessary. Vscode is in TS, vim is in C, Sourcegraph is in go, and there is no problem. That being said, it maybe make sense to RIIR. If I want to compare rust and TS for reader-mode/sourcetree:

benefits of rust
- It can be a library. TS version can also be a library in npm, but rust version can be a library for any language, even npm.
- It can be a single static binary. It may harm portability, but it has benefit for common targets.
benefits of TS
- This code isn't performance sensitive because generating LSIF dumps would be bottleneck, so with TS there is no need for fighting with borrow checker, Rc, and this kind of overhead.
- Some amount of JS/TS is necessary for such a project, by choosing TS, it would become a single language and simpler codebase
- It can make instant feedback, watching and reloading in browser, but rust needs some time for compiling.

If you agree to decouple reader-mode from RA, this kind of discussion can be in the sourcetree repository. We can open #3098 and close this one.

Anyway, If you want a rust analyzer specific code viewer or IDE, no one can prevent you. RM and RA should be two separate projects is just my opinion.

matklad · 2021-09-06T19:42:01Z

To clarify, I don’t think any coupling between ra and rm is needed in either setup, I’d say such coupling would be harmful. In “rm as a library”, it would be rust-analyzer who depends on rm, and not the other way around.

To me personally, single static binary, and minimal JS/CSS feel defining for the domain I had in mind: simple out-of-the box experience which is just good enough, but also just works.

Naturally, we can go the more general way of “emit LSIF and let the user plug that into whatever”. That’ll work for more use-cases, but will sacrifice the “just works” bit.

HKalbasi · 2021-09-06T21:05:30Z

That out of the box experience can be achieved using a third tool (closer to RM than RA) that isn't tied to rust. That tool can be a bash script / static binary that do:

Detect language of the codebase
Find or install a lsif generator for that language
Find or install RM
Run RM against the lsif dump

In addition to multiple languages, it can support multiple tools on the other end as well. For example --static for using RM and --source-graph <endpoint> for uploading to sourcegraph.

Aside from such a tool, there are other ways to improve user experience, for example sourcegraph provides some github actions that makes using it even easier. RM can provide a similar action to sourcegraph-upload-action and use existing ones from sourcegraph for language. Relying on this is github-centeric but there can be other tools at the same time. To summarize, tooling can be improved to infinity on demand.

And if I understand correctly, desktop users won't use RM/sourcetree normally because they can fire up an IDE. So bundling RM with RA would just harm majority of users (small harm, but anyway).

Milo123459 · 2021-09-21T20:33:03Z

Maybe this could work in the browser? With github.dev? Not sure

HKalbasi · 2021-09-22T08:09:29Z

I don't know about github.dev but github1s.com is based on sourcegraph so it can use lsif.

bjorn3 · 2021-09-22T08:36:45Z

For github.dev using rust-analyzer compiled to wasm makes more sense as github.dev allows you to edit the code, not just look at it.

eminence · 2021-09-22T14:28:57Z

How much of rust-analyzer's analysis could be done purely in the browser, and how much requires using cargo to download dependencies?

10181: Begining of lsif r=HKalbasi a=HKalbasi This PR adds a `lsif` command to cli, which can be used as `rust-analyzer lsif /path/to/project > dump.lsif`. It now generates a valid, but pretty useless lsif (only supports folding ranges). The propose of this PR is to discussing about the structure of lsif generator, before starting anything serious. cc `@matklad` #8696 #3098 Co-authored-by: hamidreza kalbasi <[email protected]>

jyn514 · 2021-12-27T02:31:25Z

I think rustdoc gaining support for goto def changes motivation here a bit.

FYI this is not a stable feature and I would actually much rather see it in RA than in rustdoc. I've discussed this with @GuillaumeGomez and we're not planning to make further changes to the rustdoc side before there's an RFC for the feature.

GuillaumeGomez · 2021-12-27T11:13:46Z

To be noted that even if the RFC is accepted, it's quite limited. RA could provide locations where an item is used for example, which we don't plan to add.

HKalbasi · 2021-12-27T14:27:04Z

An option for changing rustdoc src links (for example to github repository) can be an alternative RFC. Among other benefits (like viewing git history) they have some code intelligence features (probably more powerful than rustdoc). Gitlab and sourcegraph consume LSIF, so they can use RA, and github has its own. And if there becomes a RA specific thing, it can be integrated with rustdoc with that configuration as well.

bjorn3 · 2021-12-27T14:31:25Z

An option for changing rustdoc src links (for example to github repository) can be an alternative RFC.

How would you make sure that the same commit is used? How will it handle crates that have been uploaded with changes? What if the original repo or the specific commit gets deleted?

HKalbasi · 2021-12-27T17:13:08Z

In terms of security? It allows the malicious crate owner to fake src links in docs. Crate owner already can do bad things and there is other ways to see the actual source of crate, but I understand if it is not ok.

In terms of usability, deleted commits and moved repositories can lead to dead links, so crate owners should be able to change them for past versions if they care, so such an option should be a crate-wide setting and/or cli parameters and not for example attributes in source code.

bjorn3 · 2021-12-27T17:37:09Z

In terms of security? It allows the malicious crate owner to fake src links in docs. Crate owner already can do bad things and there is other ways to see the actual source of crate, but I understand if it is not ok.

When I click on the src button I expect to be presented a rustdoc generated source page even if it doesn't contain the correct source code. I don't expect to end up on an attacker controlled website that may exploit a browser bug. If the original documentation page contains malicious code, it would be auditable on docs.rs and be removable by the people operating docs.rs. An attacker controlled website may only present the malicious code for some people and can't be taken down by docs.rs. Also while docs.rs currently allows documentation to contain arbitrary javascript, this may be restricted in the future. Because of this I think this makes this existing security issue worse.

In terms of usability, deleted commits and moved repositories can lead to dead links, so crate owners should be able to change them for past versions if they care, so such an option should be a crate-wide setting and/or cli parameters and not for example attributes in source code.

On docs.rs it wouldn't be possible to retroactively change this option, so broken links can't be fixed again.

jyn514 · 2021-12-27T17:41:53Z

If the original documentation page contains malicious code, it would be auditable on docs.rs and be removable by the people operating docs.rs. An attacker controlled website may only present the malicious code for some people and can't be taken down by docs.rs.

If documentation contains links to a malicious site, we can also take down the documentation, it's not limited to the source HTML. Admittedly there's a risk that links will linkrot and start pointing to a malicious site, but that's not specific to source files.

matklad added good first issue E-hard fun A technically challenging issue with high impact S-actionable Someone could pick this issue up and work on it right now labels Apr 30, 2021

benmkw mentioned this issue Aug 31, 2021

RFC: syntax tree patterns rust-lang/rust-clippy#3875

Merged

HKalbasi mentioned this issue Sep 8, 2021

feat: Begining of lsif #10181

Merged

olafurpg mentioned this issue Oct 14, 2021

Add auto-inference for Rust sourcegraph/sourcegraph-public-snapshot#26086

Closed

rust-analyzer reader-mode #8696

rust-analyzer reader-mode #8696

Comments

matklad commented Apr 30, 2021

edwin0cheng commented Apr 30, 2021 • edited Loading

matklad commented Apr 30, 2021

DJMcNab commented Apr 30, 2021

matklad commented Apr 30, 2021

DJMcNab commented Apr 30, 2021

jonas-schievink commented Apr 30, 2021

flodiebold commented Apr 30, 2021

DJMcNab commented Apr 30, 2021

matklad commented Apr 30, 2021 • edited Loading

flodiebold commented Apr 30, 2021

flodiebold commented Apr 30, 2021

matklad commented Apr 30, 2021

matklad commented May 7, 2021

benmkw commented May 7, 2021

matklad commented May 10, 2021

benmkw commented May 10, 2021

matklad commented May 10, 2021

matklad commented May 10, 2021

benmkw commented May 10, 2021

matklad commented May 10, 2021

HKalbasi commented Sep 4, 2021

matklad commented Sep 6, 2021

matklad commented Sep 6, 2021

HKalbasi commented Sep 6, 2021

matklad commented Sep 6, 2021

HKalbasi commented Sep 6, 2021

Milo123459 commented Sep 21, 2021

HKalbasi commented Sep 22, 2021

bjorn3 commented Sep 22, 2021

eminence commented Sep 22, 2021

jyn514 commented Dec 27, 2021 • edited Loading

GuillaumeGomez commented Dec 27, 2021

HKalbasi commented Dec 27, 2021

bjorn3 commented Dec 27, 2021

HKalbasi commented Dec 27, 2021

bjorn3 commented Dec 27, 2021

jyn514 commented Dec 27, 2021

edwin0cheng commented Apr 30, 2021 •

edited

Loading

matklad commented Apr 30, 2021 •

edited

Loading

jyn514 commented Dec 27, 2021 •

edited

Loading