Skip to content
This repository has been archived by the owner on Jul 27, 2023. It is now read-only.

Lex Jupyter line magic with Mode::Jupyter #23

Merged
merged 18 commits into from
Jul 18, 2023

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Jul 5, 2023

Lex Jupyter line magic with Mode::Jupyter

This PR adds a new token MagicCommand1 which the lexer will recognize when in Mode::Jupyter. The rules for the lexer is as follows:

  1. Given that we are at the start of line, skip the indentation and look for characters that represent the start of a magic command, determine the magic kind and capture all the characters following it as the command string.

  2. If the command extends multiple lines, the lexer will skip the line continuation character (\) but only if it's followed by a newline (\n or \r). The reason to skip this only in case of newline is because they can occur in the command string which we should not skip:

    //        Skip this backslash
    //        v
    //   !pwd \
    //      && ls -a | sed 's/^/\\    /'
    //                          ^^
    //                          Don't skip these backslashes
  3. The parser, when in Mode::Jupyter, will filter these tokens before the parsing begins. There is a small caveat when the magic command is indented. In the following example, when the parser filters out magic command, it'll throw an indentation error:

    for i in range(5):
    	!ls
    
    # What the parser will see
    for i in range(5):

Footnotes

  1. I would prefer to have some other name as this not only represent a line magic (%) but also shell command (!), help command (?) and others. In original implementation, it's named as "IPython Syntax"

@dhruvmanila
Copy link
Member Author

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@dhruvmanila dhruvmanila force-pushed the dhruv/jupyter-magic-syntax branch from 7509cd4 to 94fafb5 Compare July 10, 2023 04:58
@dhruvmanila dhruvmanila marked this pull request as ready for review July 14, 2023 05:56
@dhruvmanila dhruvmanila requested a review from MichaReiser July 14, 2023 05:56
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

I'm a bit concerned about how we want to disambiguate magics from modulo operators that split over multiple lines (something our formatter will create). Let's add some test cases for module operators to see if we handle them correctly

Same for / and even comma

# This is a tuple
(
	a
	,
)

# This is a division operation
(
	a
	/
	b
)

core/src/mode.rs Outdated Show resolved Hide resolved
parser/src/lexer.rs Outdated Show resolved Hide resolved
parser/src/lexer.rs Show resolved Hide resolved
parser/src/lexer.rs Outdated Show resolved Hide resolved
parser/src/lexer.rs Outdated Show resolved Hide resolved
parser/src/lexer.rs Outdated Show resolved Hide resolved
parser/src/lexer.rs Outdated Show resolved Hide resolved
parser/src/lexer.rs Outdated Show resolved Hide resolved
# Indented magic
for a in range(5):
%ls
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some cases that look like magics but are none:

a % 5

(
	a
	% 
	5
)

%

a \
	% 5

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a % 5

(
	a
	% 
	5
)

%  # [*]

a \
	% 5

They should all work assuming that for the marked ([*]) line, the % sign is a standalone (not part of another expression). It isn't as Python throws a Syntax error on that line and the tokenizer gives the MagicCommand token only for that line.

Copy link
Member

@MichaReiser MichaReiser Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, cool! Does this work because we only call into lex_indentation when the lexr is not in a parenthesized expression?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's correct.

Comment on lines +1499 to +1500
#[cfg(feature = "full-lexer")]
Tok::NonLogicalNewline,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the correct way to do this. It seems the tests in CI are not run using --all-features flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the easiest is to remove the feature flag. We always use full-lexer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the CI doesn't use it:

CARGO_ARGS: --no-default-features --features stdlib,zlib,importlib,encodings,ssl,jit

Or, it uses it only for certain steps:

run: cargo clippy --all --features malachite-bigint,full-lexer,serde -- -Dwarnings

@dhruvmanila dhruvmanila requested a review from MichaReiser July 15, 2023 15:07
/// [line magics]: https://ipython.readthedocs.io/en/stable/interactive/magics.html#line-magics
/// [Dynamic object information]: https://ipython.readthedocs.io/en/stable/interactive/reference.html#dynamic-object-information
/// [System shell access]: https://ipython.readthedocs.io/en/stable/interactive/reference.html#system-shell-access
/// [Automatic parentheses and quotes]: https://ipython.readthedocs.io/en/stable/interactive/reference.html#automatic-parentheses-and-quotes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This comment is excellent!

parser/src/lexer.rs Show resolved Hide resolved
parser/src/lexer.rs Outdated Show resolved Hide resolved
Comment on lines +1499 to +1500
#[cfg(feature = "full-lexer")]
Tok::NonLogicalNewline,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the easiest is to remove the feature flag. We always use full-lexer.

kind: MagicKind::Magic,
},
#[cfg(feature = "full-lexer")]
Tok::NonLogicalNewline,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning that this is a non logical newline. Isn't it a logical newline, because it terminates a statement?

Can we add a test where a magic command uses an invalid indent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning that this is a non logical newline. Isn't it a logical newline, because it terminates a statement?

My main reason being that as these tokens are filtered out before parsing, they should end with a NonLogicalNewline as otherwise there'll be multiple newline tokens i.e., multiple blank lines. This is similar to the Comment token.

Although, now that I think of it, without the full-lexer feature (but with Mode::Jupyter), the NonLogicalNewline won't be filtered out while the MagicCommand token will be.

Can we add a test where a magic command uses an invalid indent.

Do you mean something like the following?

for i in range(10):
    print('hello')
   !pwd

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean something like the following?

Yes, exactly. Sorry, I should have provided an example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've decided to move ahead with the current implementation but will have to consider indentation once the parser is updated to account for any indentation error.

Comment on lines +412 to +414
/// When in [`Mode::Jupyter`], this will filter out all the Jupyter magic commands
/// before parsing the tokens.
///
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this temporary or the final solution? I assumed that our plan is to introduce a new MagicCommand ast node.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will be adding a new AST node for this (probably 2).

@MichaReiser
Copy link
Member

Very well done!

@dhruvmanila dhruvmanila merged commit 3b4c8ff into main Jul 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants