Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 'shebang' ( #!/usr/bin/boa ) #1555

Closed
ZainlessBrombie opened this issue Sep 5, 2021 · 12 comments · Fixed by #1631
Closed

Support 'shebang' ( #!/usr/bin/boa ) #1555

ZainlessBrombie opened this issue Sep 5, 2021 · 12 comments · Fixed by #1631
Labels
E-Easy Easy enhancement New feature or request good first issue Good for newcomers Hacktoberfest Hacktoberfest 2021 - https://hacktoberfest.digitalocean.com parser Issues surrounding the parser
Milestone

Comments

@ZainlessBrombie
Copy link

Feature
Support - that is ignore - the unix shebang, e.g. #!/usr/bin/boa
This is something nodejs and deno both do: they just ignore the unix-style shebang, and this is sometimes required to execute scripts written for either.

Example code

#!/bin/whatever
console.log("I still work");
@ZainlessBrombie ZainlessBrombie added the enhancement New feature or request label Sep 5, 2021
@jedel1043 jedel1043 added E-Easy Easy good first issue Good for newcomers parser Issues surrounding the parser labels Sep 5, 2021
@jedel1043
Copy link
Member

jedel1043 commented Sep 5, 2021

For anyone who wants to tackle this. The pertinent code is in:

pub fn new(reader: R) -> Self
where
R: Read,
{
Self {
cursor: Cursor::new(reader),
goal_symbol: Default::default(),
}
}

Ideally we would eagerly try to match "#!" with the lexer and, if it does match, jump to the next line. You can see an example on how to do this in the SingleLineComment lexer:

pub(super) struct SingleLineComment;
impl<R> Tokenizer<R> for SingleLineComment {
fn lex(&mut self, cursor: &mut Cursor<R>, start_pos: Position) -> Result<Token, Error>
where
R: Read,
{
let _timer = BoaProfiler::global().start_event("SingleLineComment", "Lexing");
// Skip either to the end of the line or to the end of the input
while let Some(ch) = cursor.peek()? {
if ch == b'\n' || ch == b'\r' {
break;
} else {
// Consume char.
cursor.next_byte()?.expect("Comment character vanished");
}
}
Ok(Token::new(
TokenKind::Comment,
Span::new(start_pos, cursor.pos()),
))
}
}

However, if implementing the logic inside new is too complex or too unreadable, then we can lex it on:

pub fn next(&mut self) -> Result<Option<Token>, Error>
where
R: Read,
{
let _timer = BoaProfiler::global().start_event("next()", "Lexing");
let (start, next_ch) = loop {
let start = self.cursor.pos();
if let Some(next_ch) = self.cursor.next_char()? {
// Ignore whitespace
if !Self::is_whitespace(next_ch) {
break (start, next_ch);
}
} else {
return Ok(None);
}
};
if let Ok(c) = char::try_from(next_ch) {
let token = match c {
'\r' | '\n' | '\u{2028}' | '\u{2029}' => Ok(Token::new(
TokenKind::LineTerminator,
Span::new(start, self.cursor.pos()),
)),
'"' | '\'' => StringLiteral::new(c).lex(&mut self.cursor, start),
'`' => TemplateLiteral.lex(&mut self.cursor, start),
';' => Ok(Token::new(
Punctuator::Semicolon.into(),
Span::new(start, self.cursor.pos()),
)),
':' => Ok(Token::new(
Punctuator::Colon.into(),
Span::new(start, self.cursor.pos()),
)),
'.' => {
if self.cursor.peek()?.map(|c| (b'0'..=b'9').contains(&c)) == Some(true) {
NumberLiteral::new(next_ch as u8).lex(&mut self.cursor, start)
} else {
SpreadLiteral::new().lex(&mut self.cursor, start)
}
}
'(' => Ok(Token::new(
Punctuator::OpenParen.into(),
Span::new(start, self.cursor.pos()),
)),
')' => Ok(Token::new(
Punctuator::CloseParen.into(),
Span::new(start, self.cursor.pos()),
)),
',' => Ok(Token::new(
Punctuator::Comma.into(),
Span::new(start, self.cursor.pos()),
)),
'{' => Ok(Token::new(
Punctuator::OpenBlock.into(),
Span::new(start, self.cursor.pos()),
)),
'}' => Ok(Token::new(
Punctuator::CloseBlock.into(),
Span::new(start, self.cursor.pos()),
)),
'[' => Ok(Token::new(
Punctuator::OpenBracket.into(),
Span::new(start, self.cursor.pos()),
)),
']' => Ok(Token::new(
Punctuator::CloseBracket.into(),
Span::new(start, self.cursor.pos()),
)),
'/' => self.lex_slash_token(start),
'=' | '*' | '+' | '-' | '%' | '|' | '&' | '^' | '<' | '>' | '!' | '~' | '?' => {
Operator::new(next_ch as u8).lex(&mut self.cursor, start)
}
'\\' if self.cursor.peek()? == Some(b'u') => {
Identifier::new(c).lex(&mut self.cursor, start)
}
_ if Identifier::is_identifier_start(c as u32) => {
Identifier::new(c).lex(&mut self.cursor, start)
}
_ if c.is_digit(10) => {
NumberLiteral::new(next_ch as u8).lex(&mut self.cursor, start)
}
_ => {
let details = format!(
"unexpected '{}' at line {}, column {}",
c,
start.line_number(),
start.column_number()
);
Err(Error::syntax(details, start))
}
}?;
if token.kind() == &TokenKind::Comment {
// Skip comment
self.next()
} else {
Ok(Some(token))
}
} else {
Err(Error::syntax(
format!(
"unexpected utf-8 char '\\u{}' at line {}, column {}",
next_ch,
start.line_number(),
start.column_number()
),
start,
))
}
}

matching for "#!" but only if self.cursor.pos() is in the line 1, column 1. It will have a performance penalty having to check the condition every time we call next but it should be so tiny that we can consider it nonexistent.

@Razican
Copy link
Member

Razican commented Sep 5, 2021

I don't think we should change the parser or the lexer. "shebang" is no valid JS, and I don't think our engine should treat it as a "comment". Our engine could be executing JS from any input, and we must be spec compliant.

But, I see the benefit of having this in the Boa executable when providing a file as the input source. I would just change the way this executable works by checking if the first line is a shebang. If it is, then remove that first line from the input sent to the engine.

This would make the executable compatible with this, but the engine would only accept valid JS.

@jedel1043
Copy link
Member

jedel1043 commented Sep 5, 2021

I don't think we should change the parser or the lexer. "shebang" is no valid JS, and I don't think our engine should treat it as a "comment". Our engine could be executing JS from any input, and we must be spec compliant.

But, I see the benefit of having this in the Boa executable when providing a file as the input source. I would just change the way this executable works by checking if the first line is a shebang. If it is, then remove that first line from the input sent to the engine.

This would make the executable compatible with this, but the engine would only accept valid JS.

Ah, then just skipping directly the #! in eval and parse? That would be much easier, then. Here:

boa/boa/src/lib.rs

Lines 94 to 97 in 26e1497

pub fn parse<T: AsRef<[u8]>>(src: T, strict_mode: bool) -> StdResult<StatementList, ParseError> {
let src_bytes: &[u8] = src.as_ref();
Parser::new(src_bytes, strict_mode).parse_all()
}

here:

boa/boa/src/context.rs

Lines 818 to 842 in f9a82b4

pub fn eval<T: AsRef<[u8]>>(&mut self, src: T) -> JsResult<JsValue> {
let main_timer = BoaProfiler::global().start_event("Main", "Main");
let src_bytes: &[u8] = src.as_ref();
let parsing_result = Parser::new(src_bytes, false)
.parse_all()
.map_err(|e| e.to_string());
let statement_list = match parsing_result {
Ok(statement_list) => statement_list,
Err(e) => return self.throw_syntax_error(e),
};
let mut compiler = crate::bytecompiler::ByteCompiler::default();
compiler.compile_statement_list(&statement_list, true);
let code_block = compiler.finish();
let mut vm = Vm::new(code_block, self);
let result = vm.run();
// The main_timer needs to be dropped before the BoaProfiler is.
drop(main_timer);
BoaProfiler::global().drop();
result
}

and here:

boa/boa/src/context.rs

Lines 784 to 802 in f9a82b4

pub fn eval<T: AsRef<[u8]>>(&mut self, src: T) -> JsResult<JsValue> {
let main_timer = BoaProfiler::global().start_event("Main", "Main");
let src_bytes: &[u8] = src.as_ref();
let parsing_result = Parser::new(src_bytes, false)
.parse_all()
.map_err(|e| e.to_string());
let execution_result = match parsing_result {
Ok(statement_list) => statement_list.run(self),
Err(e) => self.throw_syntax_error(e),
};
// The main_timer needs to be dropped before the BoaProfiler is.
drop(main_timer);
BoaProfiler::global().drop();
execution_result
}

we would need to skip the first bytes of src if it begins with a #! and until we find an EOL. The downside I see is that now we're integrating parsing logic in Context, but the easiness of implementation would compensate for this, I think.

@Razican
Copy link
Member

Razican commented Sep 6, 2021

I think it's even better if we do the change only in boa_cli.

@jedel1043
Copy link
Member

jedel1043 commented Sep 6, 2021

I think it's even better if we do the change only in boa_cli.

I thought of that, but then a user downloading a js script from some page and using boa as an interpreter in some Rust application would fail, wouldn't it? I think that's not ideal, and I wouldn't want to force the user to manage the shebang manually when making a simple check and skip should be easy for us.

@Razican
Copy link
Member

Razican commented Sep 6, 2021

I think it's even better if we do the change only in boa_cli.

I thought of that, but then a user downloading a js script from some page and using boa as an interpreter in some Rust application would fail, wouldn't it? I think that's not ideal, and I wouldn't want to force the user to manage the shebang manually when making a simple check and skip should be easy for us.

I think that is out of the scope of our engine. We have a JS engine that must only accept spec compliant strings.

Then, we provide a CLI tool that we can customize to allow for a particular use case that makes sense. If someone else wants to use our engine in their tool and allow for this, I guess they should add the specific support they need.

In the case we want to really change the engine, this should be an optional, opt-in feature.

@jedel1043
Copy link
Member

I think that is out of the scope of our engine. We have a JS engine that must only accept spec compliant strings.

I'd agree if it weren't for the fact that Firefox and Chrome support the shebang on their engines. Also, I think importing a module that has a shebang at the beginning would break, even if the user is just using boa_cli.

@Razican
Copy link
Member

Razican commented Sep 6, 2021

I think that is out of the scope of our engine. We have a JS engine that must only accept spec compliant strings.

I'd agree if it weren't for the fact that Firefox and Chrome support the shebang on their engines. Also, I think importing a module that has a shebang at the beginning would break, even if the user is just using boa_cli.

OK, then let's add it as an optional, opt-in feature :)

@jedel1043 jedel1043 added the Hacktoberfest Hacktoberfest 2021 - https://hacktoberfest.digitalocean.com label Oct 3, 2021
@jedel1043
Copy link
Member

Uhm, change of plans.
Hashbang support is a proposal on stage 3
https://github.com/tc39/proposal-hashbang
so we do need to lex it and parse it.

@nekevss
Copy link
Member

nekevss commented Oct 3, 2021

I can try to take a crack at this one if it still needs to be done. Since there seemed to be some discussion, is there a preferred place to do this (boa-cli vs. lexing and parsing).

@jedel1043
Copy link
Member

Originally we were going to implement it on boa-cli, but seeing the spec will change to support it in the language grammar, it must be done on the lexer/parser

@nekevss
Copy link
Member

nekevss commented Oct 4, 2021

I made a Pull Request for the attempt I put together. It does catch the Hashbang off my example file from what I can tell. Not sure if there's anymore functionality that should be/needs to be built for it though. Currently, it treats the Hashbang like a TokenKind::Comment and moves the lexer onto the next token.

@Razican Razican added this to the v0.14.0 milestone Feb 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-Easy Easy enhancement New feature or request good first issue Good for newcomers Hacktoberfest Hacktoberfest 2021 - https://hacktoberfest.digitalocean.com parser Issues surrounding the parser
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants