Support 'shebang' ( #!/usr/bin/boa ) #1555

ZainlessBrombie · 2021-09-05T14:24:42Z

Feature
Support - that is ignore - the unix shebang, e.g. #!/usr/bin/boa
This is something nodejs and deno both do: they just ignore the unix-style shebang, and this is sometimes required to execute scripts written for either.

Example code

#!/bin/whatever
console.log("I still work");

The text was updated successfully, but these errors were encountered:

jedel1043 · 2021-09-05T17:29:13Z

For anyone who wants to tackle this. The pertinent code is in:

boa/boa/src/syntax/lexer/mod.rs

Lines 106 to 114 in 08f232f

    
           pub fn new(reader: R) -> Self 
        
           where 
        
               R: Read, 
        
           { 
        
               Self { 
        
                   cursor: Cursor::new(reader), 
        
                   goal_symbol: Default::default(), 
        
               } 
        
           }

Ideally we would eagerly try to match "#!" with the lexer and, if it does match, jump to the next line. You can see an example on how to do this in the SingleLineComment lexer:

boa/boa/src/syntax/lexer/comment.rs

Lines 23 to 47 in 8f590d7

    
           pub(super) struct SingleLineComment; 
        
           impl<R> Tokenizer<R> for SingleLineComment { 
        
               fn lex(&mut self, cursor: &mut Cursor<R>, start_pos: Position) -> Result<Token, Error> 
        
               where 
        
                   R: Read, 
        
               { 
        
                   let _timer = BoaProfiler::global().start_event("SingleLineComment", "Lexing"); 
        
                   // Skip either to the end of the line or to the end of the input 
        
                   while let Some(ch) = cursor.peek()? { 
        
                       if ch == b'\n' || ch == b'\r' { 
        
                           break; 
        
                       } else { 
        
                           // Consume char. 
        
                           cursor.next_byte()?.expect("Comment character vanished"); 
        
                       } 
        
                   } 
        
                   Ok(Token::new( 
        
                       TokenKind::Comment, 
        
                       Span::new(start_pos, cursor.pos()), 
        
                   )) 
        
               } 
        
           }

However, if implementing the logic inside new is too complex or too unreadable, then we can lex it on:

boa/boa/src/syntax/lexer/mod.rs

Lines 176 to 286 in 08f232f

    
           pub fn next(&mut self) -> Result<Option<Token>, Error> 
        
           where 
        
               R: Read, 
        
           { 
        
               let _timer = BoaProfiler::global().start_event("next()", "Lexing"); 
        
               let (start, next_ch) = loop { 
        
                   let start = self.cursor.pos(); 
        
                   if let Some(next_ch) = self.cursor.next_char()? { 
        
                       // Ignore whitespace 
        
                       if !Self::is_whitespace(next_ch) { 
        
                           break (start, next_ch); 
        
                       } 
        
                   } else { 
        
                       return Ok(None); 
        
                   } 
        
               }; 
        
               if let Ok(c) = char::try_from(next_ch) { 
        
                   let token = match c { 
        
                       '\r' | '\n' | '\u{2028}' | '\u{2029}' => Ok(Token::new( 
        
                           TokenKind::LineTerminator, 
        
                           Span::new(start, self.cursor.pos()), 
        
                       )), 
        
                       '"' | '\'' => StringLiteral::new(c).lex(&mut self.cursor, start), 
        
                       '`' => TemplateLiteral.lex(&mut self.cursor, start), 
        
                       ';' => Ok(Token::new( 
        
                           Punctuator::Semicolon.into(), 
        
                           Span::new(start, self.cursor.pos()), 
        
                       )), 
        
                       ':' => Ok(Token::new( 
        
                           Punctuator::Colon.into(), 
        
                           Span::new(start, self.cursor.pos()), 
        
                       )), 
        
                       '.' => { 
        
                           if self.cursor.peek()?.map(|c| (b'0'..=b'9').contains(&c)) == Some(true) { 
        
                               NumberLiteral::new(next_ch as u8).lex(&mut self.cursor, start) 
        
                           } else { 
        
                               SpreadLiteral::new().lex(&mut self.cursor, start) 
        
                           } 
        
                       } 
        
                       '(' => Ok(Token::new( 
        
                           Punctuator::OpenParen.into(), 
        
                           Span::new(start, self.cursor.pos()), 
        
                       )), 
        
                       ')' => Ok(Token::new( 
        
                           Punctuator::CloseParen.into(), 
        
                           Span::new(start, self.cursor.pos()), 
        
                       )), 
        
                       ',' => Ok(Token::new( 
        
                           Punctuator::Comma.into(), 
        
                           Span::new(start, self.cursor.pos()), 
        
                       )), 
        
                       '{' => Ok(Token::new( 
        
                           Punctuator::OpenBlock.into(), 
        
                           Span::new(start, self.cursor.pos()), 
        
                       )), 
        
                       '}' => Ok(Token::new( 
        
                           Punctuator::CloseBlock.into(), 
        
                           Span::new(start, self.cursor.pos()), 
        
                       )), 
        
                       '[' => Ok(Token::new( 
        
                           Punctuator::OpenBracket.into(), 
        
                           Span::new(start, self.cursor.pos()), 
        
                       )), 
        
                       ']' => Ok(Token::new( 
        
                           Punctuator::CloseBracket.into(), 
        
                           Span::new(start, self.cursor.pos()), 
        
                       )), 
        
                       '/' => self.lex_slash_token(start), 
        
                       '=' | '*' | '+' | '-' | '%' | '|' | '&' | '^' | '<' | '>' | '!' | '~' | '?' => { 
        
                           Operator::new(next_ch as u8).lex(&mut self.cursor, start) 
        
                       } 
        
                       '\\' if self.cursor.peek()? == Some(b'u') => { 
        
                           Identifier::new(c).lex(&mut self.cursor, start) 
        
                       } 
        
                       _ if Identifier::is_identifier_start(c as u32) => { 
        
                           Identifier::new(c).lex(&mut self.cursor, start) 
        
                       } 
        
                       _ if c.is_digit(10) => { 
        
                           NumberLiteral::new(next_ch as u8).lex(&mut self.cursor, start) 
        
                       } 
        
                       _ => { 
        
                           let details = format!( 
        
                               "unexpected '{}' at line {}, column {}", 
        
                               c, 
        
                               start.line_number(), 
        
                               start.column_number() 
        
                           ); 
        
                           Err(Error::syntax(details, start)) 
        
                       } 
        
                   }?; 
        
                   if token.kind() == &TokenKind::Comment { 
        
                       // Skip comment 
        
                       self.next() 
        
                   } else { 
        
                       Ok(Some(token)) 
        
                   } 
        
               } else { 
        
                   Err(Error::syntax( 
        
                       format!( 
        
                           "unexpected utf-8 char '\\u{}' at line {}, column {}", 
        
                           next_ch, 
        
                           start.line_number(), 
        
                           start.column_number() 
        
                       ), 
        
                       start, 
        
                   )) 
        
               } 
        
           }

matching for "#!" but only if self.cursor.pos() is in the line 1, column 1. It will have a performance penalty having to check the condition every time we call next but it should be so tiny that we can consider it nonexistent.

Razican · 2021-09-05T20:30:42Z

I don't think we should change the parser or the lexer. "shebang" is no valid JS, and I don't think our engine should treat it as a "comment". Our engine could be executing JS from any input, and we must be spec compliant.

But, I see the benefit of having this in the Boa executable when providing a file as the input source. I would just change the way this executable works by checking if the first line is a shebang. If it is, then remove that first line from the input sent to the engine.

This would make the executable compatible with this, but the engine would only accept valid JS.

jedel1043 · 2021-09-05T23:50:14Z

I don't think we should change the parser or the lexer. "shebang" is no valid JS, and I don't think our engine should treat it as a "comment". Our engine could be executing JS from any input, and we must be spec compliant.

But, I see the benefit of having this in the Boa executable when providing a file as the input source. I would just change the way this executable works by checking if the first line is a shebang. If it is, then remove that first line from the input sent to the engine.

This would make the executable compatible with this, but the engine would only accept valid JS.

Ah, then just skipping directly the #! in eval and parse? That would be much easier, then. Here:

boa/boa/src/lib.rs

Lines 94 to 97 in 26e1497

    
           pub fn parse<T: AsRef<[u8]>>(src: T, strict_mode: bool) -> StdResult<StatementList, ParseError> { 
        
               let src_bytes: &[u8] = src.as_ref(); 
        
               Parser::new(src_bytes, strict_mode).parse_all() 
        
           }

here:

boa/boa/src/context.rs

Lines 818 to 842 in f9a82b4

    
           pub fn eval<T: AsRef<[u8]>>(&mut self, src: T) -> JsResult<JsValue> { 
        
               let main_timer = BoaProfiler::global().start_event("Main", "Main"); 
        
               let src_bytes: &[u8] = src.as_ref(); 
        
               let parsing_result = Parser::new(src_bytes, false) 
        
                   .parse_all() 
        
                   .map_err(|e| e.to_string()); 
        
               let statement_list = match parsing_result { 
        
                   Ok(statement_list) => statement_list, 
        
                   Err(e) => return self.throw_syntax_error(e), 
        
               }; 
        
               let mut compiler = crate::bytecompiler::ByteCompiler::default(); 
        
               compiler.compile_statement_list(&statement_list, true); 
        
               let code_block = compiler.finish(); 
        
               let mut vm = Vm::new(code_block, self); 
        
               let result = vm.run(); 
        
               // The main_timer needs to be dropped before the BoaProfiler is. 
        
               drop(main_timer); 
        
               BoaProfiler::global().drop(); 
        
               result 
        
           }

and here:

boa/boa/src/context.rs

Lines 784 to 802 in f9a82b4

    
           pub fn eval<T: AsRef<[u8]>>(&mut self, src: T) -> JsResult<JsValue> { 
        
               let main_timer = BoaProfiler::global().start_event("Main", "Main"); 
        
               let src_bytes: &[u8] = src.as_ref(); 
        
               let parsing_result = Parser::new(src_bytes, false) 
        
                   .parse_all() 
        
                   .map_err(|e| e.to_string()); 
        
               let execution_result = match parsing_result { 
        
                   Ok(statement_list) => statement_list.run(self), 
        
                   Err(e) => self.throw_syntax_error(e), 
        
               }; 
        
               // The main_timer needs to be dropped before the BoaProfiler is. 
        
               drop(main_timer); 
        
               BoaProfiler::global().drop(); 
        
               execution_result 
        
           }

we would need to skip the first bytes of src if it begins with a #! and until we find an EOL. The downside I see is that now we're integrating parsing logic in Context, but the easiness of implementation would compensate for this, I think.

Razican · 2021-09-06T05:57:37Z

I think it's even better if we do the change only in boa_cli.

jedel1043 · 2021-09-06T06:02:49Z

I think it's even better if we do the change only in boa_cli.

I thought of that, but then a user downloading a js script from some page and using boa as an interpreter in some Rust application would fail, wouldn't it? I think that's not ideal, and I wouldn't want to force the user to manage the shebang manually when making a simple check and skip should be easy for us.

Razican · 2021-09-06T10:15:31Z

I think it's even better if we do the change only in boa_cli.

I thought of that, but then a user downloading a js script from some page and using boa as an interpreter in some Rust application would fail, wouldn't it? I think that's not ideal, and I wouldn't want to force the user to manage the shebang manually when making a simple check and skip should be easy for us.

I think that is out of the scope of our engine. We have a JS engine that must only accept spec compliant strings.

Then, we provide a CLI tool that we can customize to allow for a particular use case that makes sense. If someone else wants to use our engine in their tool and allow for this, I guess they should add the specific support they need.

In the case we want to really change the engine, this should be an optional, opt-in feature.

jedel1043 · 2021-09-06T15:26:20Z

I think that is out of the scope of our engine. We have a JS engine that must only accept spec compliant strings.

I'd agree if it weren't for the fact that Firefox and Chrome support the shebang on their engines. Also, I think importing a module that has a shebang at the beginning would break, even if the user is just using boa_cli.

Razican · 2021-09-06T15:48:43Z

I think that is out of the scope of our engine. We have a JS engine that must only accept spec compliant strings.

I'd agree if it weren't for the fact that Firefox and Chrome support the shebang on their engines. Also, I think importing a module that has a shebang at the beginning would break, even if the user is just using boa_cli.

OK, then let's add it as an optional, opt-in feature :)

jedel1043 · 2021-10-03T19:37:55Z

Uhm, change of plans.
Hashbang support is a proposal on stage 3
https://github.com/tc39/proposal-hashbang
so we do need to lex it and parse it.

nekevss · 2021-10-03T21:52:44Z

I can try to take a crack at this one if it still needs to be done. Since there seemed to be some discussion, is there a preferred place to do this (boa-cli vs. lexing and parsing).

jedel1043 · 2021-10-03T22:12:31Z

Originally we were going to implement it on boa-cli, but seeing the spec will change to support it in the language grammar, it must be done on the lexer/parser

nekevss · 2021-10-04T21:19:48Z

I made a Pull Request for the attempt I put together. It does catch the Hashbang off my example file from what I can tell. Not sure if there's anymore functionality that should be/needs to be built for it though. Currently, it treats the Hashbang like a TokenKind::Comment and moves the lexer onto the next token.

ZainlessBrombie added the enhancement New feature or request label Sep 5, 2021

jedel1043 added E-Easy Easy good first issue Good for newcomers parser Issues surrounding the parser labels Sep 5, 2021

jedel1043 added the Hacktoberfest Hacktoberfest 2021 - https://hacktoberfest.digitalocean.com label Oct 3, 2021

nekevss mentioned this issue Oct 4, 2021

Hashbang lexer support #1631

Merged

raskad closed this as completed in #1631 Oct 7, 2021

Razican added this to the v0.14.0 milestone Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support 'shebang' ( #!/usr/bin/boa ) #1555

Support 'shebang' ( #!/usr/bin/boa ) #1555

ZainlessBrombie commented Sep 5, 2021

jedel1043 commented Sep 5, 2021 •

edited

Loading

Razican commented Sep 5, 2021

jedel1043 commented Sep 5, 2021 •

edited

Loading

Razican commented Sep 6, 2021

jedel1043 commented Sep 6, 2021 •

edited

Loading

Razican commented Sep 6, 2021

jedel1043 commented Sep 6, 2021

Razican commented Sep 6, 2021

jedel1043 commented Oct 3, 2021

nekevss commented Oct 3, 2021

jedel1043 commented Oct 3, 2021

nekevss commented Oct 4, 2021

Support 'shebang' ( #!/usr/bin/boa ) #1555

Support 'shebang' ( #!/usr/bin/boa ) #1555

Comments

ZainlessBrombie commented Sep 5, 2021

jedel1043 commented Sep 5, 2021 • edited Loading

Razican commented Sep 5, 2021

jedel1043 commented Sep 5, 2021 • edited Loading

Razican commented Sep 6, 2021

jedel1043 commented Sep 6, 2021 • edited Loading

Razican commented Sep 6, 2021

jedel1043 commented Sep 6, 2021

Razican commented Sep 6, 2021

jedel1043 commented Oct 3, 2021

nekevss commented Oct 3, 2021

jedel1043 commented Oct 3, 2021

nekevss commented Oct 4, 2021

jedel1043 commented Sep 5, 2021 •

edited

Loading

jedel1043 commented Sep 5, 2021 •

edited

Loading

jedel1043 commented Sep 6, 2021 •

edited

Loading