Skip to content
This repository has been archived by the owner on Jun 3, 2021. It is now read-only.

Separate semantic analysis and parsing #370

Merged
merged 193 commits into from
May 1, 2020
Merged

Separate semantic analysis and parsing #370

merged 193 commits into from
May 1, 2020

Conversation

jyn514
Copy link
Owner

@jyn514 jyn514 commented Apr 12, 2020

This is basically a rewrite of the entire parser. It is very much a work in progress and all the tests are failing. Additionally, I think I've lost some of the unit tests during the rewrite, I need to find them in the git history. That said, most of the work for the rewrite is done at this point and everything left is just cleanup.

Changes

Most of these could be broken up into separate PRs, except for the main 'separate parsing and analysis' change.

  • Instead of having ExprType::{Add(left, right), Sub(left, right), ...}, have ExprType::Binary(BinaryOp, left, right), which makes life much easier for the constant folder and backend. This also makes the code much easier to understand since BinaryOp is now a struct instead of a function.
  • Clean up declaration_specifiers a lot. This is now sane and won't randomly miss specifiers if they occur in the wrong order. Cannot be in a separate PR.
  • Add benchmark for nested parentheses. rcc now reliably handles 3000+ parentheses in a row when before it had trouble even with 300.
  • Implement limited span merging. This merges spans for expressions, statements, and declaration but is not very good at retrieving the original subspans. For example, this will show the whole function declarator as an error:
int f(int i, int j, void);
<stdin>:1:6 error: invalid program: void must be the first and only parameter if specified
int f(int i, int j, void);
     ^^^^^^^^^^^^^^^^^^^^
  • Correctly parse qualifiers for pointers and variables (Parse qualifiers correctly #347). Cannot be in a separate PR.
  • Remove the unused Type::Bitfield
  • Rename the AssignmentToken variants to not look dumb (AddAssign instead of PlusAssign)
  • Move most of the scaffolding for typedefs earlier in the parser to avoid having enormous match statements everywhere. This also makes is_decl_specifier much more reliable. The cost of course is that's it's super hacky, but it works very reliably.
  • The backend is no longer responsible for desugaring complex assignment (good riddance!). Cannot be in a separate PR.
  • Variables are now given a Metadata when they are declared which is reused across scopes. This means the backend no longer has to have any idea of what a scope is, making it easier to do codegen. In particular, there are no more bugs where the frontend's scope is different from the backend's scope.

Action Items

These should be reverted or fixed before merging.

  • The lexer now turns identifiers into keywords again. This broke the preprocessor and needs to be reverted, it was mostly for testing.
  • I tried making the Location a trait instead of a type. This failed miserably.
  • Added derive_more for displays. I only used it in one trivial place, it should either take the place of impl Display for Expr or be removed altogether before this is merged.
  • codespan is used only for storing the Files table. This is kind of a waste and I should either switch to codespan-reporting once and for all or write my own Files real quick to remove the unnecessary dependency. I'll make a follow-up PR for this, this one is big enough.
  • There is still a lot of commented-out code that needs to be removed.
  • The Lexer needs a design decision. Right now it implements Iterator<Item = Result<Token, LexError>> which I like because it reflects its semantic purpose. However, the parser only accepts Iterator<Item = Result<Token, CompileError>>. Either the lexer needs to yield CompileError or there needs to be a wrapper that turns all the LexErrors into the more generic CompileError. I went with a third option: the parser accepts any iterator over Result<Token, E: Into<CompileError>

Issues

Closes

#139 is fixed in the parser but now crashes because cranelift hasn't implemented boolean ops (bytecodealliance/wasmtime#1133)

Addresses #59, but only as a misfeature - it ignores all of the keywords listed there.

Makes a great deal of progress towards #266.

cc @pythondude325

@pythongirl325 pythongirl325 self-requested a review April 26, 2020 17:23
self.declarations.typedefs.insert(id, ());
} else if ctype == Type::Void {
// TODO: catch this error for types besides void?
self.err(SemanticError::VoidType, location);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure what I was thinking with the comment here. The error is for void i; and things like that.

src/analyze/mod.rs Outdated Show resolved Hide resolved
src/analyze/mod.rs Show resolved Hide resolved
src/analyze/mod.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@pythongirl325 pythongirl325 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments

src/analyze/mod.rs Outdated Show resolved Hide resolved
src/analyze/mod.rs Show resolved Hide resolved
self.parse_typename(ctype, location)
}
// TODO: I don't think this is a very good abstraction
fn parse_typename(&mut self, ctype: ast::TypeName, location: Location) -> Type {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this (and the following) parsing function even be in the analysis module?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, why wouldn't they be? These are helper functions for other functions in analyze, but I don't see why they would go somewhere else.

src/analyze/mod.rs Show resolved Hide resolved
src/analyze/mod.rs Show resolved Hide resolved
},
};
let mut storage_class = None;
for (spec, sc) in &[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this loop can be rewritten to be a little clearer, maybe even with iterators.

src/analyze/mod.rs Show resolved Hide resolved
None => ctype = Some(Type::Int(signed)),
}
}
// i;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give this comment some context

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i; (in a scope where i has not previously been declared) declares a new variable called i with type int. This 'feature' was removed in C99 but is still common in real-world code.

http://port70.net/~nsz/c/c99/n1256.html#Forewordp5

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bit that allows this in C89 is 3.5.2:

  • int , signed , signed int , or no type specifiers

See https://stackoverflow.com/questions/26488502/which-section-in-c89-standard-allows-the-implicit-int-rule

@jyn514 jyn514 merged commit baeceb2 into master May 1, 2020
@jyn514 jyn514 deleted the pratt-parsing branch May 1, 2020 20:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants