Skip to content
This repository has been archived by the owner on Jan 6, 2021. It is now read-only.

Streaming parser #37

Merged
merged 22 commits into from
May 28, 2017
Merged

Streaming parser #37

merged 22 commits into from
May 28, 2017

Conversation

daveyarwood
Copy link
Member

@daveyarwood daveyarwood commented May 10, 2017

This is a rewrite of the Alda parser from the ground-up, replacing the current parser-generator (Instaparse) implementation with an asynchronous pipeline using core.async.

The motivation for the rewrite is described in detail in #20. Parsing is currently the bottleneck when parsing large scores -- it can often take seconds, depending on the size of the score. This rewrite speeds up parsing significantly for larger scores, while remaining about the same (just slightly faster) for scores under about 100 lines, like most of our example scores -- see the benchmarks below.

The idea is that we can start to construct a score in another thread as soon as the first token is parsed. There are actually several threads going, each performing a separate stage of the parsing process in parallel. The result is much faster parsing times (depending on the size of the score) and a foundation for doing more interesting things with streaming in the future.

Benchmarks (parsing the example scores):

The metric here is the time it takes to parse an input file as events and then use the events to build a score. The "before" numbers are on the master branch, and the "after" numbers are on this branch.

across_the_sea.alda
  before: 42 ms
  after: 43 ms

awobmolg.alda
  before: 296 ms
  after: 180 ms

bach_cello_suite_no_1.alda
  before: 593 ms
  after: 368 ms

clapping_music.alda
  before: 186 ms
  after: 191 ms

debussy_quartet.alda
  before: 316 ms
  after: 267 ms

entropy.alda
  before: 105 ms
  after: 87 ms

gau.alda
  before: 66 ms
  after: 90 ms

hello_world.alda
  before: 4 ms
  after: 9 ms

key_signature.alda
  before: 37 ms
  after: 21 ms

multi-poly.alda
  before: 21 ms
  after: 12 ms

nesting.alda
  before: 29 ms
  after: 34 ms

overriding-a-global-attribute.alda
  before: 31 ms
  after: 20 ms

panning.alda
  before: 42 ms
  after: 49 ms

percussion.alda
  before: 42 ms
  after: 134 ms

phase.alda
  before: 34 ms
  after: 42 ms

poly.alda
  before: 291 ms
  after: 192 ms

printing.alda
  before: 21 ms
  after: 34 ms

variables.alda
  before: 96 ms
  after: 78 ms

variables-2.alda
  before: 12 ms
  after: 21 ms

As a final benchmark to test a very long score, I edited bach_cello_suite_no_1.alda by copying the entire file and pasting it to the end of the file about 15 times, the result being a score that is 15x the length of that example (just under 1000 lines of code):

Before: 8746 ms
After: 4709 ms

Note that this is only implements the "parse -> eval" piece of a fully streaming parse/play pipeline. I found that just by rewriting the parser by hand instead of using a parser generator library, parsing is significantly faster, but we still must wait for the score to be fully evaluated before we start to play it.

There is room for more improvement in the future -- we could start to play notes as soon as they are generated during the score-building phase. In the case of the 1000 line example above, this would take the 4709 ms wait before you start to hear any notes down to almost 0 ms! The score would still take almost 5 seconds to fully parse, but that would be happening in the background as you're listening to the score. I'm really excited about that potential, and I think this PR paves the way for us to get there in the near future.

Simplifying this a little bit: I don't think we really need to be able to
"parse with context"; we only need to be able to parse input as a sequence of
events, instead of a score map.

Ramifications:

- We are no longer in the business of generating alda.lisp code. We could add
  this back in the future, but I think it's extraneous since that's no longer an
  intermediate step in the parsing pipeline.

  (It actually hasn't been a step in the parsing process for some time now, and
  I don't consider it an essential feature of Alda.)

- Not generating alda.lisp code makes testing the parser a little trickier,
  since not all events have good equality semantics out-of-the-box.

  For example, when a note event is generated by the parser, it includes an
  anonymous "pitch-fn" which is not considered equal (by Clojure standards) to
  the pitch-fn of another note, even if the note has the same pitch.

  I am taking this as an opportunity to re-implement certain events in ways that
  will give them proper equality semantics.

  Then testing the parser is just a matter of parsing input and comparing the
  actual sequence of events vs. an expected sequence of events that can be
  obtained by writing alda.lisp code in the test namespaces. This is nice
  because the tests still read just as well, but we're testing the equality of
  events instead of the equality of generated Clojure code which can always
  change arbitrarily.

- The Alda REPL will need to be tweaked to use parse-input instead of
  parse-to-*-with-context, and also we won't know the context anymore. I think
  this is OK, because the only context we've ever needed so far in the REPL has
  been the current instruments, and those are easily obtained from the score
  map.

  My original idea (and the inspiration behind parse-to-*-with-context) was for
  the parser to give a lot of detail about where exactly we are in a score, e.g.
  are we in the middle of a voice? inside of an event sequence? in an instrument
  part? etc. But in practice, we haven't needed to use any of that context at
  all so far, and interestingly, we started adding more context to the score map
  itself, so I think even if we DID start to need that context in the REPL, we
  could just get it from the score map.

  So, this is a nice opportunity to clean up the code by removing complicated
  WIP features that we aren't using.
Conceptually simpler, and allows for notes to be equal in Clojure's eyes.
- dissoc :chord? from notes so that they can be equal to notes created via
  alda.lisp

- don't flush the buffer on attribute changes
The parser now emits "empty" parts and voices, which have the same effect as
before because they "set" the part/voice to use for the subsequent events.
@daveyarwood
Copy link
Member Author

An added benefit of rewriting the parser from the ground up is that it fixes issue #12. Line and column numbers are propagated through correctly this way, and error messages are clearer and more precise.

@daveyarwood
Copy link
Member Author

Trying this locally. Unfortunately, I'm hearing a lot of issues where audio playback will intermittently hang onto a note for a couple seconds or so before moving on. All the notes are in sync, it's just like the score pauses for a second before playback continues.

It seems like this is a weird interaction between JSyn and core.async.

Otherwise (e.g. if one of the events is an error), I think the channel hangs
around in memory forever waiting to continue being consumed.
@daveyarwood
Copy link
Member Author

(Note: still a problem after that last commit.)

@daveyarwood daveyarwood added this to the 2.0.0 milestone May 14, 2017
@daveyarwood
Copy link
Member Author

Good news! The problem only seems to happen when running in development mode via Boot. I think Boot imposes a good deal of overhead, so I think adding core.async to the mix might have been the straw that broke the camel's back.

I did a build of Alda using this branch, though, and ran it as an executable jar file, and the problem doesn't exist. Also, parsing is crazy fast. Around 100ms to parse the Bach cello suite example!

I feel good about including this in 1.0.0.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant