Switch to Pandoc (commonmark-hs) #137

srid · 2020-04-25T01:41:50Z

Requests for Pandoc arose a few times.

Let's switch to ~~Pandoc~~ commonmark-hs (which Pandoc will eventually be using as its markdown parser).

In order to migrate away from MMark to Pandoc, we will have to rewrite the replaceLink extension, which I've refactored to be general and small enough:

neuron/src/Text/MMark/Extension/ReplaceLink.hs

Lines 16 to 25 in fdf18dd

    
           -- | MMark extension to replace links with some HTML. 
        
           replaceLink :: Map MarkdownLink (Html ()) -> Extension 
        
           replaceLink linkMap = 
        
             Ext.inlineRender $ \f -> \case 
        
               inline@(Ext.Link inner uri _title) -> 
        
                 MarkdownLink (Ext.asPlainText inner) uri 
        
                   & flip lookup linkMap 
        
                   & fromMaybe (f inline) 
        
               inline -> 
        
                 f inline

Essentially the extension takes a Map of links, and for each link it renders the given HTML in the final output (the Map is computed ahead by neuron).

The following should continue to work:

z:, zquery:, <ID>, etc links
Markdown YAML metadata (for title, date and tags)

The text was updated successfully, but these errors were encountered:

srid · 2020-04-27T00:48:05Z

According to jgm/commonmark-hs#1 Pandoc will eventually use commonmark-hs for parsing Markdown, and will thus be less buggy and more performant. It being pure Haskell parser is another advantage that I find to be relevant (for ghcjs support).

I'm inclined towards going with commonmark-hs at this point.

Nadrieril · 2020-04-28T19:18:52Z

What's the goal in making the switch? I understood we wanted to open the possibility of using Pandoc features like citations or image properties, in which case we'd want to use Pandoc itself rather than commonmark-hs, right ?

srid · 2020-04-28T20:35:33Z

What's the goal in making the switch? I understood we wanted to open the possibility of using Pandoc features like citations or image properties,

That's correct; however that cannot be at the expense of dropping GHCJS support (I have another project in mind that will need this) or adding a buggy and less performant parser (see the link above). Fortunately, commonmark-hs will eventually be used as the parser in Pandoc; so by porting to commonmark-hs we will eventually be supporting Pandoc.

As for things like citations or image properties, I imagine they will eventually get the corresponding commonmark extension ported (there are already some extensions here of which fenced_divs today seems relevant to image properties); or we can write one ourselves, as commonmark's extension mechanism is more powerful than mmark's (the later can only customize HTML output; whereas with former you can write custom inline/block parsers).

Nadrieril · 2020-04-29T15:24:38Z

Ok, if I understand correctly then, the move to commonmark-hs is mainly to have a more accepting parser and powerful extension mechanism. On top of that, we would be able to build some new features. Using the actual Pandoc AST and rendering code isn't planned for now.

In fact I don't think we care about Pandoc at all: as you say, commonmark-hs has a powerful extension mechanism and has already quite a few supported. If we ever want some Pandoc feature, commonmark-hs knows how to produce a Pandoc AST so we don't even care whether Pandoc ever uses commonmark-hs as its main markdown parser.

Potential issue: it looks like commonmark-hs does not support yaml headers yet jgm/commonmark-hs#17 .

Note that fenced_divs is not what we want for images; we want https://github.com/jgm/commonmark-hs/blob/master/commonmark-extensions/test/attributes.md .

michelk · 2020-05-05T07:54:28Z

But on the other hand, if we would directly use pandoc (+ lua-filters), we could also support some other popular file-formats like org-mode or rst, which would come in handy in some situations.

I like for example the possibility to evaluate code blocks in emacs-org-mode: e.g.

#+TITLE: bla

#+NAME: someTable
#+BEGIN_SRC R :exports results :colnames yes
  data.frame(uno = c(1,2,3,4), dos = c('a', 'b', 'c', 'd'))
#+END_SRC

#+RESULTS: someTable
| uno | dos |
|-----+-----|
|   1 | a   |
|   2 | b   |
|   3 | c   |
|   4 | d   |

maralorn · 2020-05-07T15:16:43Z

I am also strongly in favor of using pandoc for the markdown.

pandoc is a very viable option for writing complete papers, thesises or even books. Not being able reuse my multi-line formulas because they are wrapped as mathjax code blocks is kind of a killer for using neuron.

srid · 2020-05-07T20:38:03Z

@maralorn @michelk See here; the current Pandoc parser is infeasible because I need the neuron Markdown parser to work in GHCJS for other projects of mine.

Fortunately though, as @Nadrieril expressed here, commonmark-hs does provide a way to parse directly into the Pandoc AST. So any code that operates on the AST could immediately be supported, with extensions written for code operating on pre-parsed text. Pandoc's author explains the overall migration plan here:

The first step would be to use this library, instead of cmark-gfm, for
pandoc's 'commonmark' input, and gradually add extensions until
most of pandoc's major markdown extensions are supported. At that point
we might consider making 'commonmark' the default input format for
pandoc, instead of 'markdown'.

Nadrieril · 2020-05-08T00:26:09Z

A possible solution for people who want to write their zettels in orgmode or use citations or whatnot, would be to make neuron more library-like, so that people can reuse the cool bits but e.g. provide their own code for getting input data. If neuron uses the Pandoc AST internally, it would be quite flexible. Somewhat like a more opiniated rib, maybe.

michelk · 2020-05-08T07:28:47Z

@srid you're mentioning GHCJS. I have also an related project in mind:

What about a flashcard system, similar to anki, where the title gets to the front and the body to back.

Just an idea...

michelk · 2020-05-08T07:30:33Z

I read something similar here.

srid · 2020-05-08T15:47:45Z

@Nadrieril That's an interesting idea, one that is worth exploring I think; but we can do this in the neuron executable (without opening up the library-based hydra). I'm currently playing with commonmark-hs and am keeping this in mind in the background.

Discussing with @felko over at zulip chat ... my thoughts are: If we support multiple markup, then commonmark would be one of them, to be used by *.md files. Zettels written in *.org would use Pandoc's org-mode parser, and *.pmd can use Pandoc's native markdown parser, and so forth. What's common between them is the Pandoc AST, which means all of neuron's link/query processing would operate directly on the Pandoc AST, without requiring the user to use a particular source markup (they could use whatever format, as long as it can be converted to Pandoc's AST). There may be unforeseen issues or what not; for example not all Pandoc source readers support the YAML block, which we use for title, date and tags.

@michelk Yup, a flashcard app is something I wanted to write myself, using reflex.

srid · 2020-05-10T23:49:21Z

The Pandoc AST feature branch (#166) is ready to use if anybody is interested in testing. I'll merge it to master soonish.

It parses text using commonmark, but uses the pandoc AST (thus neuron can potentially support other formats).

michelk · 2020-05-12T10:50:39Z

Thank you. I only had to change math-blocks from

    ```mathjax
    tt
    ```

to

    $$
    tt
    $$

and from

`$tt$` to $tt$

and we don't need to escape [ and ] anymore.

Thanks a lot.

maralorn · 2020-05-12T11:04:13Z

That is exactly how it should be! So great!

srid · 2020-05-12T16:38:32Z

Cool!

I'm gonna finish #172 (which switches to reflex-dom; but I'll make sure to test that mathjax/etc. continue to render as before) before merging all of this to master.

srid · 2020-05-13T14:45:13Z

This is now merged to master.

Note that as of #172 neuron uses reflex-dom (not pandoc's native HTML writer) to render the HTML from the AST. This README section contains instructions on how to hack on the renderer, for those interested in improving it.

srid · 2020-05-13T14:48:43Z

Oh, also note that installation instructions have changed. In particular, you would need to run this command:

nix-env -if https://github.com/srid/neuron/archive/master.tar.gz

You can simply run this again, in order to upgrade your current install. Just make sure that you are still using the cache.

srid added the help wanted Extra attention is needed label Apr 25, 2020

srid mentioned this issue Apr 25, 2020

Feature Request: Support for citations #134

Closed

srid pinned this issue Apr 25, 2020

felko mentioned this issue Apr 26, 2020

org-roam export felko/neuron-mode#18

Closed

srid removed the help wanted Extra attention is needed label Apr 27, 2020

srid self-assigned this Apr 27, 2020

srid changed the title ~~Switch to Pandoc~~ Switch to Pandoc (commonmark-hs) Apr 27, 2020

srid removed their assignment May 1, 2020

srid mentioned this issue May 9, 2020

MMark -> Pandoc AST #166

Merged

11 tasks

srid closed this as completed in #166 May 13, 2020

srid unpinned this issue May 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to Pandoc (commonmark-hs) #137

Switch to Pandoc (commonmark-hs) #137

srid commented Apr 25, 2020 •

edited

Loading

srid commented Apr 27, 2020

Nadrieril commented Apr 28, 2020 •

edited

Loading

srid commented Apr 28, 2020

Nadrieril commented Apr 29, 2020

michelk commented May 5, 2020

maralorn commented May 7, 2020

srid commented May 7, 2020

Nadrieril commented May 8, 2020

michelk commented May 8, 2020

michelk commented May 8, 2020

srid commented May 8, 2020 •

edited

Loading

srid commented May 10, 2020

michelk commented May 12, 2020

maralorn commented May 12, 2020

srid commented May 12, 2020

srid commented May 13, 2020

srid commented May 13, 2020

Switch to Pandoc (commonmark-hs) #137

Switch to Pandoc (commonmark-hs) #137

Comments

srid commented Apr 25, 2020 • edited Loading

srid commented Apr 27, 2020

Nadrieril commented Apr 28, 2020 • edited Loading

srid commented Apr 28, 2020

Nadrieril commented Apr 29, 2020

michelk commented May 5, 2020

maralorn commented May 7, 2020

srid commented May 7, 2020

Nadrieril commented May 8, 2020

michelk commented May 8, 2020

michelk commented May 8, 2020

srid commented May 8, 2020 • edited Loading

srid commented May 10, 2020

michelk commented May 12, 2020

maralorn commented May 12, 2020

srid commented May 12, 2020

srid commented May 13, 2020

srid commented May 13, 2020

srid commented Apr 25, 2020 •

edited

Loading

Nadrieril commented Apr 28, 2020 •

edited

Loading

srid commented May 8, 2020 •

edited

Loading