Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better engine than pandoc/gfm? #5

Open
fluffy-critter opened this issue Apr 28, 2019 · 4 comments
Open

Better engine than pandoc/gfm? #5

fluffy-critter opened this issue Apr 28, 2019 · 4 comments

Comments

@fluffy-critter
Copy link
Contributor

Pandoc's gfm backend produces markdown like:

> For folks who were following me on Patreon and don’t have an RSS
> reader, here are some alternate ways of following me:
> 
>   - All my stuff gets automatically posted to
>     [Twitter](http://beesbuzz.biz/twitter),
>     [Tumblr](http://beesbuzz.biz/tumblr), and
>     [Mastodon](http://beesbuzz.biz/mastodon), although that’s not
>     ideal because updates are really easy to miss on those places
>   - You can use [IFTTT](http://ifttt.com) or
>     [Blogtrottr](https://blogtrottr.com) to get posts delivered by
>     email (here’s [a tutorial on
>     IFTTT](https://www.chronicle.com/blogs/profhacker/send-an-rss-feed-to-your-email-account/50319))
>   - There’s also the `#site-updates` channel on [my
>     Discord](http://beesbuzz.biz/discord) (which is also a fun place
>     to hang out anyway)

which formats like

For folks who were following me on Patreon and don’t have an RSS
reader, here are some alternate ways of following me:

  • All my stuff gets automatically posted to
    Twitter,
    Tumblr, and
    Mastodon, although that’s not
    ideal because updates are really easy to miss on those places
  • You can use IFTTT or
    Blogtrottr to get posts delivered by
    email (here’s a tutorial on
    IFTTT
    )
  • There’s also the #site-updates channel on my
    Discord
    (which is also a fun place
    to hang out anyway)

(from this entry).

html2text might be better, but that loses the ability to support other output formats. There might also be some better Pandoc configurations that could be used.

@fluffy-critter
Copy link
Contributor Author

html2text's output isn't great either:

[fluffy](http://beesbuzz.biz/):
[Reblob!](http://beesbuzz.biz/blog/5385-Reblob):

> [Reblob!](http://publ.beesbuzz.biz/blog/179-Reblob):

>

>> It’s been a while since I’ve worked on IndieWeb stuff, but I finally got
around to releasing an _extremely preliminary_ version of
[reblob](http://publ.beesbuzz.biz/tools/1423-reblob), a little commandline
thingus to make this stuff easier. Eventually I’ll also have a server-based
version here, at least as an example.

>

> Of course this is the first entry I’ve written actually _using_ it. Lots of
rough edges but whatever!

which renders as:

fluffy:
Reblob!:

Reblob!:

It’s been a while since I’ve worked on IndieWeb stuff, but I finally got
around to releasing an extremely preliminary version of
reblob, a little commandline
thingus to make this stuff easier. Eventually I’ll also have a server-based
version here, at least as an example.

Of course this is the first entry I’ve written actually using it. Lots of
rough edges but whatever!

@tarleb
Copy link

tarleb commented May 2, 2019

Found this through your tweet. There might be a way to use one of pandoc's many customization options to fix this. E.g., you could try to remove soft line-breaks by using a pandoc filter:

function SoftBreak ()
  return pandoc.Space() -- replace soft linebreak with a space
end

Use by calling pandoc with pandoc --lua-filter=path/to/that/filter-file.lua …. Or check if the --wrap=none option does what you want. Does this help?

@fluffy-critter
Copy link
Contributor Author

@tarleb Not particularly, the way that pandoc works through Pypandoc makes that incredibly unwieldy. But there's also no reason for that in a Pandoc filter, see the branch https://github.com/PlaidWeb/reblob/tree/feature/5-trim-end-whitespace for a simple fix on the Python side.

But even with that there's a lot of stuff pypandoc does poorly that can't be easily addressed by setting markdown plugins either. The Mastodon version of the thread goes into more about that.

@fluffy-critter
Copy link
Contributor Author

There's also a bunch of other reasons I want to get off pandoc, like the Python bindings to it make a lot of assumptions about environment that won't work for one of my intended future use cases, and it's just, like, not very well-controlled in general.

I can also think of a fairly straightforward way to convert HTML to Markdown in a way that will also allow me to put in Publ-markdown extensions. I was hoping reblob would be able to also support things like ReStructuredText for folks who use that on their blog engine though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants