Skip to content
This repository has been archived by the owner on Oct 18, 2024. It is now read-only.

Use joplin-turndown for HTML->MD conversion #38

Merged
merged 8 commits into from
Feb 20, 2021

Conversation

nathanlesage
Copy link
Contributor

@nathanlesage nathanlesage commented Nov 15, 2020

Hey there!

First and foremost: thank you for the initiative of this plugin. I've been pondering the idea to do so myself for quite some time, but never actually realised it. However, now I'm pretty happy for your work already with this plugin, which I'll be using quite often!

In your contributing Guidelines you mention to first open an issue to discuss possible changes but, alas, as always I didn't read the manual and already implemented the changes – sorry for that :/

I decided to create this PR either way and accomodate it with a corresponding issue, but I am absolutely not sad if you don't want to be put in front of already done work (I'm thinking of "vor vollendete Tatsachen stellen", but I can't quite get it right in English) – so see it as an issue that you can close at any time if this doesn't suit you!

This PR addresses two things, but there are also some open questions (see below), which is why I have created a draft PR first. I wanted to discuss these beforehand, and only convert it to a final PR once you agree!

Changes

This PR addresses two main issues:

  1. You noted in your comments that you need a sanitization for filenames. I have added such a function. However, there might even be a better way (see Open Questions below)
  2. I switched the raw HTML-conversion to Joplin Turndown, which is a package I've been using for years now in Zettlr, and it's working great.

The reason I switched to Turndown is because I also use Zotfile to extract my attachments, but have amended the preferences of Zotfile so that it actually spits out well-formed HTML code instead of the paragraphs with quotes. So when I first used the plugin, this gave me weird results. By using Turndown, we have both a solid engine for converting HTML to Markdown as well as something complete, so that users can actually create standalone notes and the plugin will correctly export anything that can be expressed in Markdown.

This initiative brings some important changes to how the plugin is built:

  1. For simplicity reasons I added a new Object, Zotero.MarkdownUtils, because I did not want to mess with the Mdnotes-object. But I would like to add the utility functions to the Mdnotes object itself, however this would require to have the mdnotes.js-file in the webpack toolchain (see Open Questions).
  2. Now, there is a ./src-directory in which the entry point of the Markdown utilities resides
  3. These are built by running webpack via yarn and compiled into a target file in the ./content-directory
  4. I have already adapted the GitHub Actions workflow so that this should run perfectly out of the box and compile the files correctly. (The Makefile contains an additional yarn build-command)

Caveats

I saw that your original implementation converted paragraphs that begin with a quote into blockquotes. This pretty much looks like a conversion of Zotfile output. However, this has some assumptions, and I'm personally wary of making them with regard to notes. I would rather not add this. But I can totally see the reason for adding this in the first place, as it takes away the responsibility of the users to mess with the hidden config options, which I can totally understand.

If you would like to retain this functionality, it is easier to apply this to the rendered Markdown (because we could simply run a search for lines beginning with " and prepend > in front of them). Nevertheless, I would then make this an option (default set to yes) whether or not the user wants this additional conversion applied.

Open Questions

I have some open questions, and as I would like to contribute to this plugin, I would like to discuss them with you, as this is first and foremost your plugin, so I'll go into your direction if you don't like certain approaches!

  1. I have realized that having webpack produce a single-file bundle is actually a pretty decent approach, because this enables us to enhance the plugin by quite a margin:
    • We could split all the functions up, making the code much more maintainable
    • Additionally, we could only export the MDNotes-object in the entry file, and have the logic reside in other files/folders.
    • We could use almost any NPM-package available, which'll help to minimise efforts and increase maintainability.
  2. With regard to sanitization I wrote a small function that simply replaces disallowed characters, but there's also a good NPM-package for this which I'm using myself, and it does a very good job at creating sane filenames which work across all platforms. But for this to work, we would need to move the mdnotes.js into the webpack toolchain. However, this shouldn't prove too difficult and I would volunteer to do the migratory work!
  3. If you're happy so far with my changes, one last thing would be the question of yarn or NPM. I'm personally a huge fan of yarn as it's much more simple and offers certain functions which make development more easier. But that's just my opinion and I know a lot of folks who prefer NPM. As this is your project, I can switch to NPM if you want this instead of yarn.

And who will maintain all of this?

The big question I'm facing with my projects as well: Who is going to maintain it? I would volunteer to do all the initial work in such a way that the code documentation is decent and everyone has an easy chance of getting into it, especially given the fact that the Zotero plugin documentation is pretty awful (which is one of the reasons I did not so far attempt a plugin, and am now bandwagoning on this one). I would also vow to be responsible to maintain those parts I wrote where you don't understand or don't have the time to read the source.


EDIT:

P.S.: Tested on macOS 10.15.7 and works like a charm :) (including the changes from #35 which you mentioned in the other issue)

P.P.S.: Greetings to the Hochschule Bonn/Rhein-Sieg and my hometown! ;)

@argenos
Copy link
Owner

argenos commented Nov 15, 2020

Hi @nathanlesage! First of all, let me say thanks for taking the time to contribute to this plugin, I really appreciate your help with this!

This pretty much looks like a conversion of Zotfile output. However, this has some assumptions, and I'm personally wary of making them with regard to notes. I would rather not add this.

I agree with this! I actually have been meaning to improve this but haven't had much time to refactor it. I really appreciate this and love the idea of making this more maintainable. As a "fix", in v0.1.0 I added a hidden option to let users choose/modify how to translate those elements, the main reason (also based on the zotero-roam-plugin) was that you could arbitrarily use certain types of highlights to add some structure for the extracted notes (e.g. underlining something would turn into H4 headings). I'm not familiar with joplin-turndown, but from your description, it sounds like this sort of customization would still be possible?

As for your open questions:

TL;DR: I'd be happy to accept all the help I can get here.

This is the first anything I do with Javascript, and it's been quite the learning curve. I've been trying to find ways to split the code into multiple files in a way Zotero could handle it, but haven't had much success with what I had tried so far (I was hoping to include this in v0.1.0 originally). The next thing to try on my todo list was migrating to typescript and using the generator written by retorquere which also uses webpack from what I could gather. Given that I'm still learning all these new tools, I was still doing research and testing out before I could book some time to work on it, so your PR couldn't have come at a better moment.

  1. +1000 to make this more maintainable and easier to contribute!
  2. I think this should be fine, I'm not aware of any limitations on Zotero related to this, but I'm definitely no expert.
  3. I'm relatively new to javascript/typescript, so I don't really have a preference yet. I did a little reading, and both seem to be almost equally good, with small differences here and there, so I'm ok with either.
  4. As for maintaining, since this is a bit of a side project that started more as a script for myself, I'm also starting to think about it. I would happily accept some help now and again for the parts you wrote, mostly for parts that I don't understand/can't fix (especially because I don't want to drive your attention away from the awesome work you're doing at Zettlr).

Once again, thank you! And greetings back to wherever you are in Germany!

@nathanlesage
Copy link
Contributor Author

Hey @argenos, thanks for the elaborate reply!

I'm not familiar with joplin-turndown, but from your description, it sounds like this sort of customization would still be possible?

Yes, there are three kinds of customizations we could easily add:

  1. Use options to control what markup is being generated (for instance, I forgot to implement the unordered list bullet option, which is implemented by the plugin and can be easily controlled with the bulletListMarker-option)
  2. New filters for Turndown itself (e.g. control what HTML tags are rendered how)
  3. Post-processing on the Markdown source (which, due to it not having tags, is much simpler and is mostly a matter of replace()ing using RegExps)

This is the first anything I do with Javascript, and it's been quite the learning curve.

I know exactly what you are talking about; I've had that feeling for the past three years, and it has only abated a few weeks ago, which is why I'm much more confident proposing that large of changes at all!

As to the answers to my questions: Splendid! And with regard to Typescript: I was pondering the thought myself, but didn't want to add too many things right now! But now that I know that in principle you'd be willing to migrate to typescript yourself, this is even better.

So here's what I would suggest:

  1. I'll be sitting down over the course of the next week migrating all the source into a webpack/Typescript chain, commenting everything and trying to use a good way of splitting up everything into reasonable portions which make sense (e.g. creating a util-directory for all utility functions) — I'm extremely fast with this kind of refactoring, because migrating JavaScript to TypeScript is what I have been doing the past two months non-stop with the Zettlr source
  2. Whenever I'm unsure with regard to how to proceed, I'll ask and we can decide jointly
  3. I will always pay maximum attention to maintainability so that it's not some black magic, especially with regard to some quirks of JavaScript which I have come to know in great detail but are obviously opaque to people who've not done so much in that regard.

And greetings back to wherever you are in Germany!

I'm in Sweden right now, but I'll be home for Christmas!

Thanks again for your work, and I'm looking forward to making this plugin awesome :)

@argenos
Copy link
Owner

argenos commented Nov 15, 2020

Sounds like a plan! I'll set things up to test your branch locally too! Thank you!

@nathanlesage
Copy link
Contributor Author

Perfect 👍

Quick update: I just fiddled around with it and TypeScript works like a charm. But I'll do it properly in time :)

@argenos
Copy link
Owner

argenos commented Nov 16, 2020

Hi @nathanlesage! I had to push a few small bug fixes today, but I'm doing a code freeze now. Let me know if I can help with anything!

@nathanlesage
Copy link
Contributor Author

Easy, I'll just pull the current state as I'm not touching the mdnotes.js file either way, so you can continue amending this file, and I'll deal with any merge conflicts :) I'll be updating this PR in a few days I think when I have a first draft of the changes – this way you can follow the developments if you want!

@argenos
Copy link
Owner

argenos commented Feb 18, 2021

Hey @nathanlesage! Just a small update: I already had this working locally some weeks ago, but hadn't had time to play around with this further. I've just pushed your changes to feature/joplin-turndown (and solved the merge conflicts with the past few releases) and added a few rules for the missing Zotero note exports. What do you think about merging what we have so far? Let me know in case you want to take a look!

@argenos argenos added this to the 0.2.0 milestone Feb 18, 2021
@nathanlesage
Copy link
Contributor Author

Hey, sounds good! I would also like to apologise for not being able to further this even though I promised that, but in the End of January a course at my university started and it's taking almost all of my time :( Do I need to do something for this …? I'm sorry I don't have an overview right now …

@argenos
Copy link
Owner

argenos commented Feb 20, 2021

Hey, no worries at all! I also hadn't had much time to further work on this, and you already did a lot with the setup and your branch set me off on the right track! No need to do anything for now I think, I can just merge the branch and that would be it! We can tackle typescript on a separate PR on a semester break later on.

Good luck with your course!

@argenos argenos marked this pull request as ready for review February 20, 2021 19:33
@argenos argenos changed the base branch from master to develop February 20, 2021 19:34
@argenos argenos merged commit c7dd540 into argenos:develop Feb 20, 2021
@nathanlesage
Copy link
Contributor Author

Perfect! :) Thanks a lot! I hope I can get in here again and help out soon!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants