-
-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Conversion of HTML manual pages to markdown fails for HTML figure code #4864
Comments
Or, if we want to keep things moving, add an exclusion for now. Is there a pattern that could be used or it would be impossible? It's ok to not have them perfect on the first try. |
Test submission of conversion of all HTML manual pages to markdown using the `pandoc` based converter script (see OSGeo#4620). For figure code conversion issues, see OSGeo#4864
For easier inspection, converted MD files submitted in #4865. |
Maybe this python library by Microsoft could be worth a try: https://github.com/microsoft/markitdown ? |
I didn't know about this one :) |
I just tried the markitdown tool on v.fill.holes.html And the result looks quite OK. Images are bigger compared to the pandoc conversion. However, pymarkdownlnt and markdownlint-cli for example complain about line length and missing blank lines (amongst others)... Also code blocks are not automatically defined as shell... So, there some post-processing would be needed too... |
I tried it as well, but no success with e.g. this file:
What's the trick, @ninsbl ? |
@ninsbl would you mind to share the command you have used? |
Describe the bug
I am working on the mass conversion of all HTML manual pages to markdown. To convert all HTML files to markdown I have written a
pandoc
based converter script (see #4620) which already does most of the job.A showstopper in the conversion of HTML manual pages to markdown are the figures as the related HTML snippets vary from manual page to manual page, nonetheless there is a style recommendation.
For an easier discussion, I have moved the figure issue here to separate it out from #4748.
Many figures looks ugly after MD conversion (resulting MD code is paertially garbage):
grass/vector/v.fill.holes/v.fill.holes.html
Line 13 in fc94e29
mkdocs/site/raster3dintro.html
I have written a LUA filter for
pandoc
(yet unsubmitted) but it can only convert that specific HTML code. With so many HTML variants I have no idea how to do that.To reproduce
utils/grass_html2md.sh
converter script (see docs: script to convert HTML manual pages to markdown #4620)markdownlint
on the MD filesI tried to submit the converted MD files for community review but I get stuck in the
pre-commit
stage:From my terminal:
Expected behavior
I wonder if we have to touch the ~170 HTML files manually to streamline the HTML figure code therein in order to eventually develop a single
pandoc
LUA filer.Support welcome!
The text was updated successfully, but these errors were encountered: