-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Add helper to factor out newline + padding pattern #98
Conversation
3e18243
to
4b6b9a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this PR with one of the most considerate and mindful descriptions I have seen in a while!
Hey 👋, big fan and first-time contributor to this project🙂
🙏
I understand that round-tripping of Markdown is likely quite difficult in general, but I'm hoping that, over time, a few adjustments and additional configuration knobs on
pulldown-cmark-to-cmark
can make it tractable for certain use cases.
You have probably seen the markdown spec tests already which also use roundtripping as indicator of success (I think). Getting to 100% there would probably be meaningful to you.
Besides that, I noticed that pulldown-cmark
tends to degenerate information, so true round-tripping would only be possible if the whole input buffer is used along with the version of the function here that also provides indices into the input buffer (so skipped characters can be taken into consideration).
I'm also hoping to eventually refactor my
clap-markdown
crate to usepulldown-cmark-to-cmark
instead of bespoke (error prone), manual generation of a Markdown string.
It will be interesting to see if the typed
version of this will be better, in the sense that it prevents certain bugs that would have been invisible before.
The reason I am cautious here is that Events are assumed to be valid because pulldown-cmark
generated them, which is not true anymore if they are generated 'by hand'. Probably I am missing something though 😁.
src/text_modifications.rs
Outdated
/// | ||
/// Concretely, a call to [`write_padded_newline()`] after the first line in the | ||
/// paragraph of the list item would write `">···"`. | ||
pub fn write_padded_newline<F>(formatter: &mut F, state: &State<'_>) -> Result<(), fmt::Error> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the nice docs, I'd lean towards making this private, unless you want to be its first user.
In any case, I think it's easier to read when going with formatter: &mut impl fmt::Write
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good; I erred on the side of matching the pub
of the other functions in this file, but I'll change this to pub(crate)
since this is merely a helper function :)
I've updated to use impl
trait for the argument 👍
If you'd like, I can make a small follow-up PR that changes the rest of the functions in text_modifications.rs
to use pub(crate)
as well, since the text_modifications
module itself is private, and these functions are never re-exported publicly (that I can tell). I also found it a bit unexpected that they were all pub
but not actually externally public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see: It wouldn't have been exported publicly despite the pub
. Something that isn't obvious in my armchair review, and that is nicely fixed with pub(crate)
- thanks!
When writing a newline into the generated Markdown content to start a new line of output, it is (almost) always necessary to output the "padding" characters used to indent the content at the current location in the document, based on the hierarchy of block-level elements the output cursor is "inside" of. Since writing a newline and writing the current padding are always paired, factoring them out into a function should help with readability and consistency in performing this minor two-step dance correctly.
4b6b9a3
to
2252ba1
Compare
Thank you for the speedy and engaged maintenance of this project, and keeping it working nicely all these years! 🙂
I'd noticed them, but not run them myself yet. For discussions sake, it might help to clarify that I'm interested in "byte-for-byte" round-tripping, whereas I understand many tests to only require "semantic" round tripping. To be concrete, these two Markdown programs are "semantically" the same (parse to the same > Foo
> Bar >
> Foo
> Bar You could correct for this perhaps by keeping track of offsets, but I'm aiming at a simpler solution: make
This matches my experience as well. As mentioned above, for my use case, I fortunately don't quite need to go that far :)
Ah I was a bit unclear: I won't be directly calling pulldown-cmark-to-cmark APIs and constructing the let block = Block::Paragraph(Inlines(vec![
Inline::Text("Hello! This is a paragraph ".to_owned()),
Inline::Strong(Inlines(vec![
Inline::Text("with bold text".to_owned()),
])),
Inline::Text(".".to_owned())
]));
println!("Markdown:\n\n {}", ast_to_markdown(&[block])); Internally, So the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could correct for this perhaps by keeping track of offsets, but I'm aiming at a simpler solution: make
pulldown-cmark-to-cmark
a little bit more configurable around "quirks" like the above, so that users can choose to always get Markdown back out in their preferred format (and matching any Markdown linting tool they / their company might be using).
I see, that makes sense and be much easier to implement.
Ah I was a bit unclear: I won't be directly calling pulldown-cmark-to-cmark APIs and constructing the
Event
s manually; I'll refactorclap-markdown
to usemarkdown-ast
, which will look something like:
That's more like it - nice!
So the
Event
stream thatpulldown-cmark-to-cmark
will be "correct by construction" becausemarkdown-ast
enforces that
Thanks for sharing, it's nice to see how this crate seems to fit in nicely at the end of that conversion pipeline!
If you'd like, I can make a small follow-up PR that changes the rest of the functions [..]
Thanks for offering. If you happen to keep contributing something else you could U-Boot that, but no need for a PR just for that.
Hey 👋, big fan and first-time contributor to this project🙂
I'm building a Markdown editor that incorporates
pulldown-cmark-to-cmark
, and thought it might be time that I try to contribute some effort back to this project (and perhaps improve on a few issues I've run into). I've found that a great way to learn a new codebase is to start with making some small improvements that don't try to change behavior, so that's what I'm starting with here 🙂Please let me know if you'd like anything changed with this patch, or if you'd prefer I start with some other kind of change.
If you happen to be interested in how I'm putting this project to use, click here to read a bit about that.
I use
pulldown-cmark-to-cmark
as part ofmarkdown-ast
, which supports convertingMarkdown → Events → an AST
(and back again). I've developing that project as part of a quest to build a "WYSISYG"-style editor for Markdown within Wolfram notebooks.Overall
pulldown-cmark-to-cmark
works great for that use case, but since my flow isFoo.md → <edit file> → Foo.md
, I care especially that the Markdown "round-trips" as well as possible.I understand that round-tripping of Markdown is likely quite difficult in general, but I'm hoping that, over time, a few adjustments and additional configuration knobs on
pulldown-cmark-to-cmark
can make it tractable for certain use cases.For example, if the
Foo.md
being round-tripped is restricted by Markdown linting that runs in CI (enforcing things like consistent use of '*' vs '-' for bulletted lists, exact numbers of blank lines between paragraphs, etc.), then it is much easier to imaginepulldown-cmark-to-cmark
generating output that conforms to particular, fixed rules.I'm also hoping to eventually refactor my
clap-markdown
crate to usepulldown-cmark-to-cmark
instead of bespoke (error prone), manual generation of a Markdown string.After writing a newline into the generated Markdown output to start a new line of output, it is (almost) always necessary to output the "padding" characters used to indent the content at the current location in the document.
Since writing a newline and writing the padding are always paired, factoring them out into a function should help with readability and consistency in performing this minor two-step dance correctly.