-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation should cover access to raw AST #215
Comments
I see that the AST in this case is an XML string. Whatever; it would still be of immense use to the outside world. Let's fix that so that the XML AST can be accessed as easily as the final HTML result. |
Yes, fix that please! |
Yes, you're right, this should be documented. In fact there's a few ways it could be done. You could create a subclass on the Markdown class and reimplement the set_output_format method. Or an even easier approach would be to create an extension which monkeypatches the class and assigns a new serializer (the Extension API gives you access to the entire class instance so you can do stuff like this). Actually the AST is not an xml string, but an ElementTree Object. The serializers (Markdown ships with two) convert that to a string. The issue is that postprocessors do additional processing after the serializer runs. Of course, those postprocessors assume HTML. And until a better way of parsing raw html is implemented, the postprocessors are necessary - but then again, raw html would need to be handled differently if the output format wasn't html. Guess that is why I suggest using an extension to monkeypatch the class - the same extension could replace the postprocessors to match the new output format. And given the above complications, that is why I haven't really documented it yet. It's not just a matter of passing in a new serializer. That said, I've just updated the docs to at least list the relevant class attributes so anyone interested at least knows what part of the code to start looking at. Of course, that's not enough, I need to document everything in that list. I'll leave this issue open until that's done. Of course, patches (pull requests) are welcome. |
@MostAwesomeDude, I just reread your request. I seem to have missed the part about other templating systems accessing the AST directly. While I suppose a serializer could do this, the way things are implemented, it wouldn't be ideal. In fact, because the postprocessors (which work on the serialized string) are so tightly coupled with the rest of the parser, it's not very practical - or at least you potentially lose access to some significant features of the Markdown's syntax. Actually, my overhaul of the serializer code involved quite a bit of decoupling of the various parts of the parser already. Prior to that, there was absolutely no way to access anything but the final string. To address your issue directly, in my experience, most HTML templating systems offer a way to mark a string as "safe" so that it will be allowed to be inserted in the document unescaped (see django for an example). If your system doesn't allow this, I'd suggest that that is an issue with your templating system. Of course, safety is important, and if you are accepting markdown text from untrusted sources you should be scrubbing it anyway. I recommend a tool like Bleach for such a case. Pass the output of markdown into bleach and pass the output of bleach as a "safe" string to your template. |
Well, I would not say that none of Chloe (http://docs.factorcode.org/content/article-html.templates.chloe.html), Hamlet (http://hackage.haskell.org/package/hamlet), nor twisted.web.template (http://twistedmatrix.com/documents/current/web/howto/twisted-templates.html) are broken simply because they don't allow arbitrary safe strings. Thanks for your time. I had apparently not dug deeply enough into the API documentation to realize what is offered in terms of extensibility. I'd say that the documentation I wanted to see has already been written! |
Just for the record, because I needed this and found a fast way: What I tried first:
But then issues arise because the end of I subclassed |
Hi,
Some templating systems do not allow interpolation of textual HTML tags, for security reasons. It should be trivial for those systems to still use Python-Markdown, by taking the raw AST and turning it into HTML on their own. This would be a useful stepping stone for supporting non-HTML formats as well.
The text was updated successfully, but these errors were encountered: