Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inlinePatterns should have access to their parent/ancestors. #596

Closed
waylan opened this issue Nov 15, 2017 · 21 comments
Closed

inlinePatterns should have access to their parent/ancestors. #596

waylan opened this issue Nov 15, 2017 · 21 comments

Comments

@waylan
Copy link
Member

waylan commented Nov 15, 2017

It is reasonable to expect that some inlinePatterns should not apply if an ancestor is of a specific set of elements. For example. an @mention should not be converted to a "mention link" if the text is in a link label (like this: [@waylan](https://example.com/waylan)). Also consider that ancestors are potentially relevant ([**@waylan**](https://example.com/waylan)). In fact, see Python-Markdown/github-links#5 for that very example.

Perhaps markdown.inlinepatterns.Pattern.handleMatch should have an ancestors keyword passed to it which would include a list of ancestor elements. I doubt we need to actually pass the elements themselves. Presumably, strings of their tag names would be sufficient. Do we only pass the inline elements, or include the block-level elements all the way up to the root? Should we pass a set (with no repetitions) or a list (with all elements in order)?

I'm thinking perhaps it could be used something like this:

class MentionPattern(Pattern):

    def handleMatch(self, m, ancestors):
        if 'a' in ancestors:
            # A false match. Return None to indicate no change.
            return None
        # build and return element here...

As a reminder, it would not be a good idea to pass the parent element as the element could include text before and after the match that would need to wrap the created element. While the user (extension dev) could access and rebuilt the content correctly, that just opens up more ways for them to break things. Therefore, I'm inclined to limit this to a list of tags (as string names).

@waylan
Copy link
Member Author

waylan commented Nov 15, 2017

Another option would be to hardcode the exception into Markdown itself. For example, a Pattern class could define a list of ancestors to "exclude" itself from:

class MentionPattern(Pattern):
    exclude = ['a']
    # Implement Pattern here...

Then the pattern would never even be run if an ancestor element was in the exclude list.

@facelessuser
Copy link
Collaborator

I agree that having access to ancestors would be very cool. I've run into a similar issue targeting raw, bare links (GitHub style auto-linking). In your case, it seems escaping the mention would solve your problem, but it would be more intuitive if you didn't have to.

@facelessuser
Copy link
Collaborator

Are we just passing tag names? Or do we get attributes of tags as well?

@waylan
Copy link
Member Author

waylan commented Nov 15, 2017

I'm thinking just tag names. I'm not keen on passing the element instances for the reasons mentioned and I don't really see any sense in passing attributes without the instances of the objects. I actually like my second suggestion better (exclude), but I also understand that that is less flexible for extension devs.

Another reason to not pass the element instances is that if 'a' in ancestors doesn't work. Instead you need to do if 'a' in [t.name for t in ancestors]. Obviously still doable, but less than ideal. Of course, if we were using some custom element object (see #420), we could include methods/properties to make this easier. But that is both a bigger refactor and a backward-incompatible change.

@facelessuser
Copy link
Collaborator

Yeah, I wasn't necessarily implying to pass the element. I was more thinking just a list of tags, or a list of of tuples that contains the tag name and attributes, but even just the exclude name is fine.

To be honest, links is the only time I have ever had this issue and wished I had access to parents. But maybe that is simply because I'm so use to the current constraints I haven't thought about how we could leverage ancestry to improve existing syntax behavior.

@waylan
Copy link
Member Author

waylan commented Nov 15, 2017

To be honest, links is the only time I have ever had this issue and wished I had access to parents. But maybe that is simply because I'm so use to the current constraints I haven't thought about how we could leverage ancestry to improve existing syntax behavior.

Same here. That's why I like the "exclude" solution. It meets the needs I've actually come across. But a more flexible solution might open up possibilities I haven't even thought of.

@waylan
Copy link
Member Author

waylan commented Nov 15, 2017

Groan. I just looked at the inline pattern calling code for the first time in a long time. There are so many levels of recursion there (all for good reason) that the location were an element's tag is known and where the pattern is run are so far removed that it is non-trivial to implement either of my proposed solutions. The existing code doesn't even check for code tags as the BacktickPattern simply returns the text as an AtomicString to avoid further processing. IIRC, AtomicString was first introduced as an easier way to implement that behavior. I recall being resistant to it at first, but did see its value. However, it doesn't help us here as in this case we're not trying to stop all processing, only a select few patterns. As an aside, this all has to do with the underlying reasons why I had considered the refactor mentioned in this comment. But that refactor would be a backward-incompatible change which would break all existing inline patterns in all extensions everywhere. Sigh.

@facelessuser
Copy link
Collaborator

That's the inline pattern code I remember; full of recursion and confusion.

@facelessuser
Copy link
Collaborator

Looking over this code I think this may be possible. I'll play with it maybe tonight if I have time.

@facelessuser
Copy link
Collaborator

I've posted a simple backwards compatible experiment that uses ancestry to avoid processing inline code blocks. Obviously this is not a practical case and code blocks should use AtomicString as that is better suited for handling that case, but it is used to illustrate how this could work for others. This is just a prototype: https://github.com/Python-Markdown/markdown/tree/experiment-ancestory.

@facelessuser
Copy link
Collaborator

And yes I spelled ancestry wrong in the branch name...but it's too late for that 🙂 .

@waylan
Copy link
Member Author

waylan commented Nov 17, 2017

Cool! For easy reference, compare the changes here.

@facelessuser
Copy link
Collaborator

It was much easier than I thought. Theoretically you could later add inclusion in addition to the exclusion, but I can't think of a reason to right now.

If we want to go in this direction, I can work towards a final solution with unit tests and such.

@waylan
Copy link
Member Author

waylan commented Nov 17, 2017

Yep, I think this direction makes sense.

@facelessuser
Copy link
Collaborator

I may have something by tomorrow. As I go through tests, I see more places we need to account for ancestors in the inline treeprocessor, but it is coming together.

@facelessuser
Copy link
Collaborator

By the way, do we want to add python 3.5 and 3.6 to Travis?

@facelessuser
Copy link
Collaborator

I've got some other things to do tonight, so here is the WIP (compare). I'll pick this up tomorrow.

@facelessuser
Copy link
Collaborator

I think this working now. The only thing it can't handle is exclusion in a raw HTML link <a href="#">@mention</a>. I don't think raw inline HTML ever makes it into the tree as the HTML syntax is just capture as pieces of HTML instead of as a whole tag. Block HTML definitely doesn't, but I am less concerned about that. Not handling inline raw HTML isn't a deal breaker for me, but there might be ways to handle this (none that are trivial), just not sure if we care at this point.

@waylan
Copy link
Member Author

waylan commented Nov 17, 2017

You make a good point. Markdown is processed inside raw inline HTML. To accomplish that, only the tags themselves are stashed, not their contents. And we do nothing to keep track of the tags used in the raw HTML.

However, as this is raw HTML, I don't know that I care so much. Users already need to be careful with raw HTML as Markdown is not very intelligent about it and if your not careful, you can easily end up with invalid HTML. Given that an extra level of complexity already exists for document authors when working with raw HTML, I'm okay requiring then to also escape Markdown content inside raw HTML when necessary (for example: <a href="#">\@mention</a>).

@facelessuser
Copy link
Collaborator

I agree. Raw not working isn't a big deal for me. For a long time we've dealt with raw HTML not being in the tree as well.

In order to handle raw tags as actual tags there would have to be a big refactor. I'd prefer to take this as it is as right now.

@facelessuser
Copy link
Collaborator

I think I'm done. Feel free to review and make suggestions in #598.

@waylan waylan closed this as completed in de5c696 Nov 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants