Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dialogue highlighting works incorrectly with chinese grammer #2131

Closed
ZEROIC opened this issue Dec 5, 2024 · 8 comments · Fixed by #2138
Closed

Dialogue highlighting works incorrectly with chinese grammer #2131

ZEROIC opened this issue Dec 5, 2024 · 8 comments · Fixed by #2138
Labels
editor Component: Editor enhancement Request: New feature or improvement

Comments

@ZEROIC
Copy link

ZEROIC commented Dec 5, 2024

I write novels in chinese and this tool is suitable in my own working flow.However I find dialogue highlighting works incorrectly with chinese grammer.
The correct grammer of chinese dialogue is like these:
QQ20241206-040452
Three typical dialogue sences above,only the first one is highlighted properly.Difference from english maybe the lack of a separator.Don't worry,almost all related tools or plugins I met go wrong with this,aha.

My personal resolution is to remove the second \B in dialogStyle regex expression,then it will works as excepted.
There are other things I've done for this locally,such as add a switch like chinese highlighting mode or something else to control the feature.
I haven't known how this would affect in other unfamilar languages yet.You can decide how to implement it on you own maybe.

@ZEROIC ZEROIC added the enhancement Request: New feature or improvement label Dec 5, 2024
@vkbo
Copy link
Owner

vkbo commented Dec 5, 2024

The dialogue highlighting has been rewritten for 2.6, so perhaps you can test with 2.6 Beta 1?

There are more settings now, so maybe one of the others work. If there is no word boundary, the settings for quote symbols will not work, as you point out, as it does rely on that word boundary to know where to end.

I can certainly add a Chinese mode switch. I just need to know exactly what the rules are.

@vkbo
Copy link
Owner

vkbo commented Dec 5, 2024

Also, if you could provide some example text that I can use for testing, that would make it a lot easier. A screenshot, while helpful for illustration, isn't something I can reproduce in the editor.

@ZEROIC
Copy link
Author

ZEROIC commented Dec 6, 2024

Yeah,I pull codes of the software,just running 2.6b1 right now.I've seen changes about this function.
There should be no separator between the first dialogue and narration in chinese.So I can't customize the separator to make it works.
QQ20241206-131154

I forgot to provide expamples in a hurry.You can test this with such examples:
他说:『我能吞下玻璃而不伤身体。』
『我能吞下玻璃而不伤身体』他说。
『我能吞下玻璃而不伤身体』他说,『我能吞下玻璃而不伤身体』

@vkbo
Copy link
Owner

vkbo commented Dec 6, 2024

I think this can be achieved with a setting that doesn't require the word boundary then.

Technically, the word boundary is anyway only required if the open and close symbols are identical, or the closing symbol doubles as an apostrophe. I tried a quick change to drop the \\B if the open and close symbols are different and they are not ambiguous, and it seems to work, but I need to do some more testing to make sure.

    @property
    def dialogStyle(self) -> re.Pattern | None:
        """Dialogue detection rule based on user settings."""
        ambiguous = (nwUnicode.U_APOS, nwUnicode.U_RSQUO)
        if CONFIG.dialogStyle > 0:
            end = "|$" if CONFIG.allowOpenDial else ""
            rx = []
            if CONFIG.dialogStyle in (1, 3):
                qO = CONFIG.fmtSQuoteOpen.strip()[:1]
                qC = CONFIG.fmtSQuoteClose.strip()[:1]
                qB = r"\B" if (qO == qC or qC in ambiguous) else ""
                rx.append(f"(?:{qO}.*?(?:{qC}{qB}{end}))")
                # rx.append(f"(?:\\B{qO}.*?(?:{qC}\\B{end}))")
            if CONFIG.dialogStyle in (2, 3):
                qO = CONFIG.fmtDQuoteOpen.strip()[:1]
                qC = CONFIG.fmtDQuoteClose.strip()[:1]
                qB = r"\B" if (qO == qC or qC in ambiguous) else ""
                rx.append(f"(?:{qO}.*?(?:{qC}{qB}{end}))")
                # rx.append(f"(?:\\B{qO}.*?(?:{qC}\\B{end}))")
            return re.compile("|".join(rx), re.UNICODE)
        return None

@ZEROIC
Copy link
Author

ZEROIC commented Dec 6, 2024

Good idea,chinese quotes are not same.I fixed this with similar way but more complicate.Because I don't know if someone likes writing english novels with chinese quotes.(Just like ordering fried rice in a bar,oops!)
Never mind,now I realize it's out of grammer!

@vkbo
Copy link
Owner

vkbo commented Dec 7, 2024

Basically, the boundary condition is only there to catch cases like these:

image

image

As you can see, without it, it doesn't know where the quote ends.

But I think I can turn off this in most cases, so I'll give that a try first.

@vkbo vkbo added the editor Component: Editor label Dec 7, 2024
@vkbo vkbo added this to the Release 2.6 Beta 2 milestone Dec 7, 2024
@vkbo
Copy link
Owner

vkbo commented Dec 7, 2024

Oh, and the new implementations seems correct for your example text too:

image

@ZEROIC
Copy link
Author

ZEROIC commented Dec 7, 2024

Yep,it looks right!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editor Component: Editor enhancement Request: New feature or improvement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants