Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Remove Space Prior to English Punctuation and Full Width Characters #257

Closed
user30535 opened this issue Jul 4, 2022 · 15 comments · Fixed by #717
Closed

FR: Remove Space Prior to English Punctuation and Full Width Characters #257

user30535 opened this issue Jul 4, 2022 · 15 comments · Fixed by #717
Assignees
Labels
markdown General Markdown or Markdown related issue or feature resolution/update-made A change has been made that should resolve this issue or request rule suggestion Suggestion to add or edit a rule

Comments

@user30535
Copy link

user30535 commented Jul 4, 2022

在践行中文文案排版指北的时候容易多打空格,请问能否用Linter删除以下示例中中文标点前后的空格呢?

Before:

english , english。

Expected:

english,english。

谢谢

Edit:

Here is the English translation according to Google Translate.

When practicing Chinese copywriting, it is easy to make more spaces. Can I use Linter to delete the spaces before and after the Chinese punctuation in the following example?

Before:

english , english.

Expected:

english, english.

thanks

@pjkaufman
Copy link
Collaborator

Currently there is no rule that removes spaces before or after punctuation. There is rule to remove multiple spaces and convert them into 1 space as well as a rule for adding a space between Chinese and English characters: remove-multiple-spaces and space-between-chinese-and-english-or-numbers.

Would you like to request the addition of a rule to remove spaces around English punctuation (I am not familiar with punctuation in other languages, so contributions or an explanation of how punctuation works in other languages would be very much appreciated)?

Chinese translation according to Google Translate:

目前没有删除标点前后空格的规则。 有删除多个空格并转换为1个空格的规则以及在中英文字符之间添加空格的规则:[remove-multiple-spaces](https://github.com/platers/obsidian-linter/ blob/master/docs/rules.md#remove-multiple-spaces) 和 [space-between-chinese-and-english-or-numbers](https://github.com/platers/obsidian-linter/blob/master /docs/rules.md#space-between-chinese-and-english-or-numbers)。

您是否想请求添加一条规则以删除英语标点符号周围的空格(我不熟悉其他语言的标点符号,因此非常感谢您提供或解释标点符号在其他语言中的工作原理)?
对不起,如果中文不好,因为我使用谷歌翻译来回答和翻译你的问题。

@pjkaufman pjkaufman added the question Further information is requested label Jul 4, 2022
@user30535
Copy link
Author

user30535 commented Jul 4, 2022

Hi Peter, so sorry that I thought you could speak Chinese.

Yes I would like to request the addition of a rule to remove spaces around (before AND after) Chinese punctuation. Chinese punctuation marks (technically called fullwidth forms) are bigger than English punctuation (technically called halfwidth forms), so for example 。,()“”:; are the Chinese version (fullwidth forms) of .,()"":;.

It is best practice to add a space between Chinese and English or number characters (for example, space-between-chinese-and-english-or-numbers correctly turns 中文English中文 into 中文 English 中文), but Chinese punctuation should not be surrounded by any spaces (for example, space-between-chinese-and-english-or-numbers correctly does not turn English,English (correct) into English , English (incorrect)). space-between-chinese-and-english-or-numbers correctly does not add spaces around Chinese punctuation, but it would be nice to add a feature to remove spaces around Chinese punctuation, to turn English , English (incorrect) into English,English (correct).

Could you please help to add a rule to remove spaces around (before AND after) Chinese punctuation? You can refer to space-between-chinese-and-english-or-numbers to see how "Chinese punctuation" translates to coding language.

Btw, it would also be nice to add a rule to remove the space before an English punctuation mark, so that Linter can turn English , English into English, English.

Thank you so much!

@pjkaufman
Copy link
Collaborator

No problem. Thank you for letting us know that there is a difference as I was unaware. It seems that adding the full width forms into regex to remove whitespace around it should be simple. As for doing that for English or halfwidth forms, I would have to think a little more on it since it is not as simple as just removing whitespace as it depends on the punctuation mark in question.

@pjkaufman pjkaufman added rule suggestion Suggestion to add or edit a rule markdown General Markdown or Markdown related issue or feature and removed question Further information is requested labels Jul 4, 2022
@pjkaufman
Copy link
Collaborator

@user30535 , I have created a PR for the issue with fullwidth punctuation: #260. It covers those listed in your above comment. Does the example here look right to you?

@user30535
Copy link
Author

It just brought to my attention that and (albeit commonly used in Chinese) are actually halfwidth punctuation, and there is no fullwidth version of them. Please remove them from the rule. Sorry for the mistake!

In addition, here is a complete list of fullwidth punctuation/characters:

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz,.:;!?"'`^~ ̄_&@#%+-*=<>()[]{}⦅⦆|¦/\¬$£¢₩¥。、「」『』〔〕【】—…–《》〈〉

\uff10\uff11\uff12\uff13\uff14\uff15\uff16\uff17\uff18\uff19\uff21\uff22\uff23\uff24\uff25\uff26\uff27\uff28\uff29\uff2a\uff2b\uff2c\uff2d\uff2e\uff2f\uff30\uff31\uff32\uff33\uff34\uff35\uff36\uff37\uff38\uff39\uff3a\uff41\uff42\uff43\uff44\uff45\uff46\uff47\uff48\uff49\uff4a\uff4b\uff4c\uff4d\uff4e\uff4f\uff50\uff51\uff52\uff53\uff54\uff55\uff56\uff57\uff58\uff59\uff5a\uff0c\uff0e\uff1a\uff1b\uff01\uff1f\uff02\uff07\uff40\uff3e\uff5e\uffe3\uff3f\uff06\uff20\uff03\uff05\uff0b\uff0d\uff0a\uff1d\uff1c\uff1e\uff08\uff09\uff3b\uff3d\uff5b\uff5d\uff5f\uff60\uff5c\uffe4\uff0f\uff3c\uffe2\uff04\uffe1\uffe0\uffe6\uffe5\u3002\u3001\u300c\u300d\u300e\u300f\u3014\u3015\u3010\u3011\u2014\u2026\u2013\u300a\u300b\u3008\u3009

Could you please add them all to the rule? Thank you so much!

@pjkaufman
Copy link
Collaborator

I can definitely see about adding them and removing those two quotes.

@user30535
Copy link
Author

Good to know that! Thanks a lot!

@pjkaufman
Copy link
Collaborator

I have gone ahead and merged the changes with the unicode characters mentioned above. If there are unexpected changes to spacing or any missing characters, feel free to let me know. The changes should be on master and slated for the next release.

@pjkaufman pjkaufman reopened this Jul 7, 2022
@pjkaufman
Copy link
Collaborator

Sorry, I closed this in relation to the Chines character PR, but realized there was still the mention of the English punctuation.

@pjkaufman pjkaufman changed the title 删除中文标点和英文字母之间的空格 FR: Remove Space Prior to English Punctuation and Full Width Charactera Jul 17, 2022
@pjkaufman pjkaufman changed the title FR: Remove Space Prior to English Punctuation and Full Width Charactera FR: Remove Space Prior to English Punctuation and Full Width Characters Jul 17, 2022
@pjkaufman pjkaufman moved this to Backlog in Obsidian Linter Sep 17, 2022
@mnaoumov
Copy link
Contributor

It just brought to my attention that and (albeit commonly used in Chinese) are actually halfwidth punctuation, and there is no fullwidth version of them. Please remove them from the rule. Sorry for the mistake!

In addition, here is a complete list of fullwidth punctuation/characters:

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz,.:;!?"'`^~ ̄_&@#%+-*=<>()[]{}⦅⦆|¦/\¬$£¢₩¥。、「」『』〔〕【】—…–《》〈〉

\uff10\uff11\uff12\uff13\uff14\uff15\uff16\uff17\uff18\uff19\uff21\uff22\uff23\uff24\uff25\uff26\uff27\uff28\uff29\uff2a\uff2b\uff2c\uff2d\uff2e\uff2f\uff30\uff31\uff32\uff33\uff34\uff35\uff36\uff37\uff38\uff39\uff3a\uff41\uff42\uff43\uff44\uff45\uff46\uff47\uff48\uff49\uff4a\uff4b\uff4c\uff4d\uff4e\uff4f\uff50\uff51\uff52\uff53\uff54\uff55\uff56\uff57\uff58\uff59\uff5a\uff0c\uff0e\uff1a\uff1b\uff01\uff1f\uff02\uff07\uff40\uff3e\uff5e\uffe3\uff3f\uff06\uff20\uff03\uff05\uff0b\uff0d\uff0a\uff1d\uff1c\uff1e\uff08\uff09\uff3b\uff3d\uff5b\uff5d\uff5f\uff60\uff5c\uffe4\uff0f\uff3c\uffe2\uff04\uffe1\uffe0\uffe6\uffe5\u3002\u3001\u300c\u300d\u300e\u300f\u3014\u3015\u3010\u3011\u2014\u2026\u2013\u300a\u300b\u3008\u3009

Could you please add them all to the rule? Thank you so much!

@user30535 can you please explain where did you get your list from?

According to the links below, you included in your list some characters that not considered to be Fullwidth according to the Unicode spec, such as \u2013 (–)

https://en.wikipedia.org/wiki/Halfwidth_and_Fullwidth_Forms_(Unicode_block)

http://www.alanwood.net/unicode/halfwidth_and_fullwidth_forms.html

@user30535
Copy link
Author

It just brought to my attention that and (albeit commonly used in Chinese) are actually halfwidth punctuation, and there is no fullwidth version of them. Please remove them from the rule. Sorry for the mistake!
In addition, here is a complete list of fullwidth punctuation/characters:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz,.:;!?"'`^~ ̄_&@#%+-*=<>()[]{}⦅⦆|¦/\¬$£¢₩¥。、「」『』〔〕【】—…–《》〈〉
\uff10\uff11\uff12\uff13\uff14\uff15\uff16\uff17\uff18\uff19\uff21\uff22\uff23\uff24\uff25\uff26\uff27\uff28\uff29\uff2a\uff2b\uff2c\uff2d\uff2e\uff2f\uff30\uff31\uff32\uff33\uff34\uff35\uff36\uff37\uff38\uff39\uff3a\uff41\uff42\uff43\uff44\uff45\uff46\uff47\uff48\uff49\uff4a\uff4b\uff4c\uff4d\uff4e\uff4f\uff50\uff51\uff52\uff53\uff54\uff55\uff56\uff57\uff58\uff59\uff5a\uff0c\uff0e\uff1a\uff1b\uff01\uff1f\uff02\uff07\uff40\uff3e\uff5e\uffe3\uff3f\uff06\uff20\uff03\uff05\uff0b\uff0d\uff0a\uff1d\uff1c\uff1e\uff08\uff09\uff3b\uff3d\uff5b\uff5d\uff5f\uff60\uff5c\uffe4\uff0f\uff3c\uffe2\uff04\uffe1\uffe0\uffe6\uffe5\u3002\u3001\u300c\u300d\u300e\u300f\u3014\u3015\u3010\u3011\u2014\u2026\u2013\u300a\u300b\u3008\u3009
Could you please add them all to the rule? Thank you so much!

@user30535 can you please explain where did you get your list from?

According to the links below, you included in your list some characters that not considered to be Fullwidth according to the Unicode spec, such as \u2013 (–)

https://en.wikipedia.org/wiki/Halfwidth_and_Fullwidth_Forms_(Unicode_block)

http://www.alanwood.net/unicode/halfwidth_and_fullwidth_forms.html

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz,.:;!?"'`^~ ̄_&@#%+-*=<>()[]{}⦅⦆|¦/\¬$£¢₩¥ are copied from http://xahlee.info/comp/unicode_full-width_chars.html

I considered the list above incomplete, so I manually added 。、「」『』〔〕【】—…–《》〈〉 to the list. I agree that —…– may be interpreted not as fullwidth, but the others are pretty standard Chinese characters that should always be interpreted as fullwidth characters.

@mnaoumov
Copy link
Contributor

mnaoumov commented Nov 13, 2022

I think Fullwidth name in the linter rule is misleading. The characters you are referring to are CJK Symbols and Punctuation

I think we should think about renaming the rule to clarify the intent

@pjkaufman
Copy link
Collaborator

I think Fullwidth name in the linter rule is misleading. The characters you are referring to are CJK Symbols and Punctuation

I think we should think about renaming the rule to clarify the intent

Could you explain how these are not Fullwidth Characters? I don't believe I fully follow the discussion (especially since I am just an English speaker and rarely deal with Fullwidth characters).

@mnaoumov
Copy link
Contributor

@pjkaufman here is the definition of Fullwidth Unicode symbols
https://en.m.wikipedia.org/wiki/Halfwidth_and_Fullwidth_Forms_(Unicode_block)

The list @user30535 provided contains symbols that don't belong to that list. That's why I suggested to make the rule naming better aligned with the Unicode spec

@github-project-automation github-project-automation bot moved this from In Progress to Releasing in Obsidian Linter May 14, 2023
@pjkaufman
Copy link
Collaborator

pjkaufman commented May 14, 2023

Sorry about the delay. The rule for removing space before and or after certain characters has now been added to master and should go out in the next release. I have turned it on for my own vault and ironed out a few kinks, so hopefully several of the edge cases have been covered. Please let us know if there are any issues in the next release.

@pjkaufman pjkaufman added the resolution/update-made A change has been made that should resolve this issue or request label May 21, 2023
@pjkaufman pjkaufman moved this from Releasing to Done in Obsidian Linter May 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
markdown General Markdown or Markdown related issue or feature resolution/update-made A change has been made that should resolve this issue or request rule suggestion Suggestion to add or edit a rule
Projects
Archived in project
3 participants