Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

加入正規表示式(Regular Expression)的轉換方式 #1

Open
IvanaGyro opened this issue Oct 8, 2017 · 1 comment
Open

加入正規表示式(Regular Expression)的轉換方式 #1

IvanaGyro opened this issue Oct 8, 2017 · 1 comment

Comments

@IvanaGyro
Copy link
Collaborator

偶爾有字幕會遇到簡體一字對應到繁體多字的情況,大多數的確可以經由字典檔處理,但少數會遇到問題。
如:
原簡體:-喂 請問在家嗎? (繁體化)
轉換後:-餵 請問在家嗎?
如果可以利用正規表示式,就可以加以修正

修改

我將 convertVocabulary(): 修改為

def convertVocabulary(string_in, dic, re_dic):
	i = 0
	while i < len(string_in):
		for j in range(len(string_in) - i, 0, -1):
			if string_in[i:][:j] in dic:
				t = dic[string_in[i:][:j]]
				string_in = string_in[:i] + t + string_in[i:][j:]
				i += len(t) - 1
				break
		i += 1
	for pattern, repl in re_dic.iteritems():
		if pattern.search(string_in):
			string_in = pattern.sub(repl, string_in)
	return string_in

dic_tw.py加入

dic_re_tw = {
u"(^|\W)(餵)(\W|$)":u"\g<1>喂\g<3>"	
}
@tenyi
Copy link
Owner

tenyi commented Oct 11, 2017

感謝 iven00000000 ,請加入本專案,一起改善。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants