-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate MyStem (Russian stemmer) #1308
Comments
@reckart The stemmer is essentially completed but I will definitely not replace tabs with whitespaces. Can this checkstyle stuff be turned off? |
The style guidelines we have since ages define spaces instead of tabs. Checkstyle just enforces that for a better experience. Tabs have the problem that different viewers use different tab widths (2, 4, 8, whatever) which makes the code look very differently in different viewers. Spaces do not have this problem. Please install the DKPro Core Style file for Eclipse or configure whatever IDE you are using correspondingly. |
Mind that the style file does not take XML files into account - but they should also be formatted with spaces and use 2 space characters for indentation. |
ok, thx. |
* master: #1322 - Upgrade to OpenNLP 1.9.1 #1308 - integrate mystem #1327 - Update LIF support #1327 - Update LIF support #1329 - Span annotations with slot features may disappear from WebAnno TSV #1329 - Span annotations with slot features may disappear from WebAnno TSV #1329 - Span annotations with slot features may disappear from WebAnno TSV #1327 - Update LIF support #1323 - File extension generated by BinaryCasWriter does not contain dot #858 - Out-of-tagset tags should map to the generic type #1239 - Rename NYTCollectionReader to NitfReader #858 - Out-of-tagset tags should map to the generic type #1317 - Standard parameter to disable type mapping No issue. If a DKProTextContext is available, then TestRunner generates an XMI file from the processed data and stores it in the test output folder. No issue - Log names of files with license issues to the console. #1160 - Better support for CoNLL-U v2 (1.11.0) % Conflicts: % dkpro-core-asl/pom.xml
* master: dkpro#1325 - Avoid datasets being extracted outside their target directory dkpro#1325 - Avoid datasets being extracted outside their target directory dkpro#1325 - Avoid datasets being extracted outside their target directory dkpro#1338 - Factor CAS <-> brat conversion code into Pojos dkpro#1338 - Factor CAS <-> brat conversion code into Pojos dkpro#1322 - Upgrade to OpenNLP 1.9.1 dkpro#1308 - integrate mystem dkpro#1327 - Update LIF support dkpro#1327 - Update LIF support dkpro#1329 - Span annotations with slot features may disappear from WebAnno TSV dkpro#1329 - Span annotations with slot features may disappear from WebAnno TSV dkpro#1329 - Span annotations with slot features may disappear from WebAnno TSV dkpro#1327 - Update LIF support dkpro#1325 - Avoid datasets being extracted outside their target directory dkpro#1325 - Avoid datasets being extracted outside their target directory dkpro#1323 - File extension generated by BinaryCasWriter does not contain dot dkpro#858 - Out-of-tagset tags should map to the generic type dkpro#858 - Out-of-tagset tags should map to the generic type
* master: (21 commits) #1305 - Update TreeTagger models in build.xml #1325 - Avoid datasets being extracted outside their target directory #1325 - Avoid datasets being extracted outside their target directory #1325 - Avoid datasets being extracted outside their target directory #1338 - Factor CAS <-> brat conversion code into Pojos #1338 - Factor CAS <-> brat conversion code into Pojos #1322 - Upgrade to OpenNLP 1.9.1 #1308 - integrate mystem #1327 - Update LIF support #1327 - Update LIF support #1329 - Span annotations with slot features may disappear from WebAnno TSV #1329 - Span annotations with slot features may disappear from WebAnno TSV #1329 - Span annotations with slot features may disappear from WebAnno TSV #1327 - Update LIF support #1325 - Avoid datasets being extracted outside their target directory #1325 - Avoid datasets being extracted outside their target directory #1323 - File extension generated by BinaryCasWriter does not contain dot #858 - Out-of-tagset tags should map to the generic type #1239 - Rename NYTCollectionReader to NitfReader #858 - Out-of-tagset tags should map to the generic type ... % Conflicts: % dkpro-core-asl/pom.xml % dkpro-core-io-lif-asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/io/lif/LifReaderWriterTest.java % dkpro-core-io-lif-asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/io/lif/LifWriterTest.java
Integration of a Russian stemmer https://tech.yandex.ru/mystem/
Closed-source, distributed as pre-compiled fat binaries that seem to include the model.
Non-profit/research use is permitted, commercial usage has constrained. Website is Russian only~
The text was updated successfully, but these errors were encountered: