TENTE extracts the negated terms that appear in a text. Its name refers to the Spanish block construction game (similar to LEGO), created in 1972, and disappeared in 2007. Like that game, the TENTE system is based on the "building blocks" CUTEXT and NegEx-MES, so both must be installed previously.
The only thing that TENTE needs as input is the text. It must be broken down into sentences. For it, for example, you can use the sentence splitter contained in SPACCC_POS-TAGGER. In addition, each of the sentences in the input file must have the following format:
sentence_identifier | sentence
The output contains those terms that appear negated in each of the sentences. The format is as follows:
sentence_identifier TAB term_negated TAB "sentence" TAB Negated TAB negation_type
For example, if the input file contains only the following sentence:
1|El paciente no tiene anemia ni lupus
Then the output file will have the following two rows:
1 TAB lupus TAB "El paciente no tiene anemia ni lupus" TAB Negated TAB negPhrases 1 TAB anemia TAB "El paciente no tiene anemia ni lupus" TAB Negated TAB negPhrases
The directory structure corresponds to package nomenclature called tente. Therefore, all packages are within that structure:
- tente/config/: includes the properties files for CUTEXT and NegEx-MES.
- tente/in/: contains the text input file (text.txt) and the terms input file (terms.txt). It is only mandatory to include the text file.
- tente/main/: includes the main class Main.java and the execution JAR file tente.jar.
- tente/out/: includes the output file called by default negTerms.txt.
- tente/temp/: this is a temporary folder. When TENTE finishes its execution, you can see the CUTEXT input file (toCutext.txt), and NegEx-MES input file (toModifier.txt); as well as the output file generated by NegEx-MES (outModifier.txt). All the files included in this folder are deleted at the beginning.
As previously mentioned, before installing and compiling TENTE, it is necessary to install and compile CUTEXT and NegEx-MES. You can consult the file 'Intallation.md' included in both CUTEXT and NegEx-MES. In this section we will assume that it has been installed and compiled correctly, and we only show some execution examples.
TENTE execution, from tente/main/, is as follows:
java tente.main.Main [options]
Options:
-help : Show this message -displayon : Show the messages at the standard output. Default TRUE (show) -language : SPANISH or ENGLISH. Default SPANISH -execCutext : Extract terms with cutext (true) or not (false). Default TRUE -text : Input text file with this format: id|sentence. Default at ../in/text.txt -terms : Name of the input file with terms or empty (depends on parameter execCutext is false or true). Default: at ../in/terms.txt -temporary : Temporary folder. Default at ../temp/ -outputFolder : Name of the output folder. Default: at ../out/ -outputFile : Name of the output file. Default: 'negTerms.txt'
CUTEXT will not run if the parameter -execCutext is set to FALSE. In this case, the user must provide a file with the desired terms. On the other hand, if the mencionated parameter is set to TRUE (by default), then CUTEXT extract the terms automatically. You can see these terms when TENTE finishes its execution. At the input directory, a file with extracted terms will have been generated, by default at tente/in/terms.txt.
Let's assume an input file "text.txt" in the directory "in" that includes the following line:
1|El paciente no tiene anemia ni lupus
If TENTE is executed with the default parameters, that is:
java tente.main.Main
Then the output file will have the following two rows:
1 lupus "El paciente no tiene anemia ni lupus" Negated negPhrases 1 anemia "El paciente no tiene anemia ni lupus" Negated negPhrases
The terms file, at tente/in/terms.txt, will contain the following terms extracted by CUTEXT:
lupus anemia paciente
On the other hand, at the temporary folder tente/temp/, the following three files will have been generated:
- toCutext.txt: which will contain:
El paciente no tiene anemia ni lupus
- toModifier.txt: which will contain:
1 lupus "El paciente no tiene anemia ni lupus" 1 anemia "El paciente no tiene anemia ni lupus" 1 paciente "El paciente no tiene anemia ni lupus"
- outModifier.txt: which will contain:
1 lupus "El paciente no tiene anemia ni lupus" Negated negPhrases 1 anemia "El paciente no tiene anemia ni lupus" Negated negPhrases 1 paciente "El paciente no tiene anemia ni lupus" Affirmed NONE
The tente.jar file allows to execute TENTE directly from a terminal such as cmd, terminator, etc. To do this, you have to write the following command line (from the directory where tente.jar is located: at tente/main/tente.jar by default):
java -jar tente.jar [options]
Where options are those shown in the 'Usage' section. For example, if we type the JAR file without options:
java -jar tente.jar
The execution of TENTE will result in the example mentioned in the previous section, if the input file (at tente/in/text.txt) contains the same information.
Jesús Santamaría ([email protected])
MIT License
Copyright (c) 2018 Secretaría de Estado para el Avance Digital (SEAD)
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.