SE-WRL

The code for Improve word representation learning with sememes(ACL2017).

How to Run

Using the following command to train word-sense-sememe embeddings.

cp SSA.c[SSA.c/MST.c/SAC.c/SAT.c] word2vec/word2vec.c
cd word2vec
make
./word2vec -train TrainFile -output vectors.bin -cbow 0 -size 200 -window 8 -negative 25 -hs 0 -sample 1e-4 -threads 30 -binary 1 -iter 1 -read-vocab VocabFile -read-meaning SememeFile -read-sense Word_Sense_Sememe_File -min-count 1 -alpha 0.025

TrainFile is train data set. The following three files can be found in directory datasets. VocabFile is the word vocabulary file, and SememeFile is the sememe vocabulary file. Word_Sense_Sememe_File is a file recording group information of word-sense-sememe.

Before training, you should replace word2vec/word2vec.c with one of the four files SSA.c/MST.c/SAC.c/SAT.c.

Data Set

HowNet.txt is an Chinese knowledge base with annotated word-sense-sememe information.

Sogou-T(sample).txt is a sample dataset extracted from Sogou-T.

Complete training dataset Clean-SogouT is released in https://pan.baidu.com/s/1kXgkyJ9(password: f2ul).

Evaluation Set

wordsim-240.txt and wordsim-297.txt in this files are utilized to evaluate the quality of word representations.

analogy.txt in this file is utilized to evaluate models' capability of word analogy inference.

Annotation Information

The annotation information is for the four files SSA.c/MST.c/SAC.c/SAT.c. Annotation of the common code is only included in file SSA.c.

Revise

I'm sorry that we found bugs in programs. We have revised them. The new experiment results are released on GitHub and new version of paper is given.

Word Similarity

Model	Wordsim-240	Wordsim-297
CBOW	57.7	61.1
GloVe	59.8	58.7
Skip-gram	58.5	63.3
SSA	58.9	64.0
MST	59.2	62.8
SAC	59.1	61.0
SAT	61.2	63.3

Word Analogy

Model	Capital	City	Relationship	All
CBOW	49.8	85.7	86.0	64.2
GloVe	57.3	74.3	81.6	65.8
Skip-gram	66.8	93.7	76.8	73.4
SSA	62.3	93.7	81.6	71.9
MST	65.7	95.4	82.7	74.5
SAC	79.2	97.7	75.0	81.0
SAT	82.6	98.9	80.1	84.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SE-WRL

How to Run

Data Set

Evaluation Set

Annotation Information

Revise

Word Similarity

Word Analogy

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data process		data process
datasets		datasets
word2vec		word2vec
LICENSE		LICENSE
MST.c		MST.c
README.md		README.md
SAC.c		SAC.c
SAT.c		SAT.c
SSA.c		SSA.c

License

BWendy1/SE-WRL

Folders and files

Latest commit

History

Repository files navigation

SE-WRL

How to Run

Data Set

Evaluation Set

Annotation Information

Revise

Word Similarity

Word Analogy

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages