Skip to content

HSShin0/hyperbolic_embedding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hyperbolic Embedding

An attempt to reproduce the results in Poincaré Embeddings for Learning Hierarchical Representations. Still WIP.

Getting started

Environment

  • python 3.9
  • nltk==3.5
  • numpy==1.21.2
  • pandas==1.3.4
  • scikit-learn==1.0.1
  • scipy==1.7.1
  • torch==1.10.0
  • tqdm==4.62.3
  • wandb==0.12.1

Datasets

Use the script in the original repo for constructing mammal_closure.csv, which is the subtree of mammal in the tree of all nouns in WordNet.

Dataset directory structure:

datasets
└── wordnet
    └── mammal_closure.csv

Usages

Train

Train the embedding of mammal tree to Poincare disk.

$ python main.py
$ python main.py --help
usage: main.py [-h] [--space-type {euclid,poincare}] [--datapath DATAPATH] [--n-neg N_NEG]
               [--batch-size BATCH_SIZE] [--emb-dim EMB_DIM] [--epochs EPOCHS] [--eval-every EVAL_EVERY]
               [--exp-root EXP_ROOT] [--wlog] [--wlog-name WLOG_NAME] [--use-gpu] [--init-lr INIT_LR]
               [--burn-in-epochs BURN_IN_EPOCHS]

optional arguments:
  -h, --help            show this help message and exit
  --space-type {euclid,poincare}
                        embbeding space type (default: poincare)
  --datapath DATAPATH   Data csv filepath. (default: ./datasets/wordnet/mammal_closure.csv)
  --n-neg N_NEG         Number of negative samples for each positive pair. (default: 10)
  --batch-size BATCH_SIZE
                        Batch size for training. (default: 32)
  --emb-dim EMB_DIM     Embedding dimension. (default: 2)
  --epochs EPOCHS       Number of training epochs. (default: 10)
  --eval-every EVAL_EVERY
                        Evaluate every `eval_every`-epochs. (default: 5)
  --exp-root EXP_ROOT   Root dir for saving checkpoints. (default: exp/temp)
  --wlog                Use WanDB logger (default: False)
  --wlog-name WLOG_NAME
                        Run ID in WanDB logger. (default: )
  --use-gpu             Use GPU (default: False)
  --init-lr INIT_LR, -lr INIT_LR
                        Initial learning rate (default: 0.01)
  --burn-in-epochs BURN_IN_EPOCHS, -be BURN_IN_EPOCHS
                        Number of `burn-in` epochs. Use smaller learning rate. (default: 10)

Issues and TODO list

  • Revisit the negative sampling
  • Revisit the numerical unstability issue, fixed in an ad-hoc way
  • Implement the baseline model (i.e., Euclidean embedding)
  • Implement Mean Rank evaluation
  • Visualize the result
  • Reproduce the results in the paper
  • Fix speed issue for num_workers > 0 setting of dataloader

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages