Skip to content

Commit

Permalink
Merge branch 'features/documentation'
Browse files Browse the repository at this point in the history
  • Loading branch information
bpiwowar committed Aug 25, 2023
2 parents cfe6895 + 1870fa3 commit 15e2de2
Show file tree
Hide file tree
Showing 18 changed files with 323 additions and 37 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 pytest pytest-timeout pytest-dependency
pip install flake8 pytest pytest-timeout pytest-dependency sphinx
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
pip install faiss-cpu numba tensorboard
pip install '.'
Expand Down
21 changes: 19 additions & 2 deletions docs/source/data/adapters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,29 @@ subsampling document and/or queries.

.. currentmodule:: xpmir.datasets.adapters

Adhoc datasets
--------------

.. autoxpmconfig:: RandomFold
:members: folds

.. autoxpmconfig:: ConcatFold

Documents
---------

.. autoxpmconfig:: RetrieverBasedCollection

.. autoxpmconfig:: TopicFold
.. autoxpmconfig:: AdhocAssessmentFold
.. autoxpmconfig:: DocumentSubset

Assessments
-----------

.. autoxpmconfig:: AdhocAssessmentFold

Topics
------

.. autoxpmconfig:: TopicFold
.. autoxpmconfig:: MemoryTopicStore
.. autoxpmconfig:: TextStore
3 changes: 3 additions & 0 deletions docs/source/evaluation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ Evaluation
Evaluation
----------

.. autoxpmconfig:: xpmir.evaluation.BaseEvaluation
.. autoxpmconfig:: xpmir.evaluation.RunEvaluation

.. autoxpmconfig:: xpmir.evaluation.Evaluate

.. autoclass:: xpmir.evaluation.Evaluations
Expand Down
2 changes: 2 additions & 0 deletions docs/source/letor/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,9 @@ scorers, some are have learnable parameters.
.. autoxpmconfig:: xpmir.rankers.Scorer
:members: initialize, rsv, to, eval, getRetriever
.. autoxpmconfig:: xpmir.rankers.RandomScorer
.. autoxpmconfig:: xpmir.rankers.AbstractLearnableScorer
.. autoxpmconfig:: xpmir.rankers.LearnableScorer
.. autoxpmconfig:: xpmir.neural.TorchLearnableScorer

.. autofunction:: xpmir.rankers.scorer_retriever

Expand Down
14 changes: 11 additions & 3 deletions docs/source/letor/optimization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Optimizers

.. autoxpmconfig:: xpmir.learning.optim.ParameterOptimizer
.. autoxpmconfig:: xpmir.learning.optim.ParameterFilter

.. autoxpmconfig:: xpmir.learning.optim.RegexParameterFilter


Batching
Expand All @@ -48,5 +48,13 @@ the way to use it (i.e. multi-gpu settings).
Schedulers
----------

.. automodule:: xpmir.learning.schedulers
:members:
.. autoxpmconfig:: xpmir.learning.schedulers.Scheduler
.. autoxpmconfig:: xpmir.learning.schedulers.CosineWithWarmup
.. autoxpmconfig:: xpmir.learning.schedulers.LinearWithWarmup

Base classes
------------

.. autoxpmconfig:: xpmir.learning.base.Random
.. autoxpmconfig:: xpmir.learning.base.Sampler
.. autoxpmconfig:: xpmir.learning.trainers.Trainer
8 changes: 7 additions & 1 deletion docs/source/letor/samplers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ Samplers provide samples in the form of *records*. They all inherit from:
.. autoclass:: SerializableIterator


.. autoxpmconfig:: ModelBasedSampler


Pointwise
=========

Expand All @@ -27,6 +30,9 @@ Pairwise

.. autoxpmconfig:: TripletBasedSampler
.. autoxpmconfig:: PairwiseDatasetTripletBasedSampler
.. autoxpmconfig:: PairwiseInBatchNegativesSampler
.. autoxpmconfig:: PairwiseSampleDatasetFromTSV
.. autoxpmconfig:: PairwiseSamplerFromTSV

Hard Negatives Sampling (Tasks)
===============================
Expand Down Expand Up @@ -58,4 +64,4 @@ Useful for pre-training or when learning index parameters (e.g. for FAISS).
.. currentmodule:: xpmir.documents.samplers
.. autoxpmconfig:: DocumentSampler
.. autoxpmconfig:: HeadDocumentSampler
.. autoxpmconfig:: RandomDocumentSampler
.. autoxpmconfig:: RandomDocumentSampler
38 changes: 37 additions & 1 deletion docs/source/letor/trainers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,41 @@ Losses
.. autoxpmconfig:: xpmir.letor.trainers.pairwise.PairwiseLoss
:members: compute


.. autoxpmconfig:: xpmir.letor.trainers.pairwise.CrossEntropyLoss
.. autoxpmconfig:: xpmir.letor.trainers.pairwise.HingeLoss
.. autoxpmconfig:: xpmir.letor.trainers.pairwise.PointwiseCrossEntropyLoss


Pairwise (duo)
**************

Trainer
-------

.. autoxpmconfig:: xpmir.letor.trainers.pairwise.PairwiseLossWithTarget
:members: compute

Losses
------

.. autoxpmconfig:: xpmir.letor.trainers.pairwise.DuoPairwiseTrainer


Batchwise
*********

Trainer
-------

.. autoxpmconfig:: xpmir.letor.trainers.batchwise.BatchwiseTrainer

Losses
------

.. autoxpmconfig:: xpmir.letor.trainers.batchwise.BatchwiseLoss
.. autoxpmconfig:: xpmir.letor.trainers.batchwise.CrossEntropyLoss
.. autoxpmconfig:: xpmir.letor.trainers.batchwise.SoftmaxCrossEntropy

Other
*****

Expand All @@ -57,6 +86,13 @@ Other
Distillation: Pairwise
**********************


Sampler
-------

.. autoxpmconfig:: xpmir.letor.distillation.samplers.DistillationPairwiseSampler
.. autoxpmconfig:: xpmir.letor.distillation.samplers.PairwiseHydrator

Trainer
-------

Expand Down
20 changes: 20 additions & 0 deletions docs/source/neural.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ Cross-Encoder
Models that rely on a joint representation of the query and the document.

.. autoxpmconfig:: xpmir.neural.cross.CrossScorer
.. autoxpmconfig:: xpmir.neural.jointclassifier.JointClassifier

.. autoxpmconfig:: xpmir.neural.cross.DuoCrossScorer


Expand Down Expand Up @@ -61,6 +63,24 @@ Interaction models

.. autoxpmconfig:: xpmir.neural.colbert.Colbert

DRMM
****

.. autoxpmconfig:: xpmir.neural.interaction.drmm.Combination
.. autoxpmconfig:: xpmir.neural.interaction.drmm.CountHistogram
.. autoxpmconfig:: xpmir.neural.interaction.drmm.IdfCombination
.. autoxpmconfig:: xpmir.neural.interaction.drmm.LogCountHistogram
.. autoxpmconfig:: xpmir.neural.interaction.drmm.NormalizedHistogram
.. autoxpmconfig:: xpmir.neural.interaction.drmm.SumCombination

Similarity
==========

.. autoxpmconfig:: xpmir.neural.common.Similarity
.. autoxpmconfig:: xpmir.neural.common.L2Distance
.. autoxpmconfig:: xpmir.neural.common.CosineSimilarity


Sparse Models
=============

Expand Down
27 changes: 27 additions & 0 deletions docs/source/retrieval.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,25 @@ In a re-ranking setting, one can use a two stage retriever to perform
retrieval, by using a fully fledge retriever first, and then
re-ranking the results.

.. autoxpmconfig:: xpmir.rankers.AbstractTwoStageRetriever
.. autoxpmconfig:: xpmir.rankers.TwoStageRetriever

Duo-retrievers
--------------

Duo-retrievers only predicts whether a document is "more relevant" than
another

.. autoxpmconfig:: xpmir.rankers.DuoTwoStageRetriever
.. autoxpmconfig:: xpmir.rankers.DuoLearnableScorer

Misc
----

.. autoxpmconfig:: xpmir.rankers.full.FullRetriever
.. autoxpmconfig:: xpmir.rankers.full.FullRetrieverRescorer
.. autoxpmconfig:: xpmir.rankers.RetrieverHydrator
.. autoxpmconfig:: xpmir.rankers.mergers.SumRetriever

Collection dependendant
-----------------------
Expand All @@ -52,12 +68,23 @@ Collection dependendant
Anserini
--------

.. autoxpmconfig:: xpmir.index.anserini.Index
.. autoxpmconfig:: xpmir.interfaces.anserini.Index
.. autoxpmconfig:: xpmir.interfaces.anserini.AnseriniRetriever
.. autoxpmconfig:: xpmir.interfaces.anserini.IndexCollection
.. autoxpmconfig:: xpmir.interfaces.anserini.SearchCollection

FAISS
-----

.. autoxpmconfig:: xpmir.index.faiss.FaissIndex
.. autoxpmconfig:: xpmir.index.faiss.IndexBackedFaiss
.. autoxpmconfig:: xpmir.index.faiss.FaissRetriever


Sparse
------

.. autoxpmconfig:: xpmir.index.sparse.SparseRetriever
.. autoxpmconfig:: xpmir.index.sparse.SparseRetrieverIndex
.. autoxpmconfig:: xpmir.index.sparse.SparseRetrieverIndexBuilder
6 changes: 6 additions & 0 deletions docs/source/text/huggingface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,21 @@ Huggingface Transformers

.. currentmodule:: xpmir.text.huggingface

.. autoxpmconfig:: BaseTransformer

Encoders
========

.. autoxpmconfig:: TransformerEncoder
.. autoxpmconfig:: TransformerTokensEncoder
.. autoxpmconfig:: TransformerTextEncoderAdapter

.. autoxpmconfig:: DualTransformerEncoder
.. autoxpmconfig:: SentenceTransformerTextEncoder
.. autoxpmconfig:: OneHotHuggingFaceEncoder
.. autoxpmconfig:: DualDuoBertTransformerEncoder

.. autoxpmconfig:: TransformerVocab

Tokenizers
==========
Expand Down
5 changes: 5 additions & 0 deletions docs/source/text/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ Text Representation
:maxdepth: 2

huggingface
wordvec


The `text` module groups classes and configurations that compute
Expand All @@ -17,6 +18,10 @@ as contextual word embeddings and document embeddings.
.. autoxpmconfig:: xpmir.text.encoders.TokensEncoder
:members: forward

.. autoxpmconfig:: xpmir.text.encoders.Encoder
.. autoxpmconfig:: xpmir.text.encoders.MeanTextEncoder
.. autoxpmconfig:: xpmir.text.encoders.TripletTextEncoder

.. autoxpmconfig:: xpmir.text.encoders.TextEncoder
:members: forward

Expand Down
6 changes: 6 additions & 0 deletions docs/source/text/wordvec.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Word vectors
============

.. autoxpmconfig:: xpmir.text.wordvec_vocab.WordvecVocab
.. autoxpmconfig:: xpmir.text.wordvec_vocab.WordvecHashVocab
.. autoxpmconfig:: xpmir.text.wordvec_vocab.WordvecUnkVocab
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ platform = any
zip_safe = false
include_package_data = true
python_requires = >= 3.8
test_suite = xpmir.tests
test_suite = xpmir.test

[options.packages.find]
exclude =
Expand Down
2 changes: 2 additions & 0 deletions src/xpmir/datasets/adapters.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ def iter(self):


class AdhocAssessmentFold(AdhocAssessments):
"""Filter assessments by topic ID"""

ids: Param[List[str]]
"""A set of the ids for the assessments where we select from"""

Expand Down
24 changes: 2 additions & 22 deletions src/xpmir/neural/colbert.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,32 +5,12 @@

from typing import List
from experimaestro import Config, Constant, Param, default, Annotated
import torch
from torch import nn
import torch.nn.functional as F
from xpmir.learning.context import TrainerContext
from xpmir.letor.records import BaseRecords
from xpmir.neural.interaction import InteractionScorer


class Similarity(Config):
def __call__(self, queries, documents) -> torch.Tensor:
raise NotImplementedError()


class L2Distance(Similarity):
def __call__(self, queries, documents):
return (
(-1.0 * ((queries.unsqueeze(2) - documents.unsqueeze(1)) ** 2).sum(-1))
.max(-1)
.values.sum(-1)
)


class CosineDistance(Similarity):
def __call__(self, queries, documents):
return (queries @ documents.permute(0, 2, 1)).max(2).values.sum(1)

from .common import Similarity, CosineSimilarity

class Colbert(InteractionScorer):
"""ColBERT model
Expand All @@ -56,7 +36,7 @@ class Colbert(InteractionScorer):
doctoken: Param[bool] = True
"""Whether a specific document token should be used as a prefix to the document"""

similarity: Annotated[Similarity, default(CosineDistance())]
similarity: Annotated[Similarity, default(CosineSimilarity())]
"""Which similarity to use"""

linear_dim: Param[int] = 128
Expand Down
21 changes: 21 additions & 0 deletions src/xpmir/neural/common.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import torch
from experimaestro import Config

class Similarity(Config):
"""A similarity between vector representations"""
def __call__(self, queries: torch.Tensor, documents: torch.Tensor) -> torch.Tensor:
raise NotImplementedError()


class L2Distance(Similarity):
def __call__(self, queries, documents):
return (
(-1.0 * ((queries.unsqueeze(2) - documents.unsqueeze(1)) ** 2).sum(-1))
.max(-1)
.values.sum(-1)
)


class CosineSimilarity(Similarity):
def __call__(self, queries, documents):
return (queries @ documents.permute(0, 2, 1)).max(2).values.sum(1)
6 changes: 0 additions & 6 deletions src/xpmir/rankers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,6 @@ class ScorerOutputType(Enum):
"""A probability, in ]0,1["""


class LearnableModel(Config):
"""All learnable models"""

pass


class Scorer(Config, Initializable, EasyLogger):
"""Query-document scorer
Expand Down
Loading

0 comments on commit 15e2de2

Please sign in to comment.