Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to work with matchms 0.6.0 #35

Merged
merged 28 commits into from
Sep 16, 2020
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
dfc5f20
add support for Python 3.8
florian-huber Sep 10, 2020
aa3abe5
add support for Python 3.8
florian-huber Sep 10, 2020
5bbb16f
update changelog
florian-huber Sep 10, 2020
bb04299
fix typo
florian-huber Sep 14, 2020
b1e723e
refactor according to matchms 0.6.0
florian-huber Sep 14, 2020
b5ff6c8
Merge branch 'refactor_environment' of https://github.com/iomega/spec…
florian-huber Sep 14, 2020
bcaceb4
linting
florian-huber Sep 14, 2020
9943714
add matchms >=0.6.0 to environment
florian-huber Sep 14, 2020
cc81313
add tests for spec2vec methods
florian-huber Sep 14, 2020
abfb5b5
fix test
florian-huber Sep 14, 2020
b1aeaaf
isort linting
florian-huber Sep 14, 2020
c430fb4
Update CHANGELOG.md
florian-huber Sep 14, 2020
bc85f0f
remove matchms from meta.yaml
florian-huber Sep 14, 2020
1233031
Merge branch 'refactor_environment' of https://github.com/iomega/spec…
florian-huber Sep 14, 2020
0651b79
add nlesc channel
florian-huber Sep 14, 2020
fe27213
try adding matchms to meta.yaml build
florian-huber Sep 14, 2020
7283c90
try adding nlesc channel to conda_build
florian-huber Sep 14, 2020
a75740f
add channel nlesc to verify and publish as well
florian-huber Sep 14, 2020
97d7621
remove nlesc channel from verify
florian-huber Sep 14, 2020
41912fb
add conda update to (hopefully) use locally build spec2vec
florian-huber Sep 15, 2020
6d06320
force matchms install first and local conda install second
florian-huber Sep 15, 2020
ff7628c
Update .github/workflows/conda_build.yml
florian-huber Sep 15, 2020
370839c
try fixing conda build
florian-huber Sep 15, 2020
36dbd70
Merge branch 'refactor_environment' of https://github.com/iomega/spec…
florian-huber Sep 15, 2020
6164a36
add listing of build folder (noarch)
florian-huber Sep 15, 2020
0486cd7
rollback to conda install matchms, then conda install spec2vec
florian-huber Sep 16, 2020
70921e6
Update .github/workflows/conda_build.yml
florian-huber Sep 16, 2020
bbac633
Update .github/workflows/conda_build.yml
florian-huber Sep 16, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 11 additions & 5 deletions .github/workflows/conda_build.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
name: Anaconda Build

on: [push, pull_request]
on:
push:
pull_request:
types: [opened, reopened]

jobs:

Expand All @@ -11,7 +14,10 @@ jobs:
fail-fast: false
matrix:
os: ['ubuntu-latest', 'macos-latest', 'windows-latest']
python-version: ["3.7"]
python-version: ["3.8"]
include:
- os: 'ubuntu-latest'
python-version: "3.7"
steps:
- uses: actions/checkout@v2
with:
Expand Down Expand Up @@ -73,7 +79,7 @@ jobs:
if: matrix.os == 'ubuntu-latest'
run: sed -i "s+$PWD/++g" coverage.xml
- name: SonarCloud Scan
if: matrix.os == 'ubuntu-latest'
if: ${{ matrix.os == 'ubuntu-latest' && matrix.python-version == '3.7' }}
sverhoeven marked this conversation as resolved.
Show resolved Hide resolved
uses: sonarsource/sonarcloud-github-action@master
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Expand Down Expand Up @@ -123,7 +129,7 @@ jobs:
[ "$RUNNING_OS" = "Windows" ] && export BUILDDIR=$RUNNER_TEMP\\spec2vec\\_build\\
conda config --set anaconda_upload no
conda build --numpy 1.18.1 --no-include-recipe \
--channel bioconda --channel conda-forge \
--channel nlesc --channel bioconda --channel conda-forge \
--croot ${BUILDDIR} \
./conda
- name: Upload package artifact from build
Expand All @@ -139,7 +145,7 @@ jobs:
fail-fast: false
matrix:
os: ['ubuntu-latest', 'macos-latest', 'windows-latest']
python-version: ['3.7']
python-version: ['3.7', '3.8']
runs-on: ${{ matrix.os }}
needs: build
steps:
Expand Down
4 changes: 3 additions & 1 deletion .github/workflows/conda_publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,9 @@ jobs:
export BUILD_FOLDER=/tmp/spec2vec/_build
mkdir -p $BUILD_FOLDER
conda build --numpy 1.18.1 --no-include-recipe \
--channel bioconda --channel conda-forge \
--channel nlesc \
--channel bioconda \
--channel conda-forge \
--croot $BUILD_FOLDER \
./conda
- name: Push the package to anaconda cloud
Expand Down
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- Support for Python 3.8 [#35](https://github.com/iomega/spec2vec/pull/35)

### Changed

- Refactored Spec2Vec class to provide .pair() and .matrix() methods [#35](https://github.com/iomega/spec2vec/pull/35)

### Removed

- Spec2VecParallel (is now included as Spec2Vec.matrix()) [#35](https://github.com/iomega/spec2vec/pull/35)

## [0.2.0] - 2020-06-18

### Added
Expand Down
4 changes: 2 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,14 +99,14 @@ Installation

Prerequisites:

- Python 3.7
- Python 3.7 or 3.8
- Anaconda

Install spec2vec from Anaconda Cloud with

.. code-block:: console

conda env create --name spec2vec python=3.7
conda env create --name spec2vec python=3.8
conda activate spec2vec
conda install --channel nlesc --channel bioconda --channel conda-forge spec2vec

Expand Down
2 changes: 1 addition & 1 deletion conda/environment-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ dependencies:
- anaconda-client
- conda-build
- conda-verify
- python >=3.7,<3.8
- python >=3.7,<3.9
4 changes: 2 additions & 2 deletions conda/environment-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ channels:
- nlesc
dependencies:
- gensim >=3.8.0
- matchms>=0.5.0
- matchms>=0.6.0
- numpy
- pip
- python >=3.7,<3.8
- python >=3.7,<3.9
- scipy
- pip:
- bump2version
Expand Down
3 changes: 2 additions & 1 deletion conda/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ channels:
- defaults
dependencies:
- gensim >=3.8.0
- matchms >=0.6.0
- numpy
- python >=3.7,<3.8
- python >=3.7,<3.9
- scipy
7 changes: 5 additions & 2 deletions conda/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ source:

extra:
channels:
- nlesc
- conda-forge
- bioconda

Expand All @@ -26,18 +27,20 @@ requirements:
- conda-verify
- pytest-runner
- python
- matchms >=0.6.0
- numpy {{ numpy }}
- setuptools
host:
- python >=3.7,<3.8
- python >=3.7,<3.9
- pip
- pytest-runner
- setuptools
run:
- gensim >=3.8.0
- matchms >=0.6.0
- numpy
- pip
- python >=3.7,<3.8
- python >=3.7,<3.9
- scipy

test:
Expand Down
86 changes: 0 additions & 86 deletions integration-tests/test_user_workflow_spec2vec_parallel.py

This file was deleted.

4 changes: 2 additions & 2 deletions readthedocs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,15 @@ Installation

Prerequisites:

- Python 3.7
- Python 3.7 or 3.8
- Anaconda

Install spec2vec from Anaconda Cloud with

.. code-block:: console

# install spec2vec in a new virtual environment to avoid dependency clashes
conda create --name spec2vec python=3.7
conda create --name spec2vec python=3.8
conda activate spec2vec
conda install --channel nlesc --channel bioconda --channel conda-forge spec2vec

Expand Down
3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@
"License :: OSI Approved :: Apache Software License",
"Natural Language :: English",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.7"
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8"
],
test_suite="tests",
install_requires=[
Expand Down
42 changes: 40 additions & 2 deletions spec2vec/Spec2Vec.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
from typing import List
from typing import Union
import numpy
import scipy
from gensim.models.basemodel import BaseTopicModel
from matchms.similarity.BaseSimilarity import BaseSimilarity
from spec2vec.SpectrumDocument import SpectrumDocument
from .calc_vector import calc_vector


class Spec2Vec:
class Spec2Vec(BaseSimilarity):
"""Calculate spec2vec similarity scores between a reference and a query.

Using a trained model, spectrum documents will be converted into spectrum
Expand Down Expand Up @@ -64,7 +67,7 @@ def __init__(self, model: BaseTopicModel, intensity_weighting_power: Union[float
self.allowed_missing_percentage = allowed_missing_percentage
self.vector_size = model.wv.vector_size

def __call__(self, reference: SpectrumDocument, query: SpectrumDocument) -> float:
def pair(self, reference: SpectrumDocument, query: SpectrumDocument) -> float:
"""Calculate the spec2vec similaritiy between a reference and a query.

Parameters
Expand All @@ -86,3 +89,38 @@ def __call__(self, reference: SpectrumDocument, query: SpectrumDocument) -> floa
cdist = scipy.spatial.distance.cosine(reference_vector, query_vector)

return 1 - cdist

def matrix(self, references: List[SpectrumDocument], queries: List[SpectrumDocument],
is_symmetric: bool = False) -> numpy.ndarray:
"""Calculate the spec2vec similarities between all references and queries.

Parameters
----------
references:
Reference spectrum documents.
queries:
Query spectrum documents.

Returns
-------
spec2vec_similarity
Array of spec2vec similarity scores.
"""
n_rows = len(references)
reference_vectors = numpy.empty((n_rows, self.vector_size), dtype="float")
for index_reference, reference in enumerate(references):
reference_vectors[index_reference, 0:self.vector_size] = calc_vector(self.model,
reference,
self.intensity_weighting_power,
self.allowed_missing_percentage)
n_cols = len(queries)
query_vectors = numpy.empty((n_cols, self.vector_size), dtype="float")
for index_query, query in enumerate(queries):
query_vectors[index_query, 0:self.vector_size] = calc_vector(self.model,
query,
self.intensity_weighting_power,
self.allowed_missing_percentage)

spec2vec_similarity = 1 - scipy.spatial.distance.cdist(reference_vectors, query_vectors, "cosine")

return spec2vec_similarity
Loading