Skip to content

Commit

Permalink
Merge branch 'release-3.2.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
menshikh-iv committed Dec 9, 2017
2 parents b6234e7 + 25014fc commit 6dd8ae7
Show file tree
Hide file tree
Showing 174 changed files with 131,793 additions and 3,073 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ Thumbs.db

# Other #
#########
.tox/
.cache/
.project
.pydevproject
.ropeproject
Expand Down
20 changes: 13 additions & 7 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,24 @@ cache:
directories:
- $HOME/.cache/pip
- $HOME/.ccache

- $HOME/.pip-cache
dist: trusty
language: python


matrix:
include:
- env: PYTHON_VERSION="2.7" NUMPY_VERSION="1.11.3" SCIPY_VERSION="0.18.1" ONLY_CODESTYLE="yes"
- env: PYTHON_VERSION="2.7" NUMPY_VERSION="1.11.3" SCIPY_VERSION="0.18.1" ONLY_CODESTYLE="no"
- env: PYTHON_VERSION="3.5" NUMPY_VERSION="1.11.3" SCIPY_VERSION="0.18.1" ONLY_CODESTYLE="no"
- env: PYTHON_VERSION="3.6" NUMPY_VERSION="1.11.3" SCIPY_VERSION="0.18.1" ONLY_CODESTYLE="no"
- python: '2.7'
env: TOXENV="flake8, docs"

- python: '2.7'
env: TOXENV="py27-linux"

- python: '3.5'
env: TOXENV="py35-linux"

- python: '3.6'
env: TOXENV="py36-linux"

install: source continuous_integration/travis/install.sh
script: bash continuous_integration/travis/run.sh
install: pip install tox
script: tox -vv
146 changes: 146 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,151 @@
Changes
===========
## 3.2.0, 2017-12-09

:star2: New features:

* New download API for corpora and pre-trained models (__[@chaitaliSaini](https://github.com/chaitaliSaini)__ & __[@menshikh-iv](https://github.com/menshikh-iv)__, [#1705](https://github.com/RaRe-Technologies/gensim/pull/1705) & [#1632](https://github.com/RaRe-Technologies/gensim/pull/1632) & [#1492](https://github.com/RaRe-Technologies/gensim/pull/1492))
- Download large NLP datasets in one line of Python, then use with memory-efficient data streaming:
```python
import gensim.downloader as api

for article in api.load("wiki-english-20171001"):
pass

```
- Don’t waste time searching for good word embeddings, use the curated ones we included:
```python
import gensim.downloader as api

model = api.load("glove-twitter-25")
model.most_similar("engineer")

# [('specialist', 0.957542896270752),
# ('developer', 0.9548177123069763),
# ('administrator', 0.9432312846183777),
# ('consultant', 0.93915855884552),
# ('technician', 0.9368376135826111),
# ('analyst', 0.9342101216316223),
# ('architect', 0.9257484674453735),
# ('engineering', 0.9159940481185913),
# ('systems', 0.9123805165290833),
# ('consulting', 0.9112802147865295)]
```
- [Blog post](https://rare-technologies.com/new-api-for-pretrained-nlp-models-and-datasets-in-gensim/) introducing the API and design decisions.
- [Notebook with examples](https://github.com/RaRe-Technologies/gensim/blob/be4500e4f0616ec2864c2ce70cb5d4db4b46512d/docs/notebooks/downloader_api_tutorial.ipynb)

* New model: Poincaré embeddings (__[@jayantj](https://github.com/jayantj)__, [#1696](https://github.com/RaRe-Technologies/gensim/pull/1696) & [#1700](https://github.com/RaRe-Technologies/gensim/pull/1700) & [#1757](https://github.com/RaRe-Technologies/gensim/pull/1757) & [#1734](https://github.com/RaRe-Technologies/gensim/pull/1734))
- Embed a graph (taxonomy) in the same way as word2vec embeds words:
```python
from gensim.models.poincare import PoincareRelations, PoincareModel
from gensim.test.utils import datapath

data = PoincareRelations(datapath('poincare_hypernyms.tsv'))
model = PoincareModel(data)
model.kv.most_similar("cat.n.01")

# [('kangaroo.n.01', 0.010581353439700418),
# ('gib.n.02', 0.011171531439892076),
# ('striped_skunk.n.01', 0.012025106076442395),
# ('metatherian.n.01', 0.01246679759214648),
# ('mammal.n.01', 0.013281303506525968),
# ('marsupial.n.01', 0.013941330203709653)]
```
- [Tutorial notebook on Poincaré embeddings](https://github.com/RaRe-Technologies/gensim/blob/920c029ca97f961c8df264672c34936607876694/docs/notebooks/Poincare%20Tutorial.ipynb)
- [Model introduction and the journey of its implementation](https://rare-technologies.com/implementing-poincare-embeddings/)
- [Original paper](https://arxiv.org/abs/1705.08039) on arXiv

* Optimized FastText (__[@manneshiva](https://github.com/manneshiva)__, [#1742](https://github.com/RaRe-Technologies/gensim/pull/1742))
- New fast multithreaded implementation of FastText, natively in Python/Cython. Deprecates the existing wrapper for Facebook’s C++ implementation.
```python
import gensim.downloader as api
from gensim.models import FastText

model = FastText(api.load("text8"))
model.most_similar("cat")

# [('catnip', 0.8538144826889038),
# ('catwalk', 0.8136177062988281),
# ('catchy', 0.7828493118286133),
# ('caf', 0.7826495170593262),
# ('bobcat', 0.7745151519775391),
# ('tomcat', 0.7732658386230469),
# ('moat', 0.7728310823440552),
# ('caye', 0.7666271328926086),
# ('catv', 0.7651021480560303),
# ('caveat', 0.7643581628799438)]


```

* Binary pre-compiled wheels for Windows, OSX and Linux (__[@menshikh-iv](https://github.com/menshikh-iv)__, [MacPython/gensim-wheels/#7](https://github.com/MacPython/gensim-wheels/pull/7))
- Users no longer need to have a C compiler for using the fast (Cythonized) version of word2vec, doc2vec, etc.
- Faster Gensim pip installation

* Added `DeprecationWarnings` to deprecated methods and parameters, with a clear schedule for removal.

:+1: Improvements:
* Add Montemurro and Zanette's entropy based keyword extraction algorithm. Fix #665 (__[@PeteBleackley](https://github.com/PeteBleackley)__, [#1738](https://github.com/RaRe-Technologies/gensim/pull/1738))
* Fix flake8 E731, E402, refactor tests & sklearn API code. Partial fix #1644 (__[@horpto](https://github.com/horpto)__, [#1689](https://github.com/RaRe-Technologies/gensim/pull/1689))
* Reduce distribution size. Fix #1698 (__[@menshikh-iv](https://github.com/menshikh-iv)__, [#1699](https://github.com/RaRe-Technologies/gensim/pull/1699))
* Improve `scan_vocab` speed, `build_vocab_from_freq` method (__[@jodevak](https://github.com/jodevak)__, [#1695](https://github.com/RaRe-Technologies/gensim/pull/1695))
* Improve `segment_wiki` script (__[@piskvorky](https://github.com/piskvorky)__, [#1707](https://github.com/RaRe-Technologies/gensim/pull/1707))
* Add custom `dtype` support for `LdaModel`. Partially fix #1576 (__[@xelez](https://github.com/xelez)__, [#1656](https://github.com/RaRe-Technologies/gensim/pull/1656))
* Add `doc2idx` method for `gensim.corpora.Dictionary`. Fix #1634 (__[@roopalgarg](https://github.com/roopalgarg)__, [#1720](https://github.com/RaRe-Technologies/gensim/pull/1720))
* Add tox and pytest to gensim, integration with Travis and Appveyor. Fix #1613, #1644 (__[@menshikh-iv](https://github.com/menshikh-iv)__, [#1721](https://github.com/RaRe-Technologies/gensim/pull/1721))
* Add flag for hiding outdated data for `gensim.downloader.info` (__[@menshikh-iv](https://github.com/menshikh-iv)__, [#1736](https://github.com/RaRe-Technologies/gensim/pull/1736))
* Add reproducible order between python versions for `gensim.corpora.Dictionary` (__[@formi23](https://github.com/formi23)__, [#1715](https://github.com/RaRe-Technologies/gensim/pull/1715))
* Update `tox.ini`, `setup.cfg`, `README.md` (__[@menshikh-iv](https://github.com/menshikh-iv)__, [#1741](https://github.com/RaRe-Technologies/gensim/pull/1741))
* Add custom `logsumexp` for `LdaModel` (__[@arlenk](https://github.com/arlenk)__, [#1745](https://github.com/RaRe-Technologies/gensim/pull/1745))

:red_circle: Bug fixes:
* Fix ranking formula in `gensim.summarization.bm25`. Fix #1718 (__[@souravsingh](https://github.com/souravsingh)__, [#1726](https://github.com/RaRe-Technologies/gensim/pull/1726))
* Fixed incompatibility in persistence for `FastText` wrapper. Fix #1642 (__[@chinmayapancholi13](https://github.com/chinmayapancholi13)__, [#1723](https://github.com/RaRe-Technologies/gensim/pull/1723))
* Fix `gensim.sklearn_api` bug with `documents_columns` parameter. Fix #1676 (__[@chinmayapancholi13](https://github.com/chinmayapancholi13)__, [#1704](https://github.com/RaRe-Technologies/gensim/pull/1704))
* Fix slowdown of CI, remove pytest-cov (__[@menshikh-iv](https://github.com/menshikh-iv)__, [#1728](https://github.com/RaRe-Technologies/gensim/pull/1728))
* Replace outdated packages in Dockerfile (__[@rbahumi](https://github.com/rbahumi)__, [#1730](https://github.com/RaRe-Technologies/gensim/pull/1730))
* Replace `num_words` to `topn` in `LdaMallet.show_topics`. Fix #1747 (__[@apoorvaeternity](https://github.com/apoorvaeternity)__, [#1749](https://github.com/RaRe-Technologies/gensim/pull/1749))
* Fix `os.rename` from `gensim.downloader` when 'src' and 'dst' on different partitions (__[@anotherbugmaster](https://github.com/anotherbugmaster)__, [#1733](https://github.com/RaRe-Technologies/gensim/pull/1733))
* Fix `DeprecationWarning` from `logsumexp` (__[@dreamgonfly](https://github.com/dreamgonfly)__, [#1703](https://github.com/RaRe-Technologies/gensim/pull/1703))
* Fix backward compatibility problem in `Phrases.load`. Fix #1751 (__[@alexgarel](https://github.com/alexgarel)__, [#1758](https://github.com/RaRe-Technologies/gensim/pull/1758))
* Fix `load_word2vec_format` from `FastText`. Fix #1743 (__[@manneshiva](https://github.com/manneshiva)__, [#1755](https://github.com/RaRe-Technologies/gensim/pull/1755))
* Fix ipython kernel version in `Dockerfile`. Fix #1762 (__[@rbahumi](https://github.com/rbahumi)__, [#1764](https://github.com/RaRe-Technologies/gensim/pull/1764))
* Fix writing in `segment_wiki` (__[@horpto](https://github.com/horpto)__, [#1763](https://github.com/RaRe-Technologies/gensim/pull/1763))
* Fix write method of file requires byte-like object in `segment_wiki` (__[@horpto](https://github.com/horpto)__, [#1750](https://github.com/RaRe-Technologies/gensim/pull/1750))
* Fix incorrect vectors learned during online training for `FastText`. Fix #1752 (__[@manneshiva](https://github.com/manneshiva)__, [#1756](https://github.com/RaRe-Technologies/gensim/pull/1756))
* Fix `dtype` of `model.wv.syn0_vocab` on updating `vocab` for `FastText`. Fix #1759 (__[@manneshiva](https://github.com/manneshiva)__, [#1760](https://github.com/RaRe-Technologies/gensim/pull/1760))
* Fix hashing-trick from `FastText.build_vocab`. Fix #1765 (__[@manneshiva](https://github.com/manneshiva)__, [#1768](https://github.com/RaRe-Technologies/gensim/pull/1768))
* Add explicit `DeprecationWarning` for all outdated stuff. Fix #1753 (__[@menshikh-iv](https://github.com/menshikh-iv)__, [#1769](https://github.com/RaRe-Technologies/gensim/pull/1769))
* Fix epsilon according to `dtype` in `LdaModel` (__[@menshikh-iv](https://github.com/menshikh-iv)__, [#1770](https://github.com/RaRe-Technologies/gensim/pull/1770))

:books: Tutorial and doc improvements:
* Update perf numbers of `segment_wiki` (__[@piskvorky](https://github.com/piskvorky)__, [#1708](https://github.com/RaRe-Technologies/gensim/pull/1708))
* Update docstring for `gensim.summarization.summarize`. Fix #1575 (__[@fbarrios](https://github.com/fbarrios)__, [#1702](https://github.com/RaRe-Technologies/gensim/pull/1702))
* Refactor API Reference for `gensim.parsing`. Fix #1664 (__[@CLearERR](https://github.com/CLearERR)__, [#1684](https://github.com/RaRe-Technologies/gensim/pull/1684))
* Fix typos in doc2vec-wikipedia notebook (__[@youqad](https://github.com/youqad)__, [#1727](https://github.com/RaRe-Technologies/gensim/pull/1727))
* Fix PyPI long description rendering (__[@edigaryev](https://github.com/edigaryev)__, [#1739](https://github.com/RaRe-Technologies/gensim/pull/1739))
* Fix twitter badge src (__[@menshikh-iv](https://github.com/menshikh-iv)__)
* Fix maillist badge color (__[@menshikh-iv](https://github.com/menshikh-iv)__)

:warning: Deprecations (will be removed in the next major release)
* Remove
- `gensim.examples`
- `gensim.nosy`
- `gensim.scripts.word2vec_standalone`
- `gensim.scripts.make_wiki_lemma`
- `gensim.scripts.make_wiki_online`
- `gensim.scripts.make_wiki_online_lemma`
- `gensim.scripts.make_wiki_online_nodebug`
- `gensim.scripts.make_wiki`

* Move
- `gensim.scripts.make_wikicorpus` ➡ `gensim.scripts.make_wiki.py`
- `gensim.summarization` ➡ `gensim.models.summarization`
- `gensim.topic_coherence` ➡ `gensim.models._coherence`
- `gensim.utils` ➡ `gensim.utils.utils` (old imports will continue to work)
- `gensim.parsing.*` ➡ `gensim.utils.text_utils`


## 3.1.0, 2017-11-06


Expand Down
6 changes: 2 additions & 4 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
recursive-include docs *
recursive-include gensim/test/test_data *
recursive-include . *.sh
prune docs/src*
prune docs/notebooks/datasets
include README.md
include CHANGELOG.md
include COPYING
Expand All @@ -14,3 +10,5 @@ include gensim/models/word2vec_inner.pyx
include gensim/models/word2vec_inner.pxd
include gensim/models/doc2vec_inner.c
include gensim/models/doc2vec_inner.pyx
include gensim/models/fasttext_inner.c
include gensim/models/fasttext_inner.pyx
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ gensim – Topic Modelling in Python
[![GitHub release](https://img.shields.io/github/release/rare-technologies/gensim.svg?maxAge=3600)](https://github.com/RaRe-Technologies/gensim/releases)
[![Wheel](https://img.shields.io/pypi/wheel/gensim.svg)](https://pypi.python.org/pypi/gensim)
[![DOI](https://zenodo.org/badge/DOI/10.13140/2.1.2393.1847.svg)](https://doi.org/10.13140/2.1.2393.1847)
[![Mailing List](https://img.shields.io/badge/-Mailing%20List-lightgrey.svg)](https://groups.google.com/forum/#!forum/gensim)
[![Mailing List](https://img.shields.io/badge/-Mailing%20List-brightgreen.svg)](https://groups.google.com/forum/#!forum/gensim)
[![Gitter](https://img.shields.io/badge/gitter-join%20chat%20%E2%86%92-09a3d5.svg)](https://gitter.im/RaRe-Technologies/gensim)
[![Follow](https://img.shields.io/twitter/follow/spacy_io.svg?style=social&label=Follow)](https://twitter.com/gensim_py)
[![Follow](https://img.shields.io/twitter/follow/gensim_py.svg?style=social&label=Follow)](https://twitter.com/gensim_py)



Expand Down
51 changes: 5 additions & 46 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,29 +13,20 @@ environment:
secure: qXqY3dFmLOqvxa3Om2gQi/BjotTOK+EP2IPLolBNo0c61yDtNWxbmE4wH3up72Be

matrix:
# - PYTHON: "C:\\Python27"
# PYTHON_VERSION: "2.7.12"
# PYTHON_ARCH: "32"

- PYTHON: "C:\\Python27-x64"
PYTHON_VERSION: "2.7.12"
PYTHON_ARCH: "64"

# - PYTHON: "C:\\Python35"
# PYTHON_VERSION: "3.5.2"
# PYTHON_ARCH: "32"
TOXENV: "py27-win"

- PYTHON: "C:\\Python35-x64"
PYTHON_VERSION: "3.5.2"
PYTHON_ARCH: "64"

# - PYTHON: "C:\\Python36"
# PYTHON_VERSION: "3.6.0"
# PYTHON_ARCH: "32"
TOXENV: "py35-win"

- PYTHON: "C:\\Python36-x64"
PYTHON_VERSION: "3.6.0"
PYTHON_ARCH: "64"
TOXENV: "py36-win"

init:
- "ECHO %PYTHON% %PYTHON_VERSION% %PYTHON_ARCH%"
Expand All @@ -57,48 +48,16 @@ install:
# not already installed.
- "powershell ./continuous_integration/appveyor/install.ps1"
- "SET PATH=%PYTHON%;%PYTHON%\\Scripts;%PATH%"
- "python -m pip install -U pip"
- "python -m pip install -U pip tox"

# Check that we have the expected version and architecture for Python
- "python --version"
- "python -c \"import struct; print(struct.calcsize('P') * 8)\""

# Install the build and runtime dependencies of the project.
- "%CMD_IN_ENV% pip install --timeout=60 --trusted-host 28daf2247a33ed269873-7b1aad3fab3cc330e1fd9d109892382a.r6.cf2.rackcdn.com -r continuous_integration/appveyor/requirements.txt"
- "%CMD_IN_ENV% python setup.py bdist_wheel bdist_wininst"
- ps: "ls dist"

# Install the genreated wheel package to test it
- "pip install --pre --no-index --find-links dist/ gensim"

# Not a .NET project, we build scikit-learn in the install step instead
build: false

test_script:
# Change to a non-source folder to make sure we run the tests on the
# installed library.
- "mkdir empty_folder"
- "cd empty_folder"
- "pip install pyemd testfixtures sklearn Morfessor==2.0.2a4"
- "pip freeze"
- "python -c \"import nose; nose.main()\" -s -v gensim"
# Move back to the project folder
- "cd .."

artifacts:
# Archive the generated wheel package in the ci.appveyor.com build report.
- path: dist\*
on_success:
# Upload the generated wheel package to Rackspace
# On Windows, Apache Libcloud cannot find a standard CA cert bundle so we
# disable the ssl checks.
- "python -m wheelhouse_uploader upload --no-ssl-check --local-folder=dist gensim-windows-wheels"

notifications:
- provider: Webhook
url: https://webhooks.gitter.im/e/62c44ad26933cd7ed7e8
on_build_success: false
on_build_failure: True
- tox -vv

cache:
# Use the appveyor cache to avoid re-downloading large archives such
Expand Down
Loading

0 comments on commit 6dd8ae7

Please sign in to comment.