Skip to content

Commit

Permalink
Removed scipy dependency in extractFormants
Browse files Browse the repository at this point in the history
Hello to whoever is maintaining this these days. I see the rpy2
dependency has been removed, yay. However, there is no need to go all
the way to scipy, as the scipy functionality the system uses is all
available in numpy. For instance, `scipy.linalg.inv` is just
`numpy.linalg.inv`. And Mahalanobis distance is a trivial one-line
function (here done with two lines for optimization reasons). My tests
(inside the new `mahalanobis.py` file) show that you get _exactly the
same result_ as the master branch, for any randomly generated data.
Using an IHELP wordlist, I also confirmed that my branch gives you the
exact same results, and runs somewhat faster as well (probably due to
not having to import scipy).

So, why remove a dependency on scipy in favor of one on numpy? Well,
everyone who can run scipy can run numpy (the former depends on the
latter), but not the other way around. _Many_ people (OS X users in
particular) experience trouble installing scipy, myself included. Also,
you get faster load times this way.

I had to make four other changes on this branch:

* The way Praat is located on the system is incompatible with (all?)
  case-sensitive file system. For whatever reason, when I installed
* Praat the binary is called `Praat', but the system failed to find or
  use it. Users could get around this using a config file, except the
  system currently ignores the `Praat`/`praat` distinction, merging into
  the latter. I changed it so case information is preserved.
* Almost everything in this system was marked executable, though only
  `extractFormants.py` and `remeasure.py` are executable; fixed.
* Binary package management is a disaster for Python on all platforms
  known to me. I don't think you should tell users to use it. On _all_
  platforms, `pip` is the only effective option that is unlikely to ruin
  your system Python install.
* I ran `autopep8`.

PS: the readme says to send pull requests to the `dev` branch, but that
branch doesn't exist.
  • Loading branch information
kylebgorman committed Nov 10, 2013
1 parent 03eb9b3 commit 9c44a91
Show file tree
Hide file tree
Showing 6 changed files with 50 additions and 1 deletion.
Empty file modified FAVE-extract/README.md
100755 → 100644
Empty file.
48 changes: 48 additions & 0 deletions FAVE-extract/bin/mahalanobis.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#!/usr/bin/env python
# Mahalanobis distance function for extractFormants.py
# Kyle Gorman <[email protected]>

import numpy as np


def mahalanobis(u, v, ic):
"""
Compute Mahalanobis distance between two 1d vectors _u_, _v_ with
sample inverse covariance matrix _ic_; a ValueError will be thrown
if dimensions are incorrect.
Mahalanobis distance is defined as
\sqrt{(u - v) \sum^{-1} (u - v)^T}
where \sum^{-1} is the sample inverse covariance matrix. A particularly
useful case is when _u_ is an observation, _v_ is the mean of some
sample, and _ic_ is the inverse covariance matrix of the same sample.
# if _ic_ is an identity matrix, this becomes the Euclidean distance
>>> N = 5
>>> ic = np.eye(N)
>>> u = np.array([1 for _ in xrange(N)])
>>> v = np.array([0 for _ in xrange(N)])
>>> mahalanobis(u, v, ic) == np.sqrt(N)
True
# check against scipy; obviously this depends on scipy
>>> u = np.random.random(N)
>>> v = np.random.random(N)
>>> ic = np.linalg.inv(np.cov(np.random.random((N, N * N))))
>>> from scipy.spatial.distance import mahalanobis as mahalanobis_scipy
>>> mahalanobis(u, v, ic) == mahalanobis_scipy(u, v, ic)
True
"""
# these coercions are free if u and v are already matrices
diff = np.asmatrix(np.asarray(u) - np.asarray(v))
# ic will be coerced to type matrix if it is not already
return float(np.sqrt(diff * ic * diff.T))


if __name__ == '__main__':
import doctest
doctest.testmod()
Empty file modified FAVE-extract/cmu_phoneset.txt
100755 → 100644
Empty file.
3 changes: 2 additions & 1 deletion FAVE-extract/config.txt
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
outputFormat=both
speechSoftware=Praat
formantPredictionMethod=mahalanobis
measurementPointMethod=faav
nSmoothing=12
remeasurement=T
vowelSystem=phila
vowelSystem=phila
Empty file modified FAVE-extract/covs.txt
100755 → 100644
Empty file.
Empty file modified FAVE-extract/means.txt
100755 → 100644
Empty file.

0 comments on commit 9c44a91

Please sign in to comment.