Skip to content
Joseph Lizier edited this page Feb 15, 2024 · 8 revisions

Frequently asked questions about the toolkit

On this page, we list several frequently asked questions about the toolkit (these are mostly excerpts from emails):

What do I do about java.lang.OutOfMemoryError?

Getting java.lang.OutOfMemoryError errors means that the heap space allocated to your JVM instance is not large enough for your program.

If you are running Java directly, then you can increase the maximum heap space by adding an argument to the java command such as -Xmx512M to request 512 MB heap space, for example.

When running Java from other environments, e.g. in an IDE, in Matlab, GNU Octave, Python etc, you will need to specify this parameter in a manner dependent on that environment:

  • Matlab often gives this error because the heap space is usually set to a rather low value. Increasing the heap space allocated by Matlab to the JVM is described e.g. here.
  • For Python via JPype, you can specify this parameter when calling the startJVM() function; see comments in the PythonExamples.
  • For Julia, you can specify this parameter when calling the JavaCall.init() method; see comments in the JuliaExamples.

How can I build JIDT from the latest SVN sources?

If you need one of the latest features that are only available via SVN (or you want to grab the source code and make some changes), you can grab the source and build infodynamics.jar as follows:

  1. Make an SVN checkout, e.g.:
svn checkout http://information-dynamics-toolkit.googlecode.com/svn/trunk/ information-dynamics-toolkit-read-only

or update an existing SVN checkout:

svn update
  1. Compile the code and build your own infodynamics.jar:
ant jar

How fast is the toolkit, e.g. for transfer entropy estimation?

I've had some enquiries regarding how long the toolkit will take to compute transfer entropy on large data sets, say 100 000 samples. The answer depends mostly on the type of data that you are calculating transfer entropy on:

  1. Is it discrete or discretized data? infodynamics.discrete.ApparentTransferEntropyCalculator is very fast, on the order of 1 sec or less for this sized data.
  2. Is it continuous data (that you would prefer not to discretize)? This is generally slower, using techniques such as a Kraskov-Grassberger estimator (this is the best of breed for continuous valued data), i.e. infodynamics.measures.continuous.kraskov.TransferEntropyCalculatorKraskov, but not too far behind. At v1.0, a 100 000 time step calculation would take almost an hour with this estimator; however with the recent implementation of a fast-neighbour search algorithm (from v1.1 onwards) this has dropped to (sub?) seconds. If you want the calculation to run faster than that, then you could discretize your data (at the expense of accuracy), or subsample (say into 10 data sets of 10 000 time steps, and average over these. This will be a little less accurate, but you will get a standard error measurement from this approach as well). You can test out how long the code takes to run e.g. with example4 in the simple java demos (mirrored in the simple octave/matlab demos here and the python demos here), by altering the length of the random data that is used.

What does it mean if I get negative results from a Kraskov-Stoegbauer-Grassberger estimator?

You need to think about the output of the KSG algorithm (for mutual information, conditional MI or transfer entropy) as an estimation from a finite amount of empirical data. All estimators have a bias and variance w.r.t. the true value from underlying generating PDF. The KSG estimator seeks to eliminate the bias (under some broad assumptions), but will still have variance. So, when the true value from the underlying generating PDF would have been zero (or close to zero) MI or TE, then the bias correction in the KSG estimator should deliver an expected result around zero, and the variance around that value will mean that you will see a lot of negative results.

So, getting a negative value simply means that the relationship you've measured is less than the expected value if the variables were not related. Taking the variance into account, it would be likely that the result was consistent with no relationship between the variables.

It's possible also that your results can be more strongly skewed negative if: a. your variables have some extreme of no relationship, or don't satisfy the estimator's broad assumptions, e.g. one variable has little variance (local to the sample point) compared to the other; or b. for conditional MI, there is no conditional independence and the two variables both have strong relationships to the target. To account for a negative skew which may be due to the bias correction being slightly off, you can always subtract the mean of surrogates (a.k.a. an effective transfer entropy, a la Marschinski and Kantz, EPJB, 2002) to make the bias removal tailored properly to your variables.

Can the Kraskov-Stoegbauer-Grassberger estimator add noise to the data?

The original publication of the KSG estimator recommends the addition of a very small amount of noise to the data (e.g 1e-8) to address the situation where multiple data points share the same value in (at least) one dimension.

This can be done for each KSG estimator in JIDT (for MI basec calculators from v1.0, for conditional MI based calculators from v1.1) as shown below for the transfer entropy estimator:

// The following is Java code; change it to your own language if required
TransferEntropyCalculatorKraskov teCalc =
		new TransferEntropyCalculatorKraskov();
teCalc.setProperty("NORMALISE", "true"); // Normalise the individual variables (default)
// Set up to add noise with standard deviation of 0.00000001 normalised units
teCalc.setProperty("NOISE_LEVEL_TO_ADD", "0.00000001");
// Then use the calculator as normal ...

Please note: as of release v1.3 the addition of noise using the KSG estimator is switched on by default.

Importantly -- the addition of noise means that the results of the KSG estimators become (slightly) stochastic. If you need to produce repeatable results, then you can simply turn off the noise addition by setting property NOISE_LEVEL_TO_ADD to 0. (If you still need noise added, but want this to be repeatable, then I suggest you do this yourself before calling JIDT, using a fixed random number seed).

More details are shown in the Javadocs for the setProperty(String, String) method for each relevant calculator.

Why are my results from a Kraskov-Stoegbauer-Grassberger estimator stochastic?

See the FAQ Can the Kraskov-Stoegbauer-Grassberger estimator add noise to the data?, including the suggestion on how to turn this feature off if you need to.

How do I set the Kraskov-Stoegbauer-Grassberger estimator for transfer entropy to use algorithm 2?

The MI and conditional MI estimators using the KSG algorithms have separate classes for each of the KSG algorithms (1 and 2), being e.g. MutualInfoCalculatorMultiVariateKraskov1 and MutualInfoCalculatorMultiVariateKraskov2. However, for Transfer Entropy there is only one available class TransferEntropyCalculatorKraskov which uses algorithm 1 by default.

To use algorithm 2 here, one should set the property "ALG_NUM" to have value "2", i.e. have a statement:

teCalcKSG2.setProperty("ALG_NUM", "2");

before the calculator is initialised. One can switch the calculator back to algorithm 1 by setting this property to have value "1".

Can I normalise my transfer entropy values somehow?

There are some potential normalisations you can make to TE values in order to try to make them comparable between different time series pairs. Your options are:

  1. If you want to know what proportion of the total uncertainty in the target variable was accounted for by the TE, then normalise by the entropy of the target variable;
  2. if you want to know what proportion of the remaining uncertainty -- after accounting for the target variable's own past (i.e. the active information storage) -- was accounted for by the TE, then normalise by the entropy rate of the target variable. Which of these options you choose depends on which of these questions you want to answer; the latter is more common because it's a tighter bound on the TE to the target, but both address slightly different questions and carry their own meaning.

If you want to read further on how the TE is a component of the entropy of the next value of the target/receiver, and within that of the entropy rate of that target/receiver (this being a tighter bound as Thomas says), I can suggest sections 3.2.2 and 4.1.4 of my thesis/book (at Springer and preprint). There's also some commentary on these as normalisation techniques at section 4.5.2 of our book on TE (Springer). There we also describe how making sure we're incorporating bias correction in the estimator is an important ingredient.

There is one very important caveat to mention here: The above only applies for Shannon entropies (i.e. for discrete or discretised variables). You cannot do this for continuous valued variables. This is because the entropy and entropy rate are differential entropies, which are not strictly non-negative and are not by definition upper limits to the TE (or any other mutual information involving the variable). So if you've estimated TE say via the KSG estimator, you're not going to be able to use the above. Indeed, I'm not sure there's a good answer for what you could do. You might try to compare the pairwise TE against the whole collective TE from all of the parent variables identified to the target, but this will necessarily leave out any intrinsic uncertainty from being included in the denominator, and I think that would introduce quite some variability.

I want to make a conditional entropy calculation?

As of Feb 2024, there are no implementations for conditional entropy in JIDT -- mainly because I never really had reason for it, and there has been little demand. Most of my empirical work is on MI / CMI based measures rather than entropies.

If you just want to use discrete data, you can use two Entropy estimators and take the difference between them (H(X|Y) = H(X,Y) - H(Y)); you'll just need to combine the X,Y into a re-encoded joint variable (doable with the MatrixUtils.computeCombinedValues() method).

Ditto for continuous valued data, though here to make a joint variable you just put them as separate columns in a 2D array and pass these to the corresponding MultiVariate class for each continuous entropy estimator. You can do that for Gaussian, Kernel and Kozachenko variants. Combining 2x Kozachenko estimators is the closest you'll get to a KSG style implementation, though the biases won't cancel out as nicely (this can kind of be done with some extra maths if you look into how KSG works internally, but may or may not be worth doing that).

The call sequences are basically the same as you will see for the Entropy estimators in the AutoAnalyser, though as of Feb 2024 the AutoAnalyser only shows this for univariates. The only difference for Multivariates should be the initialisation to give the number of dimensions and then passing in 2D arrays for setObservations.

Clone this wiki locally