v7.5.0
Updates for 7.5.0
General comment:
- For version 7.5.0, we will do separate releases for JDK 1.8 and JDK > 1.8 (compiled under JDK 17). It has come to our attention that many users cannot update to a recent version of Java because their machines are not self-administered, and we wish to support these users. The Tetrad codebase uses language level 8, so this is feasible.
Algorithm highlights:
- Added an implementation of DirectLiNGAM.
- Added an implementation of DAGMA.
- Added an additive nonlinear simulation method.
- Added Integration of Overlapping Datasets (IOD) as a pooled independence test that can be used in, say, FCI.
- Improved interfaces and code for PC, CPC, FAS, and other related classes.
- Generalized the Markov blanket checker to be compatible across graph types.
- Added the Restricted BOSS algorithm, which does an anteriority variable selection and then runs BOSS over that set of variables.
- Updated FGES-MB with a more up-to-date algorithm and new options.
- Did several updates/revisions of CStaR and added BOSS and Restricted BOSS as options for the CPDAG search.
Interface highlights:
- Added a menu item to the interface to show people how to make suggestions.
- Added a new Markov checker accessible from both Python and the Java interface.
- Added a new dialog showing descriptive statistics for entire datasets.
- The descriptive statistic dialog also identifies constant columns in the data and gives an example of a 2-3 variable singularity if one exists.
- Added an interactive plot matrix tool.
- Added a new popup, compatible cross-platform, to show when processes are being run in the interface.
- Added a graph manipulation option to strip null edges from bootstrapping graphs.
- Added informative exceptions when nonsingularies are encountered in test and scoring operations.
- Moved methods to compare two graphs to the algcomparison package.
- Added a new parameter to allow saving of bootstrapped graphs to be turned on or off.
- Added tools to copy coefficient matrix copy error covariance matrix' to SEM IM editor and SEM Estimator Editor.
- Wrote a paper for the CAWS conference introducing our Python and R interfaces.
Debugging highlights:
- Turned unit test back on to test serialization so that saving and loading will be more stable.
- Fixed a node equality bug causing discrepancies between Causal Command and the Tetrad GUI.
- Fixed bugs preventing simple time series from being simulated.
- Fixed and updated knowledge layouts.
- Fixed issue with large bootstrapping operations where session sizes would balloon to large sizes because bootstrapped graphs were saved.
====DETAILS====
- Decreasing the minimum version needed for Tetrad to 1.8, so long as it is compiled under Corretto 1.8.
- Added a parameter to BOSS to permit optional inclusion of the BES step.
- Fixed some unit tests for the data loader that were failing on Windows; now, all tests pass on all platforms.
- Added BES option to BFCI.
- Added a method in Paths to calculate the maximal cliques in a graph using the Brom-Kerbosch algorithm.
- Cleaned up PC, CPC, FAS, PcCommon, and other related classes in the search and algcomparison packages. Their parameters are slightly different now.
- Removed PcMax and added a parameter in PC to use the Max-P heuristic for orient colliders.
- Adjusted hyperparameters for PC, CPC, FAS, and their defaults. Please see the code and interfaces.
- Changed 'aggressively prevent cycles' to 'meekPreventCycles' throughout the code.
- Coded up a new Integration of Overlapping Datasets (IOD) pooled independence test and an FCI-IOD wrapper for it that takes a list of datasets with overlapping variables and generates an FCI graph for them that pools independence tests for X || Y | Z for all datasets containing the variables X, Y and all of Z, using the Fisher p-value pooling method.
- Moved the IOD independence test from the work_in_progress to the search package.
- Linked to the search Javadocs from the Java interface manual.
- Coded up a new method for running processes with a popup "Stop" button that doesn't call the deprecated Thread.stop(). The "Stop" button was broken on Windows but now works on all platforms.
- Added a graph tool to strip null edges from graphs included in graphs by the bootstrapping API. This is to allow bootstrapping graphs to be estimated.
- Added 'copy coefficient matrix' and 'copy error covariance matrix' menu items to SEM IM editor and SEM Estimator Editor.
- Changed the sets of "should be independent" (Markov check) and "should be dependent" (Faithfulness check) facts in the Markov Checker. For the Markov check X || Y | parents(X) is listed for dsep(X, Y | parents(X)), and the Faithfulness check, X ~|| Y | parents(X) is listed for dconn(X, Y | parents(X).
- Relabeled tabs in Markov check to "Check Local Markov" and "Check Local Faithfulness."
- Extracted a Python-accessible model for the Markov check in the search package and refactored the interface to use it.
- Added a Help button to the Markov check that gives instructions for how to use the tool.
- Added a Kolmogorov-Smirnov check of uniformity for p-values in the Markov checker. For the "Check Local Markov" tab, if the test yields p-values, this should register as Uniform (p > alpha) for good results.
- Added the Markov Adequacy Score to the Markov Check and to the Markov Check Editor, which is zero if non-Markov and the faction of dependent tests under the m-connected condition otherwise (to give a sense of how faithful the graph is to the distribution).
- Changed "d-separation" and "d-connection" to "m-separation" and "m-connection" throughout the codebase to emphasize the fact that d-separation is m-separation restricted to the case of DAGs.
- Fixed problem preventing GRaSP-FCI from using m-separation to analyze an input graph.
- Put the legal PAG check in the interface inside of a watch block so that it can be canceled for large graphs.
- Changed name of IndTestMsep to MsepTest.
- Added parameters to GRaSP and BOSS to disallow randomness inside the algorithm. Allowing randomness helps these algorithms avoid local optima, but the user may wish to prioritize consistent answers for the algorithms.
- Replaced Markov blanket calculations in GraphUtils with a calculation that works for DAGs, CPDAGs, MAGs, and PAGs.
- Added an option in the Markov Checker to display independence results for Ind(X, Y | MB(X)).
- Generalized the markovBlanketDag method in GraphUtils to markovBlanketGraph, so that it gives the subgraph over {X} U MB(X) for any DAG, CPDAG, MAG, or PAG.
- Fixed the displayed error for the case where you erroneously try to do bootstrapping on a covariance matrix.
- Cleaned up the graph comparison API. (There were two such methods; collapsed these into one.)
- Fixed a null pointer bug in the Edgewise Comparison for the case where you compared a PAG to the true DAG with latents.
- Fixed bugs preventing simple time series from being simulated.
- Added Bollen and Ting reference for Wishart and Delta tetrad tests to the manual (for BPC, FOFC, and FTFC). I also added the original Delta test reference to the manual (Bollen 1990).
- Added a knowledge layout by knowledge indices in the Layout menus. Tiers for variables need to be indicated in the variable names as "X," e.g., for the last tier, and, e.g., "X:1" for the next to last tier, etc.
- Fixed some issues with knowledge for BFCI, GRaSP-FCI, and BFCI.
- Turned stable FAS back on.
- Allowed control-click to substitute for right-click for graph workbench to accommodate Mac Magic Mouse.
- Fixed the code so that if a search uses knowledge with more than one tier, a knowledge layout is used based on indices of variables ("X", "X:1", "X:2", etc.
- Made knowledge layout work (not from knowledge indices) in the Search box.
- Refactored GFCI, BFCI, and GRaSP-FCI to make the code more modular.
- Fixed node hashcode and equality methods to look at the names of nodes rather than node objects.
- Revised the hashCode and equals methods for the node classes to forgo testing object identity and instead always test for equality of names.
- Fixed bug in Knowledge editor where if you loaded knowledge with more than 3 tiers, only 3 would be displayed.
- Added a hashcode method for DiscreteVariable to return the hashcode of the variable's name (as with the other variable types).
- Cleaned up equals method for Edge to use name equality only and node hashcodes.
- Removed the NodeEquality class and all references to it in the project.
- Added a menu to the Java Tetrad interface to allow users to submit suggestions. It contains a clickable link that takes the user to the Tetrad Issues List.
- Moved the dangling IndependenceTest interface into the search package.
- Removed the structure prior parameter from Degenerate Gaussian as it was not used.
- Turned the test back on to deserialize saved serialization archives to address saving and loading issues with Tetrad sessions.
- Added new serialization archives for tetrad-lib and tetrad-gui for 2023-06-27.
- Placed Save... and Save As... actions for sessions in the Tetrad app inside watch blocks to reduce the possibility of session corruption due to prematurely typing control-Q (which Java can't catch).
- Added informative exceptions in places where singularity exceptions are thrown, indicating which variables are involved in the singularities.
- Added CompareTwoGraphs in algcomparison to compare two graphs on a list of statistics.
- Added a printout in the Descriptive Statistics popup to pre-screen data for constant columns and singularities for 2 or 3 variables taken at a time.
- Added a new algorithm, DirectLingam. This algorithm, like IcaLingam and the pairwise orientation rules, addresses the linear, non-Gaussian case. Reference: Shimizu, S., Inazumi, T., Sogawa, Y., Hyvarinen, A., Kawahara, Y., Washio, T., ... & Hoyer, P. (2011). DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research-JMLR, 12(Apr), 1225-1248.
- Added a Plot Matrix tool to the Data box "Tools" menu, replacing the separate Scatter Plot and Histogram tools.
- Added an "all sets" option to the Markov checker that can be used for models up to 12 variables.
- Revised the descriptive statistic tool in the Data Editor to show descriptive statistics for the entire dataset in a table. Constant columns and an example of a 2-3 variable singular subset of columns are shown at the bottom of the tool if available.
- Added a parameter, saveBootstrapGraphs, default to false. If this parameter is set to true, individual bootstrapped graphs will be saved. This was to address a, but if a large bootstrapping operation was performed, and the Tetrad session was saved out, the result would be a huge file.
- Added an additive nonlinear simulation.
- Fixed an issue for Knowledge where if you tried to make super large knowledge tiers (for 20,000 variables) FGES would grind to a halt.
- Fixed a bug in Knowledge for 20,000 variables where if you set knowledge only in tiers 1 and 2 some variables would come out in tier 0.
- Reset knowledge to use tiers numbered 0, 1, 2,... to fix a problem with setting tiered knowledge from R.
- Added Restricted BOSS, useful for datasets with large numbers of variables. Restricts BOSS to variables anterior to a given list of target variables.
- Added an option for calculate covariances on the fly throughout the code. This is useful for datasets with large number of variables with sample sizes that are not too large comparatively.
- Updated FGES-MB with a new interface. Now you can select how far concentrically the algorithm searches from the target variables and specify how you'd like the graph to be trimmed, whether no trimming, just to the adjacent of the target, or to the graph over the Markov blankets of the targets with the targets, or to the nodes that have semidirected paths to the targets.
- Updated CStaR to use the algcomparison API. Added BOSS and Restricted BOSS as options and revised the parameter lists.
- Updated the manual entry for CStaR.
- Fixed a bug in the covariance/correlation display where if you deleted a column in the data and tried to display the correlation matrix, an exception would be thrown.
- Fixed an issue with singularities for scores stopping execution.
- Fixed an issue with CStaR not stopping when requested in the interface.
- Fixed issue where knowledge variable containing periods would be removed from tiers.
- For this version (7.5.0) we will do separate releases for JDK 1.8 and JDK > 1.8 (compiled under JDK 17). It has come to our attention that many users are unable to update to a recent version of Java because their machines are not self-administered, and we wish to support these users. The Tetrad codebase uses language level 8, so this is feasible.