Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made a version of MlBayesim that doesn't store NaNs in the tables so that huge models can be estimated. #1750

Merged
merged 24 commits into from
Mar 31, 2024

Conversation

jdramsey
Copy link
Collaborator

This was a wish list item.

Added a new implementation of the BayesIm interface, MlBayesImOld.java, for representing directed acyclic graphs in Bayes nets. This class also supports operations for manipulating the node tables including setting and retrieving probabilities, normalizing node tables, and checking if table rows are incomplete.
Implemented a new class ProbMap.java in bayes package, as a map between a unique integer index for a particular node and its probability. This mapping omits NaN values and it's implemented in MlBayesIm class. MlBayesIm uses either traditional probability matrices or these new probability maps, based on a flag setting.
Introduced ProbMap class, representing a efficient mapping between a unique index and node probability, neglecting NaN values. Integrated this into MlBayesIm replacing old matrix-based approach, governed by useProbMatrices flag.
Updated ProbMap.java and MlBayesIm.java with detailed Javadoc comments for better understanding of the code. Also, refined the copyDataToProbMatrices() method in MlBayesIm.java to consider the 'useProbMatrices' flag before copying data from 'probs' array to 'probMatrices' array.
@jdramsey jdramsey requested a review from kvb2univpitt March 28, 2024 07:13
jdramsey added 19 commits March 28, 2024 03:21
The ProbMap class and its instances have been renamed to CptMap to better reflect their purpose of handling Conditional Probability Tables (CPTs). Changes were also executed in MlBayesIm class to reflect this modification. Added more detailed Javadoc comments for improved readability and understanding.
The ProbMap class and its instances have been renamed to CptMap to better reflect their purpose of handling Conditional Probability Tables (CPTs). Changes were also executed in MlBayesIm class to reflect this modification. Added more detailed Javadoc comments for improved readability and understanding.
The ProbMap class and its instances have been renamed to CptMap to better reflect their purpose of handling Conditional Probability Tables (CPTs). Changes were also executed in MlBayesIm class to reflect this modification. Added more detailed Javadoc comments for improved readability and understanding.
Fixed grammar in the explanatory comments of MlBayesIm.java, mainly focusing on the description of the division of labour among different classes and the purposes of the different dimensions in the data arrays. Also, improved sentence flow in a few places to enhance readability.
Updated MlBayesIm storage strategy from a three-dimensional array to an array of CptMap objects, which offer a more efficient representation for large conditional probability tables. Enhanced code readability by removing unnecessary checks and revising comments for clarity. This new method omits the storage of NaNs allowing space savings for sparse tables.
Transitioned to an efficient storage method for large conditional probability tables in the CptMap class. This change
Removed duplicative comments and unnecessary explanations about unique integer index calculation in CptMap class. Refocused on the key functionality: storage of large conditional probability tables in a compact form excluding NaN values.
Modified the explanation of MlBayesIm's probability storage method from a three-dimensional array to a sparse method that does not store NaNs. This change highlights the method's efficiency when working with large Bayesian probabilistic models. The old storage method's description remains for backward compatibility.
Removed an embedded class called SimulationTask along with related code lines in MlBayesIm.java file. The modification simplifies the constructSample function, as the simulation now utilizes a sequential approach rather than a fork/join parallel computation, making it more straightforward.
Refactored the MlBayesIm.java file to replace the use of probability matrices with CptMaps for storing probabilities. This change introduces more efficient data structures and methods for maintaining and accessing probabilities. Backward compatibility is maintained through the fallback to a "probs" array if CptMaps are not used.
Modified MlBayesIm.java to implement CptMaps instead of probability matrices for better memory management. While this update optimizes probability storage and access methods, it also ensures backward compatibility by keeping the old method and introducing a flag to select the method.
This commit simplifies the getVariableNames method in the MlBayesIm class. It replaces the for loop that used indices with a more straightforward enhanced for loop that directly iterates over the nodes. Consequently, this makes the code cleaner and easier to understand without affecting functionality.
Null check has been added in the existsParameterizedConstructor method in SessionNode class. If the model class is null, then the method will return false indicating that no constructor exists. This added validation is intended to prevent potential issues related to null object handling.
Refactored the conditional check in `RandomGraph.java` to use `isEmpty()` function instead of `size() == 0`. Adjusted parameter names in `BayesPmWrapper.java` for improved clarity. Corrected a typographical error in `RandomGraphEditor.java` from "graphRandomFoward" to "graphRandomForward". These changes improve code readability and maintainability.
Corrected a typographical error in a comment within the `Statistic.java` file. This edit improves the readability of the codebase for future developers by ensuring comments accurately describe the intended functionality and behavior of the code.
Updated the usage of initialization methods in various Bayes related classes, specifically changing from "MlBayesIm.RANDOM" and "MlBayesIm.MANUAL" to "MlBayesIm.InitializationMethod.RANDOM" and "MlBayesIm.InitializationMethod.MANUAL". Created new files for CptMapProbs and CptMapCounts as part of the refactoring. This update helps make the code more readable and consistent across different parts of the application.
Updated the usage of initialization methods in various Bayes related classes, specifically changing from "MlBayesIm.RANDOM" and "MlBayesIm.MANUAL" to "MlBayesIm.InitializationMethod.RANDOM" and "MlBayesIm.InitializationMethod.MANUAL". Created new files for CptMapProbs and CptMapCounts as part of the refactoring. This update helps make the code more readable and consistent across different parts of the application.
Updated the usage of initialization methods in various Bayes related classes, specifically changing from "MlBayesIm.RANDOM" and "MlBayesIm.MANUAL" to "MlBayesIm.InitializationMethod.RANDOM" and "MlBayesIm.InitializationMethod.MANUAL". Created new files for CptMapProbs and CptMapCounts as part of the refactoring. This update helps make the code more readable and consistent across different parts of the application.
Copy link
Collaborator

@kvb2univpitt kvb2univpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@jdramsey jdramsey merged commit 6ebe356 into development Mar 31, 2024
@jdramsey jdramsey deleted the joe_bayes_im branch March 31, 2024 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants