Fixed shuffle function and combine all shuffled independence test p vals for a target node into a flat list to feed into Anderson Darling Test #1781

vbcwonderland · 2024-05-28T19:23:43Z

Shuffle function now fixed.

When shuffleThreshold = 1.0, the number of independence test p vals is the same of amount as the total number of nodes in the whole graph.
When shuffleThreshold = 0.5, the number of independence test p vals is the twice of amount as the total number of nodes in the whole graph.

Then the flat list of combined total shuffled independence test p vals will be send into ADTest to get a ADTest P val for the target node.
...

jdramsey · 2024-05-29T20:01:22Z

Can you explain a little more? Are you ending up with one AD p-value per node?

vbcwonderland · 2024-05-30T02:05:17Z

Yes, now per target node has one ADTest PValue associated with each shuffle.

shuffleThreshold: a double number representing the percentage of data we would select for this shuffle.
shuffleTimes: the total times of shuffles we would make, aiming at an estimation of full data coverage after all the shuffles.

For example, a shuffleThresholdof 0.2 would lead to shuffling the data 5 times, each time takes 20% of the data.

List<List<Double>> pVals_list: a list of lists of double values, where each sublist contains the p-values (the node and each of its local nodes got from the independence test (e.g. FisherZ)) calculated for one shuffle of the data.

The loop iterates shuffleTimes times, each time performing the following steps:

Data Subsampling:
getSubsampleRows(shuffleThreshold): returns a list of row indices based on the shuffle threshold, indicating which rows of data to include in the test.
((RowsSettable) independenceTest).setRows(rows): This sets the rows that the test should consider.
Calculating P-Values:
A new list pVals is initialized for storing the p-values of this iteration. The inner loop goes through each IndependenceFact in facts: Depending on the type of independenceTest, it calculates the p-value for the fact f:

Fisher Z-test (IndTestFisherZ): The p-value is calculated and directly added to pVals.
Chi-square test (IndTestChiSquare): The p-value is calculated and added only if it's non-null.
After all facts are processed for this shuffle, the list of p-values pVals is added to pVals_list.
This pVals_list contains the p-values for each shuffle, each represented as a list of doubles.

Later on, this getLocalPValues method is used in methods e.g. getAndersonDarlingTestAcceptsRejectsNodesForAllNodes in the following way

 // All local nodes' p-values for node x
            List<List<Double>> shuffledlocalPValues = getLocalPValues(independenceTest, localIndependenceFacts, shuffleThreshold);
            for (List<Double> localPValues: shuffledlocalPValues) {
                //  P value obtained from AD test using the localPValues
                Double ADTest = checkAgainstAndersonDarlingTest(localPValues);
                if (ADTest <= threshold) {
                    rejects.add(x);
                } else {
                    accepts.add(x);
                }
            }

where each inner list of Independence test p values would then be fed into ADTest to generate ADTest P value for this target node.
There would be 1 result ADTest P value for 1 target node.
which means, when we have shuffleThreshold set as 1.0, there would be 1 total shuffleTimes, and there would be 1 ADTest P Value as the result.
And when we want to generate more data by shuffling, if we set shuffleThreshold as 0.2, there would be 5 total shuffleTimes, and there would be 5 ADTest P Value as the result, which is exactly we want.

jdramsey

Looks good.

Fixed shuffle function

18c244e

vbcwonderland assigned jdramsey May 28, 2024

convert to shuffled P Vals into one whole flat list

e9441e7

vbcwonderland changed the title ~~Fixed shuffle function~~ Fixed shuffle function and combine all shuffled independence test p vals for a target node into a flat list to feed into Anderson Darling Test May 31, 2024

jdramsey approved these changes May 31, 2024

View reviewed changes

jdramsey merged commit e7a4b28 into development May 31, 2024

jdramsey deleted the vbc-05-28 branch May 31, 2024 21:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed shuffle function and combine all shuffled independence test p vals for a target node into a flat list to feed into Anderson Darling Test #1781

Fixed shuffle function and combine all shuffled independence test p vals for a target node into a flat list to feed into Anderson Darling Test #1781

vbcwonderland commented May 28, 2024 •

edited

Loading

jdramsey commented May 29, 2024

vbcwonderland commented May 30, 2024

jdramsey left a comment

Fixed shuffle function and combine all shuffled independence test p vals for a target node into a flat list to feed into Anderson Darling Test #1781

Fixed shuffle function and combine all shuffled independence test p vals for a target node into a flat list to feed into Anderson Darling Test #1781

Conversation

vbcwonderland commented May 28, 2024 • edited Loading

jdramsey commented May 29, 2024

vbcwonderland commented May 30, 2024

jdramsey left a comment

Choose a reason for hiding this comment

vbcwonderland commented May 28, 2024 •

edited

Loading