-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make test/score that will work for algebraically defined nonlinear models. #1669
Comments
Actually your file didn't come through; you may need to zip it before attaching it (I've found)... |
One second I found your link... |
Hold on, sorry, you didn't actually say it was a covariance matrix. But huh, it loads for me..... can you tell me more about how you're trying to load it? |
@uvnikgupta What version of Java are you using? |
Thanks for the update. Sorry, I was multitasking yesterday. This is a bug we know about (thanks @kvb2univpitt). The issue (if you want to know) is that Oracle changed the implementation of the ByteBuffer class so that it's incompatible between version 1.8 and versions > 1.8. It's this bug: https://www.morling.dev/blog/bytebuffer-and-the-dreaded-nosuchmethoderror/ except in your case it's the clear() method that's the problem and not the position() method. You're using OpenJDK 1.8, I'm guessing on a Linux box? (Actually can you confirm that?) What I'll do (sorry just trying different things here) is the casting they suggest in the article to see if it will work in OpenJDK1.8 for me. (It needs to work both for 1.8 and for > 1.8 unfortunately, which is the issue.) Unfortunately I'm on a Mac at the moment and the only JDK 1.8 I can get anymore is Amazon's, and it's not a problem there. When I get back home today I'll try installing OpenJDK 1.8 on my Windows laptop (I think I can still do that, though I can no longer get it from M$) and test it there. But really what I need to do is test it on Linux, using OpenJDK 1.8, and I don't have a Linux box currently. If I made you a version (or maybe two versions) to test, would you be willing to try them out on your machine? That would help a lot. |
@jdramsey, Thanks a lot for explaining the issue. |
maybe you could get the open jdk 8 from here : https://www.openlogic.com/openjdk-downloads?field_java_parent_version_target_id=416&field_operating_system_target_id=436&field_architecture_target_id=391&field_java_package_target_id=396 |
Awesome--Let me grab the Mac version now and test it, and then I can download the Windows one later and test it there. Fingers crossed! We (well @kvb2univpitt) were thinking of rewriting that section of code without using ByteBuffer, but hopefully this fixes it without that effort. |
Actually they're not providing any Mac options--it's in their selector but you only get Windows options in the list. I'm at the office right now but can do this later when I get home; my Windows laptop is there. I just tested it using Amazon's Corretto 1.8 on Mac and it works there, though I suspect Amazon may have gone in and fixed the issue internally. |
Oh hold on, they did have it! It's just that their dropdown was broken; I had to select "all" and then the Mac options showed up. I test it--it works! That gives me some confidence that it will work on Windows as well using the a Windows 1.8 download from this site, but I can test it later. |
The problem goes away if you use Java 11 and above. |
@kvb2univpitt I am motivated to figure it out because we have users who are not in a position to grab a newer version of Java. I may have figured it out though--I'll let you know! I'm going to test it now on Windiows. |
I am one of those in that group :) |
@jdramsey We definitely need to get rid of the ByteBuffer. By "we" I mean "me". |
@uvnikgupta @kvb2univpitt Could you both try to break this version? I.e., launch it, try to load a dataset... If it works I will tell you what I did. |
Sure. On it :) Tried different datasets and it seems to work pretty fine now 👍 |
Tried a few more and data loading + Search works flawlessly. The only issue now is the the resulting graph is nowhere close to the actual graph :( I guess that is state of the existing discovery algorithms due to the nature of the problem. |
I'm very curious what experience Kevin has. I compiled this under Corretto 1.8 and have no trouble running under 1.8 or 11 on my Mac, so if you have no trouble on Windows, I'll try under 11 under Windows. Not sure what to say about the content. Maybe if you tell me the general nature of the problem and what you've tried I could comment? |
I am loading the data and connecting to the search box. Then executing search using different algorithms. Finally comparing the result with the actual DAG. The data and the actual DAG is attached for your reference BTW, I encountered a Null pointer issue when I tried to use the "Regression" |
Are these Gaussian variables? With what sample size?
…On Thu, Jul 27, 2023 at 9:28 PM kelearin ***@***.***> wrote:
I am loading the data and connecting to the search box. Then executing
search using different algorithms. Finally comparing the result with the
actual DAG. The data and the actual DAG is attached for your reference
20_nodes_normal.csv
<https://github.com/cmu-phil/tetrad/files/12189230/20_nodes_normal.csv>
[image: image]
<https://user-images.githubusercontent.com/20485662/256699118-c585c8fe-048a-4e90-bbe4-969c12ddf0b8.png>
BTW, I encountered a Null pointer issue when I tried to use the
"Regression"
[image: image]
<https://user-images.githubusercontent.com/20485662/256699297-6ae9f1fb-2b24-46e3-ae23-fff1296432f0.png>
—
Reply to this email directly, view it on GitHub
<#1669 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD4Y3OON3557TVEGSNGP7KLXSMISRANCNFSM6AAAAAA2ZBCWL4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
not able to attach my data generator .py file. So below is the formulae: |
They are not terribly Gaussian. By the way @uvnikgupta if you'd like to switch to email I'm happy. @cg09 if you load up the data that was sent in the version of Tetrad given above and use the Plot Matrix tool you can see the distributions of the variables. |
yes, I can share my data generation python code then. Please DM me at |
That's what I thought--nonlinear algebraic functions generated them...You know we were just thinking of how to incorporate this sort of nonlinear additivity into a fast score... |
What sort of "non-linear algebraic" functions?
…On Thu, Jul 27, 2023 at 10:08 PM Joseph Ramsey ***@***.***> wrote:
That's what I thought--nonlinear algebraic functions generated them...You
know we were just thinking of how to incorporate this sort of nonlinear
additivity into a fast score...
—
Reply to this email directly, view it on GitHub
<#1669 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD4Y3OJHDQAYRO4CSF42AHLXSMNJDANCNFSM6AAAAAA2ZBCWL4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The formula simplifies to : 2*D2 + D3 - E1*A6 + 8*A7/A8
I hope you received the python scripts I shared. You can generate data of
any size with that script. Just modify the size parameter in the
instantiations of the DataGenerator class under if __name__ == "__main__",
create a data folder and run "python data_generator.py". Of course you have
to pip install pandas, numpy and scipy.
…On Fri, Jul 28, 2023 at 12:31 AM Joseph Ramsey ***@***.***> wrote:
b1 = a2^2
==> log(b1) - 2 ln a2
b2 = a1
Singularity, you'll need to remove one of two columns or teach you
algorithm to deal with it. But you can't use regression here in any form.
(This is why the regression check is failing, above, BTW).
c2 = sqrt(abs(b1))
==> Hmmm... you need to check a symmetric function here of b1 to find the
dependency.
c3 = b1 * b2
==>ln(c3) = ln(b1) + ln(b2)
c2^2 + c3 - a2^2
==> Logging won't help here for the entire function! But logging c2 and
logging b2 separately would help if you knew to do that! Hmmm...
c4 = b2^3
==> ln(c4) = 3 * ln(b2)...no problem.
sqrt(|c4|)
==> Another symmetric function.
b3 = a4^2 + a5
==> Logging a4 separately would have helped.
c1 = b3^2
==> Logging solves this.
"D1": 'np.round(np.mod(1000data_2["C1"], 10), 3)',
Not sure how to describe this one in words yet, I'll come back to it.
==> NO HELP HERE! You need to resort to a generalized score I think!!!
Ugh, slow!!!
"E1": 'np.abs(data_2["A3"])**2/(data_2["D1"] + .001)',
abs{a3)^2 / d1 + 0.001.
==> Heuristically I would still log this :-) 2 * ln(abs(a3)) - ln(d1) +
ln(0.001)
"F1": '2data_2["D2"] + data_2["D3"] - data_2["E1"]data_2["A6"] +
8data_2["A7"]/data_2["A8"]'
2 * d2 + d3 - .... what is that? e1 a6?? + 8 a7 / a8? I have to check what
concatenating variables in Python does... string concatenation?????!
==> I still have no idea what this even means yet, lol!!! :-D
—
Reply to this email directly, view it on GitHub
<#1669 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE4JMHW2S3AACDGNWJMBZV3XSNFC3ANCNFSM6AAAAAA2ZBCWL4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sorry I haven't gotten back to you--we're all at the UAI conference here in Pittsburgh. I thought about the 1.8 issue and think the thing to do is to publish a separate version compiled under 1.8. I'm going to try to get this done today. |
Yes, I was starting to wonder :)
The agenda for the UAI conference sounds really cool. I have never attended
any of its conferences but I can imagine the energy in that environment. I
hope I am able to attend some day.
Coming back to the topic, I already have your working version for 1.8 so I
am not really waiting for an official release. I am now more interested in
figuring out why the algorithms are not performing well and how to tweak
the data or the algorithm parameters to reproduce most of the DAG, if not
fully.
Regards
Uvnik
…On Thu, Aug 3, 2023 at 12:49 PM Joseph Ramsey ***@***.***> wrote:
Sorry I haven't gotten back to you--we're all at the UAI conference here
in Pittsburgh. I thought about the 1.8 issue and think the thing to do is
to publish a separate version compiled under 1.8. I'm going to try to get
this done today.
—
Reply to this email directly, view it on GitHub
<#1669 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE4JMHVC5ULUM3SYVRB6P5LXTPQA7ANCNFSM6AAAAAA2ZBCWL4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sorry for the delay--we had a couple of dissertation defenses in the last week. Getting back to this. I need to look at your Python code more carefully to see what assumptions are being honored. It wasn't clear to me on my first gander. We had made a nonlinear simulator using Gaussian processes (and additive simulation) and GRaSP/BOSS did pretty well on that, but when we looked at the distributions, all of the functions had linear trends. it's been noticed in the past (I can get you a reference) that linear Gaussian scores like LG BIC tend to do OK whenever there are linear trends, and besides this, GRaSP/BOSS tend to do OK under a rather significant weakening of the faithfulness assumption, so some "sins" can be forgiven by the procedure. What I know will give the procedure difficult are the square and absolute value functions you use, which give dependencies but not becuase of linear trends. I'm wondering if you took those out how well the algorithms would do? |
@uvnikgupta Wondering, have you had a chance to look at this? |
You are beset with new Tetrad problems. Sorry.
…On Tue, Aug 22, 2023 at 1:25 PM Joseph Ramsey ***@***.***> wrote:
@uvnikgupta <https://github.com/uvnikgupta> Wondering, have you had a
chance to look at this?
—
Reply to this email directly, view it on GitHub
<#1669 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD4Y3OPTPAF7J4ORBX7ZP6LXWTTQ5ANCNFSM6AAAAAA2ZBCWL4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Oh, I'm just trying to review outstanding issues and see what needs to be done. This particular issue involves trying to generalize to more algebraic functional forms for larger models, something I'm interested in and thinking of how to do. |
I mean we do have the KCI general independence test, but it won't scale far enough for the problems suggested here. Also, it would be good to have a general score, and we've never implemented Biwei's general score in Tetrad, but Biwei's score won't handle these problems; there are too many variables, and the sample sizes are too large. I've been thinking about scores that are more general than LG but perhaps not completely general, which could handle a variety of distributions (but perhaps not all) and might be fast. I ask everyone I talk to whether they can think of such scores but no takers so far. I agree though it would be nice to have and a contribution to the literature. |
I am at a conference on ecology and causality. They are all about
identifying unmeasured intermediate variables between input and output, but
have no clue how to do it. I have data on turtles and soon to have data on
penguins, I think.
Clark
…On Tue, Aug 22, 2023 at 1:39 PM Joseph Ramsey ***@***.***> wrote:
Oh, I'm just trying to review outstanding issues and see what needs to be
done. This particular issue involves trying to generalize to more algebraic
functional forms for larger models, something I'm interested in and
thinking of how to do.
—
Reply to this email directly, view it on GitHub
<#1669 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD4Y3OMYJEO5U22QNJ3F4W3XWTVGLANCNFSM6AAAAAA2ZBCWL4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Interesting.... |
@uvnikgupta Sorry, I ended up with so many thing to do at beginning of term that I was losing track of them in my head. Let me write this one down so I can work on it some. (I made a long to-do list recently and ordered it in terms of priorities. I think this is going to help.) |
@uvnikgupta Let me characterize the problem this way. Is there a test/score that could be used that would recover at least approximately the correct DAG when the data are generated with simple combinations of functions? What combinations can work and which can't? Is that fair? |
@uvnikgupta Perhaps one of us should look to see if there's any literature on this already. |
@jdramsey sorry, I am not sure if I understand your question completely. Are we trying to find a score that would compare a set of equations to the generated DAG? If yes, then I am do not understand why. The reason being that if I know the equations, I can already create the original DAG and then use scores like SHD to compare the generated vs the original graph. |
@uvnikgupta That is, does anyone have a strategy for search a dataset with > 20 variables where the variables are generated by an SEM with the kinds of functions you're using? Also, with the sample sizes you have in mind? You could use a general test like KCI, but it won't scale that far. |
Loading the attached csv throws the following exception:
Infer demiliter for file: 20_nodes_normal.csv Exception in thread "AWT-EventQueue-0" java.lang.NoSuchMethodError: java.nio.ByteBuffer.clear()Ljava/nio/ByteBuffer; at edu.pitt.dbmi.data.reader.util.TextFileUtils.inferDelimiter(TextFileUtils.java:135) at edu.cmu.tetradapp.editor.LoadDataSettings.getInferredDelimiter(LoadDataSettings.java:882) at edu.cmu.tetradapp.editor.LoadDataSettings.basicSettings(LoadDataSettings.java:503) at edu.cmu.tetradapp.editor.LoadDataDialog.showDataLoaderDialog(LoadDataDialog.java:165) at edu.cmu.tetradapp.editor.LoadDataAction.actionPerformed(LoadDataAction.java:91) at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022) at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348) at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402) at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259) at javax.swing.AbstractButton.doClick(AbstractButton.java:376) at javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:842) at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:886) at java.awt.Component.processMouseEvent(Component.java:6539) at javax.swing.JComponent.processMouseEvent(JComponent.java:3324) at java.awt.Component.processEvent(Component.java:6304) at java.awt.Container.processEvent(Container.java:2239) at java.awt.Component.dispatchEventImpl(Component.java:4889) at java.awt.Container.dispatchEventImpl(Container.java:2297) at java.awt.Component.dispatchEvent(Component.java:4711) at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4904) at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4535) at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4476) at java.awt.Container.dispatchEventImpl(Container.java:2283) at java.awt.Window.dispatchEventImpl(Window.java:2746) at java.awt.Component.dispatchEvent(Component.java:4711) at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:760) at java.awt.EventQueue.access$500(EventQueue.java:97) at java.awt.EventQueue$3.run(EventQueue.java:709) at java.awt.EventQueue$3.run(EventQueue.java:703) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:74) at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:84) at java.awt.EventQueue$4.run(EventQueue.java:733) at java.awt.EventQueue$4.run(EventQueue.java:731) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:74) at java.awt.EventQueue.dispatchEvent(EventQueue.java:730) at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:205) at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116) at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93) at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
[20_nodes_normal.csv](https://github.com/cmu-phil/tetrad/files/12176485/20_nodes_normal.csv)
The text was updated successfully, but these errors were encountered: