Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Model extraction cases #407

Open
liunelson opened this issue Dec 10, 2024 · 10 comments
Open

[BUG]: Model extraction cases #407

liunelson opened this issue Dec 10, 2024 · 10 comments

Comments

@liunelson
Copy link

I found some edge cases for template_model_from_sympy_odes.

Case 1

Fixed Birth (l) and proportionate death (m X) processes are interpreted correctly as natural production and natural degradation:

\frac{d S}{d t} = -b S I + l - m S
\frac{d I}{d t} = b S I - g I - m I
\frac{d R}{d t} = g I - m R

However, I had expected the m S terms on d I/d t, d R/d t to be interpreted controlled degradation templates with rate laws m * S.

\frac{d S}{d t} = -b S I + l - m S
\frac{d I}{d t} = b S I - g I - m S
\frac{d R}{d t} = g I - m S

Doing it directly with template_model_from_sympy_odes gives a model that completely ignores all the m S terms:

odes = [
    sympy.Eq(S(t).diff(t), - b * S(t) * I(t) + l - m * S(t)),
    sympy.Eq(I(t).diff(t), b * S(t) * I(t) - g * I(t) - m * S(t)),
    sympy.Eq(R(t).diff(t), g * I(t) - m * S(t)),
]

Doing it within Terarium (which styles the above LaTeX and converts it to SymPy before sending to MIRA) gives a strange model wherein the rate laws are "-I*S*b + l", "I*S*b - I*g", "I*g" (possibly unrelated to MIRA):

Case 2

This is a case with "branching ratios":

\frac{d S}{d t} = -b S I
\frac{d I}{d t} = b S I - g I
\frac{d R}{d t} = k g I
\frac{d V}{d t} = (1 - k) g I

This gives a model where I has a natural degradation g I and two controlled productions of R, V, instead of two natural conversions from I into R, V.
Screenshot 2024-12-10 at 5 18 50 PM

If we were to rewrite the 2nd equation as \frac{d I}{d t} = b S I - k g I - (1 - k) g I, then MIRA returns the correct model (where I branches into R, V with ratio k).
Screenshot 2024-12-10 at 5 19 35 PM

Such a branching case was actually involved in a paper from which the UCSD team wanted to extract a model and it required careful reading of the text to realize a rewrite of the equations.

Do you have a forthcoming solution to this problem or do you expect users/Terarium to rewrite the equations?

@liunelson liunelson changed the title [BUG]: [BUG]: Model extraction cases Dec 10, 2024
@liunelson
Copy link
Author

liunelson commented Dec 10, 2024

I have more cases described here:
DARPA-ASKEM/terarium#5805

The issue there (Cases 1, 3) appear to be related to the SymPy's parse_latex adding unnecessary parentheses around multiple terms, causing MIRA to represent the parenthesis'd term to be a single template.

Can simplify_rate_law or MIRA fix this problem or do we need to ditch the SymPy parser and hope that a LLM agent could convert the LaTeX to SymPy better?

Case 3a

odes_latex = [
    r"\frac{d S(t)}{d t} = -b * S(t) * I(t) + l - m * S(t)",
    r"\frac{d I(t)}{d t} = b * S(t) * I(t) - g * I(t) - m * I(t)", 
    r"\frac{d R(t)}{d t} = g * I(t) - m * R(t)",
]

odes_sympy = [
    Eq(Derivative(S(t), t), -m*S(t) + (l + ((-b)*S(t))*I(t))),
    Eq(Derivative(I(t), t), -m*I(t) + (-g*I(t) + (b*S(t))*I(t))),
    Eq(Derivative(R(t), t), g*I(t) - m*R(t))
]

Case 3b

odes_latex = [
    r"\frac{d S(t)}{d t} = -b * S(t) * I(t)",
    r"\frac{d I(t)}{d t} = b * S(t) * I(t) - k * g * I(t) - (1 - k) * g * I(t)", 
    r"\frac{d R(t)}{d t} = k * g * I(t)",
    r"\frac{d V(t)}{d t} = (1 - k) * g * I(t)"
]

odes_sympy = [
    Eq(Derivative(S(t), t), -b*I(t)*S(t)),
    Eq(Derivative(I(t), t), -g*(1 - k)*I(t) + ((b*S(t))*I(t) - g*k*I(t))),
    Eq(Derivative(R(t), t), (g*k)*I(t)),
    Eq(Derivative(V(t), t), (g*(1 - k))*I(t))
]

@bgyori
Copy link
Member

bgyori commented Dec 11, 2024

For Case 1, I think the term \frac{d R}{d t} = - m S is not physically plausible and doesn't really fit a canonical pattern that could be recognized. A physically plausible controlled degradation would be something like \frac{d R}{d t} = - m R S. Does this come up in practice or is it a hypothetical example?

@bgyori
Copy link
Member

bgyori commented Dec 11, 2024

For Case 2, I agree this is an ambiguous case (both models produce correct ODEs) and it would be nice to recognize the natural conversions, though not trivial. This requires some further thinking and algorithmic improvement.

@liunelson
Copy link
Author

@bgyori
Case 1 was meant to be purely hypothetical - I was added the usual natural death processes (dX/dt = ... - m * X) and tried out this admittedly unphysical variation. Is this feature to ignore unphysical patterns such as this one?

I just updated our LaTeX style guide (which an LLM agent is instructed to follow when cleaning up LaTeX provided by users or upstream service). * will now be used to explicitly denote multiplication to avoid SymPy parse_latex(...) from converting LaTeX a b (1 - g) I(t) to SymPy a * b(t = 1 - g) * I(t).

@liunelson
Copy link
Author

I imagine Case 2 is quite nontrivial to tackle automatically, despite how common I see it in model-extraction scenarios. I'm split on whether to try to teach/instruct the equation-styling LLM agent to recognize and expand branching terms. I'll experiment.

@liunelson
Copy link
Author

Could you comment on Cases 3a/b? We're trying to figure out how to pass SymPy strings (as opposed to SymPy sympy.core.relational.Equality) to the MIRA function.

Previously, we simply did:

model = template_model_from_sympy_odes([sympy.parsing.latex.parse_latex(ode) for ode in odes_latex])

If MIRA didn't get tripped up by the extra (), we wouldn't have to switch to an LLM solution.

@bgyori
Copy link
Member

bgyori commented Dec 16, 2024

For Case 3a, I believe we are getting the expected result, despite the parentheses.

ControlledConversion 		 I*S*b
NaturalProduction 		 l
NaturalDegradation 		 S*m
NaturalConversion 		 I*g
NaturalDegradation 		 I*m
NaturalDegradation 		 R*m

In particular, the first two templates look correct in terms of a separate production and conversion template.

@bgyori
Copy link
Member

bgyori commented Dec 16, 2024

Case 3b appears to be working correctly as well, I get these templates

ControlledConversion 		 I*S*b
NaturalConversion 		 I*g*k
NaturalConversion 		 I*g*(1 - k)

which look correct (just printing some basic details, the actual subjects/objects/controllers are also correct)

@liunelson
Copy link
Author

That's quite weird, I get different results from you:

I'm also up to the latest MIRA
Screenshot 2024-12-16 at 2 47 05 PM

@liunelson
Copy link
Author

liunelson commented Dec 17, 2024

Here's the code snippets that I used:

# Case 3a
odes_latex = [
    r"\frac{d S(t)}{d t} = -b * S(t) * I(t) + l - m * S(t)",
    r"\frac{d I(t)}{d t} = b * S(t) * I(t) - g * I(t) - m * I(t)", 
    r"\frac{d R(t)}{d t} = g * I(t) - m * R(t)",
]

odes_sympy = [sympy.parsing.latex.parse_latex(ode) for ode in odes_latex]

__ = [print(ode) for ode in odes_sympy]

model = template_model_from_sympy_odes(odes_sympy)

generate_summary_table(model)
# Case 3b
odes_latex = [
    r"\frac{d S(t)}{d t} = -b * S(t) * I(t)",
    r"\frac{d I(t)}{d t} = b * S(t) * I(t) - k * g * I(t) - (1 - k) * g * I(t)", 
    r"\frac{d R(t)}{d t} = k * g * I(t)",
    r"\frac{d V(t)}{d t} = (1 - k) * g * I(t)"
]

odes_sympy = [sympy.parsing.latex.parse_latex(ode) for ode in odes_latex]

__ = [print(ode) for ode in odes_sympy]

model = template_model_from_sympy_odes(odes_sympy)
generate_summary_table(model)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants