Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mermaid Crashes If trying to draw a large pipeline #8089

Closed
CarlosFerLo opened this issue Jul 25, 2024 · 10 comments · Fixed by #8767
Closed

Mermaid Crashes If trying to draw a large pipeline #8089

CarlosFerLo opened this issue Jul 25, 2024 · 10 comments · Fixed by #8767
Labels
P3 Low priority, leave it in the backlog

Comments

@CarlosFerLo
Copy link
Contributor

CarlosFerLo commented Jul 25, 2024

Thanks in advance for your help :)

Describe the bug
I was building a huge pipeline, 30 components and 35 connections, and for debugging proposes I wanted to display the diagram, but both .draw() and .show() methods failed. It still works with small pipelines by the way.

Error message

Failed to draw the pipeline: https://mermaid.ink/img/ returned status 400
No pipeline diagram will be saved.
Failed to draw the pipeline: could not connect to https://mermaid.ink/img/ (400 Client Error: Bad Request for url: https://mermaid.ink/img/{place holder for 2km long data}

No pipeline diagram will be saved.
Traceback (most recent call last):
  File "/Users/carlosfernandezloran/Desktop/babyagi-classic-haystack/.venv/lib/python3.10/site-packages/haystack/core/pipeline/draw.py", line 87, in _to_mermaid_image
    resp.raise_for_status()
  File "/Users/carlosfernandezloran/Desktop/babyagi-classic-haystack/.venv/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://mermaid.ink/img/{another placeholder}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/carlosfernandezloran/Desktop/babyagi-classic-haystack/babyagi.py", line 188, in <module>
    pipe.draw(path=Path("pipe"))
  File "/Users/carlosfernandezloran/Desktop/babyagi-classic-haystack/.venv/lib/python3.10/site-packages/haystack/core/pipeline/base.py", line 649, in draw
    image_data = _to_mermaid_image(self.graph)
  File "/Users/carlosfernandezloran/Desktop/babyagi-classic-haystack/.venv/lib/python3.10/site-packages/haystack/core/pipeline/draw.py", line 95, in _to_mermaid_image
    raise PipelineDrawingError(
haystack.core.errors.PipelineDrawingError: There was an issue with https://mermaid.ink/, see the stacktrace for details.

Expected behavior
I expect the .show() and .draw() methods to work for all pipelines, no matter the size.
This might be a Mermaid problem and not strictly haystacks, but we would need to work to implement a local diagram generator as said in #7896

To Reproduce
I will not add all the 200 lines of add_component, connect statements, but you can imagine how it goes.

System:

  • OS: macOS
  • GPU/CPU: M1
  • Haystack version (commit or version number): 2.3.0
@julian-risch julian-risch added the P2 Medium priority, add to the next sprint if no P1 available label Jul 29, 2024
@vblagoje
Copy link
Member

vblagoje commented Sep 4, 2024

hey @CarlosFerLo what do you suspect is the issue here? The payload we send to Mermaid is roughly speaking too long, gets truncated somehow and graph generation fails? Or perhaps something else? I'd love to see what's up here but would love to hear your reasoning about the root cause as well.

@CarlosFerLo
Copy link
Contributor Author

hey @vblagoje I believe it is a truncation issue. I am not really experienced with Mermaid, but I believe that is the case.

@vblagoje
Copy link
Member

vblagoje commented Sep 4, 2024

Yes @CarlosFerLo, I investigated this a bit and apparently get request has a common maximum URL length of around 2,000 to 2,048 characters. Most likely smaller graphs fit into this size and when we make a get request it works up until certain graph size, until it doesn't. All the code for this stuff is in haystack/core/pipeline/draw.py. We need to see how to send perhaps a post request to mermaid and make this work. I'll self-assign this issue unless you want to take it, lmk! 🙏

@CarlosFerLo
Copy link
Contributor Author

@vblagoje I am currently involved in a project that takes on a lot of my time, if you do not mind assing your self to this issue.

@vblagoje vblagoje self-assigned this Sep 5, 2024
@vblagoje
Copy link
Member

vblagoje commented Sep 5, 2024

Our intuition was right @CarlosFerLo This is a know limitation that I now confirmed. Here is the script:

import base64
import requests
import io
from PIL import Image
import matplotlib.pyplot as plt

graph = """
    graph LR;
        comp1["<b>comp1</b><br><small><i>AddFixedValue<br><br>Optional inputs:<ul style='text-align:left;'><li>add (Optional[int])</li></ul></i></small>"]:::component 
        -- "result -> value<br><small><i>int</i></small>" --> comp2["<b>comp2</b><br><small><i>Double</i></small>"]:::component;
        comp2["<b>comp2</b><br><small><i>Double</i></small>"]:::component 
        -- "value -> value<br><small><i>int</i></small>" --> comp3["<b>comp3</b><br><small><i>Square</i><br><br>Outputs:<ul style='text-align:left;'><li>result (int)</li></ul></small>"]:::component;
        comp3["<b>comp3</b><br><small><i>Square</i></small>"]:::component 
        -- "output -> next<br><small><i>int</i></small>" --> comp4["<b>comp4</b><br><small><i>MultiplyByTwo</i></small>"]:::component;
        comp4["<b>comp4</b><br><small><i>MultiplyByTwo</i></small>"]:::component 
        -- "next -> result<br><small><i>int</i></small>" --> comp5["<b>comp5</b><br><small><i>SubtractFixedValue<br><br>Optional inputs:<ul style='text-align:left;'><li>sub (Optional[int])</li></ul></i></small>"]:::component;
        comp5["<b>comp5</b><br><small><i>SubtractFixedValue</i></small>"]:::component 
        -- "result -> value<br><small><i>int</i></small>" --> comp6["<b>comp6</b><br><small><i>Divide</i><br><br>Outputs:<ul style='text-align:left;'><li>result (int)</li></ul></small>"]:::component;
        
        classDef component text-align:center;
    
        %% Repeat pattern with arbitrary connections up to 100 nodes
        comp6["<b>comp6</b><br><small><i>Divide</i></small>"]:::component 
        -- "output -> value<br><small><i>int</i></small>" --> comp7["<b>comp7</b><br><small><i>Modulo</i></small>"]:::component;
        comp7["<b>comp7</b><br><small><i>Modulo</i></small>"]:::component 
        -- "next -> result<br><small><i>int</i></small>" --> comp8["<b>comp8</b><br><small><i>Power</i></small>"]:::component;
        comp8["<b>comp8</b><br><small><i>Power</i></small>"]:::component 
        -- "result -> output<br><small><i>int</i></small>" --> comp9["<b>comp9</b><br><small><i>Absolute</i></small>"]:::component;
        comp9["<b>comp9</b><br><small><i>Absolute</i></small>"]:::component 
        -- "value -> next<br><small><i>int</i></small>" --> comp10["<b>comp10</b><br><small><i>Inverse</i></small>"]:::component;
    
            comp10["<b>comp10</b><br><small><i>Inverse</i></small>"]:::component 
        -- "inverse -> value<br><small><i>int</i></small>" --> comp11["<b>comp11</b><br><small><i>Logarithm</i></small>"]:::component;
        comp11["<b>comp11</b><br><small><i>Logarithm</i></small>"]:::component 
        -- "log -> result<br><small><i>float</i></small>" --> comp12["<b>comp12</b><br><small><i>Exponential</i></small>"]:::component;
        comp12["<b>comp12</b><br><small><i>Exponential</i></small>"]:::component 
        -- "exp -> value<br><small><i>float</i></small>" --> comp13["<b>comp13</b><br><small><i>Cosine</i></small>"]:::component;
        comp13["<b>comp13</b><br><small><i>Cosine</i></small>"]:::component 
        -- "cos -> result<br><small><i>float</i></small>" --> comp14["<b>comp14</b><br><small><i>Sine</i></small>"]:::component;
        comp14["<b>comp14</b><br><small><i>Sine</i></small>"]:::component 
        -- "sin -> next<br><small><i>float</i></small>" --> comp15["<b>comp15</b><br><small><i>Tangent</i></small>"]:::component;
        comp15["<b>comp15</b><br><small><i>Tangent</i></small>"]:::component 
        -- "tan -> result<br><small><i>float</i></small>" --> comp16["<b>comp16</b><br><small><i>ArcSine</i></small>"]:::component;
        comp16["<b>comp16</b><br><small><i>ArcSine</i></small>"]:::component 
        -- "asin -> value<br><small><i>float</i></small>" --> comp17["<b>comp17</b><br><small><i>ArcCosine</i></small>"]:::component;
        comp17["<b>comp17</b><br><small><i>ArcCosine</i></small>"]:::component 
        -- "acos -> result<br><small><i>float</i></small>" --> comp18["<b>comp18</b><br><small><i>ArcTangent</i></small>"]:::component;
        comp18["<b>comp18</b><br><small><i>ArcTangent</i></small>"]:::component 
        -- "atan -> next<br><small><i>float</i></small>" --> comp19["<b>comp19</b><br><small><i>SquareRoot</i></small>"]:::component;
        comp19["<b>comp19</b><br><small><i>SquareRoot</i></small>"]:::component 
        -- "sqrt -> result<br><small><i>float</i></small>" --> comp20["<b>comp20</b><br><small><i>CubeRoot</i></small>"]:::component;
         
            comp20["<b>comp20</b><br><small><i>CubeRoot</i></small>"]:::component 
        -- "cbrt -> value<br><small><i>float</i></small>" --> comp21["<b>comp21</b><br><small><i>Factorial</i></small>"]:::component;
        comp21["<b>comp21</b><br><small><i>Factorial</i></small>"]:::component 
        -- "fact -> result<br><small><i>int</i></small>" --> comp22["<b>comp22</b><br><small><i>Permutation</i></small>"]:::component;
        comp22["<b>comp22</b><br><small><i>Permutation</i></small>"]:::component 
        -- "perm -> value<br><small><i>int</i></small>" --> comp23["<b>comp23</b><br><small><i>Combination</i></small>"]:::component;
        comp23["<b>comp23</b><br><small><i>Combination</i></small>"]:::component 
        -- "comb -> result<br><small><i>int</i></small>" --> comp24["<b>comp24</b><br><small><i>GCD</i></small>"]:::component;
        comp24["<b>comp24</b><br><small><i>GCD</i></small>"]:::component 
        -- "gcd -> value<br><small><i>int</i></small>" --> comp25["<b>comp25</b><br><small><i>LCM</i></small>"]:::component;
        comp25["<b>comp25</b><br><small><i>LCM</i></small>"]:::component 
        -- "lcm -> result<br><small><i>int</i></small>" --> comp26["<b>comp26</b><br><small><i>PrimeCheck</i></small>"]:::component;
        comp26["<b>comp26</b><br><small><i>PrimeCheck</i></small>"]:::component 
        -- "prime -> value<br><small><i>boolean</i></small>" --> comp27["<b>comp27</b><br><small><i>Fibonacci</i></small>"]:::component;
        comp27["<b>comp27</b><br><small><i>Fibonacci</i></small>"]:::component 
        -- "fib -> result<br><small><i>int</i></small>" --> comp28["<b>comp28</b><br><small><i>Lucas</i></small>"]:::component;
        comp28["<b>comp28</b><br><small><i>Lucas</i></small>"]:::component 
        -- "lucas -> next<br><small><i>int</i></small>" --> comp29["<b>comp29</b><br><small><i>PascalTriangle</i></small>"]:::component;
        comp29["<b>comp29</b><br><small><i>PascalTriangle</i></small>"]:::component 
        -- "pascal -> result<br><small><i>array</i></small>" --> comp30["<b>comp30</b><br><small><i>BinomialCoefficient</i></small>"]:::component;
     
            comp30["<b>comp30</b><br><small><i>BinomialCoefficient</i></small>"]:::component 
        -- "binom -> value<br><small><i>int</i></small>" --> comp31["<b>comp31</b><br><small><i>QuadraticRoot</i></small>"]:::component;
        comp31["<b>comp31</b><br><small><i>QuadraticRoot</i></small>"]:::component 
        -- "root -> result<br><small><i>float</i></small>" --> comp32["<b>comp32</b><br><small><i>LinearEquation</i></small>"]:::component;
        comp32["<b>comp32</b><br><small><i>LinearEquation</i></small>"]:::component 
        -- "linear -> value<br><small><i>float</i></small>" --> comp33["<b>comp33</b><br><small><i>Polynomial</i></small>"]:::component;
        comp33["<b>comp33</b><br><small><i>Polynomial</i></small>"]:::component 
        -- "poly -> result<br><small><i>float</i></small>" --> comp34["<b>comp34</b><br><small><i>Differential</i></small>"]:::component;
        comp34["<b>comp34</b><br><small><i>Differential</i></small>"]:::component 
        -- "diff -> value<br><small><i>float</i></small>" --> comp35["<b>comp35</b><br><small><i>Integral</i></small>"]:::component;
        comp35["<b>comp35</b><br><small><i>Integral</i></small>"]:::component 
        -- "integral -> result<br><small><i>float</i></small>" --> comp36["<b>comp36</b><br><small><i>FourierTransform</i></small>"]:::component;
        comp36["<b>comp36</b><br><small><i>FourierTransform</i></small>"]:::component 
        -- "fourier -> value<br><small><i>complex</i></small>" --> comp37["<b>comp37</b><br><small><i>LaplaceTransform</i></small>"]:::component;
        comp37["<b>comp37</b><br><small><i>LaplaceTransform</i></small>"]:::component 
        -- "laplace -> result<br><small><i>complex</i></small>" --> comp38["<b>comp38</b><br><small><i>MatrixMultiplication</i></small>"]:::component;
        comp38["<b>comp38</b><br><small><i>MatrixMultiplication</i></small>"]:::component 
        -- "matrix -> value<br><small><i>array</i></small>" --> comp39["<b>comp39</b><br><small><i>VectorAddition</i></small>"]:::component;
        comp39["<b>comp39</b><br><small><i>VectorAddition</i></small>"]:::component 
        -- "vector -> result<br><small><i>array</i></small>" --> comp40["<b>comp40</b><br><small><i>DotProduct</i></small>"]:::component;
    """
breaking_chunk = """
          comp40["<b>comp40</b><br><small><i>DotProduct</i></small>"]:::component 
        -- "dot -> value<br><small><i>float</i></small>" --> comp41["<b>comp41</b><br><small><i>CrossProduct</i></small>"]:::component;
        comp41["<b>comp41</b><br><small><i>CrossProduct</i></small>"]:::component 
        -- "cross -> result<br><small><i>array</i></small>" --> comp42["<b>comp42</b><br><small><i>EigenValue</i></small>"]:::component;
        comp42["<b>comp42</b><br><small><i>EigenValue</i></small>"]:::component 
        -- "eigen -> value<br><small><i>float</i></small>" --> comp43["<b>comp43</b><br><small><i>EigenVector</i></small>"]:::component;
        comp43["<b>comp43</b><br><small><i>EigenVector</i></small>"]:::component 
        -- "vector -> result<br><small><i>array</i></small>" --> comp44["<b>comp44</b><br><small><i>SingularValueDecomposition</i></small>"]:::component;
        comp44["<b>comp44</b><br><small><i>SingularValueDecomposition</i></small>"]:::component 
        -- "svd -> value<br><small><i>matrix</i></small>" --> comp45["<b>comp45</b><br><small><i>CholeskyDecomposition</i></small>"]:::component;
        comp45["<b>comp45</b><br><small><i>CholeskyDecomposition</i></small>"]:::component 
        -- "cholesky -> result<br><small><i>matrix</i></small>" --> comp46["<b>comp46</b><br><small><i>LUDecomposition</i></small>"]:::component;
        comp46["<b>comp46</b><br><small><i>LUDecomposition</i></small>"]:::component 
        -- "lu -> value<br><small><i>matrix</i></small>" --> comp47["<b>comp47</b><br><small><i>QRDecomposition</i></small>"]:::component;
        comp47["<b>comp47</b><br><small><i>QRDecomposition</i></small>"]:::component 
        -- "qr -> result<br><small><i>matrix</i></small>" --> comp48["<b>comp48</b><br><small><i>GramSchmidtProcess</i></small>"]:::component;
        comp48["<b>comp48</b><br><small><i>GramSchmidtProcess</i></small>"]:::component 
        -- "gram -> value<br><small><i>matrix</i></small>" --> comp49["<b>comp49</b><br><small><i>MoorePenroseInverse</i></small>"]:::component;
        comp49["<b>comp49</b><br><small><i>MoorePenroseInverse</i></small>"]:::component 
        -- "inverse -> result<br><small><i>matrix</i></small>" --> comp50["<b>comp50</b><br><small><i>MatrixDeterminant</i></small>"]:::component;
    """

# Encode the graph to Base64
graphbytes = graph.encode("ascii")
base64_bytes = base64.b64encode(graphbytes)
base64_string = base64_bytes.decode("ascii")

print(f"Encoded string: {base64_string}")
print(f"Length chars: {len(base64_string)}")

# Fetch
response = requests.get('https://mermaid.ink/img/' + base64_string)
print(response.headers)

# Display
img = Image.open(io.BytesIO(response.content))
plt.imshow(img)
plt.show()

If you run this script you'll get an image for this arbitrary chatgpt generated graph. However, if you connect the breaking_chunk then we get an exception - failure. I've inspected response headers and the server runs on cloudfare. Not sure what type of the server it is. Cloudflare is most likely running its own custom server software, rather than a standard off-the-shelf web server like Apache or Nginx. But the limit of encoded URL is there - around 12000 chars.

So how do we mitigate this?

  1. We (Haystack) run our own mermaid server with custom URL size set. Possible, but not likely.
  2. Optionally limit the labels on graphs. These labels contribute quite a lot to encoded graph size and thus cause issues for large graph renderings. We had this before but now looking at the code these optional setting seem to be gone. I'll consult with @silvanocerza about this and we'll decide what to do next.

Perhaps there are some other options? I'll talk internally about this and we'll come up with some game plan. Thanks for raising this @CarlosFerLo 🙏

@vblagoje
Copy link
Member

vblagoje commented Sep 5, 2024

cc @julian-risch moving this one to backlog again as a known limitation. Will consult on possible mitigation routes internally.

@julian-risch julian-risch added P3 Low priority, leave it in the backlog and removed P2 Medium priority, add to the next sprint if no P1 available labels Sep 9, 2024
@lbux
Copy link
Contributor

lbux commented Nov 6, 2024

Just ran across the issue as well. Should have probably Googled the error message sooner because I spent quite a while trying to draw the pipeline on a whiteboard to see if the issue was with how I designed my pipeline. Since there doesn't seem to be a workaround, I'll probably have to do some hack to make it work with mermaid-cli or something.

@lbux
Copy link
Contributor

lbux commented Nov 7, 2024

It's ugly but if someone wants to mess with npm inside their project, here is how I did it:

add an offline function to draw.py

import subprocess

def _to_mermaid_image_offline(graph: networkx.MultiDiGraph):
    graph_styled = _to_mermaid_text(graph.copy())
    
    # Save the Mermaid code to a file
    with open("graph.mmd", "w") as f:
        f.write(graph_styled)
    
    # Call mermaid-cli to generate the image
       subprocess.run(
           ["mmdc", "-i", "graph.mmd", "-s", "5", "-o", "graph.png"],
           check=True,
           timeout=10
       )
       with open("graph.png", "rb") as img_file:
           return img_file.read()

replace the _to_mermaid_image call in draw (base.py)

def draw(self, path: Path) -> None:
        """
        Save an image representing this `Pipeline` to `path`.

        :param path:
            The path to save the image to.
        """
        # Before drawing we edit a bit the graph, to avoid modifying the original that is
        # used for running the pipeline we copy it.
        image_data = _to_mermaid_image_offline(self.graph)
        Path(path).write_bytes(image_data)

+1 for a true offline replacement to be integrated into Haystack somehow

@PaulBFB
Copy link

PaulBFB commented Jan 3, 2025

I just ran into the same issue - is there another way to solve this?

@lbux
Copy link
Contributor

lbux commented Jan 14, 2025

I just ran into the same issue - is there another way to solve this?

Do you have code that I can run to replicate your issue? I am testing an alternative fix using mermaid-py and want to test more before making a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 Low priority, leave it in the backlog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants