Image2latex #751

mmatera · 2023-01-13T10:55:32Z

Work in progress. The aim of this PRis to be able to show images in the PDF documentation.

setup.py

rocky · 2023-01-13T13:59:56Z

There may be a conceptual problem here in any approach where any formatting routine writes in the name of a file: at the time that LaTeX is run, the files in the filesystem may not exist. In fact LaTeX may be run on a completely separate machine.

As a result, it is more reliable to have something closer to the LaTeX run create the files. In particular we use doc2latex just before running LaTeX, and internally it uses the mathics.doc.latex_doc. module.

The second thing I'd like to mention regarding formatting is something I saw in producing SVGs. We wrap in

<svg width= ... svg="http://www.w3.org/2000/svg"
                xmlns="http://www.w3.org/2000/svg"
                version="1.1"
                viewBox="%s">

and

</svg>

in the output while Mathematica does not. And not surprisingly I think Mathematica is right and has thought about this possibly more than we have.

The front-end should be adding that kind of text. We have the same <asy> wrapper for asymptote output.

So corresponding thing here would be about adding \includegraphics[width=3cm]{. I am not totally sure though.

rocky · 2023-01-13T14:01:57Z

mathics/builtin/box/image.py

@@ -27,4 +30,12 @@ def boxes_to_mathml(self, elements=None, **options):
        )

    def boxes_to_tex(self, elements=None, **options):
-        return "-Image-"
+        fp = tempfile.NamedTemporaryFile(delete=True, suffix=".png")


Consider using pillow routines for writing PNG.

The conversion was done when Image is converted in ImageBox. imageBox stores a B64 encoded version of the PNG file. Here we just decode it and store it in a file.

If ImageBox can keep the pillow structure, that may be a win. A more general problem we have is that in digesting things for M-Expressions we lose the efficient and sometimes more flexible properties of whatever the object was before. And we spend a lot of time in conversion.

This is done now.

mmatera · 2023-01-13T18:33:56Z

There may be a conceptual problem here in any

I am aware of that. The problem is that , as far as I know, LaTeX does not have a native way to embed raster images. Then, we have the option of inventing a new homebrew syntax, but I guess that you would agree about that we want to avoid it.

rocky · 2023-01-13T19:26:48Z

There may be a conceptual problem here in any

I am aware of that. The problem is that , as far as I know, LaTeX does not have a native way to embed raster images. Then, we have the option of inventing a new homebrew syntax, but I guess that you would agree about that we want to avoid it.

What I have been finding so far, is the that later we can delay stuff like rasterization or "materializing" an object the better of or more flexibly we can handle results. I think there was a mistake made early on and propagated that had the idea that mathics-core will do everything. A front-end just gives it an Mathics expression and suggests some preferences and then off mathics core goes and takes it from there. No more intervention by the front end is needed.

This downsides we have seen is that mathics core becomes much larger - a huge monolith in fact, there is a lot of code that duplicates work better done in specialized libraries like Pillow or SymPy, but isn't as good, and isn't as well maintained. And then on top of that, everything is slow too, partly because of the conversions going on or because it is inefficiently represented and isn't as cleverly implemented.

Instead, what seems like a good alternative, is to make sure the evaluation and formatting are very well separate phases. And then have the front-end involved in dictating the formatting based on the evaluation results looking at it from a high-level.

Right now the front-ends are a little messy at doing this, but I expect after we have better Boxing and Form processing, exposing Symbols better (which we've started), and going over the APIs between the front-end and mathics core, (e.g. this Result class), we can do much better.

It is okay to come up with a protocol like Graphics3D that supports the kind 3D Graphics that Mathics should support. It would be nice if there is an existing 3D Graphics protocol that is powerful enough to express the kinds of things WMA supports. There might be. Ditto for 2D, and for images. I would imagine that something like Pillow can handle images. (But we would want some sort of JSON-friendly way to represent things when the front end is not Python-based). But using our own solution wrapped around a base-64 encoded PNG as the intermediate image language is a bad idea. It has already been tried in Mathics, and at least for me, doesn't work.

rocky · 2023-01-13T20:11:09Z

It occurs to me that WMA probably as the same problem of how does it communicate an image over a wire protocol. So that should be considered too.

mmatera · 2023-01-13T20:43:59Z

It occurs to me that WMA probably as the same problem of how does it communicate an image over a wire protocol. So that should be considered too.

Actually, it does not seem to do it well. Just try Import["Hedy.tif"]//TeXForm in the WMA REPL

rocky · 2023-01-14T00:27:48Z

It occurs to me that WMA probably as the same problem of how does it communicate an image over a wire protocol. So that should be considered too.

Actually, it does not seem to do it well. Just try Import["Hedy.tif"]//TeXForm in the WMA REPL

Actually, the box has been down the last couple of days. But I will try it when it is available. Ok, if there isn't good precedent to follow we will use our own.

mmatera · 2023-01-14T01:02:06Z

I am going to be away from the computer until the next Tuesday. After that, I will try to reformulate this.

mmatera · 2023-01-16T22:54:24Z

Now this works well enough to produce the documentation. I removed the code in mathics.format.latex in favor of the boxes_to_* methods. Also, I deferred the base64 encoding to these methods instead of the constructor of ImageBox, which now stores the Pillow encoded image.

The part of embedding the images in the latex code would come for another round because it requires some extra discussion. Some possibilities would be

mathics/doc/latex_doc.py

rocky · 2023-01-17T02:23:04Z

mathics/builtin/box/image.py

 from mathics.builtin.box.expression import BoxExpression
+from mathics.eval.image import pixels_as_ubyte
+
+try:


No try please. PIL is now required .

(And in the far future when image processing is moved out of Mathics-core PIL will still be required in that package. Let's not complicate.)

rocky · 2023-01-17T02:26:17Z

mathics/builtin/box/image.py

@@ -13,18 +32,80 @@ class ImageBox(BoxExpression):
    an Image object.
    """

+    def boxes_to_b64text(self, elements=None, **options):


Type annotation and a docstring would be nice here.

rocky · 2023-01-17T02:33:58Z

mathics/builtin/box/image.py

@@ -1,6 +1,25 @@
 # -*- coding: utf-8 -*-

+import base64


Without a docstring at the top, this doesn't appear in any printed documentation or in Django docs. If that is intended, then the homegrown tagging such as ImageBox (on line 26) is not used.

We should decide way we want to go and either add a docstring at the top or remove the homegrown tagging.

rocky · 2023-01-17T02:38:28Z

mathics/builtin/box/image.py

+        encoded = b"data:image/png;base64," + encoded
+        return encoded, size
+
+    def boxes_to_png(self, elements=None, **options) -> Tuple:


The return type is more than just a generic tuple it is something like Tuple[<type of contents>, Tuple[int, int]]

Please add and fill in "type of contents".

mathics/builtin/box/image.py

rocky · 2023-01-17T03:16:42Z

Now this works well enough to produce the documentation.

A big thanks! I just tried this on the LaTeX documentation and it looks great!

I am concerned about the proliferation of not-well-thought-out code do we do have to follow up on this after release and address this in a less hacky and more robust way.

mmatera · 2023-01-17T03:21:38Z

Now this works well enough to produce the documentation.

A big thanks! I just tried this on the LaTeX documentation and it looks great!

I am concerned about the proliferation of not-well-thought-out code do we do have to follow up on this after release and address this in a less hacky and more robust way.

I guess that the next step would be to use markdown instead of LaTeX for building the documentation. This would allow to avoid most of the regexp hacks. Then, we could use pandoc to build the PDF.

rocky · 2023-01-17T03:23:27Z

The part of embedding the images in the latex code would come for another round because it requires some extra discussion. Some possibilities would be

https://ctan.org/tex-archive/macros/latex/contrib/inline-images

https://tex.stackexchange.com/questions/208819/embedding-images-in-tex-file-as-base64-strings

https://github.com/zerotoc/pdfinlimg

Of these, first option https://ctan.org/tex-archive/macros/latex/contrib/inline-images seems like the most established and is likely to be the most robust.

The front end should have a way to indicate whether the result should be put in a file or put inline. Mathicsscript for example decide in certain situations that having this in the filesystem is better than an inline PDF.

rocky · 2023-01-17T03:26:53Z

Now this works well enough to produce the documentation.

A big thanks! I just tried this on the LaTeX documentation and it looks great!
I am concerned about the proliferation of not-well-thought-out code do we do have to follow up on this after release and address this in a less hacky and more robust way.

I guess that the next step would be to use markdown instead of LaTeX for building the documentation. This would allow to avoid most of the regexp hacks. Then, we could use pandoc to build the PDF.

Sphinx has a markdown plugin. But it also supports RsT as well. I don't think we'll need to use pandoc directly. But this stuff is so far out on the horizon that it doesn't make sense to me to speculate. Who knows? By the time we get around to it, things will have improved or new ways added.

This I think has happened with PIL, some of the cool features that fix some of the problems we have such as Big-Endian Big TIFFs (and whatever it was that broke MadTeaParty.jpg) fall into this category.

rocky · 2023-01-17T13:34:01Z

mathics/builtin/image/base.py

+import numpy
+import PIL
+import PIL.Image
+import PIL.ImageEnhance


I am not seeing PIL.ImageFilter or PIL.ImageOps used anywhere (nor a plain PIL for that matter).

Probably left over cut and paste from older code that had this.

rocky

LGTM - see small comment though ~~and consider using f-strings with size[0], size[1] instead of *size~~.

mmatera added 5 commits January 12, 2023 14:02

improving Grid

9be6dc5

merge

670dc14

merge

50420fd

fix grid to latex

e079ec8

still not working

52f88c7

rocky reviewed Jan 13, 2023

View reviewed changes

setup.py Outdated Show resolved Hide resolved

rocky reviewed Jan 13, 2023

View reviewed changes

mmatera added 5 commits January 14, 2023 08:57

merge

20e0497

reformulate imagebox

0f6c8bd

Merge branch 'Image2latex' into HEAD

e042b5c

Merge remote-tracking branch 'origin/master' into Image2latex

fd4c54c

working

e217cfb

mmatera marked this pull request as ready for review January 16, 2023 22:54

mmatera commented Jan 16, 2023

View reviewed changes

mathics/doc/latex_doc.py Show resolved Hide resolved

rocky reviewed Jan 17, 2023

View reviewed changes

mathics/builtin/box/image.py Outdated Show resolved Hide resolved

last rocky´s suggestions

4f834c9

rocky reviewed Jan 17, 2023

View reviewed changes

rocky approved these changes Jan 17, 2023

View reviewed changes

removing trailing imports

f2bfb3c

mmatera merged commit b6798af into master Jan 17, 2023

mmatera deleted the Image2latex branch January 17, 2023 14:30

rocky mentioned this pull request Jan 18, 2023

Remove Doc Boxing (for now) and replace some autogenerated summaries #757

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image2latex #751

Image2latex #751

mmatera commented Jan 13, 2023

rocky commented Jan 13, 2023 •

edited

Loading

rocky Jan 13, 2023

mmatera Jan 13, 2023

rocky Jan 13, 2023

mmatera Jan 16, 2023

mmatera commented Jan 13, 2023

rocky commented Jan 13, 2023 •

edited

Loading

rocky commented Jan 13, 2023

mmatera commented Jan 13, 2023

rocky commented Jan 14, 2023

mmatera commented Jan 14, 2023

mmatera commented Jan 16, 2023

rocky Jan 17, 2023 •

edited

Loading

rocky Jan 17, 2023

rocky Jan 17, 2023

rocky Jan 17, 2023

rocky commented Jan 17, 2023

mmatera commented Jan 17, 2023

rocky commented Jan 17, 2023

rocky commented Jan 17, 2023 •

edited

Loading

rocky Jan 17, 2023

rocky left a comment •

edited

Loading

Image2latex #751

Image2latex #751

Conversation

mmatera commented Jan 13, 2023

rocky commented Jan 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmatera commented Jan 13, 2023

rocky commented Jan 13, 2023 • edited Loading

rocky commented Jan 13, 2023

mmatera commented Jan 13, 2023

rocky commented Jan 14, 2023

mmatera commented Jan 14, 2023

mmatera commented Jan 16, 2023

rocky Jan 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rocky commented Jan 17, 2023

mmatera commented Jan 17, 2023

rocky commented Jan 17, 2023

rocky commented Jan 17, 2023 • edited Loading

Choose a reason for hiding this comment

rocky left a comment • edited Loading

Choose a reason for hiding this comment

rocky commented Jan 13, 2023 •

edited

Loading

rocky commented Jan 13, 2023 •

edited

Loading

rocky Jan 17, 2023 •

edited

Loading

rocky commented Jan 17, 2023 •

edited

Loading

rocky left a comment •

edited

Loading