-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image2latex #751
Image2latex #751
Conversation
There may be a conceptual problem here in any approach where any formatting routine writes in the name of a file: at the time that LaTeX is run, the files in the filesystem may not exist. In fact LaTeX may be run on a completely separate machine. As a result, it is more reliable to have something closer to the LaTeX run create the files. In particular we use The second thing I'd like to mention regarding formatting is something I saw in producing SVGs. We wrap in
and
in the output while Mathematica does not. And not surprisingly I think Mathematica is right and has thought about this possibly more than we have. The front-end should be adding that kind of text. We have the same So corresponding thing here would be about adding |
@@ -27,4 +30,12 @@ def boxes_to_mathml(self, elements=None, **options): | |||
) | |||
|
|||
def boxes_to_tex(self, elements=None, **options): | |||
return "-Image-" | |||
fp = tempfile.NamedTemporaryFile(delete=True, suffix=".png") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using pillow routines for writing PNG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conversion was done when Image is converted in ImageBox. imageBox stores a B64 encoded version of the PNG file. Here we just decode it and store it in a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ImageBox can keep the pillow structure, that may be a win. A more general problem we have is that in digesting things for M-Expressions we lose the efficient and sometimes more flexible properties of whatever the object was before. And we spend a lot of time in conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done now.
I am aware of that. The problem is that , as far as I know, LaTeX does not have a native way to embed raster images. Then, we have the option of inventing a new homebrew syntax, but I guess that you would agree about that we want to avoid it. |
What I have been finding so far, is the that later we can delay stuff like rasterization or "materializing" an object the better of or more flexibly we can handle results. I think there was a mistake made early on and propagated that had the idea that mathics-core will do everything. A front-end just gives it an Mathics expression and suggests some preferences and then off mathics core goes and takes it from there. No more intervention by the front end is needed. This downsides we have seen is that mathics core becomes much larger - a huge monolith in fact, there is a lot of code that duplicates work better done in specialized libraries like Pillow or SymPy, but isn't as good, and isn't as well maintained. And then on top of that, everything is slow too, partly because of the conversions going on or because it is inefficiently represented and isn't as cleverly implemented. Instead, what seems like a good alternative, is to make sure the evaluation and formatting are very well separate phases. And then have the front-end involved in dictating the formatting based on the evaluation results looking at it from a high-level. Right now the front-ends are a little messy at doing this, but I expect after we have better Boxing and Form processing, exposing Symbols better (which we've started), and going over the APIs between the front-end and mathics core, (e.g. this Result class), we can do much better. It is okay to come up with a protocol like Graphics3D that supports the kind 3D Graphics that Mathics should support. It would be nice if there is an existing 3D Graphics protocol that is powerful enough to express the kinds of things WMA supports. There might be. Ditto for 2D, and for images. I would imagine that something like Pillow can handle images. (But we would want some sort of JSON-friendly way to represent things when the front end is not Python-based). But using our own solution wrapped around a base-64 encoded PNG as the intermediate image language is a bad idea. It has already been tried in Mathics, and at least for me, doesn't work. |
It occurs to me that WMA probably as the same problem of how does it communicate an image over a wire protocol. So that should be considered too. |
Actually, it does not seem to do it well. Just try |
Actually, the box has been down the last couple of days. But I will try it when it is available. Ok, if there isn't good precedent to follow we will use our own. |
I am going to be away from the computer until the next Tuesday. After that, I will try to reformulate this. |
Now this works well enough to produce the documentation. I removed the code in The part of embedding the images in the latex code would come for another round because it requires some extra discussion. Some possibilities would be |
mathics/builtin/box/image.py
Outdated
from mathics.builtin.box.expression import BoxExpression | ||
from mathics.eval.image import pixels_as_ubyte | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No try
please. PIL is now required .
(And in the far future when image processing is moved out of Mathics-core PIL will still be required in that package. Let's not complicate.)
mathics/builtin/box/image.py
Outdated
@@ -13,18 +32,80 @@ class ImageBox(BoxExpression): | |||
an Image object. | |||
""" | |||
|
|||
def boxes_to_b64text(self, elements=None, **options): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type annotation and a docstring would be nice here.
@@ -1,6 +1,25 @@ | |||
# -*- coding: utf-8 -*- | |||
|
|||
import base64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a docstring at the top, this doesn't appear in any printed documentation or in Django docs. If that is intended, then the homegrown tagging such as ImageBox
(on line 26) is not used.
We should decide way we want to go and either add a docstring at the top or remove the homegrown tagging.
mathics/builtin/box/image.py
Outdated
encoded = b"data:image/png;base64," + encoded | ||
return encoded, size | ||
|
||
def boxes_to_png(self, elements=None, **options) -> Tuple: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type is more than just a generic tuple it is something like Tuple[<type of contents>, Tuple[int, int]]
Please add and fill in "type of contents".
A big thanks! I just tried this on the LaTeX documentation and it looks great! I am concerned about the proliferation of not-well-thought-out code do we do have to follow up on this after release and address this in a less hacky and more robust way. |
I guess that the next step would be to use markdown instead of LaTeX for building the documentation. This would allow to avoid most of the regexp hacks. Then, we could use pandoc to build the PDF. |
Of these, first option https://ctan.org/tex-archive/macros/latex/contrib/inline-images seems like the most established and is likely to be the most robust. The front end should have a way to indicate whether the result should be put in a file or put inline. Mathicsscript for example decide in certain situations that having this in the filesystem is better than an inline PDF. |
Sphinx has a markdown plugin. But it also supports RsT as well. I don't think we'll need to use pandoc directly. But this stuff is so far out on the horizon that it doesn't make sense to me to speculate. Who knows? By the time we get around to it, things will have improved or new ways added. This I think has happened with PIL, some of the cool features that fix some of the problems we have such as Big-Endian Big TIFFs (and whatever it was that broke MadTeaParty.jpg) fall into this category. |
mathics/builtin/image/base.py
Outdated
import numpy | ||
import PIL | ||
import PIL.Image | ||
import PIL.ImageEnhance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not seeing PIL.ImageFilter or PIL.ImageOps used anywhere (nor a plain PIL for that matter).
Probably left over cut and paste from older code that had this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - see small comment though and consider using f-strings with .size[0]
, size[1]
instead of *size
Work in progress. The aim of this PRis to be able to show images in the PDF documentation.