Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hook to include comments #312

Open
dmadisetti opened this issue Oct 24, 2024 · 7 comments
Open

Hook to include comments #312

dmadisetti opened this issue Oct 24, 2024 · 7 comments
Labels
On hold Waiting for more input from reporter

Comments

@dmadisetti
Copy link

This is likely a won't-fix as it is particular to certain review software.

However, some collaborative editors like overleaf allow for "external comments". Packages like comment allow for annotations viewable in most readers.

latexdiff seems great for isolating text across the markup, is there any reasonable way to hook in and automatically add these comments? Flexible on the input, but here's an example of a json comment dump from overleaf:

{
  "comments": [
    {
      "id": "6484839c1d760c36e6000001",
      "op": {
        "c": "The text that is highlighted, with first character at 15233",
        "p": 15233,
        "t": "6484839c1d760c36e6000001"
      },
      "metadata": {
        "user_id": "539f53e5eef2f999414858da",
        "ts": "2023-06-22T17:23:44.167Z"
      }
    }
  ],
  "threads": {
    "6484839c1d760c36e6000001": {
      "messages": [
        {
          "id": "539f53e5eef2f999414858da",
          "content": "Message content of the thread",
          "timestamp": 1687454622182,
          "user_id": "539f53e5eef2f999414858da",
          "user": {
            "id": "539f53e5eef2f999414858da",
            "email": "user"
          }
        }
      ],
      "resolved": true,
      "resolved_at": "2023-06-22T17:23:45.067Z",
      "resolved_by_user_id": "539f53e5eef2f999414858da",
      "resolved_by_user": {
        "id": "539f53e5eef2f999414858da",
        "email": "user"
      }
    },
  }
}
@dmadisetti
Copy link
Author

Related: #49

@ftilmann
Copy link
Owner

I also use Overleaf quite a lot, so could be interesting but I am not sure what exactly you are proposing. Also not sure if it is really something latexdiff can help with, but probably become clearer if you explain

Actually, I was not aware how to get the json dump you are pasting out. The fact that it is hard to keep a history and that they are not visible in the pdf has kept me to not use the commenting feature of Overleaf. So I would be curious to know how one gets this json dump?

@dmadisetti
Copy link
Author

dmadisetti commented Oct 25, 2024

I forked and extended a web socket API reversal of Overleaf for vim:

https://github.com/dmadisetti/AirLatex.vim

This is the data I use to do comment highlighting in my plugin. I should probably document it a bit more, but the json I pasted I pull straight from an old log

In my workflow as is, I just have a script that dumps the list of comments


I'm proposing using latexdiff to add the comment tags provided in some format around the relevant text, in addition to the diff.

I think this could be done as is, by preprocessing the documents and adding the comments at the relevant positions, and then using comment mode, but maybe there's a better way to do this?


I did take a very quick/ naive stab at this just for comments awhile back, and it seemed like parsing across the AST was going to be the hardest part (consider a highlight that only captures half a tag and some text- you need some semantic structure understaning to properly insert the annotation without breaking the document). latexdiff seems to already handle this to some extent? I haven't dug too deeply in, but the parsing seems solid base

@dmadisetti
Copy link
Author

... I also considered writing a GitHub bot that would auto add comments, but I have to do real work sometimes (:

and decided it was too much of a time suck

@ftilmann
Copy link
Owner

Indeed, it would be very cool to be able to do a git pull from overleaf and have the comments added to the tex in a way that they are rendered in the pdf.

I see what you mean. I don't see a fully straightforward way to implement this in latexdiff, although many latexdiff subroutines could be useful for what you are proposing. The approach taken by latexdiff is quite simplistic in some ways. It uses the perl pattern matching for splitting the text into tokens and words (admittedly, the pattern is very complicated to deal with nested parenthesis etc). Normally, command arguments are treated as opaque, so that a command and its arguments are treated as atomistic tokens but for a set of commands (so-called textcommands), the argument can be opened up to scrutiny in a second pass. The markup just marks up simple sequences, making sure every scoping token (e.g. {,}} is outside any markup, except for command arguments (closed scopes) which sometimes can be fully contained. This works well most of the time, but over time many situations where this simple approach breaks have arisen, resulting in a by now a quite baroque set of pre- and post-processing to make it work as intended.

I am thinking of implementing a feature to add or suppress markup with special latexdiff comment tags inside the new file, which would make it easier to implement what you are proposing. I can't give a timeline for that, as the 'real job' does not leave me a lot of time for latexdiff.

I don't use vim, so if your plugin could be refactored into a command line tool, so that I could say pull-overleaf-comments <url-of-my-overleaf-project> and would get the json you pasted that would make it easier for me to play and try out things, but I have to admit that I can't really see an easier way to tackle this then first implementing the feature idea mentioned above.

@ftilmann
Copy link
Owner

ftilmann commented Nov 5, 2024

I am thinking of implementing a feature to add or suppress markup with special latexdiff comment tags inside the new file, which would make it easier to implement what you are proposing. I can't give a timeline for that, as the 'real job' does not leave me a lot of time for latexdiff.

OK, in the end I was much faster than expected, and I implemented this promised feature (last commit for this feature 33c99ee, at least for now), though it has not been tested thoroughly. Have a look at the description, e.g. directly in source code

latexdiff/latexdiff

Lines 5149 to 5232 in 33c99ee

=head1 DIRECTIVES
Sometimes, the output C<latexdiff> produces is not satisfactory or
some complicated constructions even lead to difference tex file that
leads to error. It is possible to give
latexdiff some hints to control the markup by placing some special
comments, termed I<directives> into the tex file. Directives mark
blocks by paired C<BEGIN> and C<END> directives. It is important that
the directives are written exactly as specified below,i.e., all
letters need to be capitalised and there has to be exactly one space
between BEGIN/END and the block type. However, after the directive
arbitrary comments can be added. Nesting of blocks or overlapping
blocks are not parsed correctly and will cause undefined behaviour.
Blocks can be spanning across scope boundaries; they can also be used in the last argument of text commands.
If they appear in the arguments of other commands, then latexdiff will assume they were placed before or after
the command; it is best to avoid this.
=over 10
=item C<DIFADD> block
...
%BEGIN DIFADD
...
%END DIFADD
...
Everything enclosed between the C<%BEGIN DIFADD> and C<%END DIFADD> directives will be treated as atomistic addition to the
text. The interior will be marked up as added text following the
normal rules for what is marked up. A use case for this directive is
when a paragraph has been changed substantially but retains some of
the phrasing of the original paragraph. As latexdiff prefers to find
a minimal difference between two files, such a configuration will
usually lead to a fragmented markup, with several added and deleted
sentences or parts of sentenced and a few remaining phrases marked as
unchanged. With the use of this directive it is possible to mark the
whole modified segment as new, which will then be marked-up `en bloc'
as new, and the old part as one block of deleted material, which is
usually clearer than the fragmented default markup.
C<DIFADD> block directives must be placed into the the body of the new
file. Those directives are ignored in the preamble or in the old file.
=item C<DIFDEL> block
...
%BEGIN DIFDEL
...
%END DIFDEL
...
Everything enclosed between the C<%BEGIN DIFDEL> and C<%END DIFDEL>
directives will be treated as atomistic deleted text.
The interior will be marked up as deleted text following the
normal rules for what is marked up.
C<DIFDEL> block directives must be placed into the the body of the old
file. Those directives are ignored in the preamble or in the new
file.The use case is similar to that of the C<DIFADD> blocks, but the
hint is placed in the old file. In most cases, is sufficient to either
hint in the old file with a C<DIFDEL> block I<or> in the new file with
a C<DIFADD> block and latexdiff will take care of the rest.
=item C<DIFNOMARKUP> block
...
%BEGIN DIFNOMARKUP
...
%END DIFNOMARKUP
...
The text between the markers will be included in the diff algorithm
but no actual markup will be made in this part of the text. It
will show the new text only and suppress the old text. If the text
immediately above the DIFNOMARKUP block has been added a
C<\DIFaddend> will be placed directly above the C<%BEGIN DIFNOMARKUP>
line and any open C<\DIFadd> command terminated, equivalently for
deleted blocks and for text added or deleted immediately after the
C<%BEGIN DIFNOMARKUP>. The main purpose of this command is to salvage
the situation if latexdiff has produced invalid or visually
unacceptable output - markup in the offending passage can be
suppressed by surrounding it with C<DIFNOMARKUP> directives and
rerunning latexdiff, thus enabling markup of the rest of the document.
This pair of directives must be placed in the new file and will be
ignored in the old file (or the preambles of either file).

If you mark the beginning and end of your comment with the special comments (directives) %BEGIN DIFFADD and %END DIFADD and then compare the file with itself

latexdiff --no-del file.tex file.tex > file-w-markup.tex

then all the commented regions are marked with the blue underline.
If one of the directives is in a textual argument (which also has to be last and the command has to be in a (configurable) list of commands with text arguments), it should do the markup anyway correctly on the word level. If these directives are used inside another type of command (that might even not have a visible output), then the markup will begin before/after that other command, which should be almost always what you would want.
You can place information on the actual content of the comment on the same line as the ```%BEGIN DIFFADD`, and then have a post-processor that adds the necessary latex commands to display it.

One shortcoming for your use case is that currently nesting is not allowed, while I think Overleaf allows nesting and even partially overlapping comments. I estimate it would not be too difficult to implement the possibility for nesting and/or overlapping in latexdiff in a simple way, but there is some ambiguity how to best mark this up (I guess using multiple colours would be great, but also somewhat more complicated to implement). For now, I will not develop this further as for my envisaged original use case nesting is not needed but let me know if this seems promising to you. If you provide some examples/scripts I could definitely have another look.

@dmadisetti
Copy link
Author

Sorry I didn't follow up. An export command means rewriting and extracting the auth and web socket logic

I'll give this a spin by incorporating it directly into my plugin, and then try to decouple after

But thanks! Will follow up

@ftilmann ftilmann added the On hold Waiting for more input from reporter label Nov 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
On hold Waiting for more input from reporter
Projects
None yet
Development

No branches or pull requests

2 participants