Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the comment watermark format #1076

Closed
0x2b3bfa0 opened this issue Jun 23, 2022 · 15 comments
Closed

Improve the comment watermark format #1076

0x2b3bfa0 opened this issue Jun 23, 2022 · 15 comments
Assignees
Labels
cml-comment Subcommand p2-nice-to-have Low priority

Comments

@0x2b3bfa0
Copy link
Member

Watermarks are invisible strings inserted on CML comments1 so they can be differentiated from comments created by users. Our current watermark format is all but uniquely identifiable, and should probably be updated.

cml/src/cml.js

Line 181 in 7bd9257

: '![CML watermark](https://raw.githubusercontent.com/iterative/cml/master/assets/watermark.svg)';

Note that watermarks use a blank image instead of an HTML comment, because not all forges support the latter in their respective flavours of Markdown.

Proposal

Use the image description to store arbitrary data encoded as Base64 (e.g. useful for live metrics) and the image link to point to an invisible, tiny image in a static host of our choice.

![eyJrZXkiOiAidmFsdWUifQ==](https://cml.dev/9b0e332381649d87112c181dae1568640ea055b3100e5253d86410d276db8af9/pixel.gif) 

Precursors

Footnotes

  1. E.g. created by running cml send-comment

@0x2b3bfa0
Copy link
Member Author

0x2b3bfa0 commented Jun 23, 2022

Parsing data in watermarks

Simple use case of embedding machine-readable metrics into reports; hacky1 but functional.

import base64
import commonmark
import json


WATERMARK = "https://cml.dev/9b0e332381649d87112c181dae1568640ea055b3100e5253d86410d276db8af9/pixel.gif"


def extract_metrics(report: str) -> any:
    for node, _ in commonmark.Parser().parse(report).walker():
        if node.t == "image" and node.destination == WATERMARK:
            return json.loads(base64.b64decode(node.first_child.literal))


if __name__ == "__main__":
    with open("report.md", "r") as report:
        print(extract_metrics(report.read()))

Footnotes

  1. Note that the commonmark package is deprecated; a production-ready example would look slightly different and use another library.

@casperdcl casperdcl added the cml-comment Subcommand label Jun 23, 2022
@casperdcl
Copy link
Contributor

Use the image description to store arbitrary data encoded as Base64

🤢

@0x2b3bfa0
Copy link
Member Author

We can also use Unicode zero-width watermarks: 🤔

@tasdomas
Copy link
Contributor

tasdomas commented Sep 26, 2022

Looks like there are several ways to do this:

  1. Add a base64 string to the image alt text: ![CML watermark <base64 snipped>](https://raw.githubusercontent.com/iterative/cml/master/assets/watermark.svg)

  2. Add a base64 string to the image title: ![CML watermark](https://raw.githubusercontent.com/iterative/cml/master/assets/watermark.svg "base64")z

  3. Add a CML logo with a link to https://cml.dev/ and the link title being base64 encoded data: [![CML](https://cml.dev/logo.png)](https://cml.dev "base64")

  4. There's also the option of using zero-width unicode characters, but this feels a tad hacky

Also, I think it looks a bit amateurish to include resources hosted by github (the watermark svg) - we should at least host it ourselves.

@tasdomas tasdomas self-assigned this Sep 26, 2022
@0x2b3bfa0
Copy link
Member Author

it looks a bit amateurish to include resources hosted by github (#1076) - we should at least host it ourselves

Playing devil's advocate: hosting it on GitHub is a good guarantee it can't be used for tracking. 🤷🏼‍♂️

@tasdomas
Copy link
Contributor

tasdomas commented Sep 26, 2022

Playing devil's advocate: hosting it on GitHub is a good guarantee it can't be used for tracking. 🤷🏼‍♂️

That's putting a lot of faith in github :)

@0x2b3bfa0
Copy link
Member Author

0x2b3bfa0 commented Sep 26, 2022

There's also the option of using zero-width unicode characters, but this feels a tad hacky

I proposed this because as David Ortega pointed out a few days ago, the current image-based watermarks are visible at the bottom of mail notifications. Yes, it's “a little bit” more hacky. 😄

Example

Captura de pantalla 2022-09-26 a las 13 46 31

@tasdomas
Copy link
Contributor

Option 1
CML watermark cnVuLWlkPTMwOTgzODIwNzQ=

Option 2
CML watermark

Option 3
CML (obviously the logo would need to be smaller)

@0x2b3bfa0
Copy link
Member Author

@0x2b3bfa0
Copy link
Member Author

How it looks in email notifications:

Captura de pantalla 2022-09-26 a las 13 56 53

@tasdomas
Copy link
Contributor

Option $e$ and 3 appear to be most viable

@casperdcl
Copy link
Contributor

casperdcl commented Sep 26, 2022

  1. poor rendering: could this be fixed by using PNG instead of SVG?
  2. poor URL: I'd be fine with replacing raw.githubusercontent.com/iterative/cml/master/assets -> static.iterative.ai/cml
  3. guaranteeing no hidden tracking: the above (static.iterative.ai) is open sourced at https://github.com/iterative/static to prove no nefarious activity
  4. base64: why? what is the use case? What metadata do we actually need to inject? If it's just a comment ID then plaintext | urllib.parse.quote is enough?

@0x2b3bfa0
Copy link
Member Author

0x2b3bfa0 commented Sep 26, 2022

urllib.parse.quote

hashlib 🤺

@tasdomas
Copy link
Contributor

4. base64: why? what is the use case? What metadata do we actually need to inject? If it's just a comment ID then plaintext | `urllib.parse.quote` is enough?

base64 is not strictly necessary - my current thinking is to store the watermark scope with the watermark.

@dacbd
Copy link
Contributor

dacbd commented Feb 17, 2023

completed
#1173
#1309

@dacbd dacbd closed this as completed Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cml-comment Subcommand p2-nice-to-have Low priority
Projects
None yet
Development

No branches or pull requests

4 participants