Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2529: Proposal to use existing events as captions for images #2529

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions proposals/2529-text-messages-as-captions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Use existing m.room.message/m.text events as captions for images
uhoreg marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, splitting image and caption into separate events will make bridging very complicated. When to send the image? How long until the caption is sent, then?

it'll still be practically impossible to bridge an image+caption away from matrix, the caption would basically have to be rendered as a separate message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that be very bad? For example, bridging to Discord would result in two messages next to each other if they cannot be combined by debouncing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as messages can be sent in between the image and the caption the entire caption context is lost.

Soru personally considers this a blocker, TBH.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easier bridging (and handling captions in general) is one of the main benefits of #2530 btw

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to embed the caption in the image event content


## Background

There is a demand to be able to apply a text caption to an image, as is
Copy link
Contributor

@MurzNN MurzNN Jun 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already several MSC about adding text to images (this one MSC2529, MSC2530, MSC2881, MSC2674, maybe more? please point to them), but we can split their ideas to two types:

  1. Classic behavior: media + caption: this is about adding text description, that comments the media. It suit well for articles, where we already have a long text and media (image, video, file) is just an item of the whole story, and it needs a short text description. In that cases it can be named "media caption" or "media description".

  2. Chat behavior: text + media(s): This is a common usage of media in chat style conversations, where we have short text messages and a single media item (or series of several medias), that adds some information to that short text. So actually the text is the main entity, and the media is just an addition to it. The examples can be:

  • Here is my photos from last weekend [image1, image2]
  • My payment is rejected, here is the file with the rejection message: [file.pdf]
  • I've got the "Access denied" error, screenshot is attached. How can I solve this problem?

Also very often we have not a single media, but several ones (two or more photos, several following screenshots, etc) and the "media + caption" behavior doesn't suite for such cases at all!

So I want to discuss the whole idea that the classic media caption implementation does not suit well for Matrix ecosystem (and other chat systems too), and that the text + media(s) behavior suits much better for chat style conversations. I agree that sometimes people really want to post a media and add caption to it (for people with disabilities, or to comment what is displayed on the image), but in most cases the behavior is totally backwards!

Please share your own thoughts about this idea!

And maybe discuss this in some better place, could anyone suggest it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the better place to discuss it is here element-hq/roadmap#13

possible in other chat platforms. In Matrix this is not possible, so people
will generally send two events: one `m.image`, then a `m.text` event
immediately afterward to simulate a caption.

Better would be to able to explicitly mark an event as a caption.

## Proposal

Allow an optional `m.relates_to` field in the `content` field of a text message
turt2live marked this conversation as resolved.
Show resolved Hide resolved
event.

Example:

```
...
"content": {
"body": "Caption text",
"msgtype": "m.text",
"m.relates_to": {
"event_id": "$(some image event)",
"rel_type": "m.caption"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc this is a further metadata leak in encrypted rooms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure this is the right type under MSC1849: there was fairly long discussion at some point about how we shouldn't need to define new relationship types very often, if at all.

}
},
```

If a client recognises the `rel_type`, they can render the caption with the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should a client do if someone tries to caption a message that isn't m.image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe they should be allowed to, but at render time, there is no change, since the rendering client will ignore the rel_type if the target isn't an image. Not sure.

image rather than as a separate message in the timeline.

The benefit of this is that if a client doesn't support or recognise the
`m.caption`, it can ignore the relation and just render the message inline.

This would not require aggregation from the server since there will always be a
need to send the event separately anyway.

## Potential issues

* Not sure how this relates to the broader questions discussed in MSC1849
* This is catering to a narrow use-case requirement. There may be a more general solution available
* Would MSC1767 (extensible events) obsolete this?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extensible events would definitely include captions, but IMO basic features like captions shouldn't be blocked on MSCs that aren't going to happen any time soon