-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Refactor MediaRepository to separate out storage #2767
Conversation
44025b4
to
d402d0f
Compare
d402d0f
to
6daa0d1
Compare
6daa0d1
to
8f03aa9
Compare
This is a long long PR, though a lot of it is just adding new classes. I can split this up into separate PRs, but that would break the backup media repo stuff (as that's not added back in until the last commit) |
with media_storage.store_into_file(info) as (f, fname, finish_cb): | ||
# .. write into f ... | ||
yield finish_cb() | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not ecstatic by this API shape TBH, but I'm somewhat keen to ensure that the open file gets cleaned up correctly.
@@ -1,5 +1,6 @@ | |||
# -*- coding: utf-8 -*- | |||
# Copyright 2014-2016 OpenMarket Ltd | |||
# Copyright 2018 New Vecotr Ltd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vecotr
@defer.inlineCallbacks | ||
def get_local_media(self, request, media_id, name): | ||
"""Responds to reqests for local media, if exists, or returns 404. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs for param types and return type please
else: | ||
yield self._respond_remote_file( | ||
request, server_name, media_id, name | ||
) | ||
|
||
@defer.inlineCallbacks | ||
def _respond_local_file(self, request, media_id, name): | ||
media_info = yield self.store.get_local_media(media_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.store
is now redundant in the constructor (as are self.clock
and self.version_string
while you are there)
else: | ||
file_path = self.filepaths.local_media_filepath(media_id) | ||
|
||
yield respond_with_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would have been helpful if you could have put the switch from respond_with_file
to respond_with_responder
in a separate commit to the switch from _respond_local_file
to get_local_media
, ftr.
logger = logging.getLogger(__name__) | ||
|
||
|
||
class MediaStorage(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for future reference: it would also have been helpful if you could have built this class up over separate commits showing how it is used, rather than presenting the whole class as a fait accomplit that I then need to cross-reference to the commits where it is actually used. For example, in the first commit, add fetch_media
and use it in get_local_media
. Then add store_file
in the next commit.
filesystem_id = media_info["filesystem_id"] | ||
upload_name = name if name else media_info["upload_name"] | ||
|
||
file_path = self.filepaths.remote_media_filepath( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filepaths
now also redundant in the constructor
self.dynamic_thumbnails = hs.config.dynamic_thumbnails | ||
self.thumbnail_requirements = hs.config.thumbnail_requirements | ||
|
||
self.remote_media_linearizer = Linearizer(name="media_remote") | ||
|
||
self.recently_accessed_remotes = set() | ||
|
||
self.media_storage = MediaStorage(self.primary_base_path, self.filepaths) | ||
# List of StorageProvider's where we should search for media and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StorageProviders
self.media_storage = MediaStorage(self.primary_base_path, self.filepaths) | ||
# List of StorageProvider's where we should search for media and | ||
# potentially upload to. | ||
self.storage_providers = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this a class field rather than a local var?
@@ -32,9 +32,10 @@ class MediaStorage(object): | |||
"""Responsible for storing/fetching files from local sources. | |||
""" | |||
|
|||
def __init__(self, local_media_directory, filepaths): | |||
def __init__(self, local_media_directory, filepaths, storage_providers): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please doc the param types, especially the new one.
pass | ||
|
||
|
||
class StorageProviderWrapper(StorageProvider): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel a bit like this could just go into FileStorageProviderBackend, but perhaps you have plans or reasons that won't work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are options that will probably be used across any future storage providers, so it felt a bit silly to implement the same options across multiple ones (I have a S3 one lurking about).
For example, I can imagine in the future we might end up with config like:
storage_providers:
- name: backup_fs
store_thmbnails: false
type: FileStorageProvider
params:
directory: /foo/bar
or something
997a539
to
227c491
Compare
I've made |
Nor can I really. I think it's probably an improvement. |
synapse/rest/media/v1/_base.py
Outdated
media_type (str): The media/content type. | ||
file_size (int): Size in bytes of the media, if known. | ||
file_size (int): Size in bytes of the media. If not known it should be None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so int|None
I guess.
name (str|None): Optional name that, if specified, will be used as | ||
the filename in the Content-Disposition header of the response. | ||
|
||
Retruns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retruns
name (str|None): Optional name that, if specified, will be used as | ||
the filename in the Content-Disposition header of the response. | ||
|
||
Retruns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retruns
|
||
Args: | ||
request(twisted.web.http.Request) | ||
media_id (str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, but please could you doc media_id, here and all the other places it's used? it would help me be less confused about the difference between it and file_id.
server_name, media_id) | ||
raise SynapseError(502, "Failed to fetch remote media") | ||
|
||
yield finish() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I realised that belatedly. sorry.
os.remove(fname) | ||
except Exception: | ||
pass | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true. In that case please can youdo the fiddling with sys.exc_info (raise exc_info[0], exc_info[1], exc_info[2]
)?
I've docced the media_id params, though whether they're useful is another question. I struggled a bit to come up with something better than "media_id is a media ID" :/ Suggestions welcome |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thank you :) |
The end intention here is:
In the future we should make these behaviours configurable through config options, as well as keeping track of last access time for local as well as remote media.
We should also look into making the MediaStorage layer aware of the deletion of old content, as currently that is still done by MediaRepository/PreviewUrlResource. Otherwise, only the MediaStorage and StorageProvider classes should care where things are on disk.