-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine RawContent and TextContent into Content #149
Conversation
Okay, thinking on this a little more–the FileField is pretty much entirely redundant, as well as being one of the biggest parts of the table. The storage location is based entirely on the LearningPackage UUID + the |
Other self notes before I forget:
|
The compression thing came to mind because I was looking through some example course data and there are a handful of Capa problems that weigh in at ~13-14 KB. But when compressed with There are HTML blocks that are dramatically larger than this, but that's because they're encoding images into the raw HTML using base64-encoded data URLs ( |
Okay, I poked into the compression thing just a little bit further. I'm going to stop now because it's not critical to get in for the short term, but I have a general plan for it:
At the time of write, we run zlib compression on the text and decide whether to use the compressed or uncompressed field for this row. The other field is left null. When we first introduce this feature, we can run it as a data migration, though that wouldn't be a requirement. Pruning is still the more important feature for controlling the content size growth. |
f2cd20a
to
6c1f89b
Compare
@bradenmacdonald, @kdmccormick: A little later than I had hoped, but it's ready for real review. |
@bradenmacdonald, @kdmccormick: Do you folks know what this mypy error is about by any chance?
|
@ormsbee It's telling you that when constructing a You can ignore the |
It's fundamentally different in this case. But it's a nullable field, so it should be permitted. Does the type-checker just not accept nullable text fields? |
dd74059
to
ffad90c
Compare
That would surprise me, so I tested it out by changing the definition of # text = MultiCollationTextField(
text = models.TextField(
blank=True,
null=True,
max_length=MAX_TEXT_LENGTH,
# We don't really expect to ever sort by the text column, but we may
# want to do case-insensitive searches, so it's useful to have a case
# and accent insensitive collation.
# db_collations={
# "sqlite": "NOCASE",
# "mysql": "utf8mb4_unicode_ci",
# }
) and that type-checked fine (except for a new error that pops up on openedx_learning/core/components/admin.py:167, where I think you need to change So, I think the issue that the field-value type argument (specifically, If that ends up blocking this PR, you could hack around the error for now by adding a type annotation directly to # TextField type args are [TypeForGetting,TypeForSetting]
text: models.TextField[str|None, str|None] = MultiCollationTextField(
blank=True,
null=True,
max_length=MAX_TEXT_LENGTH,
# We don't really expect to ever sort by the text column, but we may
# want to do case-insensitive searches, so it's useful to have a case
# and accent insensitive collation.
db_collations={
"sqlite": "NOCASE",
"mysql": "utf8mb4_unicode_ci",
}
) |
@ormsbee if you want to just work around this for now using The only reason I balk at just using |
@kdmccormick: I respect and appreciate your inner type-checking nerd. I'll use the workaround you suggested. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general shape of the refactoring looks great.
I'm about 2/3rds through; I'll leave the rest of my review after lunch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alrighty, all I have is a bunch of docstring nits.
I'm sure my next review will be a ✅ , so feel free to merge if someone beats me to it.
1e17b5d
to
fdc11ee
Compare
@kdmccormick: Incorporated all your suggestions except this one on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
This is mostly pulling together
RawContent
andTextContent
into a unifiedContent
model in order to reduce confusion and not force text to always go to a file-based storage backend.Other things that may also be a part of this PR as issues I've noticed along the way:
FileField
to save space (and use the low level storages API instead)lru
in rollback situations.