[WIP] [Data rearchitecture] Implement article status manager for timeslices #6083
+827
−102
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does
This PR implements a new
ArticleStatusManagerTimeslice
class that "imitates" the process previously done byArticleStatusManager
class.Analysis of the ArticleStatusManager class (revisions verison)
The article status manager has several steps. Conceptually,
ArticleStatusManager
modifies articles and revisions. Both entities are course agnostic, i.e., they are not associated with a particular course. This is a fundamental difference with the new system because although the entity article was not modified, the revisions entity no longer exists. Instead,we have articles course timeslices, which are entities associated to a specific course. This is something to pay special attention to, because if several courses share the same article, article course timeslices should be updated for all courses that use that article.
New implementation
Due to the fundamental differences between the systems, it is not possible to make a direct translation from one implementation to the other. It is necessary to redesign the behavior of the class thinking about the new properties that the timeslices system presents.
As part of this PR, we modified the logic for article course timeslice creation. Before this PR, timeslices were created only for articles courses (articles edited by the course users in the tracked namespaces). Now, timeslices for all articles with at least one revision are created, but we only create article course records for those that are pertinent to the course. This is to have data for articles that, for example, are created in draft but then move to the main namespace.
The new
ArticleStatusManagerTimeslice
class does the following:namespace
,deleted
andmw_page_id
fields.Open questions and concerns
It looks like there are two main use cases for the
ArticleStatusManager
class. The usual one is through the periodic course update. The other one is through the cleanup scripts:docs/cleanup_scripts/duplicate_articles.rb
anddocs/cleanup_scripts/duplicate_mw_page_id_handling.rb
. For the latter case,update_status
with a single article is invoked, and it's kind of a special case. I'm not entirely sure if those cleanup scripts are still used, as it seems that we added a db restriction in the Articles table (see #4381). If the scripts are no longer used then we can simplify the new implementation.