Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discovery: tracking upstream-downstream content links #242

Open
Tracked by #1097
navinkarkera opened this issue Dec 10, 2024 · 15 comments
Open
Tracked by #1097

Discovery: tracking upstream-downstream content links #242

navinkarkera opened this issue Dec 10, 2024 · 15 comments
Assignees

Comments

@navinkarkera
Copy link

  • Designs
  • Courses should display list of libraries from which content is being used in it. (Not yet finalized)
  • Courses should display list of content blocks from libraries that was updated in libraries after being imported.
  • Course outlines should display a notification when any library content under it has been updated in the original library.
  • Additional notes:
    • Update or ignore all updates for components in a given course.
    • Ability to sort components by library?
@bradenmacdonald
Copy link
Contributor

@navinkarkera We also need to use this data for the opposite case: within libraries, showing all the courses in which a given component (or Unit etc.) is used.

@bradenmacdonald
Copy link
Contributor

@kdmccormick and/or @ormsbee will probably be interested in reviewing this plan.

@kdmccormick
Copy link
Member

Something to consider: This needs to be backwards-compatible with instances that are upgraded from Sumac to Teak, and thus may have some V2 Library References that are not persisted in these new tables. There are a couple ways to handle this. One would be to mandate a backfill migration on the Sumac->Teak upgrade. Another would be to accept that the Library Sync page may be missing some of these existing downstream-upstream connections, and provide a "Refresh" button which would check for them.

@navinkarkera
Copy link
Author

@kdmccormick Thanks! I am actually thinking of tracking links in meilisearch index document of course blocks instead of new database tables. @pomegranited is yet to take a look and review it but re-indexing courses should automatically fill up upstream links for all old blocks and new ones are handled by signal/events.

Let me know if it is a terrible idea 😅

@bradenmacdonald
Copy link
Contributor

@navinkarkera What's the advantage of using Meilisearch in that case? One hesitation I have is that so far all the core libraries, component, and upstream/downstream APIs don't have any dependency on Meilisearch (only the UI and the search APIs do), and I think it's better to stick with that until everyone is comfortable with using Meilisearch. For example, MIT and 2U still have some unanswered questions about its reliability and scalability so they aren't using it yet.

@pomegranited
Copy link

@navinkarkera @bradenmacdonald Ya.. I don't have a problem with using Meilisearch as a source for this info in the frontend, but I don't think it should be the authoritative source. We should always be able to re-create the search data from the database.

@bradenmacdonald
Copy link
Contributor

@pomegranited Well I believe in this case it's not an authoritative source in any case. We can re-create the links at any time by scanning all the OLX (in modulestore+learning core). It's just very slow to do that.

My concern is more that this upstream-downstream tracking is going to be a pretty core functionality, and so I'm not sure it should depend on Meilisearch. But I'm open to the idea. I guess the upstream-downstream links themselves will continue to work just fine, so this is more about discovering them. I just want to know if there's really any advantage to using Meilisearch for this use case; if not, we might as well use MySQL.

@navinkarkera
Copy link
Author

@pomegranited Like @bradenmacdonald pointed out, meilisearch won't be the authoritative source for this data but the upstream_ref in course block olx as it is now, so we can recreate the index any time.

Well I believe in this case it's not an authoritative source in any case. We can re-create the links at any time by scanning all the OLX (in modulestore+learning core). It's just very slow to do that.

@bradenmacdonald Yes, but the index will be updated as part of current reindex process which already goes through each xblock olx and indexes it.

My concern is more that this upstream-downstream tracking is going to be a pretty core functionality, and so I'm not sure it should depend on Meilisearch. But I'm open to the idea.

My understanding is that you need meilisearch setup for libraries v2 to work, which in turn means that you need it for importing/linking content from it to courses.

I just want to know if there's really any advantage to using Meilisearch for this use case; if not, we might as well use MySQL.

The biggest advantage would be the reading speed from the index, support for searching filtering and sorting the links.

I am not opposed to the idea of creating a database table but to support search, filter and sorting it would be better to store it in the index as well. Also, except for upstream ref (library usage key) everything else can be derived so storing it again is not really necessary (except if there are cases where we cannot use the index and need the information at decent speed ).

@bradenmacdonald
Copy link
Contributor

My understanding is that you need meilisearch setup for libraries v2 to work, which in turn means that you need it for importing/linking content from it to courses.

@navinkarkera No - you need Meilisearch for the UI/frontend to work, but all the libraries v2 REST/python APIs (other than search, obviously) will work just fine without Meilisearch - including the upstream reference tracking. For example, someone could build an alternate UI for content libraries that doesn't have any search/filtering features, and just uses the "regular" libraries REST APIs and it would work perfectly fine.

support for searching filtering and sorting the links.

Do we need that? I thought our only use case was looking up by either "downstream course ID", "upstream component ID", or "downstream component ID", which MySQL will support just fine.

@kdmccormick
Copy link
Member

Hm, interesting idea @navinkarkera . Thinking out loud:

  • Persisting the content links to mysql would allow us to make foreign keys against upstream-downstream links. I always saw this as something we would need, but now that we're challenging the idea, I can't actually think of anything we'd want to hang off the link table. Can anyone else? @ormsbee ?
  • As has been pointed out, the OLX will always be the authoritative source, so regardless of which route we take, we'll want to be careful to update the link table/index whenever OLX is imported or a course is edited.
  • Backfilling will also be important.
    • With Mysql, it'd probably be a custom migration
    • How would this work with meilisearch? Would we need to tell operators to rebuild indexes upon migrating to Teak?
  • I see us discussing "do operators need meili for X", which I appreciate us taking into consideration. At the same time, does anyone know where we stand on making meili a platform-wide dependency, replace ES? If that's already on the Teak roadmap, then we do not need to be factoring it into this decision.

@bradenmacdonald
Copy link
Contributor

I see us discussing "do operators need meili for X", which I appreciate us taking into consideration. At the same time, does anyone know where we stand on making meili a platform-wide dependency, replace ES? If that's already on the Teak roadmap, then we do not need to be factoring it into this decision.

I had hoped that people would test it out with Redwood and we'd be able to make that decision now. That hasn't happened. But with Sumac it's now default in Tutor and we're going to get a lot more feedback about using Meilisearch in production, so we'll hopefully know soon. However, some stakeholders are definitely uncomfortable with making Meilisearch a core dependency, so for now it's best to plan as if it's going to be optional/swappable where we can.

@ormsbee
Copy link

ormsbee commented Dec 17, 2024

I would also favor having the link relationship expressed in a Django model to start, for all the reasons @bradenmacdonald mentioned.

@navinkarkera
Copy link
Author

navinkarkera commented Dec 18, 2024

build an alternate UI for content libraries that doesn't have any search/filtering features, and just uses the "regular" libraries REST APIs and it would work perfectly fine.

@bradenmacdonald Ohh, makes sense.

Do we need that? I thought our only use case was looking up by either "downstream course ID", "upstream component ID", or "downstream component ID", which MySQL will support just fine.

The designs indicate a search bar and sort options for both libraries and library blocks used in course pages.

Persisting the content links to mysql would allow us to make foreign keys against upstream-downstream links. I always saw this as something we would need, but now that we're challenging the idea, I can't actually think of anything we'd want to hang off the link table.

@kdmccormick On this note, @pomegranited suggested that if we don't add foreign key links to this new table, it can be defined in learning core independent of edx-platform, is this something that we need? I'll come back on this.

@kdmccormick
Copy link
Member

Persisting the links Learning Core is an interesting idea. It would force us not to assume that upstreams are always content libraries and that downstreams are always courses--and I think that'd be positive.

I don't see why that would preclude us from creating foreign keys to the new table, though? Anyway, curious what you come up with here.

@navinkarkera
Copy link
Author

@kdmccormick We are planning to link upstream i.e. PublishableEntity to the new table but still need to store usage keys for both upstream and downstream as it will help us store link information for not yet imported course or library pairs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

5 participants