-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: packages don't have a status #1081
Comments
@fiver-watson can you start thinking about how to incorporate this status in the UI? |
Additional thoughts on this issueExisting enums and entities While we agree that the current statuses used on package tables are in fact statuses about the workflow running and not the package itself, the developers have pointed out that until recently, we did not have an entity for "Preservation Actions" - and as such, in the code the current enum is labeled as package status - see:
Currently these two enums mostly duplicate each other, minus the unused statuses. I am not sure where each are used, but I am hoping that with this work, we can better distinguish the two. Another possible package status needing opinions: Depending on when in a workflow we create database entries etc, we may want to have a On the proposed I can see how this will be useful, much like my proposed Example: let's just imagine that some stubborn archivist is trying to learn the validation rules through trial and error. Every time the package fails, they make changes based on the first validation error reported, and no more. In the end, they attempt to ingest the same package 10 times, before it finally succeeds on the 11th. Each package attempt will have an identical name, with pretty identical details on its contents (depending how far it gets), though I think it will be assigned a different UUID by Enduro (as a separate unique attempt). If we maintain that information indefinitely, then every time this user wants to search for their package, they will have to wade through 10 other incorrect results to find the desired "STORED" package. Now multiply this by thousands of packages over the course of a year. Over time, I suspect that the user experience of the system will degrade quickly when there is so much noise. How can we keep useful information about failed packages and workflows for enough time, but not so long that they simply become extra noise impeding the use of the system? UI Design proposalIdeally, I would like to tie this into the work proposed in Enduro issue #955 , when we reorganize the package details page. I think that at a glance, package status will be less important to determine than workflow status, as operators will be mainly using Enduro as an ingest engine and not a digital repository for long-term storage management, etc. Consequently, I don't want to add another "badge" that will draw the eye in the same way as the current workflow statuses, and potentially lead to momentary confusion as users try to determine which status is which. My suggestion is that we use bolding and color (with a high enough contrast ratio to ensure accesibility - if needed, i can propose hex codes for some of the statuses), but no badge for the status, and simply add it to the package details. We can also take this opportunity in the current UI to remove the duplicate workflow status badge being shown in the package details area, and replace it with a package status instead, like so: Once Issue #955 is implemented, the redeigned package details could look like so: Thoughts and feedback welcome! |
An additional thought for consideration: I don't know how much of a headache this might be for the devs and happy to think of alternatives, but even with a proposed new status, I kind of liked Sara's original idea of, on the package browse page, reusing the location column to just show "DELETED" (or "REMOVED" or similar) instead of a location. I do think it's helpful to be able to tell in SOME way from the pkg browse page (without having to click through to the package details) that the package is deleted, and the current table already feels overloaded - I don't personally think adding an additional column is a good idea. Meaning: until we decide to redesign the package browse page, this might be a decent workaround, if it doesn't make the dev's 🤯 😡 (╯°□°)╯︵ ┻━┻ Super simple mockup done via browser: |
Thanks for putting this together and sharing your thoughts @sallain and @fiver-watson. This raises again the discussion we had about Enduro being more an orchestrator than an ingest application as it's mixing again concepts from the storage domain. Based on the decisions we make we may go even more in that direction, which may be okay but it should be clear for everybody. Ingest vs orchestratorI consider what we call ingest more like an orchestrator already. For example, this is what the ingest/processing workflow does:
I think it's okay to orchestrate all that in an ingest workflow/application but ... Move preservation task/workflowsA better example is the move operation. Right now is not clear that we already have two services because we serve both API's together and have a single user interface. However, internally these services are called "Package service" (what we call ingest that I think we should consider an orchestrator) and "Storage service". In the API spec they are separated by the If you consider the "package" service an ingest application, I think the current statuses are okay. Processing/Ingest is done the package should be stored because it didn't fail. However, when we added the move operation, instead of doing it only in the storage domain, we wanted to keep track of the location where the AIP is stored and a preservation action for the move in the ingest domain. I think that's where the separation between services starts to blur and the "package" service becomes more an orchestrator than an ingest application. If you separate both services, you could show where the API was sent in the ingest application and "relate" the package in both services by the UUID (what we already do). But everything after that should only happen in the storage domain, and because we share the UI and API it should be easy to redirect to a However, we wanted to keep track of the move operation and the location UUID in the "package" service, so we added the operation to the
Ingest separationIf we want to separate the concerns a bit better we could still use the statuses we have on the ingest domain. Adding new statuses only in the storage domain. Moving the move and the delete operations there, not worrying about them on ingest. We could add a package list and a package view pages in that domain, having two different package pages, one for the ingest domain and another for the storage domain, keeping the information for each service there, having different search filters and so on. It will help clearing old data from the ingest domain when needed too. Orchestrator approachKeeping the current functionality as it is will require adding new statuses as suggested above. It makes the package service kind of the source of truth where you know all operations performed for a package over the time. Like the move operation, deleting should also happen on the package service to keep track of that operation, adding an extra layer on top of the actual operation in the storage service. UI mixSimilar to what we have, we could have a better separation in the backend but keep the UI with a single page for the package, where information from the package and storage services is mixed to provide different actions or calculate an overall status. But this complicates searching and will make it harder to split the applications later if needed. ConclusionIn my opinion, we should have a better separation of concerns in this services already. As I commented above, the move and delete operations should only be part of the storage service, instead of mixing the information from both services in the same page we could have two different package list and view pages based on the domain, allowing us to show the information, actions, filters, etc. related to that domain only. Making it easier to clear old ingest packages while keeping the storage information, and also making it easier to separate in the future. Please, let me know your thoughts!
|
I think you have outlined the issue wonderfully, @jraddaoui - thanks. And I agree that we need to make some decisions around this, as it will become increasingly challenging to manage if we are not clear on how we plan to extend the functionality of the application into other bounded contexts. Please forgive me in advance if this is a massive misunderstanding of the underlying technical issues... I wonder if there are not some other options that compromise between these? For example... Common source of truth, choreographed services
Something like this-ish? It is possible that sharing a common db might make data modeling difficult and inefficient. Perhaps another option would be: Choreographed services with a new front-end service
Or, finally, the third consideration I have is that if we want an orchestrator approach, that's fine - but perhaps it should be a separate new thing from the ingest service? I guess I am trying to avoid the old pre-processing approach that happened, where it felt like one application that should have been a bounded context ended up holding all the domain logic and state of truth for the others. New orchestrator and API gateway?Perhaps it is time to consider some variation of what we originally tried, i.e.
I suspect there are likely many things I have suggested that aren't practical, don't make sense, or even just repeat what you are already suggesting and I just didn't understand - apologies in advance. Just wanted to make sure we are considering all options! |
Thanks for looking into this @fiver-watson, and for going even further. I think there are some good points there but most of those ideas change the principle we discussed about using public APIs to communicate between bounded contexts. I think sharing the persistence layer is not a good idea, I may not be understanding your suggestion, would each service need to know about the schema from the others to access their data? How takes care of versioning/migrations? Having an event store was something we also discussed, it could be a good option to communicate between services. For example, if a package is deleted in storage it can submit an event and the systems that care about that can listen and act. Instead of doing that, we decided to trigger the operation in ingest and use workflows to keep track of the operation and public APIs to communicate between services. My suggestion now is that we don't need to know the move/deletion operation done in storage from the ingest application. But it could be a good solution for other communications in the future. An API gateway is something that I always have a hard time seeing. In the end is similar to the UI mix, but doing the mix in an intermediary service instead of in the UI. I think it depends on how much we want a single UI for all services or individual ones. My idea, if we separate ingest application is to use the same UI, but still have the separation of concerns we kind of have right now, I'll follow up about this in another comment. |
Having multiple package list and view pages based on the domain (orchestration vs ingest/"service") makes sense to me as well (a unified interface would be nice, but maybe harder to keep coherent long-term). The orchestration package view page could potentially show view data relating to ingest/"service" when appropriate. Maybe the Creating a generalized framework for "services" - making them akin to "plugins" (and possibly something that third parties could contribute in the future) - definitely seems appealing, but could be more time consuming to get right versus conventional application subsections. |
I personally think that we should definitely try to aim for a single common UI as much as possible, regardless of how we choose to separate contexts / tech stacks / responsibilities in the back. From a marketing and usability perspective, it will be a lot harder to convince potential clients / users that they need 4 or more different applications in the future (e.g. Enduro Ingest, Preserve, Store, some kind of metadata manager, some kind of public access system, some kind of reporting tool, etc...) if we can't abstract that away from their actual daily experience. If each application has its own UI, there is also more likely to be drift in terms of user experience across the UIs, adding cognitive load to the end user (e.g. Right, the edit button is up here in this view, but if I go to this view, it's down here now, and it's buried in the context menu on this app, etc...). If I use Spotify, I don't know I am installing 10 different apps with different bounded contexts and managed by different teams - I get the experience of one seamless app. Given the common "Enduro" branding, I personally think that is a goal we should keep in mind. That said, perhaps PLT might have thoughts on this as a high-level long term goal. And yeah, I snuck an event store into my diagram, because I do really think there is a lot of good overlap for us to explore, in terms of the archival domain's focus on chain of custody, authenticity, and capturing EVENTS for every step along the way, and the way that event-based systems work. It might be too soon for us to go deep on that, but when we start thinking about versioning, reingest, etc. I think that they could be invaluable. In terms of recovery too, being able to replay events to return to a previous state could possibly help us in many ways. That said.... if we are going this far in thinking more broadly about the long term aims of the project and its architecture, I would also love us to talk with @jhsimpson more about some of the ideas he has been exploring, like some of the patterns in this miro board, etc...
I trust your thoughts on this - I was mainly trying to avoid the search issues that a common UI might introduce. That said, I think that a shared event store with local service persisting their own data could still work fine - in the UI design, we put search boxes on specific pages (much like AtoM has a dedicated actors search on the actor browse page, etc) rather than a global one. In cases where one service needs data from the others (e.g. being able to show the package location when in the package browse or view pages), then listening to the events and just persisting what is needed in the local context could help with that. There is some minor data duplication, but in cases of conflicts it would be pretty easy to determine which service is the source of truth for which datum. Mostly I just wanted to make sure that we weren't just considering only 2 options - I think there are more variations possible, and I hope that smarter folks than me can help to consider them! |
Also, just to bring all of this back to the original issue.... Regardless of how we choose to separate the concerns, adding a Package Status shouldn't necessarily touch on this boundary. Sure, one of the proposed statuses is "STORED", but that is still a status of the package - we don't need to know with that status where it is stored, just that it successfully was stored. The original issue here was separating the status of the workflow (where there will eventually be many different workflows- e.g. delete AIP, move AIP, reingest package, generate DIP, etc...) from the status of the package. Just so we don't lose sight of the immediate work under discussion here! |
I think one possible future architecture we should consider is eliminating the Enduro Storage Service altogether in favour of using the Archivematica Storage Service. This is a radical change to Enduro, but I think there is a way to get there from here using a facade pattern approach. I also think this has the possibility of allowing Enduro to use the AMSS or Enduro SS. Here's my very simplified (and probably slightly incorrect) attempt at an architecture diagram of Enduro + AM right now: I hope @jraddaoui will correct any errors I've made when he's back. And here's a proposal for an Enduro + AM software architecture with no Enduro SS: In the proposed architecture I think you could potentially use an adapter to replace the AMSS with an Enduro SS, or another storage service. My thinking is to treat the Storage Service as a fully separate app and service, which make it much easier to replace independent of the rest of Enduro. I think working against the AMSS has the advantage of the AMSS already being a completely separate app, and supporting multiple backend storage systems. Of course using the AMSS also has all the disadvantages of the AMSS: a unusual and often unpredictable API, possible performance limitations, and the technical debt of the AMSS. I think that getting to the proposed architecture would first involve the same steps that @jraddaoui suggested for Ingest separation:
Then we would have to disentangle the Enduro SS and AMSS by removing the way we proxy AIP downloads through the Enduro SS. After that we could develop an adapter or "shim" to convert Enduro SS API calls to AMSS API calls. We would also probably need to do some development in the AMSS API to make it more predictable and to add any missing functionality. Here's a link the original Miro board where I drew the Architecture Diagrams: [Edited: the diagrams several times to correct errors] |
Is your feature request related to a problem? Please describe.
Enduro provides a status for workflows but not for packages. This hasn't really been a problem, but as we complete the analysis for an AIP deletion workflow (#1076) it seems desirable to include a
DELETED
status for an AIP. As it is, if a package is deleted then the package's workflow status would be updated toDONE
, but a user would have to click into the package detail page and review the deletion workflow to see that it has in fact been deleted.In order to be a one-stop source of information about an institution's preserved holdings, Enduro should maintain a record of all SIPs that have been processed, including those that are deleted. By glancing at the packages table, a user should be able to see if a package is stored, in progress/processing, or deleted.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
Add a package attribute for status, with the following:
PROCESSING
- the package is being used in an active workflowSTORED
- the package has been successfully processed and sent to the final storage locationDELETED
- the package has been removed from the systemFAILED
- the workflow errored out and the package could not be stored/deleted <-- this status needs opinionsDescribe alternatives you've considered
We considered whether or not we could repurpose the locations column to indicate that the package has been deleted, since the deleted package's location would now be blank. However, this seems a little messy.
The text was updated successfully, but these errors were encountered: