Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PREMIS events #24

Merged
merged 1 commit into from
Jun 28, 2024
Merged

Add PREMIS events #24

merged 1 commit into from
Jun 28, 2024

Conversation

mcantelon
Copy link
Contributor

No description provided.

@mcantelon mcantelon force-pushed the dev/premis-events branch from a96694c to 3ccb2d0 Compare June 2, 2024 18:58
Copy link

codecov bot commented Jun 2, 2024

Codecov Report

Attention: Patch coverage is 74.30939% with 93 lines in your changes missing coverage. Please review.

Project coverage is 66.77%. Comparing base (f976317) to head (84e8e73).

Files Patch % Lines
internal/premis/premis.go 77.29% 34 Missing and 13 partials ⚠️
cmd/worker/workercmd/cmd.go 0.00% 12 Missing ⚠️
internal/workflow/preprocessing.go 84.37% 5 Missing and 5 partials ⚠️
internal/activities/add_premis_objects.go 66.66% 4 Missing and 4 partials ⚠️
internal/activities/add_premis_event.go 62.50% 3 Missing and 3 partials ⚠️
internal/activities/validate_file_formats.go 68.42% 3 Missing and 3 partials ⚠️
internal/activities/add_premis_agent.go 69.23% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #24      +/-   ##
==========================================
+ Coverage   62.79%   66.77%   +3.97%     
==========================================
  Files          14       19       +5     
  Lines         586      936     +350     
==========================================
+ Hits          368      625     +257     
- Misses        198      261      +63     
- Partials       20       50      +30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mcantelon mcantelon marked this pull request as draft June 2, 2024 19:07
@mcantelon mcantelon force-pushed the dev/premis-events branch 10 times, most recently from ade9c0c to 6374c23 Compare June 13, 2024 05:11
Copy link
Contributor

@jraddaoui jraddaoui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @mcantelon!

I like the way you structured the premis package and how you are building the XML in general. I added some comments below and I think you are/will be working with David, so I hope he can provide the missing pieces from this feedback.

In general, and this is a note to myself too (David and other devs here do a great work on it), we should try to:

  • Give more visibility to the work being done. I'd appreciate meaningful commit messages, even if it's a WIP PR.
  • Focus more on test driven development (TDD). With this stack is not easy to test your work running the workflow each time, TDD can help you with that for the majority of the development cycle, instead of doing it at the end (I didn't mention it yet, but there is a big lack of tests in this PR).

Comment on lines 1 to 3
<?xml version="1.0" encoding="UTF-8"?>
<premis:premis xmlns:premis="http://www.loc.gov/premis/v3" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/premis/v3 https://www.loc.gov/standards/premis/premis.xsd" version="3.0">
</premis:premis>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a local file could be a deployment issue. Since we are working with XML we could create this structure pretty easily, we could also embed the file into a variable like we do in the version package: https://github.com/artefactual-sdps/preprocessing-sfa/blob/main/internal/version/version.go#L11-L12.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

ctx context.Context,
params *AddPREMISAgentParams,
) (*AddPREMISAgentResult, error) {
err := premis.AppendPREMISAgentXML(filepath.Join(params.Path, "/metadata/premis.xml"), params.Agent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thing, feel free to ignore. The location of the PREMIS XML could change and this activity won't know about it, I'd join and pass the path to the XML file directly from workflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Comment on lines 10 to 24
const AddPREMISAgentName = "add-premis-agent"

type AddPREMISAgentActivity struct{}

func NewAddPREMISAgent() *AddPREMISAgentActivity {
return &AddPREMISAgentActivity{}
}

type AddPREMISAgentParams struct {
Path string
Agent premis.PREMISAgent
}

type AddPREMISAgentResult struct{}

func (md *AddPREMISAgentActivity) Execute(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just cosmetic, I like having the types together (with the activity the latest) and then the functions that relate to the activity:

const AddPREMISAgentName = "add-premis-agent"

type AddPREMISAgentParams struct {
	Path  string
	Agent premis.PREMISAgent
}

type AddPREMISAgentResult struct{}

type AddPREMISAgentActivity struct{}

func NewAddPREMISAgent() *AddPREMISAgentActivity {
	return &AddPREMISAgentActivity{}
}

func (md *AddPREMISAgentActivity) Execute(...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed!

outcome := "valid"

if params.Error != nil {
detail = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

event := premis.PREMISEventSummary{
Type: params.Type,
Detail: detail,
Outcome: outcome}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bracket should go in a new line.

childEl := element.FindElement(fmt.Sprintf(".//%s[text()='%s']", path, value))

if childEl == nil {
foundDifference = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could break here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

if d.IsDir() {
return nil
}
subpaths = append(subpaths, string([]rune(p)[subpathStart:]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this solution a little hard to understand. Could we use https://pkg.go.dev/path/filepath#Rel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

activities.AddPREMISObjectsName,
&activities.AddPREMISObjectsParams{
Path: localPath,
ContentPath: identifySIP.SIP.ContentPath},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bracket. I'll stop flagging them, I hope they get fixed by linting, otherwise you get the point ;)

Comment on lines +82 to +147
if e != nil {
return nil, e
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need considerable refactoring after you rebase the latest work from David. I think you where mobbing with him today, so I hope he provides the necessary feedback to you. Otherwise, let me know if you need to know a little more about what and why it changed.

Copy link
Contributor Author

@mcantelon mcantelon Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

Comment on lines 109 to 215
// Add PREMIS event noting validate file formats result.
e = temporalsdk_workflow.ExecuteActivity(
withLocalActOpts(ctx),
activities.AddPREMISEventName,
&activities.AddPREMISEventParams{
Path: localPath,
Agent: premis.PREMISAgentDefault(),
Type: "validateFileFormats",
Error: validateFileFormatsErr},
).Get(ctx, &addPREMISEvent)
if e != nil {
return nil, e
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to do some of these updates to the PREMIS files from inside the existing activities. That will give you access to the actual file formats and how the validation went for each file while it's happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworked the validate file formats activity to add individual per-file events.

@mcantelon mcantelon force-pushed the dev/premis-events branch from 6374c23 to cdc1f18 Compare June 13, 2024 19:51
@mcantelon
Copy link
Contributor Author

mcantelon commented Jun 14, 2024

Thanks @jraddaoui ! I've addressed most of the feedback, but will fix the rest, add more tests, and add PREMIS event appending from within activities.

return doc, PREMISEl, nil
}

func AppendPREMISObjectXML(PREMISfilepath string, object PREMISObject) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcantelon I realized after pair programming yesterday that the standard Go pattern for file I/O would be to pass an io.Writer param to a funciton like this, and write to that stream instead of writing directly to a file. Using io.Writer is more flexible then direct file I/O and makes it easier to test the output without dealing with opening and closing actual files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented this! Let me know if it needs tweaking.

}

// Add PREMIS event noting validate structure result.
var addPREMISEvent activities.AddPREMISEventResult
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things I learned from @sevein's work on adding telemetry to Enduro is that there is a significant run time cost to scheduling and then running Temporal activities. Because of this cost I think it would be better to combine all of the "addPREMIS" activities into a single activity, unless there's some reason to keep them separate.

Copy link
Contributor Author

@mcantelon mcantelon Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I was having them be separate so the PREMIS file state would reflect where the workflow failed, if an error interrupted its execution, but that's probably not necessary.

@mcantelon mcantelon changed the title WIP: PREMIS events Add PREMIS events Jun 21, 2024
@mcantelon mcantelon marked this pull request as ready for review June 21, 2024 07:46
Copy link
Contributor

@djjuhasz djjuhasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mcantelon. Thanks for implementing io.Writer where you have, but I think this will be improved even more if you can take the principle further. I think the idiomatic Go design would be to handle all the file I/O in the PREMIS activities, and exclusively use io.Reader and io.Writer in the premis package. I think in the context of an XML document you can pass around the *etree.Document representing the PREMIS XML tree instead of a reader or a writer, so you don't have to keep parsing and serializing the document struct.

I've made some inline comments to try and provide some concrete examples of how I think the implementation could be improved. The comments are not meant to be comprehensive, just to provide a template that you can repeat for the other premis package functions.

internal/premis/premis.go Outdated Show resolved Hide resolved
internal/premis/premis.go Outdated Show resolved Hide resolved
internal/premis/premis.go Outdated Show resolved Hide resolved
internal/premis/premis.go Outdated Show resolved Hide resolved
internal/premis/premis.go Outdated Show resolved Hide resolved
internal/premis/premis.go Outdated Show resolved Hide resolved
@djjuhasz
Copy link
Contributor

@mcantelon I just realized I may not have been clear that passing around io.Reader, io.Writer, and *etree.Document should only be done inside of an activity. We can't return large data structures like *etree.Document from an activity, so each PREMIS activity should read the existing premis.xml from disk (if it exists) at the beginning of the activity, and write a premis.xml file to disk at the end of the activity.

@mcantelon
Copy link
Contributor Author

Thanks @djjuhasz... should be good for another review!

Copy link
Contributor

@djjuhasz djjuhasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcantelon I'll finish my code review tomorrow, but here's a few issues to work on in the meantime. Overall I think it's looking pretty good and the test coverage is great. 💪

PREMISFilePath: PREMISFilePathNormal,
Agent: premis.PREMISAgentDefault(),
},
result: activities.AddPREMISAgentResult{},
Copy link
Contributor

@djjuhasz djjuhasz Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because result is always going to be an empty struct, I think you should test the contents of the written premis.xml file. The current tests won't catch a bug in the activity that results in an empty or incorrect premis.xml file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

internal/activities/add_premis_event.go Outdated Show resolved Hide resolved
internal/premis/premis.go Outdated Show resolved Hide resolved
internal/activities/add_premis_agent.go Outdated Show resolved Hide resolved
internal/activities/add_premis_agent.go Show resolved Hide resolved

future.Get(&res)
assert.NilError(t, err)
assert.DeepEqual(t, res, tt.result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think you should test that the file contents meet expectations here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!


future.Get(&res)
assert.NilError(t, err)
assert.DeepEqual(t, res, tt.result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test the file contents please. :)

@@ -24,6 +25,8 @@ func TestValidateFileFormats(t *testing.T) {
fs.WithFile("file2.png", pngContent),
).Path()

PREMISFilePath := fs.NewFile(t, "premis.xml", fs.WithContent(premis.EmptyPremis)).Path()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would good to test the premis.xml file contents in these tests too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Copy link
Contributor

@djjuhasz djjuhasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcantelon I've added some more comments, but it's getting confusing doing code review while you are making changes so I'm going to pause for now.

Your changes look good so far! :)

internal/premis/premis.go Outdated Show resolved Hide resolved
internal/premis/premis.go Outdated Show resolved Hide resolved
internal/premis/premis.go Outdated Show resolved Hide resolved
internal/premis/premis.go Outdated Show resolved Hide resolved

err := doc.ReadFromFile(filePath)
if err != nil {
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to add a bit of context to the error message here (and in general) to help a developer encountering the error to find where it originated.

e.g.

return nil, fmt.Errorf("parse XML: %v")

internal/activities/add_premis_objects.go Outdated Show resolved Hide resolved
internal/activities/validate_file_formats.go Outdated Show resolved Hide resolved
return doc, nil
}

func GetRoot(doc *etree.Document) (*etree.Element, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be exported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

internal/premis/premis.go Outdated Show resolved Hide resolved
@jraddaoui jraddaoui force-pushed the dev/premis-events branch from 3539fd5 to 84e8e73 Compare June 28, 2024 11:44
Copy link
Contributor

@jraddaoui jraddaoui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mcantelon and @djjuhasz!

I have not looked at the latest feedback and commits, just rebased, squashed and tested. I can see the premis.xml file in the final AIPs, it fails to be imported by Archivematica but it doesn't break the workflow. I'll merge what we have and we can follow up with the remaining feedback in another PR.

@jraddaoui jraddaoui requested a review from djjuhasz June 28, 2024 11:58
@jraddaoui jraddaoui dismissed djjuhasz’s stale review June 28, 2024 11:59

We need to release what we have, we can follow up in another PR.

@jraddaoui jraddaoui merged commit 84e8e73 into main Jun 28, 2024
9 checks passed
@jraddaoui jraddaoui deleted the dev/premis-events branch June 28, 2024 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants