-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PREMIS events #24
Conversation
a96694c
to
3ccb2d0
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #24 +/- ##
==========================================
+ Coverage 62.79% 66.77% +3.97%
==========================================
Files 14 19 +5
Lines 586 936 +350
==========================================
+ Hits 368 625 +257
- Misses 198 261 +63
- Partials 20 50 +30 ☔ View full report in Codecov by Sentry. |
ade9c0c
to
6374c23
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @mcantelon!
I like the way you structured the premis
package and how you are building the XML in general. I added some comments below and I think you are/will be working with David, so I hope he can provide the missing pieces from this feedback.
In general, and this is a note to myself too (David and other devs here do a great work on it), we should try to:
- Give more visibility to the work being done. I'd appreciate meaningful commit messages, even if it's a WIP PR.
- Focus more on test driven development (TDD). With this stack is not easy to test your work running the workflow each time, TDD can help you with that for the majority of the development cycle, instead of doing it at the end (I didn't mention it yet, but there is a big lack of tests in this PR).
hack/sampledata/xsd/empty_premis.xml
Outdated
<?xml version="1.0" encoding="UTF-8"?> | ||
<premis:premis xmlns:premis="http://www.loc.gov/premis/v3" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/premis/v3 https://www.loc.gov/standards/premis/premis.xsd" version="3.0"> | ||
</premis:premis> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a local file could be a deployment issue. Since we are working with XML we could create this structure pretty easily, we could also embed the file into a variable like we do in the version package: https://github.com/artefactual-sdps/preprocessing-sfa/blob/main/internal/version/version.go#L11-L12.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
ctx context.Context, | ||
params *AddPREMISAgentParams, | ||
) (*AddPREMISAgentResult, error) { | ||
err := premis.AppendPREMISAgentXML(filepath.Join(params.Path, "/metadata/premis.xml"), params.Agent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thing, feel free to ignore. The location of the PREMIS XML could change and this activity won't know about it, I'd join and pass the path to the XML file directly from workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
const AddPREMISAgentName = "add-premis-agent" | ||
|
||
type AddPREMISAgentActivity struct{} | ||
|
||
func NewAddPREMISAgent() *AddPREMISAgentActivity { | ||
return &AddPREMISAgentActivity{} | ||
} | ||
|
||
type AddPREMISAgentParams struct { | ||
Path string | ||
Agent premis.PREMISAgent | ||
} | ||
|
||
type AddPREMISAgentResult struct{} | ||
|
||
func (md *AddPREMISAgentActivity) Execute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just cosmetic, I like having the types together (with the activity the latest) and then the functions that relate to the activity:
const AddPREMISAgentName = "add-premis-agent"
type AddPREMISAgentParams struct {
Path string
Agent premis.PREMISAgent
}
type AddPREMISAgentResult struct{}
type AddPREMISAgentActivity struct{}
func NewAddPREMISAgent() *AddPREMISAgentActivity {
return &AddPREMISAgentActivity{}
}
func (md *AddPREMISAgentActivity) Execute(...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed!
outcome := "valid" | ||
|
||
if params.Error != nil { | ||
detail = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
event := premis.PREMISEventSummary{ | ||
Type: params.Type, | ||
Detail: detail, | ||
Outcome: outcome} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bracket should go in a new line.
childEl := element.FindElement(fmt.Sprintf(".//%s[text()='%s']", path, value)) | ||
|
||
if childEl == nil { | ||
foundDifference = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could break
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
internal/premis/premis.go
Outdated
if d.IsDir() { | ||
return nil | ||
} | ||
subpaths = append(subpaths, string([]rune(p)[subpathStart:])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this solution a little hard to understand. Could we use https://pkg.go.dev/path/filepath#Rel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
internal/workflow/preprocessing.go
Outdated
activities.AddPREMISObjectsName, | ||
&activities.AddPREMISObjectsParams{ | ||
Path: localPath, | ||
ContentPath: identifySIP.SIP.ContentPath}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bracket. I'll stop flagging them, I hope they get fixed by linting, otherwise you get the point ;)
if e != nil { | ||
return nil, e | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need considerable refactoring after you rebase the latest work from David. I think you where mobbing with him today, so I hope he provides the necessary feedback to you. Otherwise, let me know if you need to know a little more about what and why it changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
internal/workflow/preprocessing.go
Outdated
// Add PREMIS event noting validate file formats result. | ||
e = temporalsdk_workflow.ExecuteActivity( | ||
withLocalActOpts(ctx), | ||
activities.AddPREMISEventName, | ||
&activities.AddPREMISEventParams{ | ||
Path: localPath, | ||
Agent: premis.PREMISAgentDefault(), | ||
Type: "validateFileFormats", | ||
Error: validateFileFormatsErr}, | ||
).Get(ctx, &addPREMISEvent) | ||
if e != nil { | ||
return nil, e | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may want to do some of these updates to the PREMIS files from inside the existing activities. That will give you access to the actual file formats and how the validation went for each file while it's happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reworked the validate file formats activity to add individual per-file events.
6374c23
to
cdc1f18
Compare
Thanks @jraddaoui ! I've addressed most of the feedback, but will fix the rest, add more tests, and add PREMIS event appending from within activities. |
internal/premis/premis.go
Outdated
return doc, PREMISEl, nil | ||
} | ||
|
||
func AppendPREMISObjectXML(PREMISfilepath string, object PREMISObject) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mcantelon I realized after pair programming yesterday that the standard Go pattern for file I/O would be to pass an io.Writer
param to a funciton like this, and write to that stream instead of writing directly to a file. Using io.Writer
is more flexible then direct file I/O and makes it easier to test the output without dealing with opening and closing actual files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented this! Let me know if it needs tweaking.
} | ||
|
||
// Add PREMIS event noting validate structure result. | ||
var addPREMISEvent activities.AddPREMISEventResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the things I learned from @sevein's work on adding telemetry to Enduro is that there is a significant run time cost to scheduling and then running Temporal activities. Because of this cost I think it would be better to combine all of the "addPREMIS" activities into a single activity, unless there's some reason to keep them separate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I was having them be separate so the PREMIS file state would reflect where the workflow failed, if an error interrupted its execution, but that's probably not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mcantelon. Thanks for implementing io.Writer
where you have, but I think this will be improved even more if you can take the principle further. I think the idiomatic Go design would be to handle all the file I/O in the PREMIS activities, and exclusively use io.Reader
and io.Writer
in the premis
package. I think in the context of an XML document you can pass around the *etree.Document
representing the PREMIS XML tree instead of a reader or a writer, so you don't have to keep parsing and serializing the document struct.
I've made some inline comments to try and provide some concrete examples of how I think the implementation could be improved. The comments are not meant to be comprehensive, just to provide a template that you can repeat for the other premis
package functions.
@mcantelon I just realized I may not have been clear that passing around |
Thanks @djjuhasz... should be good for another review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mcantelon I'll finish my code review tomorrow, but here's a few issues to work on in the meantime. Overall I think it's looking pretty good and the test coverage is great. 💪
PREMISFilePath: PREMISFilePathNormal, | ||
Agent: premis.PREMISAgentDefault(), | ||
}, | ||
result: activities.AddPREMISAgentResult{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because result
is always going to be an empty struct, I think you should test the contents of the written premis.xml
file. The current tests won't catch a bug in the activity that results in an empty or incorrect premis.xml file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
|
||
future.Get(&res) | ||
assert.NilError(t, err) | ||
assert.DeepEqual(t, res, tt.result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think you should test that the file contents meet expectations here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
|
||
future.Get(&res) | ||
assert.NilError(t, err) | ||
assert.DeepEqual(t, res, tt.result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test the file contents please. :)
@@ -24,6 +25,8 @@ func TestValidateFileFormats(t *testing.T) { | |||
fs.WithFile("file2.png", pngContent), | |||
).Path() | |||
|
|||
PREMISFilePath := fs.NewFile(t, "premis.xml", fs.WithContent(premis.EmptyPremis)).Path() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would good to test the premis.xml file contents in these tests too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mcantelon I've added some more comments, but it's getting confusing doing code review while you are making changes so I'm going to pause for now.
Your changes look good so far! :)
internal/premis/premis.go
Outdated
|
||
err := doc.ReadFromFile(filePath) | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to add a bit of context to the error message here (and in general) to help a developer encountering the error to find where it originated.
e.g.
return nil, fmt.Errorf("parse XML: %v")
internal/premis/premis.go
Outdated
return doc, nil | ||
} | ||
|
||
func GetRoot(doc *etree.Document) (*etree.Element, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
3539fd5
to
84e8e73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mcantelon and @djjuhasz!
I have not looked at the latest feedback and commits, just rebased, squashed and tested. I can see the premis.xml file in the final AIPs, it fails to be imported by Archivematica but it doesn't break the workflow. I'll merge what we have and we can follow up with the remaining feedback in another PR.
We need to release what we have, we can follow up in another PR.
No description provided.