Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve procedures for authority management? #6

Open
holfordm opened this issue Dec 7, 2017 · 3 comments
Open

improve procedures for authority management? #6

holfordm opened this issue Dec 7, 2017 · 3 comments

Comments

@holfordm
Copy link
Collaborator

holfordm commented Dec 7, 2017

Currently authority control is done through three large files for persons, places, and works respectively. This has worked well for the medieval project which has had a centralized cataloguing structure. As we move to a more decentralized structure, first with Fihrist, later with expanding medieval to the Oxford colleges and possibly Cambridge, the use of single large files is likely to be problematic. It is likely that multiple editors will make changes to the files simultaneously, resulting in complex conflicts and general frustration.

If we agree that this is an issue, I can think of two potential solutions.

  1. (the easiest). split the files into individual files, one for each entity, retaining the current identifiers. This would greatly reduce the likelihood of conflicts and would make any that did arise much easier to resolve. Changes to the existing indexing processes should not be that great?
  2. move to a dedicated authority management system. I have used EATS https://github.com/ajenhl/eats in the past which worked well but might need a lot of customisation to be suitable for our projects. It might also be possible to set something up using eXist?
@andrew-morrison
Copy link
Contributor

As discussed this morning, I've been experimenting with using some XML technologies to get some of the benefits of a dedicated authority management system, without the delay that selecting and setting one up would cause:

What I've added is:

  • A new 'authority' folder with subfolders for managing multiple contributors.
  • A 'persons_master.xml' authority file that uses XInclude to pull in individual authority files.
  • A new Schematron rule so that ID clashes will be validation errors even when editing individual files.
  • An XSLT stylesheet for previewing persons_master.xml, to provide a user-friendly search interface for cataloguers to find the IDs of people mentioned in their manuscript descriptions, and authority file editors (if they're not the same person) to check before adding a new person.

I've copied your persons.xml authority file into persons1.xml in a subfolder called 'bodleian' and created a new persons2.xml with a single entry - a deliberate duplicate for demonstration purposes - so you can open either file in Oxygen, validate, and it will take you to the duplicate. These files can be renamed to whatever you want, just update the href attributes in persons_master.xml file.

Limitations/issues:

  • Having to check across multiple files inevitably slows validation a bit.
  • Different XML parsers support different XInclude/XPointer syntax. The one used (in persons_master.xml) is what works with Saxon and therefore the Oxygen XML editor. But it means you can only import one listPerson element per file, and it is referenced by it's order in the file, so if it is moved the link will break.
  • The in-browser preview uses external JavaScript libraries to create the nice user interface. It may be better to take copies of those (they're MIT license open source) so they don't have to be downloaded each time.
  • On older/slower machines, or if the number of entries gets very big, it might become unusably slow.
  • Google Chrome and Microsoft Edge don't allow in-browser XSL transformation of XML files on the local hard drive. It does work in Firefox, Safari and Internet Explorer 11. Or people could transform into a HTML file in Oxygen and open that.

@holfordm: If you have time to try this out, let me know how it well it works for you. I can change what it displayed in the preview easily (e.g. add another column for birth year?)

If you want to start using this, let me know because there are a few extra steps. It only works with people authority files at the moment, but it is trivial to set up analogous code for places, organisations and works. And the indexing scripts would need to be pointed to the new 'master' authority files.

andrew-morrison added a commit to fihristorg/fihrist-mss that referenced this issue Jan 9, 2018
@andrew-morrison
Copy link
Contributor

@eifionjones: I probably should have tagged you in on this issue earlier, but it has taken me a while to get my head around the issues.

As Matthew says above, authority files for things like works and people are going to be a nightmare if lots of people are going to need to update them, potentially all at the same.

So I have developed an experimental system for building one authority list out of multiple individual files, helping people to avoid clashing IDs, and provide a user interface for viewing existing entries. I've now set this up in the fihrist-mss repository. It's just for demonstration purposes at the moment, and only works for person authority lists, but if you have time, it would be good to get your feedback.

If you update your local copy, then open authority/persons_master.xml in a web browser (Firefox, Safari or Internet Explorer) you can see 18 people I've copied out of Medieval's authority list. But none of them are actually in that file. Instead they are imported from three separate persons.xml files, each in a separate subfolder. Open either the Oxford or Cambridge file in Oxygen, validate it, and it will find the deliberate duplicate I have added. Switch back to the web browser and paste the ID into the search box, and it'll show you both.

Is this likely to be useful for Fihrist?

@eifionjones
Copy link

This does look good! I can see this being useful, we'll just have to get our heads around how people will manage authorities in Fihrist. I had somehow envisaged the authority files being generated from the data set (with maybe a Fihrist lookup) with no need for manual editing/intervention. And a separate index of names and identifiers only for people not in Fihrist. Anyway, let me revisit this and get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants