Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use case: Equivalent of "namespacing" Fedora to accommodate multisites #396

Open
rosiel opened this issue Oct 12, 2016 · 37 comments
Open

Use case: Equivalent of "namespacing" Fedora to accommodate multisites #396

rosiel opened this issue Oct 12, 2016 · 37 comments
Labels
Subject: Multi-tenancy related to having content from multiple Drupal sites in one system Type: use case proposes a new feature or function for the software using user-first language.

Comments

@rosiel
Copy link
Member

rosiel commented Oct 12, 2016

Was told to create an issue for this, apologies if duplicated.

Title (Goal) Ability to partition Islandoras on the same Fedora
Primary Actor Repository Admin, Sysadmin
Scope Islandora Site Architecture
Level Medium?
Story In order to have less work to do maintaining separate Fedora stacks, I as a sysadmin/repository admin would like the ability to use a Drupal multisite with separate Islandoras on the same Fedora. As a repo admin who provides sites for others, I have some client sites that need to be given permission to manage their own objects but not objects that belong to a different namespace/site. We sometimes need to present certain (select) objects as a whole 'site' of its own with different associated themes or framing Drupal content. (this looks toward a related issue/use case of 'can i set up an exhibit of select content' ) I am worried about this in the context of two-way sync, which would push all fedora content into all of my islandora sites as Drupal content.
@rosiel rosiel changed the title Equivalent of "namespacing" Fedora to accommodate multisites Use case: Equivalent of "namespacing" Fedora to accommodate multisites Oct 12, 2016
@DiegoPino
Copy link
Contributor

@rosiel cool! This really needs to be discussed. My first guess would be: we could decide what to sync, where to sync based on a given, arbitrary and configurable predicate. But this needs to be explored further, mostly because if sync is happening from fedora to Drupal 8 (resource was not originated by islandora for example) then that sync utility would need a reverse map for this, something like:
ns1:predicateA == value1 -> sync with URL1
And that map would not be stored in Drupal but in some config available to camel.

or we could define that URL as an rdf property?

Many many ways to define the same

Thanks a lot!!

@dannylamb
Copy link
Contributor

I think the easiest way to deal with this would be structure. From the repository root, you'd need to have seperate containers for each multisite. Then you could re-index per container. That pattern could be applied to multitenancy with appropriate authz.

@DiegoPino
Copy link
Contributor

@dannylamb you mean LDP based? i can see some scalability issues with that, in specific if we use the default PID minter which is handy to avoid a unbalanced tree. Also makes filtering in a triple store kinda complex (like show me all objects that are descendant of.. what if that descendant of is 5 steps with different predicates). Good talk for next CLAW call!

@dannylamb
Copy link
Contributor

Have I mentioned how much I hate that semantics and storage are jumbled up in Fedora?

@rosiel
Copy link
Member Author

rosiel commented Oct 14, 2016

"That pattern could be applied to multitenancy with appropriate authz." -> this sounds cool but is way over my head. What should I read to fill in my blanks?

@whikloj
Copy link
Member

whikloj commented Oct 17, 2016

@rosiel I think this sounds more complex than it is. In my mind @dannylamb is proposing a Fedora 4 repo structure of

Fedora 4 root
|- /site1
|   |- /objects in site 1 
|
|- /site2
     |- /objects in site 2

Then you can set authorization based on the root level elements, ie. Bob is admin of site1 and Jane is admin of site2. But neither can access the other's repository contents.

But @DiegoPino is right that this might have issues of unbalanced trees. Perhaps we should pull @ruebot in and have him do one of his performance and scaling massive ingests to see how it works if you create 3-4 root level objects and ingest a ratio of objects into each.

Like

Fedora 4 root
|- /site1 (ingest objects)
|
|- /site2 (ingest 1/2 as many as site1)
|
|- /site3 (ingest 1/4 as many as site1)
|
|- /site4 (ingest 1/8 as many as site1)

and see how ingest and response times go? This test would be directly on Fedora and so could avoid any issues of PHP/Drupal in it's timing.

@ajs6f
Copy link

ajs6f commented Oct 19, 2016

There is no (performance) problem at all with an unbalanced tree, at least from the Fedora side. The problem is having too many children of a single node/resource.

@ajs6f
Copy link

ajs6f commented Oct 19, 2016

That's what the pair-tree PID minter protects against.

@whikloj
Copy link
Member

whikloj commented Oct 20, 2016

@ajs6f when you say "too many children of a single node", do you mean just having a tonne of children under a single node, or do the children have to direct children of the single node?
ie.

<root node>
      |- <child 1>
             |- <sub 1>
             |     |- <sub sub 1>
             |- <sub 2>
             |     |- <sub sub 2>

versus

<root node>
      |- <child 1>
             |- <sub 1>
             |- <sub 2>
             |- <sub 3>
             |- <sub 4>

@DiegoPino
Copy link
Contributor

@whikloj i guess @ajs6f means direct children. Or any type of tree would end being a disaster.

@ajs6f
Copy link

ajs6f commented Oct 20, 2016

Yes, as @DiegoPino says. it's too many immediate/direct children that are a problem.

@ajs6f
Copy link

ajs6f commented Oct 20, 2016

In fact, if you can guarantee by other means (particularly by controlling your own id minting) that you won't stick too many children under a single parent, then you shouldn't use the hierarchy builder minter. You should just use PUT and stick things wherever it makes sense.

@whikloj
Copy link
Member

whikloj commented Oct 20, 2016

So I have two concerns here:

  1. How many children is too many?
  2. As Islandora CLAW is attempting a lower barrier to entry, it might be a lot of work to create an id minting strategy for all use cases.

@ajs6f
Copy link

ajs6f commented Oct 20, 2016

  1. I do not know. It's a good question. I'd take it to #fcrepo or the email list.
  2. Yes, but I think you probably can, actually, if you are controlling the Fedora IDs (as opposed to Drupal IDs). But I don't claim to fully understand the ID management in the current architecture. Maybe this is a good topic for a CLAW call? (I would be happy to join.)

@rosiel
Copy link
Member Author

rosiel commented Oct 20, 2016

It sounds like you are suggesting that the hierarchical structure that is built into Fedora 4 Objects would be ˆsometimesˆ meaningful and ˆsometimesˆ arbitrary. Does this sound like a solid plan? (I am not being sarcastic; I actually don't know). Would it be better to include an extra, hereditary predicate and let pid-minters populate the hierarchy for optimal storage/retrieval?

@ajs6f
Copy link

ajs6f commented Oct 20, 2016

No, what I am telling you is that the hierarchical structure that is built into Fedora 4 Objects is now sometimes meaningful and sometimes arbitrary if you use the hierarchical ID minter. I'm suggesting you decide whether you can avoid that. I don't know what the phrase "hereditary predicate" means.

@uconnjeustis
Copy link

On the Islandora Metadata Interest Group, a discussion was started on OAI-PMH support. In addition to some wanted features, the idea of namespaces came up. Our use case is different from that of @rosiel and wanted to add it here.

Use Type Description
Title (Goal) Ability to distinguish and/or assign content to multiple institutions
Primary Actor Sysadmin, Repository Admin, Repository curators
Scope Islandora Site Architecture
Level Medium?
Story Currently, the Connecticut Digital Archive works with over 40 institutions who add and manage content in the repository and in multiple sites. To distinguish one institutions' content from another, CTDA implements namespaces. Each institution has a namespace that is a range. For example, 20002-29999 is the namespace range for UConn Archives & Special Collections. The reason for this is that UConn ASC can have general content in the 20002 namespace, research data in 20003, and university records in 20004. Each institution has such a range where the first one or two numbers never change. We not only use namespaces to distinguish content from different institutions and within an institution different types of content but also namespaces are used on various sites. For example, we have a site for UConn ASC and CT State Library. For CTDA, we really need an easy way to ensure that institutions and users can quickly determine if the content is theirs. Namespaces allow us to do that especially as they appear in the PID, in the url, etc. Going forward we need a way to ensure these institutional distinctions remain in place and can be continued in such a way that non-technical volunteers are easily able to assign content to a particular institution.

@ruebot
Copy link
Member

ruebot commented Jan 4, 2017

@uconnjeustis can you create a separate issue for this if this is a separate use case? Also, I think it would be a really good idea to talk this out on a future CLAW call, so please do not hesitate in adding it to the agenda, and attending the meeting.

@ajs6f
Copy link

ajs6f commented Jan 4, 2017

Not a CLAW-specific issue, either. Might be worth bringing up on a Fedora call-- some documentation of best practices would be good.

@uconnjeustis
Copy link

uconnjeustis commented Jan 6, 2017

My use case as it's slightly different though related to this issue is now in a Islandora-CLAW/CLAW-478. Please direct responses there. Thanks

I just came back from vacation and think I missed the last CLAW meeting. I'll check the schedule and try to hope on the next one.

@ajs6f
Copy link

ajs6f commented Jan 6, 2017

@uconnjeustis ++

@mjordan
Copy link
Contributor

mjordan commented Aug 27, 2018

Should the current migration sprint account for how to make Fedora 3.x PID namespaces migrate over losslessly? Just askin'. Related issue: #822.

@ajs6f
Copy link

ajs6f commented Aug 27, 2018

I would think mapping PID namespaces to LDP containers would be best.

@dannylamb
Copy link
Contributor

I think organizing objects by stuffing them in a container per namespace would separate them out nicely if you really want to solidify the distinction. FWIW, so long a we stuff the PID on a field somewhere, we can then query on it to do things like "Get me all objects who were in namespace X".

@mjordan
Copy link
Contributor

mjordan commented Aug 27, 2018

Do containers suffer from the many-direct-children scalability issue discussed above?

@ajs6f
Copy link

ajs6f commented Aug 27, 2018

Fedora suffers from that problem. There's nothing inherent in LDP that causes that problem, but to the extent that you're committed to Fedora, you would have to deal with it.

@DiegoPino
Copy link
Contributor

Worth mentioning here: It's unhealthy to think in a D8 context/CLAW about multi sites the way they were applied in Islandora 7.x. Multi sites, by definition, imply different DB tables (not speaking about domain access module), means one site can not access other site's entities, which makes splitting/or better said, reusing nodes/entities from one site to another, extremely complex, not recommend, or even impossible without hacking (now speaking about the (domain access module)[https://www.drupal.org/project/domain].
For Islandora 7.x that was not an issue since no DO were ever stored in Drupal, all live read from o'l fedora 3.
Opposite case here. Pipe goes in a single direction. So really "namespacing" at least for that purpose makes less sense. I would say, if UI side "separation" is needed or desired, then probably simple taxonomy work like a generic tag system (this object belongs to this group) plus awareness of that in each view that lists/displays/context module stuff should suffice. The moment you expose/pop-up storage/backend implications like LDP containment and fedora paths and minting, and depend on them on a system that never ever accesses directly or gives control over that like CLAW, you are opening a pandora box or signing a contract you won't be able to keep in the long term.

FYI: There has been discussions about the whole multi site approach a lot here https://www.drupal.org/project/drupal/issues/2306013

@jpeak5
Copy link

jpeak5 commented Aug 27, 2018

FWIW, we have been using namespace prefixes in D7 to accomplish multisite without actually using multisite. We serve a consortium of ~20 members, each with its own namespace prefix; using this scheme lets us support the idea of 'sub-institutions' (to arbitrary depth, in theory).

I'm glad to share more, and at the very least, we have plenty of data like this that we could use to test a migration along the lines proposed above

We don't use a RELS-EXT to define the relationship, so every collection is really just a child of root.


root
- lsu-abc:collection
- lsu-sc-abc:collection
- latech-cmprt:collection

Effectively, however, this flat example represents two top-level institutions, lsu and latech, and one subinstitution of lsu, lsu-sc:

root
  - lsu-*
      - lsu-sc-*
  - latech-*

@mjordan
Copy link
Contributor

mjordan commented Aug 29, 2018

If we're storing the 7.x PID as per #822, and we're creating taxonomies as described in #888, maybe we should provide an option to create and populate a taxonomy of PID namespaces and assign the relevant value to each new CLAW node on the migration fly. That way, we get the ability immediately after migration to do some of the things in CLAW we were doing in the source 7.x with PID namespaces.

I'm not suggesting we do this during the migration sprint, but maybe after. Might be a good first issue for someone (like me but it doesn't necessarily have to be me) to take on.

@dannylamb
Copy link
Contributor

Linking to #926

@mjordan
Copy link
Contributor

mjordan commented Dec 22, 2018

Now that migrate_7x_claw migrates the 7.x object's PID to the corresponding D8 node's field_pid, we can get the 7.x object's namespace from that and do stuff with it. This could be handled with a Context Condition that parses out the namespace from the string stored in field_pid.

Related issue: #822.

@mjordan
Copy link
Contributor

mjordan commented Dec 23, 2018

Following from my previous comment, I've written a Context condition plugin will be useful for objects migrated from 7.x. It tests the namespace part of a PID in a D8 islandora_object node's field_pid field, which we now get in migrations using https://github.com/Islandora-Devops/migrate_7x_claw.

Here's the configuration form of a context that uses it, with a reaction (which is part of the core Context module) being to use the Bartik theme:

context

Here's a screenshot of a node that has one of the registered namespaces:

node

And a screenshot of a node that does not have one of the registered namespaces (i.e., reaction isn't executed):

node2

Currently, we don't have an context reactions that would be useful in a "multisite" setup (just to bring this back to @rosiel's original use case), but it would be possible to write some reactions that replicated 7.x multisite behavior.

If people think this Context condition will be useful, I can open a PR against https://github.com/Islandora-CLAW/islandora to add it.

@whikloj
Copy link
Member

whikloj commented May 14, 2019

Just throwing these here in case they are of use later.

@rosiel
Copy link
Member Author

rosiel commented Feb 13, 2020

Seeing @bondjimbond's awesome work on multitenancy, I would be happy to close this ticket as the multitenancy use case is more thoroughly expanded in #1300, and that sounds like a more advisable set up for multitenant systems. Namespacing was never really the issue; it was more about dividing up content.

To summarize the output from this thread:

Thank you @DiegoPino @dannylamb @whikloj @ajs6f @uconnjeustis @mjordan @jpeak5 @ruebot @Natkeeran for your work on this thread.

So... we good to close this thread?

@mjordan
Copy link
Contributor

mjordan commented Feb 13, 2020

@rosiel awesome summary and relating of issues. One thing I'd like to offer though:

Context allows you to do a lot, but does not do access control

That's not necessarily true. A while back I put together https://github.com/mjordan/ip_range_access specifically use Context for access control. I'd love to get some additional eyes on it. I wrote that module to replace a capability of the 7.x Islandora Context module that we use to control access to some licensed vendor content we host in our Islandora repo, and that we make accessible from off campus via Ezproxy.

@rosiel
Copy link
Member Author

rosiel commented Feb 13, 2020

I think there's a reason that Contexts doesn't come with a "deny access" reaction. It works on the node or media's page. This does not carry through to Views, blocks, or other ways of exposing content. So if you're using this, be very careful.

@mjordan
Copy link
Contributor

mjordan commented Feb 13, 2020

@rosiel thanks for the heads up. We haven't tested that module for those things yet but certainly will.

@kstapelfeldt kstapelfeldt added Subject: Multi-tenancy related to having content from multiple Drupal sites in one system Type: use case proposes a new feature or function for the software using user-first language. and removed Multi-tenancy labels Sep 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Subject: Multi-tenancy related to having content from multiple Drupal sites in one system Type: use case proposes a new feature or function for the software using user-first language.
Projects
Development

No branches or pull requests