Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ui.content package no longer block replication queue on AEMaaCS #2523

Conversation

kwin
Copy link
Contributor

@kwin kwin commented Feb 4, 2021

Give underlying system user write access to /var/acs-commons
Set other ACLs below /var only on Author

This closes #2341

@kwin kwin requested a review from davidjgonzalez February 4, 2021 16:31
@kwin kwin force-pushed the bugfix/install-var-package-in-aem-cloud branch from cd13fb2 to 009ab7d Compare February 4, 2021 16:32
create path /var/acs-commons(nt:folder)

# AEM classic does not know this system user, but creating it below system/acs-commons shouldn't do any harm
create service user sling-distribution-importer with path system/acs-commons
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kwin Does this service user already exist on AEM CS? If it does, i assume this "creation" will noop right? (obviously, if it does exist OOTB - it wont exist at "/home/users/system/acs-commons")

Copy link
Contributor Author

@kwin kwin Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidjgonzalez
Copy link
Contributor

@kwin awesome! Before merge/release - just want to make sure it's released into the right semver based on current compatibility [1]

Basically, 6.3+ will support the repo-init of /var/acs-commons and the ACS Commons service-worker ACLs, right? .. so we can do a 4.x release.

[1] https://adobe-consulting-services.github.io/acs-aem-commons/pages/compatibility.html

@kwin
Copy link
Contributor Author

kwin commented Feb 4, 2021

Basically, 6.3+ will support the repo-init of /var/acs-commons and the ACS Commons service-worker ACLs, right? .. so we can do a 4.x release.

AEM 6.3.0 ships with org.apache.sling.repoinit.parser 1.1.0 and org.apache.sling.jcr.repoinit 1.1.2. The only thing which is not supported by the latter is intermediate path support (apache/sling-org-apache-sling-jcr-repoinit@d039439, being added with jcr.repoinit 1.1.8). Not sure in which SP that has been added...
Either we require that SP or we just create the system users in the default path... WDYT?

Also I would very much appreciate if you could test in AEMaaCS prior to release (I only tested against AEM Cloud SDK Quickstart)

@coveralls
Copy link

coveralls commented Feb 4, 2021

Pull Request Test Coverage Report for Build 6276

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 55.645%

Totals Coverage Status
Change from base Build 6264: 0.0%
Covered Lines: 16039
Relevant Lines: 28824

💛 - Coveralls

@davidjgonzalez
Copy link
Contributor

Agh - probably should require the SP (though, I can't see in release notes what adds this support)..

Feels too messy adding "acs-commons-foo-bar" outside of /home/users/system/acs-commons

I imagine anyone on 6.3 should be on a later SP anyhow, so it should in practice be a non-breaking release (though I guess we should release as 5.0 anyhow -- which might be good to clearly call out AEM CS xompatability)..

We can always release 6.x with the other big changes after more testing.

And yes - I'll deploy to AEM CS in the cloud to make sure it builds/deploys

So... I guess we're good with this, PR -- We just need to figure out what AEM 6.x + SP? versions are supported :)

@davidjgonzalez
Copy link
Contributor

Just FYI - working through some unrelated issues getting this branch to build/deploy w CM to AEM CS

@davidjgonzalez
Copy link
Contributor

Deploy step failing on Build Image step.

I did have to add a missing opening quote in the repoinit script, and suppress some warnings around creating temp files to get it to get this far.

It looks like there is a problem w/ replicating it still - though I haven't dug into it (just checked the status, and tossed the log here in case someone has some cycles to check it out)

deploy_step822705.log

Comment on lines 1 to 17
scripts=[
# these users and ACLs are only necessary on author
create service user acs-commons-workflow-remover-service with path system/acs-commons
set principal ACL for acs-commons-workflow-remover-service
allow jcr:read, rep:write on /var/workflow/instances
end

create service user acs-commons-workflowpackagemanager-service with path system/acs-commons
set principal ACL for acs-commons-workflowpackagemanager-service
allow jcr:read on /var/workflow/packages
end
"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
scripts=[
# these users and ACLs are only necessary on author
create service user acs-commons-workflow-remover-service with path system/acs-commons
set principal ACL for acs-commons-workflow-remover-service
allow jcr:read, rep:write on /var/workflow/instances
end
create service user acs-commons-workflowpackagemanager-service with path system/acs-commons
set principal ACL for acs-commons-workflowpackagemanager-service
allow jcr:read on /var/workflow/packages
end
"
]
scripts=["
# these users and ACLs are only necessary on author
create service user acs-commons-workflow-remover-service with path system/acs-commons
set principal ACL for acs-commons-workflow-remover-service
allow jcr:read, rep:write on /var/workflow/instances
end
create service user acs-commons-workflowpackagemanager-service with path system/acs-commons
set principal ACL for acs-commons-workflowpackagemanager-service
allow jcr:read on /var/workflow/packages
end
"
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing opening quote

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 71adee8.

@kwin
Copy link
Contributor Author

kwin commented Feb 7, 2021

It looks like there is a problem w/ replicating it still - though I haven't dug into it (just checked the status, and tossed the log here in case someone has some cycles to check it out)

Although the log does not contain details about why it is blocked (more should be visible in the authors log) I assume some other rights are missing.
According to https://repo1.maven.org/maven2/com/adobe/aem/aem-sdk-api/2021.1.4830.20210128T075814Z-210128/aem-sdk-api-2021.1.4830.20210128T075814Z-210128-aem-publish-sdk.slingosgifeature the following privileges are granted (line 5763) by default

"# GRANITE-23007 - [RTC] Configure service user mapping for Pipeline replication",
    "create service user sling-distribution-importer with path system/cq:services/internal",
    "set principal ACL for sling-distribution-importer",
    "  allow jcr:modifyAccessControl,jcr:readAccessControl on /content",
    "  allow jcr:modifyAccessControl,jcr:readAccessControl on /conf",
    "  allow jcr:modifyAccessControl,jcr:readAccessControl on /etc",
    "  allow jcr:nodeTypeDefinitionManagement,rep:privilegeManagement on :repository ",
    "end",

It seems that these filter rules conflict with that

    ....
    <filter root="/home/groups/rep:policy"/>
    <filter root="/home/users/rep:policy"/>
    <filter root="/home/users/system/acs-commons"/>
    <filter root="/oak:index/rep:policy"/>

Would be good to have those limitations more clearly documented though....
At least the policies could be moved as well to repoinit.

@kwin
Copy link
Contributor Author

kwin commented Feb 7, 2021

I manually executed cp2fm (https://github.com/apache/sling-org-apache-sling-feature-cpconverter) on the ui.content package and got

$ ./bin/cp2sf ./acs-aem-commons-ui.content-4.11.3-SNAPSHOT.zip  -o features -a artifacts
[INFO] Apache Sling Content Package to Sling Feature converter
[INFO]
[INFO] Reading content-package './acs-aem-commons-ui.content-4.11.3-SNAPSHOT.zip'...
[INFO] content-package './acs-aem-commons-ui.content-4.11.3-SNAPSHOT.zip' successfully read!
[INFO] Ordering input content-package(s) [adobe/consulting:acs-aem-commons-ui.content:4.11.3-SNAPSHOT]...
[INFO] New content-package(s) order: [adobe/consulting:acs-aem-commons-ui.content:4.11.3-SNAPSHOT]
[INFO] Converting content-package 'adobe/consulting:acs-aem-commons-ui.content:4.11.3-SNAPSHOT'...
[INFO] Building zip: /var/folders/rm/vlg2h6m16mb0f65djmnb12xr0000gq/T/synthetic-content-packages/adobe-consulting-acs-aem-commons-ui.content-4.11.3-SNAPSHOT-acs-aem-commons-ui.content-4.11.3-SNAPSHOT.zip
[INFO] Dropping package of PackageType.CONTENT acs-aem-commons-ui.content (content-package id: adobe/consulting:acs-aem-commons-ui.content:4.11.3-SNAPSHOT)
[INFO] Adding/Appending RepoInitExtension for runMode: null
[INFO] Conversion complete!
[INFO] Writing resulting Feature Model 'adobe.consulting:acs-aem-commons-ui.content:slingosgifeature:4.11.3-SNAPSHOT' to file 'features/acs-aem-commons-ui.content.json'...
[INFO] 'features/acs-aem-commons-ui.content.json' Feature File successfully written!
[INFO] +-----------------------------------------------------+
[INFO] Apache Sling Content Package to Sling Feature converter SUCCESS
[INFO] Cleaning up tmp directories /var/folders/rm/vlg2h6m16mb0f65djmnb12xr0000gq/T/sub-content-packages, /var/folders/rm/vlg2h6m16mb0f65djmnb12xr0000gq/T/synthetic-content-packages
[INFO] Total time: 576 milliseconds
[INFO] Finished at: Sun Feb 07 10:05:06 UTC 2021
[INFO] Final Memory: 27M/512M
[INFO] +-----------------------------------------------------+

The following JSON feature model was emitted:

{
  "id":"adobe.consulting:acs-aem-commons-ui.content:slingosgifeature:4.11.3-SNAPSHOT",
  "repoinit:TEXT|true":[
    "create service user acs-commons-bulk-workflow-service with path /home/users/system/acs-commons",
    "create service user acs-commons-shared-component-props-service with path /home/users/system/acs-commons",
    "create service user acs-commons-package-replication-status-event-service with path /home/users/system/acs-commons",
    "create service user acs-commons-ensure-service-user-service with path /home/users/system/acs-commons",
    "create service user acs-commons-review-task-asset-mover-service with path /home/users/system/acs-commons",
    "create service user acs-commons-twitter-updater-service with path /home/users/system/acs-commons",
    "create service user acs-commons-remote-assets-service with path /home/users/system/acs-commons",
    "create service user acs-commons-form-helper-service with path /home/users/system/acs-commons",
    "create service user acs-commons-email-service with path /home/users/system/acs-commons",
    "create service user acs-commons-on-deploy-scripts-service with path /home/users/system/acs-commons",
    "create service user acs-commons-httpcache-jcr-storage-service with path /home/users/system/acs-commons",
    "create service user acs-commons-dispatcher-flush-service with path /home/users/system/acs-commons",
    "create service user acs-commons-component-error-handler-service with path /home/users/system/acs-commons",
    "create service user acs-commons-manage-controlled-processes-service with path /home/users/system/acs-commons",
    "create service user acs-commons-error-page-handler-service with path /home/users/system/acs-commons",
    "create service user acs-commons-system-notifications-service with path /home/users/system/acs-commons",
    "set ACL for acs-commons-email-service",
    "allow jcr:read on /etc/notification/email",
    "end",
    "set ACL for acs-commons-twitter-updater-service",
    "allow jcr:read,jcr:modifyProperties,crx:replicate on /content",
    "end",
    "set ACL for acs-commons-bulk-workflow-service",
    "allow jcr:read,jcr:modifyProperties on /etc/acs-commons/bulk-workflow-manager",
    "end",
    "set ACL for acs-commons-package-replication-status-event-service",
    "allow jcr:read,rep:write,jcr:readAccessControl,jcr:modifyAccessControl on /",
    "end",
    "set ACL for acs-commons-dispatcher-flush-service",
    "allow jcr:read,crx:replicate,jcr:removeNode on /",
    "end",
    "set ACL for acs-commons-manage-controlled-processes-service",
    "allow jcr:all on /var/acs-commons/mcp",
    "end",
    "set ACL for acs-commons-ensure-service-user-service",
    "allow rep:userManagement on /home/groups",
    "allow rep:userManagement on /home/users",
    "allow jcr:read,rep:write,jcr:readAccessControl,jcr:modifyAccessControl on /",
    "end",
    "set ACL for acs-commons-review-task-asset-mover-service",
    "allow jcr:read,jcr:versionManagement,rep:write on /content/dam",
    "end",
    "set ACL for acs-commons-component-error-handler-service",
    "allow jcr:read on /content",
    "end",
    "set ACL for acs-commons-error-page-handler-service",
    "allow jcr:read on /content",
    "end",
    "set ACL for acs-commons-httpcache-jcr-storage-service",
    "allow jcr:read,rep:write on /var/acs-commons/httpcache",
    "end",
    "set ACL for acs-commons-system-notifications-service",
    "allow jcr:read on /etc/acs-commons/notifications",
    "end",
    "set ACL for acs-commons-on-deploy-scripts-service",
    "allow jcr:versionManagement,jcr:read,rep:write,jcr:lockManagement on /etc",
    "allow jcr:versionManagement,jcr:read,rep:write,jcr:lockManagement on /var/acs-commons/on-deploy-scripts-status",
    "allow jcr:versionManagement,jcr:read,rep:write,jcr:lockManagement,crx:replicate on /content",
    "allow jcr:read on /",
    "end"
  ]
}

The content package itself was not modified though in any way (i.e. it still contains the no longer necessary ACLs)

I don't know how exactly the install-packages container tries to upload the packages (if they are modified by any means)...
I opened https://issues.apache.org/jira/browse/SLING-10127 to get clarification why the modified content package is just disregarded by the converter.

@davidjgonzalez
Copy link
Contributor

davidjgonzalez commented Feb 7, 2021

@kwin np - ill see if I can find someone to poke around under the covers on Monday to help us understand exactly whats breaking (and hopefully get it documented, if it should be a public-facing detail)

Ill push the latest updates through the pipeline again as well to see if there's any new effect (good or bad :))

@davidjgonzalez
Copy link
Contributor

image
FYI - the latest changes deployed. But im a little concerned, that maybe this is a race condition, where on the first deploy repoinit applied, but not soon enough? .. and then on the 2nd execution of the pipeline repo init has previously effected the repo, and so it worked.

ill try to run it against a different env and see if it works on the first past.

@kwin
Copy link
Contributor Author

kwin commented Feb 8, 2021

We should try to revert all changes related to ACLs and system users in this PR if we are sure that the cp2fm converter correctly extracts those (and they don't prevent deploying the content package later on). Then we would only need to maintain a repoinit for creation of the actual /var/acs-commons node.

@davidjgonzalez
Copy link
Contributor

@kwin just a FYI - your latest built/deployed to successful clean AEM CS (vs. on-top of prior failure). I can find some time tonight to ensure the repoinits actaully did their job as well (nodes/acls/service users).

LMK if you want to run any more updates through a pipeline.

@kwin
Copy link
Contributor Author

kwin commented Feb 8, 2021

Would be good to first deploy 4.11.2 (usually first deployment runs fine despite var nodes) and then run the update on the mutable repository with this PR. Still I am wondering if the commit f8dcc55 fixed the build, or that was just coincidence.

Would also be good to try to manually deploy the acs-aem-commons-ui.content package from author to publish and afterwards check the distribution queue.
TBH it is still unclear to me how the sling-distribution-importer (used in https://github.com/apache/sling-org-apache-sling-distribution-core/blob/f99fe35004817dd8741e77b78245d60e46b9b254/src/main/java/org/apache/sling/distribution/serialization/impl/vlt/FileVaultContentSerializer.java#L130) can deal with the system users and acls below /home/...

@davidjgonzalez
Copy link
Contributor

kk - i can try to recreate my envs and do the fresh installs.

It seems positive that it succeeded to the previously-failing dev pipeline, and then the same commit build to Stage/Prod fine.

@davidjgonzalez
Copy link
Contributor

deploy_step835758.log

@kwin shoot - ran it again on a brand new Dev env, and it failed similarly.. so i guess it was a coincidence it passed before :(

@kwin
Copy link
Contributor Author

kwin commented Feb 8, 2021

The log does not contain the relevant information. Any chance you can get the error.log on the publish or at least the replication.log from the author?
We could try to remove the remaining occurrences of locations which are not accessible to the system user, but this is
a) just a shot in the dark, as there is no documentation on how mutable content packages are exactly deployed in AEMaaCS. I tried to get some more clarification in AdobeDocs/experience-manager-cloud-service.en#78.
b) probably a bug in the cp2fm converter, which should also be able to strip those parts out of mutable content packages (https://issues.apache.org/jira/browse/SLING-10127).

@davidjgonzalez Do you think you can somehow get further information about the limitations of mutable content packages?

@kwin
Copy link
Contributor Author

kwin commented Feb 9, 2021

I made several more tests with modified acs-aem-commons-ui.content packages (in version 4.11.0) in the Cloud.
I manually uploaded to AEM Cloud DEV Author with Package Manager and then replicated to Publish. Afterwards I checked the Distribution Queue and the publish log:

With a package containing oak:index, home and the root ACLs I see the following errors

09.02.2021 09:03:17.939 [cm-p7802-e44859-aem-publish-847d648cdb-jnsz9] *ERROR* [Queue Processor for Subscriber agent publishSubscriber] org.apache.jackrabbit.vault.fs.impl.io.GenericArtifactHandler Error while parsing /jcr_root/_oak_index/_rep_policy.xml: {}
org.xml.sax.SAXException: javax.jcr.AccessDeniedException: Access denied.
	at org.apache.jackrabbit.vault.fs.impl.io.DocViewSAXImporter.endElement(DocViewSAXImporter.java:1245) [org.apache.jackrabbit.vault:3.4.0]
	at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:610)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1718)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2883)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:534)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216)
	at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
	at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324)
	at org.apache.jackrabbit.vault.fs.impl.io.GenericArtifactHandler.accept(GenericArtifactHandler.java:100) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.fs.io.Importer.commit(Importer.java:896) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.fs.io.Importer.commit(Importer.java:799) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.fs.io.Importer.commit(Importer.java:839) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.fs.io.Importer.commit(Importer.java:839) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.fs.io.Importer.run(Importer.java:440) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.packaging.impl.ZipVaultPackage.extract(ZipVaultPackage.java:232) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.packaging.impl.JcrPackageImpl.extract(JcrPackageImpl.java:401) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.packaging.impl.JcrPackageImpl.extract(JcrPackageImpl.java:360) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.packaging.impl.JcrPackageImpl.extract(JcrPackageImpl.java:346) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.sling.distribution.journal.bookkeeper.ContentPackageExtractor.installPackage(ContentPackageExtractor.java:101) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at org.apache.sling.distribution.journal.bookkeeper.ContentPackageExtractor.installPackage(ContentPackageExtractor.java:93) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at org.apache.sling.distribution.journal.bookkeeper.ContentPackageExtractor.handle(ContentPackageExtractor.java:72) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at org.apache.sling.distribution.journal.bookkeeper.PackageHandler.installAddPackage(PackageHandler.java:79) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at org.apache.sling.distribution.journal.bookkeeper.PackageHandler.apply(PackageHandler.java:61) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at org.apache.sling.distribution.journal.bookkeeper.BookKeeper.importPackage(BookKeeper.java:148) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber.processQueueItem(DistributionSubscriber.java:339) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber.fetchAndProcessQueueItem(DistributionSubscriber.java:296) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber.processQueue(DistributionSubscriber.java:275) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: javax.jcr.AccessDeniedException: Access denied.
	at org.apache.jackrabbit.oak.spi.security.authorization.accesscontrol.AbstractAccessControlManager.checkPermissions(AbstractAccessControlManager.java:208) [org.apache.jackrabbit.oak-security-spi:1.37.0.R1884613]
	at org.apache.jackrabbit.oak.spi.security.authorization.accesscontrol.AbstractAccessControlManager.getTree(AbstractAccessControlManager.java:168) [org.apache.jackrabbit.oak-security-spi:1.37.0.R1884613]
	at org.apache.jackrabbit.oak.spi.security.authorization.principalbased.impl.PrincipalBasedAccessControlManager.getPolicies(PrincipalBasedAccessControlManager.java:161) [org.apache.jackrabbit.oak-authorization-principalbased:1.37.0.R1884613]
	at org.apache.jackrabbit.oak.security.authorization.composite.CompositeAccessControlManager.getPolicies(CompositeAccessControlManager.java:82) [org.apache.jackrabbit.oak-core:1.37.0.R1884613]
	at org.apache.jackrabbit.oak.jcr.delegate.AccessControlManagerDelegator$5.perform(AccessControlManagerDelegator.java:97) [org.apache.jackrabbit.oak-jcr:1.37.0.R1884613]
	at org.apache.jackrabbit.oak.jcr.delegate.AccessControlManagerDelegator$5.perform(AccessControlManagerDelegator.java:93) [org.apache.jackrabbit.oak-jcr:1.37.0.R1884613]
	at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:209) [org.apache.jackrabbit.oak-jcr:1.37.0.R1884613]
	at org.apache.jackrabbit.oak.jcr.delegate.AccessControlManagerDelegator.getPolicies(AccessControlManagerDelegator.java:93) [org.apache.jackrabbit.oak-jcr:1.37.0.R1884613]
	at org.apache.jackrabbit.oak.jcr.delegate.JackrabbitAccessControlManagerDelegator.getPolicies(JackrabbitAccessControlManagerDelegator.java:141) [org.apache.jackrabbit.oak-jcr:1.37.0.R1884613]
	at org.apache.jackrabbit.vault.fs.impl.io.JackrabbitACLImporter$ImportedPolicy.getPolicy(JackrabbitACLImporter.java:182) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.fs.impl.io.JackrabbitACLImporter$ImportedAcList.apply(JackrabbitACLImporter.java:267) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.fs.impl.io.JackrabbitACLImporter.close(JackrabbitACLImporter.java:154) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.fs.impl.io.DocViewSAXImporter.endElement(DocViewSAXImporter.java:1189) [org.apache.jackrabbit.vault:3.4.0]
	... 32 common frames omitted
...
09.02.2021 09:07:07.408 [cm-p7802-e44859-aem-publish-847d648cdb-b9l6k] *ERROR* [Queue Processor for Subscriber agent publishSubscriber] org.apache.jackrabbit.vault.fs.impl.io.GenericArtifactHandler Error while parsing /jcr_root/_rep_policy.xml: {}
org.xml.sax.SAXException: javax.jcr.AccessDeniedException: Access denied.
	at org.apache.jackrabbit.vault.fs.impl.io.DocViewSAXImporter.endElement(DocViewSAXImporter.java:1245) [org.apache.jackrabbit.vault:3.4.0]
	at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:610)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1718)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2883)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:534)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216)
	at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
	at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324)
	at org.apache.jackrabbit.vault.fs.impl.io.GenericArtifactHandler.accept(GenericArtifactHandler.java:100) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.fs.io.Importer.commit(Importer.java:896) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.fs.io.Importer.commit(Importer.java:799) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.fs.io.Importer.commit(Importer.java:839) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.fs.io.Importer.run(Importer.java:440) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.packaging.impl.ZipVaultPackage.extract(ZipVaultPackage.java:232) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.packaging.impl.JcrPackageImpl.extract(JcrPackageImpl.java:401) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.packaging.impl.JcrPackageImpl.extract(JcrPackageImpl.java:360) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.packaging.impl.JcrPackageImpl.extract(JcrPackageImpl.java:346) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.sling.distribution.journal.bookkeeper.ContentPackageExtractor.installPackage(ContentPackageExtractor.java:101) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at org.apache.sling.distribution.journal.bookkeeper.ContentPackageExtractor.installPackage(ContentPackageExtractor.java:93) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at org.apache.sling.distribution.journal.bookkeeper.ContentPackageExtractor.handle(ContentPackageExtractor.java:72) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at org.apache.sling.distribution.journal.bookkeeper.PackageHandler.installAddPackage(PackageHandler.java:79) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at org.apache.sling.distribution.journal.bookkeeper.PackageHandler.apply(PackageHandler.java:61) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at org.apache.sling.distribution.journal.bookkeeper.BookKeeper.importPackage(BookKeeper.java:148) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber.processQueueItem(DistributionSubscriber.java:339) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber.fetchAndProcessQueueItem(DistributionSubscriber.java:296) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber.processQueue(DistributionSubscriber.java:275) [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: javax.jcr.AccessDeniedException: Access denied.
	at org.apache.jackrabbit.oak.spi.security.authorization.accesscontrol.AbstractAccessControlManager.checkPermissions(AbstractAccessControlManager.java:208) [org.apache.jackrabbit.oak-security-spi:1.37.0.R1884613]
	at org.apache.jackrabbit.oak.spi.security.authorization.accesscontrol.AbstractAccessControlManager.getTree(AbstractAccessControlManager.java:168) [org.apache.jackrabbit.oak-security-spi:1.37.0.R1884613]
	at org.apache.jackrabbit.oak.spi.security.authorization.principalbased.impl.PrincipalBasedAccessControlManager.getPolicies(PrincipalBasedAccessControlManager.java:161) [org.apache.jackrabbit.oak-authorization-principalbased:1.37.0.R1884613]
	at org.apache.jackrabbit.oak.security.authorization.composite.CompositeAccessControlManager.getPolicies(CompositeAccessControlManager.java:82) [org.apache.jackrabbit.oak-core:1.37.0.R1884613]
	at org.apache.jackrabbit.oak.jcr.delegate.AccessControlManagerDelegator$5.perform(AccessControlManagerDelegator.java:97) [org.apache.jackrabbit.oak-jcr:1.37.0.R1884613]
	at org.apache.jackrabbit.oak.jcr.delegate.AccessControlManagerDelegator$5.perform(AccessControlManagerDelegator.java:93) [org.apache.jackrabbit.oak-jcr:1.37.0.R1884613]
	at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:209) [org.apache.jackrabbit.oak-jcr:1.37.0.R1884613]
	at org.apache.jackrabbit.oak.jcr.delegate.AccessControlManagerDelegator.getPolicies(AccessControlManagerDelegator.java:93) [org.apache.jackrabbit.oak-jcr:1.37.0.R1884613]
	at org.apache.jackrabbit.oak.jcr.delegate.JackrabbitAccessControlManagerDelegator.getPolicies(JackrabbitAccessControlManagerDelegator.java:141) [org.apache.jackrabbit.oak-jcr:1.37.0.R1884613]
	at org.apache.jackrabbit.vault.fs.impl.io.JackrabbitACLImporter$ImportedPolicy.getPolicy(JackrabbitACLImporter.java:182) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.fs.impl.io.JackrabbitACLImporter$ImportedAcList.apply(JackrabbitACLImporter.java:267) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.fs.impl.io.JackrabbitACLImporter.close(JackrabbitACLImporter.java:154) [org.apache.jackrabbit.vault:3.4.0]
	at org.apache.jackrabbit.vault.fs.impl.io.DocViewSAXImporter.endElement(DocViewSAXImporter.java:1189) [org.apache.jackrabbit.vault:3.4.0]
	... 31 common frames omitted

So it seems that also the rep:policy at the root node needs to be removed.
This matches with the observation from #2523 (comment) as on root level the user is not allowed to set ACLs (only on repository level). I am still wondering why the ACLs and users below /home are not an issue (as the user should not have access there either).

@kwin
Copy link
Contributor Author

kwin commented Feb 11, 2021

@davidjgonzalez Can you try again with the updated PR? In case that still fails we would need the publish log to see the underlying issue.

@kwin
Copy link
Contributor Author

kwin commented Mar 2, 2021

@davidjgonzalez Any update? A lot of people are waiting for a AEMaaCS compliant release....

@badvision
Copy link
Contributor

@davidjgonzalez Any update? A lot of people are waiting for a AEMaaCS compliant release....

Travis build is failing, any insight into the oakpal failure?

Failed to execute goal net.adamcin.oakpal:oakpal-maven-plugin:2.0.0:verify (oakpal-verify) on project acs-aem-commons-ui.content: ** Violations were reported at or above severity: MAJOR ** -> [Help 1]

@kwin
Copy link
Contributor Author

kwin commented Mar 2, 2021

oakpal does not evaluate repoinit configurations. We probably need to fix oakpal or just disable those checks.

@badvision
Copy link
Contributor

oakpal does not evaluate repoinit configurations. We probably need to fix oakpal or just disable those checks.

I'm ok with either, but we can't push this PR through if it breaks the build. I'll leave the decision to you, I have no particular preference.

@davidjgonzalez
Copy link
Contributor

Havent found anyone that's been able to dig around the backend. There are issues with some service users (but i don't expect that should fail the build?)

There's also an issue with applying ACLs to /var/workflow/instances - as it thinks the path is non-existent. TBH, im not sure which of these errors in the log qualifies as a build-failing event.

buildImage_buildImage (3).log.zip

@kwin
Copy link
Contributor Author

kwin commented Mar 3, 2021

@davidjgonzalez Thanks for the log. It seems that /var/workflow/instances is only created lazily (i.e. it is not there when we try to apply ACLs). All errors from the repoinit script are fatal i.e. lead to a repository shutdown, but it could be that the Cloud Manager does not (yet) detect that error state correctly. This should be fixed by 5bd46d6

@kwin
Copy link
Contributor Author

kwin commented Mar 3, 2021

I'm ok with either, but we can't push this PR through if it breaks the build. I'll leave the decision to you, I have no particular preference.

I now removed the failing oakPal checks in 2eb7587

@kwin kwin force-pushed the bugfix/install-var-package-in-aem-cloud branch from 2eb7587 to 9019eaf Compare March 3, 2021 10:52
@davidjgonzalez
Copy link
Contributor

@kwin fyi - i had to fix a number of ERRORs being thrown, mostly around repoinit trying to set ACLs on paths that didnt exist yet, and service users that didnt exist when OSGi components were starting (basically moved more stuff out to repoinit).

It looks like the ui.content package is still failing to publish


Processing Package Metadata
Download Package - https://cm0pl0va80stor0prd.file.core.windows.net/ec53e565-9c49-454d-8a1f-2b0673d4fc48/build/acs-aem-commons-content-4.11.3-SNAPSHOT.zip?sig=zkkcydA%2Fp5lGBOxyG6nZnBCkIlzg78ay8FD7gF0yIb0%3D&se=2021-03-31T19%3A34%3A58Z&sv=2018-03-28&rsct=application%2Foctet-stream&rscd=attachment%3B%20filename%3Dbuild%2Facs-aem-commons-content-4.11.3-SNAPSHOT.zip&sp=r&sr=f
Extract Package - adobe/consulting:acs-aem-commons-ui.content:4.11.3-SNAPSHOT
adobe/consulting:acs-aem-commons-ui.content:4.11.3-SNAPSHOT maps to /tmp/packages/package_1614802808.zip
Extract Package - adobe/consulting:acs-aem-commons-content:4.11.3-SNAPSHOT
adobe/consulting:acs-aem-commons-content:4.11.3-SNAPSHOT maps to /tmp/packages/package_1614802809.zip
Waiting for the author
Checking for unblocked Pipeline
Push (Upload/Install/Replicate) processed packages
Install /etc/packages/adobe/consulting/acs-aem-commons-ui.content-4.11.3-SNAPSHOT.zip
Replicate /etc/packages/adobe/consulting/acs-aem-commons-ui.content-4.11.3-SNAPSHOT.zip
                <td>Message</td>
                <td><div id="Message">Replication started for /etc/packages/adobe/consulting/acs-aem-commons-ui.content-4.11.3-SNAPSHOT.zip
Waiting for Replication to succeed
Install /etc/packages/adobe/consulting/acs-aem-commons-content-4.11.3-SNAPSHOT.zip
Replicate /etc/packages/adobe/consulting/acs-aem-commons-content-4.11.3-SNAPSHOT.zip
                <td>Message</td>
                <td><div id="Message">Replication started for /etc/packages/adobe/consulting/acs-aem-commons-content-4.11.3-SNAPSHOT.zip
Waiting for Replication to succeed
Found blocked queue:  cbfee9e4-61bb-4d99-8c71-3e56379191ae-publishSubscriber
Failing due to blocked queue after starting job - cleaned package /etc/packages/adobe/consulting/acs-aem-commons-content-4.11.3-SNAPSHOT.zip from replication as it was blocking

Creating the kill switch /mnt/sandbox/kill-switch

I do see

03.03.2021 20:24:55.093 *INFO* [Apache Sling Repository Startup Thread #1] org.apache.sling.jcr.repoinit.impl.AclVisitor Adding principal-based access control entry for sling-distribution-importer
03.03.2021 20:24:55.101 *INFO* [Apache Sling Repository Startup Thread #1] org.apache.sling.jcr.repoinit.impl.AclUtil Equivalent principal-based entry already exists for principal sling-distribution-importer and effective path /content 
03.03.2021 20:24:55.101 *INFO* [Apache Sling Repository Startup Thread #1] org.apache.sling.jcr.repoinit.impl.AclUtil Equivalent principal-based entry already exists for principal sling-distribution-importer and effective path /conf 
03.03.2021 20:24:55.102 *INFO* [Apache Sling Repository Startup Thread #1] org.apache.sling.jcr.repoinit.impl.AclUtil Equivalent principal-based entry already exists for principal sling-distribution-importer and effective path /etc 
03.03.2021 20:24:55.102 *INFO* [Apache Sling Repository Startup Thread #1] org.apache.sling.jcr.repoinit.impl.AclUtil Equivalent principal-based entry already exists for principal sling-distribution-importer and effective path null 

I suspect? that maybe that last line that lists the path as null is actually ours for /var/acs-commons and thus the ACLs arent being applied?

Im trying to shuffle a few things around in repoinit to see if it has any effect.. LMK if you see anything that stands out. I can make a PR back to you with the other changes i had to make too.

deploy_step905717.log

@kwin
Copy link
Contributor Author

kwin commented Mar 4, 2021

The line

03.03.2021 20:24:55.102 INFO [Apache Sling Repository Startup Thread #1] org.apache.sling.jcr.repoinit.impl.AclUtil Equivalent principal-based entry already exists for principal sling-distribution-importer and effective path null

is
a) not an error and
b) not triggered by our repoinit. Path is only null if not set or :repository (https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/760547d538b6ba3357a9baf25efe50c09b734bc7/src/main/java/org/apache/sling/jcr/repoinit/impl/AclUtil.java#L288)

Instead it is triggered by the default repoinit config (https://repo1.maven.org/maven2/com/adobe/aem/aem-sdk-api/2021.2.4944.20210221T230729Z-210128/, line 5763ff):

"# GRANITE-23007 - [RTC] Configure service user mapping for Pipeline replication",
    "create service user sling-distribution-importer with path system/cq:services/internal",
    "set principal ACL for sling-distribution-importer",
    "  allow jcr:modifyAccessControl,jcr:readAccessControl on /content",
    "  allow jcr:modifyAccessControl,jcr:readAccessControl on /conf",
    "  allow jcr:modifyAccessControl,jcr:readAccessControl on /etc",
    "  allow jcr:nodeTypeDefinitionManagement,rep:privilegeManagement on :repository ",
    "end",

If the content distribution does still not work it must have another reason. That can only be seen in the publish log unfortunately. Can you attach that as well? Please also commit the other changes....

@davidjgonzalez
Copy link
Contributor

@kwin let me try again with a fresh pull from yours - i think you fixed the prior errors with missing service-users and paths that i had patched into my branch. Ill grab the Publish logs as well.

@davidjgonzalez
Copy link
Contributor

@kwin shoot - didnt work. I pushed some more fixes to your branch as a PR (some immediate OSGi components that required service users were throwing ERRORs; figured might as well fix them to rule them out as potential causes)

Attaching the deploy log and the publish logs. I'm not seeing any errors in the publish logs, and whats logs seems reasonable (though I'm not that in tune w/ the inner workings of the deployment process)

I've requested access to some backend systems to see if I can poke around myself, but in the meantime - not sure if you see anything of interest. Keep in mind I've done a few deployments today - so you'll want to true up timestamps off the deploy_step log:

2021-03-04T22:30:25+0000 Summary of events during the deployment step:
2021-03-04T22:30:25+0000 Begin deployment in program-3-dev [CorrelationId: 2063095]
2021-03-04T22:30:29+0000 Update author indexes job has started.
2021-03-04T22:33:41+0000 Update author indexes job has finished successfully.
2021-03-04T22:33:42+0000 Update publish indexes job has started.
2021-03-04T22:33:57+0000 Update publish indexes job has finished successfully.
2021-03-04T22:42:40+0000 Install mutable content job has started.
2021-03-04T23:15:43+0000 Install mutable content job has failed.
2021-03-04T23:15:43+0000 Failed deployment in program-3-dev
2021-03-04T23:15:43+0000 Detailed events during the deployment step:

image

deploy_step905717.log

author_aemerror_2021-03-04.log.gz

publish_aemerror_2021-03-04.log.gz

@kwin
Copy link
Contributor Author

kwin commented Mar 5, 2021

The attached deploy_step log shows an error from the 3rd of March, while both author and publish error are from 4th of March. Can you attach the deploy_step log from the 4th of March?

@davidjgonzalez
Copy link
Contributor

@kwin Oh wow - i just realized that ALL deploy_step logs for my pipeline executions have the same file name .. I always assumed that that number postfix (905717) was unique per execution... make sense why i had ...906717 (1).log, ...906717 (2).log, ...906717 (N).log in my downloads, i typically post the file without the (N) since I assumed they were the same, and the filename with the (N) is "cleaner" ... 🤦 ... no good deed goes unpunished.

deploy_step905717.log

@@ -0,0 +1,29 @@
scripts=[
Copy link
Contributor Author

@kwin kwin Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved back to regular ACLs set via the content package.
The root access on root should be a granted to a user group via repoinit. The service user can be member of that group.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean this entire repo unit? This script is to prevent ERRORs on bundle/component start. Ultimately they will be in repoinits anyhow..?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how rep:Acls on mutable content are converted, as those must be set during the real pod execution.If they are converted that would indeed change the semantics as with content packages ACLs on non-existing content is silently ignored while for repoinit it leads to an exception....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double-checked that its fine (preferred) to use repoinit to manage the mutable space, so unless there are other concerns i vote to keep it as repoinit (since that's the ideal state anyhow).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, agree, @davidjgonzalez do you remove the creation of the /etc/tags path then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes - sorry, did a deployment to make sure it was ok. Let me check on it.

I don't think we can even put /etc/tags into ui.content package, since the mere presence of that node causes TagManager API to use it (/etc/tags) as the tag root.

@davidjgonzalez davidjgonzalez changed the base branch from master to acs-aem-commons-5.0.0 March 13, 2021 19:13
@davidjgonzalez davidjgonzalez merged commit 58f4f4f into Adobe-Consulting-Services:acs-aem-commons-5.0.0 Mar 13, 2021
id-keenan added a commit to id-keenan/acs-aem-commons that referenced this pull request Mar 15, 2021
* master:
  [maven-release-plugin] prepare for next development iteration
  [maven-release-plugin] prepare release acs-aem-commons-5.0.2
  v5.0.2 release
  Don't set ACLs on potentially non-existing path /etc/tags (Adobe-Consulting-Services#2547)
  [maven-release-plugin] prepare for next development iteration
  [maven-release-plugin] prepare release acs-aem-commons-5.0.0
  [maven-release-plugin] prepare for next development iteration
  [maven-release-plugin] prepare release acs-aem-commons-4.12.0
  4.12.0 changelog
  Sorter (Update) (Adobe-Consulting-Services#2544)
  Adobe-Consulting-Services#2542 - Fixed VanityUrlAdjuster package location (Adobe-Consulting-Services#2543)
  ACS Redirect Manager (Adobe-Consulting-Services#2513)
  Adding (backwards compatible) options to the i18nprovider / injector to: (Adobe-Consulting-Services#2518)
  Add append option to dataimporter (Adobe-Consulting-Services#2535)
  Make sure show hide of tabs also works when coral panel is of region (Adobe-Consulting-Services#2540)
  Make ui.content package no longer block replication queue on AEMaaCS (Adobe-Consulting-Services#2523)
  Introduce marker interface for ClusterLeader (Adobe-Consulting-Services#2499)

# Conflicts:
#	CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ACS Commons fails to deploy to AEM as a Cloud Service due to inclusion of /var nodes
4 participants