Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7.0.6 takes longer to deploy (delaying start up and triggering a 60 seconds timeout) #24484

Closed
oxdrove opened this issue Jul 2, 2023 · 29 comments

Comments

@oxdrove
Copy link

oxdrove commented Jul 2, 2023

Environment Details

  • GlassFish Version (and build number): 7.0.6
  • JDK version: 11.0.19+7
  • OS: OmniOS 151038
  • Database: postgresql 15.3

Problem Description

Start takes significantly longer with 7.0.6 than 7.0.5

This would not be a problem, although a minor inconvenience, were is not for the 60 seconds timeout for start which is exceeded.

Glassfish 7.0.6

[ Jul  2 12:11:29 Executing start method ("/opt/XXXX/glassfish/bin/asadmin start-domain"). ]
Waiting for domain1 to start ............................................................
Waiting finished after 60,000 ms.
No response from the Domain Administration Server (domain1) after 60 seconds.
The command is either taking too long to complete or the server has failed.
Please see the server log files for command status.
Please start with the --verbose option in order to see early messages.
Command start-domain failed.

Glassfish 7.0.5

[ Jul  2 12:14:37 Executing start method ("/opt/XXXX/glassfish/bin/asadmin start-domain"). ]
Waiting for domain1 to start .....................................
Waiting finished after 36,895 ms.
Successfully started the domain : domain1
domain  Location: /XXXX/glassfish/domain1
Log File: /XXXX/glassfish/domain1/logs/server.log
Admin Port: 4,848
Command start-domain executed successfully. 

Steps to reproduce

Update to 7.0.6

Impact of Issue

Slow start triggers a 60 second limit which flags the start up as failed. Glassfish actually starts but it says it doesn't, return value from asadmin I presume. This causes the SMF status to go to maintenance.

Solution is working out why 7.0.6 takes twice as long to start however it also should be possible to extend the start time out - maybe it is, I can't see how.

@hs536
Copy link
Contributor

hs536 commented Jul 3, 2023

To extend the start timeout, try the following env (default 60_000L):
export AS_START_TIMEOUT=600_000L

@oxdrove
Copy link
Author

oxdrove commented Jul 3, 2023

export AS_START_TIMEOUT=600000
with no underscore nor L worked. Thank you very much for this.

Waiting for domain1 to start Environment property AS_START_TIMEOUT is set to a value 10_000L which cannot be parsed to long.

Set low so I can see it is controlling the time:

[ Jul  3 09:10:34 Executing start method ("/opt/XXXX/glassfish/bin/asadmin start-domain"). ]
Waiting for domain1 to start ..........
Waiting finished after 10,000 ms.
No response from the Domain Administration Server (domain1) after 10 seconds. 

Yes it works, now to extend and test 7.0.6

7.0.5:

[ Jul  3 09:09:31 Executing start method ("/opt/XXXX/glassfish/bin/asadmin start-domain"). ]
Waiting for domain1 to start .................................. 
Waiting finished after 31,529 ms.
Successfully started the domain : domain1
domain  Location: /XXXX/glassfish/domain1
Log File: /XXXX/glassfish/domain1/logs/server.log
Admin Port: 4,848
Command start-domain executed successfully.
[ Jul  3 09:10:05 Method "start" exited with status 0. ] 

7.0.6 with AS_START_TIMEOUT=600000

[ Jul  3 09:13:51 Executing start method ("/opt/XXXX/glassfish/bin/asadmin start-domain"). ]
Waiting for domain1 to start ..............................................................
Waiting finished after 61,293 ms.
Successfully started the domain : domain1
domain  Location: /XXXX/glassfish/domain1
Log File: /XXXX/glassfish/domain1/logs/server.log
Admin Port: 4,848
Command start-domain executed successfully.
[ Jul  3 09:14:55 Method "start" exited with status 0. ]

The timeout extension prevents the return value making the OS think it failed.

7.0.6 takes twice the time of 7.0.5. Otherwise identical. ears, wars, data, etc.

@oxdrove
Copy link
Author

oxdrove commented Jul 3, 2023

Now with the timeout fixed I can progressively undeploy and prune to find a cause. I find that that including the Square payment dependencies makes a difference. In pom.xml:

        <dependency>
            <groupId>com.squareup</groupId>
            <artifactId>square</artifactId>
            <version>30.0.1.20230608</version>
        </dependency> 

This increases the size of my WAR from 400k to 10MB and increases the deploy time . On GF 7.0.5 it increases from 2 to 5 seconds, on GF 7.0.6 this increases from 2 seconds to 30. So although Square adds a lot of dependency files (probably needlessly, I guess a lot are never used) 7.0.6 is taking 6 times longer to process.

Note I have removed all the Square code so to reproduce just add the dependency. I try another project and just add the dependency, Deploy times adding dependency. 6 times longer with 7.0.6:

7.0.5  1.7 -> 5.1
7.0.6  1.7 -> 30.0

@oxdrove oxdrove changed the title 7.0.6 takes longer to start (trigging a 60 seconds timeout) 7.0.6 takes longer to start (triggering a 60 seconds timeout) Jul 3, 2023
@OndroMih
Copy link
Contributor

OndroMih commented Jul 3, 2023

I have a small script to print the stack trace of the thread that's running the deploy command. I use it when I want to know what's happening during deployment when the deployment is slow. Maybe it will help you investigate what's happening:

jcmd `cat glassfish/domains/domain1/config/pid` Thread.print | tac | sed -n '/DeployCommand/,/^$/p' | tac

if you run it regularly during deployment, you'll see what is taking so long. I suspect it's CDI scanning (Weld), since the JAR file probably has a lot of classes to scan for bean-defining annotations.

@oxdrove
Copy link
Author

oxdrove commented Jul 3, 2023

Thank you for the suggestion. I have run the above command during deployment. In the phase it appears to take the time this is the output:

 "admin-listener(1)" #68 daemon prio=5 os_prio=64 cpu=28557.02ms elapsed=187.88s tid=0x0000000008029000 nid=0x5b runnable [0xfffffc7fbefc7000]
   java.lang.Thread.State: RUNNABLE
        at java.util.AbstractCollection.addAll([[email protected]/AbstractCollection.java:352](mailto:[email protected]/AbstractCollection.java:352))
        at org.glassfish.weld.connector.WeldUtils.isCDIEnablingAnnotation(WeldUtils.java:323)
        at org.glassfish.weld.connector.WeldUtils.isCDIEnablingAnnotation(WeldUtils.java:336)
        at org.glassfish.weld.connector.WeldUtils.isCDIEnablingAnnotation(WeldUtils.java:336)
        at org.glassfish.weld.connector.WeldUtils.isCDIEnablingAnnotation(WeldUtils.java:336)
        at org.glassfish.weld.connector.WeldUtils.isCDIEnablingAnnotation(WeldUtils.java:336)
        at org.glassfish.weld.connector.WeldUtils.isCDIEnablingAnnotation(WeldUtils.java:336)
        at org.glassfish.weld.connector.WeldUtils.isCDIEnablingAnnotation(WeldUtils.java:336)
        at org.glassfish.weld.connector.WeldUtils.isCDIEnablingAnnotation(WeldUtils.java:309)
        at org.glassfish.weld.connector.WeldUtils.hasCDIEnablingAnnotations(WeldUtils.java:206)
        at org.glassfish.weld.connector.WeldUtils.hasCDIEnablingAnnotations(WeldUtils.java:185)
        at org.glassfish.weld.connector.WeldUtils.isImplicitBeanArchive(WeldUtils.java:172)
        at org.glassfish.weld.connector.WeldUtils.isImplicitBeanArchive(WeldUtils.java:160)
        at org.glassfish.weld.BeanDeploymentArchiveImpl.populate(BeanDeploymentArchiveImpl.java:469)
        at org.glassfish.weld.BeanDeploymentArchiveImpl.<init>(BeanDeploymentArchiveImpl.java:146)
        at org.glassfish.weld.RootBeanDeploymentArchive.createModuleBda(RootBeanDeploymentArchive.java:68)
        at org.glassfish.weld.RootBeanDeploymentArchive.<init>(RootBeanDeploymentArchive.java:64)
        at org.glassfish.weld.RootBeanDeploymentArchive.<init>(RootBeanDeploymentArchive.java:56)
        at org.glassfish.weld.DeploymentImpl.createModuleBda(DeploymentImpl.java:721)
        at org.glassfish.weld.DeploymentImpl.<init>(DeploymentImpl.java:141)
        at org.glassfish.weld.WeldDeployer.load(WeldDeployer.java:228)
        at org.glassfish.weld.WeldDeployer.load(WeldDeployer.java:115)
        at org.glassfish.internal.data.ModuleInfo.load(ModuleInfo.java:182)
        at org.glassfish.internal.data.ApplicationInfo.load(ApplicationInfo.java:317)
        at com.sun.enterprise.v3.server.ApplicationLifecycle.deploy(ApplicationLifecycle.java:531)
        at com.sun.enterprise.v3.server.ApplicationLifecycle.deploy(ApplicationLifecycle.java:259)
        at org.glassfish.deployment.admin.DeployCommand.execute(DeployCommand.java:467)
        at com.sun.enterprise.v3.admin.CommandRunnerImpl$2$1.run(CommandRunnerImpl.java:570)
        at com.sun.enterprise.v3.admin.CommandRunnerImpl$2$1.run(CommandRunnerImpl.java:566)
        at java.security.AccessController.doPrivileged([[email protected]/Native](mailto:[email protected]/Native) Method)
        at javax.security.auth.Subject.doAs([[email protected]/Subject.java:361](mailto:[email protected]/Subject.java:361))
        at com.sun.enterprise.v3.admin.CommandRunnerImpl$2.execute(CommandRunnerImpl.java:565)
        at com.sun.enterprise.v3.admin.CommandRunnerImpl$3.run(CommandRunnerImpl.java:596)
        at com.sun.enterprise.v3.admin.CommandRunnerImpl$3.run(CommandRunnerImpl.java:588)
        at java.security.AccessController.doPrivileged([[email protected]/Native](mailto:[email protected]/Native) Method)
        at javax.security.auth.Subject.doAs([[email protected]/Subject.java:361](mailto:[email protected]/Subject.java:361))
        at com.sun.enterprise.v3.admin.CommandRunnerImpl.doCommand(CommandRunnerImpl.java:587)
        at com.sun.enterprise.v3.admin.CommandRunnerImpl.doCommand(CommandRunnerImpl.java:1478)
        at com.sun.enterprise.v3.admin.CommandRunnerImpl$ExecutionContext.execute(CommandRunnerImpl.java:1847)
        at com.sun.enterprise.v3.admin.CommandRunnerImpl$ExecutionContext.execute(CommandRunnerImpl.java:1723)
        at org.glassfish.deployment.admin.ReDeployCommand.execute(ReDeployCommand.java:107)

So yes, CDI and Weld is there.
We presume the Square dependencies give it a lot of work and this is on a 1W per core processor but neither accounts for why 7.0.6 takes so much longer.

@oxdrove
Copy link
Author

oxdrove commented Jul 3, 2023

Change for 7.0.6:
#24480

@dmatej
Copy link
Contributor

dmatej commented Jul 3, 2023

Just for the record, without any deployment on my machine (Xeon, 64GB RAM):

Eclipse GlassFish 7.0.5 (branch: master, commit: 7b49f4eaf669e9a3b4f4648254f9016872455236, timestamp: 2023-04-02T15:44:03Z) 
startup time: Embedded (6 462 ms), startup services (3 234 ms), total (9 696 ms)]]

Eclipse GlassFish 7.0.6 (commit: 9e82903b2afcab9478003fb5adb7bd6a466e4982) 
startup time: Embedded (6 347 ms), startup services (2 672 ms), total (9 019 ms)]]

It seems the issue is really in deployment.

@oxdrove oxdrove changed the title 7.0.6 takes longer to start (triggering a 60 seconds timeout) 7.0.6 takes longer to deploy (delaying start up and triggering a 60 seconds timeout) Jul 4, 2023
@oxdrove
Copy link
Author

oxdrove commented Jul 4, 2023

Deployment is a part of useful startup which was tripping a timeout causing failure, but yes, we have narrowed it to deployment and I have amended the title accordingly.

@arjantijms
Copy link
Contributor

You could try to downgrade Weld to 5.1.0 by copying https://mvnrepository.com/artifact/org.jboss.weld/weld-osgi-bundle/5.1.0.Final to [glassfish root]/glassfish/modules/weld-osgi-bundle.jar and see if that makes a difference.

I haven't tried this, but I believe the downgrade should be possible.

@arjantijms
Copy link
Contributor

@oxdrove any update?

@oxdrove
Copy link
Author

oxdrove commented Jul 12, 2023

I swapped the weld jars from 5 and 6, no change.

I have tried to build from source with a view to using Weld 5.1.0 not 5.1.1. I have read README.md specifically "## Building". I run "mvn clean install" and it complains:

$ mvn clean install
Error: Could not find or load main class org.codehaus.plexus.classworlds.launcher.Launcher
Caused by: java.lang.ClassNotFoundException: org.codehaus.plexus.classworlds.launcher.Launcher 

How to build?

@arjantijms
Copy link
Contributor

I swamped the weld jars from 5 and 6, no change.

Thanks for testing that.

In the latest commit I just updated Weld to 5.1.1.SP1. You could try that one too:

https://ci.eclipse.org/glassfish/job/glassfish_build-and-test-using-jenkinsfile/job/PR-24498/1/artifact/bundles/glassfish.zip

Error: Could not find or load main class org.codehaus.plexus.classworlds.launcher.Launcher

That looks like a Maven environment not correctly configured. Specifically the M2_HOME may not be correct. @dmatej, any idea?

@dmatej
Copy link
Contributor

dmatej commented Jul 12, 2023

What maven version and java do you use?
mvn -version
You should use the latest maven 3 and jdk17

@oxdrove
Copy link
Author

oxdrove commented Jul 12, 2023

$ java -version
openjdk version "11.0.19" 2023-04-18
OpenJDK Runtime Environment (build 11.0.19+7-omnios-151038)
OpenJDK 64-Bit Server VM (build 11.0.19+7-omnios-151038, mixed mode)

From README.md:

### Prerequisites

* JDK11+, we strongly recommend using JDK17 to build GlassFish with all 

mvn works for other projects, There appears to be something odd about maven on the build machine, I am investigating

@oxdrove
Copy link
Author

oxdrove commented Jul 12, 2023

Maven: My fault. The build server has to have an environment that uses the freshly built software before the installed packages, that is how a new package links to a library before the library is installed, if it were installed the removal of the old version would stop installed packages working.

mvn is working, the build now report failure, I will investigate before reporting here.

@oxdrove
Copy link
Author

oxdrove commented Aug 6, 2023

7.0.7 is as slow as 7.0.6.

Building gives errors,

$ mvn -X clean install -Pfastest
...
[ERROR] Failed to execute goal io.github.git-commit-id:git-commit-id-maven-plugin:5.0.0:revision (get-the-git-infos) on project glassfish-common: .git directory is not found! Please specify a valid [dotGitDirectory] in your pom.xml -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal io.github.git-commit-id:git-commit-id-maven-plugin:5.0.0:revision (get-the-git-infos) on project glassfish-common: .git directory is not found! Please specify a valid [dotGitDirectory] in your pom.xml
...

...no, I'm trying to build it not commit it.

$ mvn -X clean compile -Pfastest
...
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.6.0:unpack (unpack) on project glassfish: Artifact has not been packaged yet. When used on reactor artifact, unpack should be executed after packaging: see MDEP-98. -> [Help 1]
$ mvn -X clean package -Pfastest
...
[ERROR] Failed to execute goal io.github.git-commit-id:git-commit-id-maven-plugin:5.0.0:revision (get-the-git-infos) on project glassfish-common: .git directory is not found! Please specify a valid [dotGitDirectory] in your pom.xml -> [Help 1]

no, I just want to build it from source.

@oxdrove
Copy link
Author

oxdrove commented Aug 6, 2023

It would help if it was clearer on what "your" pom.xml file is.

$ find . -name pom.xml | wc
     562     562   27066

Plenty from which to choose and none mentions dotGitDirectory.

$ find . -name pom.xml | xargs grep -l dotGitDirectory | wc
       0       0       0

@dmatej
Copy link
Contributor

dmatej commented Aug 6, 2023

$ mvn -X clean package -Pfastest
...
[ERROR] Failed to execute goal io.github.git-commit-id:git-commit-id-maven-plugin:5.0.0:revision (get-the-git-infos) on project glassfish-common: .git directory is not found! Please specify a valid [dotGitDirectory] in your pom.xml -> [Help 1]

no, I just want to build it from source.

Maven tries to resolve the git commit id which is used in GlassFish internally.

@oxdrove
Copy link
Author

oxdrove commented Aug 6, 2023

...and fails. I either fudge it, get a git commit id or give up. I only know how to do the later.

@dmatej
Copy link
Contributor

dmatej commented Aug 6, 2023

...and fails. I either fudge it, get a git commit id or give up. I only know how to do the later.

git clone [email protected]:eclipse-ee4j/glassfish.git
cd glassfish
mvn clean install -Pfastest -pl 'org.glassfish.main.distributions:glassfish' -am -T4C

@oxdrove
Copy link
Author

oxdrove commented Aug 7, 2023

$ git clone [email protected]:eclipse-ee4j/glassfish.git
Cloning into 'glassfish'...
The authenticity of host 'github.com (140.82.121.4)' can't be established.
ED25519 key fingerprint is SHA256:+DiY3wvvV6TuJJhbpZisF/zLDA0zPMSvHdkr4UvCOqU.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'github.com' (ED25519) to the list of known hosts.
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Sorry I need to go and do something useful. Stepping back from build, the problem points to Weld, isn't Weld a dependent project? So the change is is Weld not Glassfish and we should be looking at Weld and not Glassfish?

@OndroMih
Copy link
Contributor

OndroMih commented Aug 7, 2023

No, it's not related to Weld. It's used by the GlassFish distribution artifact to retrieve the git commit ID and add it to the GlassFish ZIP.

You can fix this message and skip running the git plugin with this:

  • Open nucleus/parent/pom.xml in GF source code
  • Find "git-commit-id-maven-plugin" plugin
  • In the "configuration" section of the plugin, add the following line:
 <failOnNoGitDirectory>false</failOnNoGitDirectory>

Unfortunately, it's not possible to configure this via a system property on the command line, you need to modify the file.

@oxdrove
Copy link
Author

oxdrove commented Aug 13, 2023

I already tried the failOnNoGitDirectory but had another error and have been working on it.

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:3.1.0:run (do stuff) on project glassfish: An Ant BuildException has occured: The following error occurred while executing this line:
[ERROR] /scratch/build/glassfish/64/appserver/distributions/glassfish/target/antrun/build-main.xml:9: The archive microprofile-jwt-auth-api.jar doesn't exist
[ERROR] around Ant part ...<jarupdate basedir="/scratch/build/glassfish/64/appserver/distributions/glassfish/src/main/patches/microprofile-jwt-auth-api" destfile="/scratch/build/glassfish/64/appserver/distributions/glassfish/target/stage/glassfish7/glassfish/modules/microprofile-jwt-auth-api.jar" includes="META-INF/MANIFEST.MF" />... @ 17:310 in /scratch/build/glassfish/64/appserver/distributions/glassfish/target/antrun/build-main.xml

It is trying to make this file with a macro so it will not exist until it makes it itself. It does not complain it can not make the file. The line above, in the pom.xml, that makes xmlsec.jar does work. I can not see why.

@oxdrove
Copy link
Author

oxdrove commented Sep 3, 2023

Summery of progress:

  1. It appears not to be the Weld version
  2. workaround: install Payara

Longer version:
I have install java 17, this lets it build - documentation is in error. This wasted a not inconsiderable amount of time because glassfish required the non intuitive (from the reported error)
asadmin create-jvm-options "--add-opens jdk.naming.dns/com.sun.jndi.dns=ALL-UNNAMED

I compiled 7.0.6 with "Integrate Weld 5.1.1 #24480" in reverse. Same slow load.

Some change between 7.0.5 and 7.0.6 has caused this.

The logical this to do is compile 7.0.5 and incrementally add the changes that created 7.0.6. Sadly 7.0.5 compiled does not run as 7.0.5 binary version [for me]. It just hangs on start. I can take 7.0.5 and 7.0.6 and diff the two to play spot-the-difference but this is taking me too long.

The sensible thing to do it for someone that actually understands the build and the changes that took place to test this.

@dmatej
Copy link
Contributor

dmatej commented Sep 3, 2023

It just hangs on start.

Actually it usually doesn't hang, but the initialization failed for some reason, so it cannot log anything. The first log is logged by the start-domain command, not by the server, that confused me years ago too. To see what happened you can try to execute asadmin start-domain --verbose which allocates the console, so you can see the standard output. It should contain some stacktrace.

asadmin create-jvm-options "--add-opens jdk.naming.dns/com.sun.jndi.dns=ALL-UNNAMED

I did not find any usage of this module in GlassFish. Can you send us the stacktrace?

@oxdrove
Copy link
Author

oxdrove commented Sep 5, 2023

asadmin create-jvm-options "--add-opens jdk.naming.dns/com.sun.jndi.dns=ALL-UNNAMED

I did not find any usage of this module in GlassFish.

It's not. Expanding my "Glassfish requires" Glassfissh requires this option in order to pass to Java so it can run my applications. Glassfish being a means to and end and for non Glassfish developers not an end itself - I don't require Glassfish at all. Note too it's not java as such that needs it, java 17 runs my library methods perfectly well from the command line leading to my confusion as to what the opaque error message meant.

Copy link

github-actions bot commented Sep 5, 2024

This issue has been marked as inactive and old and will be closed in 7 days if there is no further activity. If you want the issue to remain open please add a comment

@github-actions github-actions bot added the Stale label Sep 5, 2024
@avpinchuk avpinchuk removed the Stale label Sep 11, 2024
@dmatej
Copy link
Contributor

dmatej commented Oct 17, 2024

I can't see what we could improve with this now until we receive the asadmin start-domain --verbose output at least. @oxdrove , I have doubts about your results too, because as I did maybe 200 tests of this in last 10 days, I noticed the influence of the laptop heat on these results. The worst result is when I use a script to unpack the zip file and start the domain immediately after that. The best result is when I close some browsers and I don't compile anything else and I just run asadmin start-domain --verbose.

(worst+best results)
7.0.5
Total startup time including CLI: 3 489 ms
Total startup time including CLI: 2 684 ms

7.0.6
Total startup time including CLI: 3 484 ms
Total startup time including CLI: 2 684 ms

7.0.18
Total startup time including CLI: 3 458 ms
Total startup time including CLI: 2 689 ms

7.0.19-SNAPSHOT with #25183
Total startup time including CLI: 3 454 ms
Total startup time including CLI: 2 677 ms

@dmatej dmatej closed this as not planned Won't fix, can't repro, duplicate, stale Oct 17, 2024
@oxdrove
Copy link
Author

oxdrove commented Oct 18, 2024

Your results are meaningless unless you indicate the presence of the square dependency.

You need to perform 4 tests: with w/o square times 7.0.5 or 7.0.6.

During investigation the question has narrowed from "7.0.6 takes longer to deploy" to the more specific "Why does 7.0.6 take longer to load Square than 7.0.5".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants