performance on deployment? #105

afischer211 · 2015-06-08T08:48:38Z

I have created an empty channel and fill it sequentially (in a loop) with a Groovy-based script with normal Maven-deploy operations with many OSGi-bundles. The first artifacts are deployed quickly, but after some deploys (app. 30-40) the deployment becomes slower and slower. Especially the deployment of the (by groovy generated) pom-file or other small files is very slow (<5KB/sec)
Is this a problem of the internal pdrone-caches or the postgresql-database?
At start (empty channel) I get upload-rates of >500KB/sec, with a filled channel the upload-rate is 0,5...100KB/sec (higher for bigger files).
What are your experiences with heavy filled channels (>200 bundles)? Or is this a problem with deferred proceeding in the background (queues)? What do you recommend for memory (JVM) and CPU-cores?

ctron · 2015-06-08T16:16:08Z

I am not sure how you measure the update rate. Normally the HTTP request only returns after the artifact has been analyzed, the BLOB has been stored and the channel aggregation has been performed. So I guess the transfer rate will be much higher and the average is then lowered by the final wait period, where the channel is being processed.

If the channel gets more and more content, aggregating the channel might take more time.

I do know several spots which could be improved, performance wise. It simply is a matter of time/effort.

Most operations (like Maven and P2 access) are optimized in a way that reading is fast, while modifications might be slower.

Well, a recommendation is that the more the database can cache, the faster the access will be. 200 bundles should not be an issue. Modifications on one channel will be performed sequentially. So more CPU cores will only help in parallel channel operations (on other channels) or other read operations (web ui, repository adapters).

afischer211 · 2015-06-08T16:30:58Z

I deploy artifacts with maven-deploy (Groovy-script, which creates for every bundle within a folder a temporary pom file and deploys the artifact together with the generated pom-file). If I run the groovy-script for a folder with 20 jar-files/bundles, I can see, how the deployment becomes slow for every jar/pom file. Also jenkins-builds with a single deployment for every maven project (the target of the project) in a multi-module build needs a big amount of time for running.
It is ok to optimize at first for read-access, but what can I do for an acceptable deployment/upload speed? How do you upload multiple bundles at once into package-drone? I have automated the conversion of jar-files into OSGi-bundles (with bnd-tools) and want to upload the converted files to pdrone.
Is the PostgreSQL-database the bottleneck or the Java-code itself (aspects)?
Reading (e.g. listening of the content of a channel) is quick and ok.

ctron · 2015-06-08T16:50:07Z

Well this highly depends on your channel configuration and the operations. Some operations require a full channel rebuild which includes extracting metadata from the BLOBs. This is the slowest operation.

The normal "add" operation should be much quicker than this full channel rebuild though. In the optimal case it just processes the one artifact to add.

But it may happen that a addition triggers other operations as well. For example adding an OSGi bundle will trigger the creation of two additional artifacts if the P2 metadata artifact is added and a full channel aggregation if the P2 repository channel is added.

This requires the storage manager to load all artifact information in the end of the operation. Since with maven every file is uploaded independently, this can trigger a lot of operations and each one has to wait until the work is complete.

To be honest I don't have statistical data about performance. Right now there is no facility in package drone which measures and records performance. But this should be one of the next steps in order to decide which areas need performance improvements.

I guess one performance "bug" is loading artifacts when processing the channel. For example one step is to check generator artifacts for regeneration. Right now all artifacts are loaded from the database (without metadata though) and checked. Discarding irrelevant entries on the java code. This could be done by a proper JPA query.

So guessing again what helps most, I would say configure Postgres to allow for caching access for reading artifacts. Again, right now proper JPA queries are missing. So in most cases all channel artifacts are read.

afischer211 · 2015-06-08T17:03:56Z

Ok, so we come near to a reason for my performance problems. I use channels with P2 metadata generation and OSGi and P2 repository aspects. So I think, I run into the situation of full channel aggregation after each deployment.
I will try to give PostgreSQL as much cache as possible for speeding up Package Drone. May be it is better to reduce the heap-size for package drone (actually 6GB) and give PostgreSQL more cache size...

ctron · 2015-06-09T06:08:01Z

I am planning to make the milestone 2 for today. Including any the fix for P2. After that I will start to add some functionality for performance measurement. So that we can actually measure and don't need to guess ;-)

afischer211 · 2015-06-09T06:52:48Z

I'm happy awaiting the new release 0.10.0m2 for testing the fixes for #97 and #104 :-)

ctron · 2015-06-10T16:04:53Z

So I did add a little bit of tracing and ran the stress tests again, which deploy bundles to some channels which already have some bundles.

About 99% of the time is consumed by running the channel aggregator, which is ran twice for each maven upload (jar + pom). And about 60% of the time is required to scan for artifacts in order to aggregate.

I will dig a bit more into this, just to let you know, that the first assumption (that the database operations might be the issue) seems correct.

afischer211 · 2015-06-11T07:55:37Z

Thanks for your informations. So we can hope for some improvements (or better directions for postgresql-configuration) in the future...

ctron · 2015-06-11T09:24:19Z

Yes, absolutely. However I am not sure what the time frame for this will be.

The more I look at it, the more I think that it might be a good idea to actually cache a few parts in Package Drone itself. But such a change would require some deeper changes and I don't want to make them in the 0.10.x version.

afischer211 · 2015-06-11T10:17:57Z

I see, that you add some profiling/monitoring functionality. So I can see in the future the results of configuration-changes in PostgreSQL. This can help for optimizing PostgreSQL-settings.

maggu2810 · 2015-06-11T13:43:28Z

Just an idea, if Alexander need a workaround for.
We could add an API (somehting similar to https://github.com/ctron/package-drone/wiki/Upload-API) that could be used to

disable automatic channel aggregation
enable automatic channel aggregation
trigger channel aggregation

So, for a workaround (only), someone that deploys multiple bundles, could use this:

disable automatic channel aggregation
do normal deployment of x bundles
enable automatic channel aggregation
trigger channel aggregation

ctron · 2015-06-11T13:54:45Z

Very good idea. I would like to see some functionality like: freeze and thaw. Or suspend and resume like you described.

Actually that second operation could be scheduled automatically after some time, or requested e externally. Requesting it twice should result in a no-op and be cheap. So before starting a new build you could always request it.

However, during a build might be problematic if someone re-downloads artifacts which just git uploaded. So it should be an opt-in functionality, as you described!

maggu2810 · 2015-06-11T14:02:41Z

However, during a build might be problematic if someone re-downloads artifacts which just git uploaded. So it should be an opt-in functionality, as you described!

Yes, we should keep it as an option that someone could use. We should not break stuff by adding a new option.

I thought about the same stuff (if the build process downloads something that was uploaded by the build process. This could also be solved, but could result in a lot of work.

we could set the channel aggregation to a mode (mode: disable / enable / ...) that only "new" channel artifacts get aggregated
we could use an "overlay channel" that usage could be enabled, all changes are done on this channel and at least we could drop that overlay or integrate it in the real channel.

I think for the first shot, disable / enable / trigger would be a good choice for the effort-benefit-factor.

ctron · 2015-06-11T14:08:04Z

However, what I actually want to do is fix those performance bugs :-)

I already got one. This will delay the 0.10.x release a little bit, but bring enormous speed improvements to the system even without caching. So I think it is worth it. Another one is already being worked on as well. Hopefully Monday, since I already head out for weekend today :-)

A third, and last one for my test case night be a bit more tricky. Since it involves not only the database but also the blob store. However I have not looked in to this one. So maybe there is a simple fix for this as well.

The most important step was to add profiling. It shows were the problems are and allows fixing them.

maggu2810 · 2015-06-11T14:23:04Z

Whatever you want. ;-)

I already stated:

Just an idea, if Alexander need a workaround for.

If you could solve the performance issues without that workaround: nice ;-)

ctron · 2015-06-11T14:28:39Z

Actually I want both :-) a function like that always comes in handy at some point!

ctron · 2015-06-15T12:48:42Z

I just uploaded the release 0.10.0-m3, which contains huge performance improvements. Since it also contains some changes it might be a good idea to keep a copy of 0.10.0-m2 😉 But you can switch between m2 and m3 as you like.

afischer211 · 2015-06-15T17:14:40Z

I'm testing the new milestone for several hours, until now without new problems.
Let me see, how the performance is improved.

afischer211 · 2015-06-16T14:24:50Z

Performance seems to be better, but I have got some problems with the generated content.xml and artifacts.xml for a channel.
The listed artifacts within content.xml does not match with the content of the channel (there are artifacts with the same base-version, but newer snapshot/qualifier-part).
On refreshing all aspects of the channel I receive deadlock-exceptions (see issue #108).

ctron · 2015-06-16T16:05:09Z

OK,

I suggest you switch back. If you have any additional information I would be glad to have it.

I will have a look at it tomorrow, hopefully I can wrap it up in a test case.

afischer211 · 2015-06-18T08:28:22Z

I returned to 0.10.0M2.
I see, that the deployment of the files "pom" and "tycho-p2-artifacts" is very slow. The deployment-performance of the jar-files/bundles itself is acceptable.

ctron · 2015-06-18T08:47:26Z

Yes, this can be since the jar files are only stored temporarily first. And added to the channel at a later time when the full information is present from the maven upload.

afischer211 · 2015-06-19T13:35:51Z

On my package-drone instance 0.10.0M2 I deploy concurrently with Maven from multiple Jenkins-jobs into the same channel.
Every job hangs on deploying a file like maven-metadata.xml. After many(!) minutes some jobs continue, other sometimes fail with timeout.
Is this the same problem with generating metadatas like content.xml and artifacts.xml after adding some artifacts? May be also the cleanup aspect running directly after insert is the problem (better cleanup asynchronous in background task?).

ctron · 2015-06-19T13:58:50Z

How many artifact are in this channel? Is there any error on the console or the database log?

afischer211 · 2015-06-19T14:42:56Z

In the console-log are only exceptions about aborted connection (because the maven deployment aborts).
WARN org.eclipse.jetty.server.HttpChannel [HttpChannel.java:481] Could not send response error 500: javax.servlet.ServletException: javax.servlet.ServletException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: org.eclipse.jetty.io.EofException
Another (interesting) log-entry (sometimes occuring) is:
WARN de.dentrassi.pm.p2.servlet.P2Servlet [P2Servlet.java:141] Download plugin: null
In the postgresql-logs are no relevant entries.

In the database-table "ARTIFACTS" are 10.000 rows, provided in different channels.
I think, in the concrete channel are 500..1000 artifacts.

afischer211 · 2015-06-24T16:12:32Z

With the new version 0.10.0m6 (improved cleanup-aspect) the deployment of 52 artifacts (in summary from 1733 to 1785 artifacts in one channel) needs 6:46min. This is ok for me and a huge improvement over older versions. Thanks for the new statistics-informations!

ctron added question specify labels Jun 8, 2015

ctron mentioned this issue Jun 9, 2015

Add profiling infrastructure #107

Closed

ctron self-assigned this Jun 10, 2015

ctron added need info and removed question labels Jun 10, 2015

ctron added this to the v0.10.0-m3 milestone Jun 15, 2015

ctron added bug help wanted and removed need info specify labels Jun 15, 2015

ctron added a commit that referenced this issue Jun 24, 2015

improve the handling of the clean up aspect, #111 #105

fa14cb9

afischer211 closed this as completed Jun 24, 2015

afischer211 mentioned this issue Jun 24, 2015

high database load #111

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance on deployment? #105

performance on deployment? #105

afischer211 commented Jun 8, 2015

ctron commented Jun 8, 2015

afischer211 commented Jun 8, 2015

ctron commented Jun 8, 2015

afischer211 commented Jun 8, 2015

ctron commented Jun 9, 2015

afischer211 commented Jun 9, 2015

ctron commented Jun 10, 2015

afischer211 commented Jun 11, 2015

ctron commented Jun 11, 2015

afischer211 commented Jun 11, 2015

maggu2810 commented Jun 11, 2015

ctron commented Jun 11, 2015

maggu2810 commented Jun 11, 2015

ctron commented Jun 11, 2015

maggu2810 commented Jun 11, 2015

ctron commented Jun 11, 2015

ctron commented Jun 15, 2015

afischer211 commented Jun 15, 2015

afischer211 commented Jun 16, 2015

ctron commented Jun 16, 2015

afischer211 commented Jun 18, 2015

ctron commented Jun 18, 2015

afischer211 commented Jun 19, 2015

ctron commented Jun 19, 2015

afischer211 commented Jun 19, 2015

afischer211 commented Jun 24, 2015

performance on deployment? #105

performance on deployment? #105

Comments

afischer211 commented Jun 8, 2015

ctron commented Jun 8, 2015

afischer211 commented Jun 8, 2015

ctron commented Jun 8, 2015

afischer211 commented Jun 8, 2015

ctron commented Jun 9, 2015

afischer211 commented Jun 9, 2015

ctron commented Jun 10, 2015

afischer211 commented Jun 11, 2015

ctron commented Jun 11, 2015

afischer211 commented Jun 11, 2015

maggu2810 commented Jun 11, 2015

ctron commented Jun 11, 2015

maggu2810 commented Jun 11, 2015

ctron commented Jun 11, 2015

maggu2810 commented Jun 11, 2015

ctron commented Jun 11, 2015

ctron commented Jun 15, 2015

afischer211 commented Jun 15, 2015

afischer211 commented Jun 16, 2015

ctron commented Jun 16, 2015

afischer211 commented Jun 18, 2015

ctron commented Jun 18, 2015

afischer211 commented Jun 19, 2015

ctron commented Jun 19, 2015

afischer211 commented Jun 19, 2015

afischer211 commented Jun 24, 2015