Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change naming for timeout settings #20

Closed
ihostage opened this issue Mar 18, 2020 · 17 comments · Fixed by #48
Closed

Change naming for timeout settings #20

ihostage opened this issue Mar 18, 2020 · 17 comments · Fixed by #48
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@ihostage
Copy link

The naming is misleading.
RetryingConfig + maxRetries means the plugin would retry publication multiple times, however, in practice it means it would "just wait the specified number of times while the repository is in progress".

Do you think it would be better to use timeout-like configuration properties?

For instance connectToNexusTimeout, initializeRepositoryTimeout, singleArtifactPublishTimeout, closeRepositoryTimeout, etc, etc.

Note: the users do not care what the number of "status check" attempts is there. What they care is overall timeout value, and they might want to have something like initial delay between status check attempts.

WDYT?

Original comment from @vlsi in closed #18 .

I think that it's really nice idea and I don't want it forgot in closed PR.
My current timeout settings look like:

nexusPublishing {
    ...
    clientTimeout.set(Duration.parse("PT10M")) // 10 minutes
}
nexusStaging {
    ...
    numberOfRetries = 360 // 1 hour if 10 seconds delay
    delayBetweenRetriesInMillis = 10000 // 10 seconds
}

But really I just want to configure, that I'm ready to wait 1 hour for publishing staging repository.
Something like a configuration timeout for publishing artifacts to the staging repository in nexusPublishing block.

@szpak
Copy link
Contributor

szpak commented Mar 18, 2020

Thanks, I also planned to create a separate issue as I did with #19.

Currently we have global clientTimeout and connectTimeout and per repository/server retrying {} with maxRetries and delayBetween. We could try to use callTimeout to (try to) interrupt initializeRepository, but it could be tricky (#17) and somehow implementation specific.
singleArtifactPublishTimeout on the other hand is handled by Gradle's publish task - I'm not sure if we should cover this.

We could - just an outline for further discussion:

  1. Extract timeouts to a separate subconfiguration:
nexusPublishing {
    timeout {
        connectTimeout: Duration
        readWriteTimeout: Duration //currently clientTimeout
        initializeRepositoryTimeout: Duration
        transitRepositoryTimeout: Duration
    }
}

transitRepositoryTimeout is a timeout for close and separate timeout for release - the don't summarize. The name isn't self-explanatory. Maybe we would like to have two separate closeRepositoryTimeout and releaseRepositoryTimeout just to make user like easier?

  1. The timeout block/closure would be at the top configuration level - global settings which could be overridden by per repository/server settings (if needed or requested by some of the users - @visi? ;-) ).

It may look even feasible, but under the hood there is a need to override delayBetweenRetrying e.g. for tests. People might also want to increase it to do not "kill" their overloaded server. That complicates the configuration parameters (for users) and also the implementation itself. But maybe you have som good ideas how to overcome that?

@vlsi
Copy link
Contributor

vlsi commented Mar 18, 2020

transitRepositoryTimeout The name isn't self-explanatory

I'm afraid the meaning is not really clear from the name.

closeRepositoryTimeout and releaseRepositoryTimeout

Those names are clear. However, having a common one might help as well (e.g. defaultNexusOperationTimeout ?)

It may look even feasible, but under the hood there is a need to override delayBetweenRetrying e.g. for tests

Technically speaking, there are libraries to augment time behavior. So you don't need to rely on sleep(100ms).
I've collected a couple of links here: pgjdbc/pgjdbc#1065

People might also want to increase it to do not "kill" their overloaded server

Technically speaking, backoff should quickly climb to the sane maximum delays.
In practice, it is wrong to have a setting like delayBetweenRetrying because, well, what does it mean for the case of exponential backoff (which should be used anyway) ?

I perfectly agree there might be initialDelay and probably maximumDelay. However, they might be configured by default to something like 1s and 30s which should not overload the server.

@szpak
Copy link
Contributor

szpak commented Mar 22, 2020

Those names are clear. However, having a common one might help as well (e.g. defaultNexusOperationTimeout ?)

That name isn't precise. It is only for transition operations, not all operation (currently just two of them).

Technically speaking, there are libraries to augment time behavior. So you don't need to rely on sleep(100ms).
I've collected a couple of links here: pgjdbc/pgjdbc#1065

Hmm, I'm not sure if I understood it correctly, but in the context of closing/releasing repositories in Nexus there is X tries which every can take Y seconds + some delay between them and I'm not sure if it the main use case for - as example - fluxcapacitor.

Currently we use failsafe which is quite configurable. Nevertheless, I am not sure how the configuration of GNPP should look like to keep it both simple, powerful and fitting our needs :). Could you make any proposal?

@vlsi
Copy link
Contributor

vlsi commented Mar 22, 2020

It is only for transition operations

What do you mean by transition operations?

I'm not sure if it the main use case for - as example - fluxcapacitor.

If you want to ensure that GNPP can publish artifacts even if Nexus takes 20 minutes, then you could use something like fluxcapacitor to emulate 20minutes without altering the timeout settings.

I am not sure how the configuration of GNPP should look like to keep it both simple

See #20 (comment)

There should be end-to-end timeout configuration.
Then you could expose various backoff strategies, however, exposing just one of them is probably enough (e.g. withBackoff(initial, maximum, unit) )

@szpak
Copy link
Contributor

szpak commented Mar 22, 2020

What do you mean by transition operations?

initializeStagingRepository is a simple call - request, waiting, response, created (or not). On the other hand close and release are more complex - request, waiting (just a little bit), accepted (or not). If accepted, it enters the in transition (or transitioning) state on the Nexus side. It means it transits from open to closed or closed to released (or dropped). The plugin needs to ask it regularly (using another get request) to get know when the transition is finished.

e2e timeout could be useful in some cases, but some people (not to mention tests) would still like to be able to tweak e.g. delay between get requests to get know if transition has finished. And I am not sure to express it in the plugin configuration in a simple, yet flexible configuration.

@vlsi
Copy link
Contributor

vlsi commented Mar 22, 2020

And I am not sure to express it in the plugin configuration in a simple, yet flexible configuration.

Would you please expose a total timeout first?

Every use case needs to configure it.
For instance, I know that for my personal needs I'm ok to wait for 15-20 minutes.
I don't care of the low-level timeouts.

would still like to be able to tweak e.g. delay between get requests

Nobody really needs that. Really. Tweaking timeouts have no business value.

to get know if transition has finished

If you mean "progress indication" feature, then you should not abuse "timeout between retries" as "progress indicator".

And I am not sure to express it in the plugin configuration in a simple, yet flexible configuration.

Expose e2e timeouts to the end-user explicitly.
Use sane values for low-level timeout defaults (e.g. set connectTimeout to half of the e2e timeout).

@szpak
Copy link
Contributor

szpak commented Mar 22, 2020

Would you please expose a total timeout first?

There is a per task timeout available in Gradle itself. Could it be used?

I haven't found a global timeout for Gradle execution (all the tasks). @marcphilipp do you know if it can be simply achieved?

If you mean "progress indication" feature, then you should not abuse "timeout between retries" as "progress indicator".

Progress is important to get know that you usually release in the attempt 39/40 to increase it. The same you can observe (with --info) that in 40/40 your repo was in still open state and transitioning (much more time to finish is usually needed), not even in close and transitioning (when shorter time is required). Without that it's hit and miss to set proper value for timeout with just global timeout and no debug information like that. It's the same as with awaitility - you can call sleep(60_000) in your test or check ever 500ms and in most of the cases finish after 2 seconds, not 60.

Expose e2e timeouts to the end-user explicitly.

In e2e (or mocked, but with Gradle TestKit) to verify that the plugin tries more than one time to check the status of closing stating repository that value delay value to set to near-zero is quite important.

@szpak
Copy link
Contributor

szpak commented Mar 22, 2020

Would you please expose a total timeout first?

There is a per task timeout available in Gradle itself. Could it be used?

I haven't found a global timeout for Gradle execution (all the tasks). @marcphilipp do you know if it can be simply achieved?

@vlsi On Linux you can use timeout utility:

timeout ./gradlew releaseMyArtifacts

Wouldn't it be sufficient for you?

@vlsi
Copy link
Contributor

vlsi commented Mar 22, 2020

Progress is important to get know

This is relevant: gradle/gradle#3654

There's internal class ProgressLogger which is used to show unit test progress as the tests are executed.

the attempt 39/40 to increase it

Is 39/40 any different from 399/400?
What I mean is "status check attempts" have zero business value on their own.
The business value comes from "waiting for 20min for the operation to complete". Nobody ever cares if the plugin performs 10 or 20 attempts. The important thing is that it tries at least for 20 minutes (or whatever is configured).

Wouldn't it be sufficient for you?

Do you mean you are going to remove all the timeouts from the plugin?
What if I use Windows or macOS?

@szpak
Copy link
Contributor

szpak commented Mar 22, 2020

Nobody ever cares if the plugin performs 10 or 20 attempts.

Some organizations cares if it means to have something releasing and "in production" in 10 minutes or in 20, since the commit has been made (vide my example with awaitility and senseless - too long - idle waiting).

What if I use Windows or macOS?

Most of the people using CI servers - where the timeout seems to be more important - release from Linux. You can still have per timeout set timeout, in Gradle. I don't know if there is an easy way to achieve global timeout in Gradle itself.

@szpak
Copy link
Contributor

szpak commented Mar 22, 2020

@vlsi To summarize the things.

  1. Global timeout - total Gradle execution time - is fine if there is a sensible way to implement it in Gradle. I don't think it would be good to handle it in the plugin - also for all not plugin releated tasks.
  2. Per task timeout (in Gradle) could be useful for cases where close or release shouldn't take no more than X minutes.
  3. I don't have good idea how to design the configuration to be simple and flexible to handle more specific timeouts

I personally still would like to have an (optional) control over delay time. We could have "total" release execution time and (optional delay time, set to some value by default). Retries would happen until that time is not exceeded. However, it would be soft limit - total execution would be slightly longer - hard limit should be achieved with the aforementioned per task timoeut feature in Gradle. Would that be sufficient for you. If - by any chance - yes, please propose names and other configuration elements.

@vlsi
Copy link
Contributor

vlsi commented Mar 22, 2020

Global timeout - total Gradle execution time - is fine if there is a sensible way to implement it in Gradle

Just in case: I don't need GNPP to implement global timeouts across full Gradle execution.
What I mean by "global" is per operation.

Sorry if that was not clear initially, but I thought it is obvious the plugin can't manage the timeout for the full Gradle build :-/

The operations here are like initializeStagingRepo, closeStagingRepo, releaseStagingRepo.
Of course, the total Gradle execution time might be very different depending on whatever reasons (e.g. parallelism)

Per task timeout (in Gradle) could be useful for cases where close or release shouldn't take no more than X minutes.

plus initialize

However, it would be soft limit

Hard limits are possible only in hard real-time systems.
Of course, all the timeouts here will be soft.

please propose names and other configuration elements.

WDYT of the following?

defaultNexusOperationTimeout. Default to 20min
initializeRepositoryTimeout. Default to defaultNexusOperationTimeout
closeRepositoryTimeout. Default to defaultNexusOperationTimeout
releaseRepositoryTimeout. Default to defaultNexusOperationTimeout

connectTimeout. Default to defaultNexusOperationTimeout/2. This is a TCP connect timeout for all the operations.
networkReadTimeout. Default to defaultNexusOperationTimeout/2. This is a TCP single read timeout for all the operations.

status checks (e.g. pass it to withBackoff(initial, maximum, unit) )
initialStatusCheckInterval. Default to 2sec.
maxStatusCheckInterval. Default to 60sec.

@vlsi
Copy link
Contributor

vlsi commented Mar 22, 2020

I personally still would like to have an (optional) control over delay time

It is fine if you add the control as extra option.
It is not fine if RetryingConfig + maxRetries is the only possible way to configure close repository timeout.

@szpak
Copy link
Contributor

szpak commented Mar 25, 2020

initializeRepositoryTimeout. Default to defaultNexusOperationTimeout

There is no retrying with init. It's just a call to Nexus. Once finished the repo should be created (or there was an error). initializeRepositoryTimeout would need to be applied on the task itself - using the Gradle task timeout mechanism. I don't know what would be taken into account having both timeout (on task) set by the plugin and timeout (on task) set by the user in build.gradle.

Otherwise, it would be needed to just use connectTimeout and networkReadTimeout in the underlying network client to achieve that. However, then how to split that value between connect and read timeouts?

connectTimeout. Default to defaultNexusOperationTimeout/2. This is a TCP connect timeout for all the operations.

For me it is very long. 10 minutes waiting for connect? Probably even with Sonatype Nexus it would be too much. Read for init, on the other hand, can take some time (for Sonatype).

@vlsi
Copy link
Contributor

vlsi commented Mar 25, 2020

For me it is very long. 10 minutes waiting for connect?

Well, it could be max(2min, defaultNexusOperationTimeout/2)

@marcphilipp
Copy link
Member

Sorry for joining the discussion late.

defaultNexusOperationTimeout. Default to 20min
initializeRepositoryTimeout. Default to defaultNexusOperationTimeout
closeRepositoryTimeout. Default to defaultNexusOperationTimeout
releaseRepositoryTimeout. Default to defaultNexusOperationTimeout

Would these just set the Gradle's task timeout?

@marcphilipp
Copy link
Member

I think RetryingConfig should not be a property of NexusRepository since it does not describe the repository but the config for an operation on the repo. I think we should rename it to CheckConfig and make it a (internal?) property of the close/release tasks. If we configure Gradle's timeout on all our tasks, I don't think we need to make the check interval/max attempts etc. configurable. I think by default checking every few seconds (I don't see a use case for an exponential backoff here) until the timeout is reached should be sufficient from a user's point of view. For that to work, we need to make sure we handle interrupts correctly (cf. https://jodah.net/failsafe/execution-cancellation/#handling-interruption). With that in place we could use unlimited "retries".

I think we should keep the current connectTimeout and just set it to a duration that works with Sonatype's staging repo (2min?). We currently have a clientTimeout that is used as both readTimeout and writeTimeout when configuring OkHttp. I think it would probably be clearer to split it and set both timeouts to the same value as connectTimeout.

@szpak szpak added the enhancement New feature or request label May 25, 2020
@marcphilipp marcphilipp added this to the 0.1.0 milestone May 31, 2020
@marcphilipp marcphilipp self-assigned this Dec 13, 2020
marcphilipp added a commit that referenced this issue Feb 6, 2021
To avoid confusion since publishing is not retried, only checking the
repo's status when transitioning `RetryingConfig` is moved from
`NexusRepository` to `NexusPublishExtension` and renamed to
`TransitionCheckOptions`. The options specified in the extension are by
default applied to all close and release tasks but can be overridden on
each task, if need be.

Resolves #20.
marcphilipp added a commit that referenced this issue Feb 6, 2021
To avoid confusion since publishing is not retried, only checking the
repo's status when transitioning `RetryingConfig` is moved from
`NexusRepository` to `NexusPublishExtension` and renamed to
`TransitionCheckOptions`. The options specified in the extension are by
default applied to all close and release tasks but can be overridden on
each task, if need be.

Resolves #20.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants