You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have encountered a problem with the Operators Marketplace in CRC. However, it's not CRC specific.
When the marketplace container "community-operators-XXX" is deployed, it starts by downloading a list of packages. This is over 50MB of data downloaded in small chunks and it takes some time to download even on a decent internet connection.
While this is happening the readiness and/or liveness probes report failure.
What happens in my case is that the pods get repeatedly killed before they collected the necessary data and could have become ready. This happens several times before OpenShift gives up and the deployment stays failed.
This a very well hidden problem because the Operator (list of operators) still shows items, there's no indication of the problem except for the failed Pods in the operator-marketplace project and a lower than expected the number of items in the .....
$ oc get packagemanifests -n openshift-marketplace | wc -l
124
vs
$ oc get packagemanifests -n openshift-marketplace | wc -l
250
In my case, I solved the problem by increasing the initialDelaySeconds to 300 and the failureThreshold to 100. That gave the container enough time to download the data before it would get killed and redeployed.
However, I am not sure what is the correct place to do that.
I think that I am surely not the only person having this issue. Especially with CRC people might be testing OpenShift on not-that-great internet connectivity. The issue is well hidden. In the console (web interface), there's no indication of a problem, just that operators are missing in the list. I have noticed only because I was missing a particular operator that I wanted to work with. Also, the wording in CRC docs suggests that some things might be degraded or reporting issues due to memory limitations so it's easy to miss the problem of a failed deployment.
A temporary solution to the problem could be increasing the initialDelaySeconds etc. in the right place. Fixing it properly might involve reimplementing the initialization with the "Init Container" pattern, or changing the readiness/liveness probes or something else?
Thanks and regards!
The text was updated successfully, but these errors were encountered:
Hello!
I have encountered a problem with the Operators Marketplace in CRC. However, it's not CRC specific.
When the marketplace container "community-operators-XXX" is deployed, it starts by downloading a list of packages. This is over 50MB of data downloaded in small chunks and it takes some time to download even on a decent internet connection.
While this is happening the readiness and/or liveness probes report failure.
What happens in my case is that the pods get repeatedly killed before they collected the necessary data and could have become ready. This happens several times before OpenShift gives up and the deployment stays failed.
This a very well hidden problem because the Operator (list of operators) still shows items, there's no indication of the problem except for the failed Pods in the operator-marketplace project and a lower than expected the number of items in the .....
vs
In my case, I solved the problem by increasing the
initialDelaySeconds
to 300 and thefailureThreshold
to 100. That gave the container enough time to download the data before it would get killed and redeployed.However, I am not sure what is the correct place to do that.
I think that I am surely not the only person having this issue. Especially with CRC people might be testing OpenShift on not-that-great internet connectivity. The issue is well hidden. In the console (web interface), there's no indication of a problem, just that operators are missing in the list. I have noticed only because I was missing a particular operator that I wanted to work with. Also, the wording in CRC docs suggests that some things might be degraded or reporting issues due to memory limitations so it's easy to miss the problem of a failed deployment.
A temporary solution to the problem could be increasing the
initialDelaySeconds
etc. in the right place. Fixing it properly might involve reimplementing the initialization with the "Init Container" pattern, or changing the readiness/liveness probes or something else?Thanks and regards!
The text was updated successfully, but these errors were encountered: