-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DATAGO-57805: Add command line functionality #153
Conversation
…ocessing in failure scenarios
if (args.length > 2) { | ||
|
||
String type = args[0]; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, only handle the scan command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment for next ticket, let's change this var to commandType
or command
or something else.
log.error("Scan request [{}]: Scan did not complete successfully.", scanId); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It completed successfully if there are dataCollectionFileEntitites for each scan type, and none of them are marked as "FAILED"
AtomicBoolean scanCompleted = new AtomicBoolean(false); | ||
|
||
// The hard timeout is set to 30 minutes to prevent the scan from running forever | ||
long hardTimeoutMillis = System.currentTimeMillis() + 30 * 60 * 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the "paranoia" timeout just so the EMA doesn't run forever. It's currently set to 30 minutes since large configs can take a long time to scan.
|
||
// The initiated timout is set to 2 minutes to shortcut a scan that is stuck in the INITIATED state | ||
// This typically occurs if there is an error in the broker URL or credentials | ||
long initiatedTimeoutMillis = System.currentTimeMillis() + 2 * 60 * 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This timeout is triggered if all of the scans are in "INITIATED" state for more than 2 minutes. It's meant to catch cases where we had trouble creating the client or running the command but the exception was not captured.
String scanId = (String) exchange.getIn().getHeader(RouteConstants.SCAN_ID); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pulled this code into a shared abstract class called RouteStateProcessor
that's used by RouteCompleteProcessor
and RouteFailedProcessor
.
this.idGenerator = idGenerator; | ||
} | ||
|
||
@Transactional | ||
public ScanStatusEntity save(String name, String scanId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the ability to set any state instead of hardcoding to COMPLETE
.
@@ -39,17 +40,17 @@ public class KafkaClientManagerImpl implements MessagingServiceClientManager<Adm | |||
private final TimeUnit maxReconnectBackoffTimeUnit; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SaveActions plugin re-wrote this...
@@ -0,0 +1,19 @@ | |||
package com.solace.maas.ep.event.management.agent.plugin.exception; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new Generic Plugin Exception that Camel will look for to know when a scan has failed.
@@ -40,7 +41,8 @@ protected AbstractRouteBuilder(ScanTypeDescendentsProcessor scanTypeDescendentsP | |||
public void configure() throws Exception { | |||
|
|||
// NetworkClient Selectable network i/o exception + Kafka Future exceptions | |||
onException(IOException.class, InterruptedException.class, ExecutionException.class, TimeoutException.class) | |||
onException(IOException.class, InterruptedException.class, ExecutionException.class, TimeoutException.class, | |||
PluginClientException.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the PluginClientException
that any plugin can throw. This signals to camel that an error has occurred and Camel can set the scan type status to FAILED
.
@@ -59,6 +61,7 @@ public void configure() throws Exception { | |||
RouteConstants.TRACE_ID + "}]: The status of [${header." + | |||
RouteConstants.SCAN_TYPE + "}]" + " is: [${header." + RouteConstants.SCAN_STATUS + "}].") | |||
.to("direct:perRouteScanStatusPublisher?block=false&failIfNoConsumers=false") | |||
.to("direct:processScanStatusAsFailed?block=false") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New route to set the scan status to failed for this type.
.../application/src/main/java/com/solace/maas/ep/event/management/agent/cli/EmaCommandLine.java
Outdated
Show resolved
Hide resolved
long initiatedTimeoutMillis = System.currentTimeMillis() + 2 * 60 * 1000; | ||
|
||
while (System.currentTimeMillis() < hardTimeoutMillis && !scanCompleted.get()) { | ||
Thread.sleep(2000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this Thread.sleep increases CPU usage by a lot. Can we use Timer and TimerTask instead to run a scanComplete or hardTimeoutMillis check every 2 minutes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm guessing wrong on the cpu consumption, let's keep the code as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything I've read online says that the thread is released and does not consume CPU for a .sleep and .wait() so out of simplicity I'd like to keep it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved with non-blocking comments.
What is the purpose of this change?
The existing EMA user interface is too difficult. The ask is to add a command runner which takes arguments.
How was this change implemented?
Built on top of Julian's excellent POC. Added a new command runner that can be accessed by disabling tomcat using spring command line attributes.
Made the error handling more robust by introducing the new PluginClientException that camel can use to tell when the scan has failed and report an error.
How was this change tested?
Tested manually from the command line. Also regressed the existing standalone and cloud-connected scans to make sure nothing was broken.
Is there anything the reviewers should focus on/be aware of?
No