Skip to content

Commit

Permalink
Merge pull request #532 from jsnoble/worker_disconnect_timeout
Browse files Browse the repository at this point in the history
Worker disconnect timeout
  • Loading branch information
godber authored Sep 9, 2017
2 parents 583fcb1 + 10d9bdc commit 8b0028f
Show file tree
Hide file tree
Showing 6 changed files with 33 additions and 12 deletions.
3 changes: 2 additions & 1 deletion docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ The configuration file essentially has two main fields, configuration for terasl
|:---------: | :--------: | :------: | :------:
ops_directory | 'path/to/directory', to look for more readers and processors. Usually this is where you place your custom code not part of core, unless you want to leave your code in place. The directory should have a "readers" and "processors" folder mirroring teraslice| String | optional
assets_directory | 'path/to/directory', to look for more custom readers and processors. Usually this is where you place your custom code not part of core, unless you want to leave your code in place. | String | optional
network_timeout | time in milliseconds to wait for a response when messaging node to node before throwing an error | Number | optional, defaults to 60000 ms
worker_disconnect_timeout | time in milliseconds that the slicer will wait after all workers have disconnected before terminating the job | Number | optional, defaults to 300000 ms or 5 minutes
shutdown_timeout | time in milliseconds, to allow workers and slicers to finish operations before forcefully shutting down when a shutdown signal occurs| Number | optional, defaults to 60 seconds (60000 ms)
hostname | IP or hostname for server | String | required, this is used to identify your nodes
workers | This represents the maximum number of workers that is node is permitted to make, must be set to a number greater than zero. This is currently hard set, and to change this number it must require a reboot and configuration change | Number | optional, defaults to the amount of cpu cores your system is running on
Expand All @@ -61,7 +63,6 @@ master_hostname | hostname where the cluster_master resides, used to notify all
port | port for the cluster_master to listen on, this is the port that is exposed externally for the api | Number | optional, defaults to 5678
name | Name for the cluster itself, its used for naming log files/indices | String | defaults to 'teracluster',
state | Elasticsearch cluster where job state, analytics and logs are stored | Object | optional, defaults to {connection: 'default'},
timeout | time in milliseconds to wait for a response when messaging node to node before throwing an error | Number | optional, defaults to 60000 ms
slicer_port_range | range of ports that slicers will use per node | String | optional, defaults to range: '45678:46678'

### terafoundation
Expand Down
2 changes: 1 addition & 1 deletion lib/cluster/node_master.js
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,7 @@ module.exports = function(context) {

messaging.register('cluster:job:stop', function(data) {
var slicerFound = false;
var stopTime = config.timeout;
var stopTime = config.network_timeout;
var intervalTime = 200;

messageWorkers(context, data, {message: 'worker:shutdown'}, function(worker) {
Expand Down
2 changes: 1 addition & 1 deletion lib/cluster/services/api.js
Original file line number Diff line number Diff line change
Expand Up @@ -664,7 +664,7 @@ module.exports = function(context, app, services) {

timer = setTimeout(function() {
reject({message: 'Timeout has occurred for query', code: 500})
}, context.sysconfig.teraslice.timeout)
}, context.sysconfig.teraslice.network_timeout)
});
}

Expand Down
2 changes: 1 addition & 1 deletion lib/cluster/services/cluster.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ module.exports = function(context, server) {
var messaging = context.messaging;
var events = context.foundation.getEventEmitter();
var logger = context.foundation.makeLogger('cluster', 'cluster', {module: 'cluster_service'});
var configTimeout = context.sysconfig.teraslice.timeout;
var configTimeout = context.sysconfig.teraslice.network_timeout;
var pendingWorkerRequests = new Queue();
var moderator = null;
var cluster_state = {};
Expand Down
16 changes: 11 additions & 5 deletions lib/cluster/slicer.js
Original file line number Diff line number Diff line change
Expand Up @@ -226,11 +226,17 @@ module.exports = function(context) {
//only call if workers have connected before, and there are none left
if (!isShuttingDown && workerFound && messaging.getClientCounts() === 0) {
//TODO this needs a refactor for when slicer controls ex state
messaging.send({
message: 'slicer:error:terminal',
error: `all workers from slicer #${ex_id} have disconnected`,
ex_id: ex_id
})
setTimeout(function() {
//if after a a set time there are still no workers, it will shutdown
if (messaging.getClientCounts() === 0) {
messaging.send({
message: 'slicer:error:terminal',
error: `all workers from slicer #${ex_id} have disconnected`,
ex_id: ex_id
})
}
}, context.sysconfig.teraslice.worker_disconnect_timeout);

}
});

Expand Down
20 changes: 17 additions & 3 deletions lib/config/schemas/system.js
Original file line number Diff line number Diff line change
Expand Up @@ -94,16 +94,16 @@ var schema = {
}
}
},
timeout: {
network_timeout: {
doc: 'time in milliseconds for waiting for a response when messaging node_master before throwing an error',
default: 300000,
format: function(val) {
if (isNaN(val)) {
throw new Error('timeout parameter for teraslice must be a number')
throw new Error('network_timeout parameter for teraslice must be a number')
}
else {
if (val <= 0) {
throw new Error('timeout parameter for teraslice must be greater than zero')
throw new Error('network_timeout parameter for teraslice must be greater than zero')
}
}
}
Expand All @@ -122,6 +122,20 @@ var schema = {
}
}
},
worker_disconnect_timeout: {
doc: 'time in milliseconds that the slicer will wait after all workers have disconnected before terminating the job',
default: 300000,
format: function(val) {
if (isNaN(val)) {
throw new Error('worker_disconnect_timeout parameter for teraslice must be a number')
}
else {
if (val <= 0) {
throw new Error('worker_disconnect_timeout parameter for teraslice must be greater than zero')
}
}
}
},
slicer_port_range: {
doc: 'range of ports that slicers will use per node',
default: '45679:46678',
Expand Down

0 comments on commit 8b0028f

Please sign in to comment.