-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
delete pipeline in registry #12414
delete pipeline in registry #12414
Conversation
@@ -76,7 +76,6 @@ def put(pipeline_id, state) | |||
def remove(pipeline_id) | |||
@lock.synchronize do | |||
@states.delete(pipeline_id) | |||
@locks.delete(pipeline_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please review the locking mechanism.
the idea is to keep locks[pipeline_id]
forever for create
, reload
, stop
and delete
to stay mutually exclusive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO @locks.delete(pipeline_id)
need to stay there because keeping @locks[pipeline_id]
does not make sense if not also keeping @states[pipeline_id]
. Once a pipeline is removed, it's pipeline_id
should not exist anymore in the registry at all.
I am not sure I understand in which condition it would be useful to keep it around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread A, B and C want to update @State of pipeline_id 1
A | B | C | |
---|---|---|---|
t1 | lock_1 = get_lock(pid: 1) | ||
t2 | lock_1 = get_lock(pid: 1) | ||
t3 | lock_1.lock | ||
t4 | update @State | ||
t5 | remove @locks[pid: 1] | ||
t6 | lock_1.unlock | ||
t7 | lock1.lock | ||
t8 | update @State | lock_1_new = get_lock(pid: 1) | |
t9 | lock_1_new.lock | ||
t10 | update @State |
A removes lock in @locks. B holds the old lock
C gets a new lock for pipeline_id 1 as A removed it. B and C have the right to update @State of pipeline_id 1
I think the purpose of @locks is to ensure only one thread can edit the same pipeline_id state simultaneously.
If @locks keeps the lock, it can keep the integrity of action A,B,C
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha! 💡 I see what you mean. I think you are right here. Let me go over this a bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So yes +1 on this change per your reasoning above. The downside here is a memory leak for removed pipelines that will never be recreated but practically speaking the potential for this becoming a problem is extremely low.
I guess that the alternative would be to rethink/refactor the locking logic but that does not seem necessary.
I would probably add a comment about this and maybe link it to this discussion here so that if this code is revisited in the future it is more explicit in the code.
@@ -41,14 +41,19 @@ def resolve(pipelines_registry, pipeline_configs) | |||
end | |||
end | |||
|
|||
configured_pipelines = pipeline_configs.map { |config| config.pipeline_id.to_sym } | |||
configured_pipelines = pipeline_configs.map { |config| config.pipeline_id.to_sym }.to_set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming to_set
here is to dedup values but under which condition are there duplicates in this collection??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the set
is for O(1) configured_pipelines.include?
. configured_pipelines
was a Array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, ok. But TBH given the small number of pipelines we are dealing with this extra to_set
conversion is practically & possibly more costly than the actual include?
iterations.
To push that micro optimization further I would try to avoid the Array -> Array -> Set and directly produce a Hash
or Set
using inject
or each_with_object
?
Something like?
configured_pipelines = pipeline_configs.each_with_object(Set.new) { |config, set| set.add(config.pipeline_id.to_sym) }
configured_pipelines = pipeline_configs.map { |config| config.pipeline_id.to_sym } | ||
configured_pipelines = pipeline_configs.each_with_object(Set.new) { |config, set| | ||
set.add(config.pipeline_id.to_sym) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: we try to use brackets notation for blocks {|...| } for one-liners otherwise use do |...| end when multiline.
Great stuff @kaisecheng! 🚀 |
deletes the pipeline in the pipelines_registry if it is terminated and is removed in the source Fixed: elastic#12414
deletes the pipeline in the pipelines_registry if it is terminated and is removed in the source Fixed: elastic#12414
deletes the pipeline in the pipelines_registry if it is terminated and is removed in the source Fixed: #12414
deletes the pipeline in the pipelines_registry if it is terminated and is removed in the source Fixed: elastic#12414
deletes the pipeline in the pipelines_registry if it is terminated and is removed in the source Fixed: #12414
Users cannot delete a pipeline and recreate it with the same configuration string in Kibana. Logstash store all pipelines in pipeline_registry and never remove them. Logstash takes actions by comparing the source and registry. It causes difficulties to distinguish if a pipeline is finished or be terminated due to removal in the source(elasticsearch).
The flow of the issue
This PR deletes the pipeline in the pipelines_registry if it is terminated and is removed in the source.
related issue
upstream #12387