Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete pipeline in registry #12414

Merged

Conversation

kaisecheng
Copy link
Contributor

@kaisecheng kaisecheng commented Nov 4, 2020

Users cannot delete a pipeline and recreate it with the same configuration string in Kibana. Logstash store all pipelines in pipeline_registry and never remove them. Logstash takes actions by comparing the source and registry. It causes difficulties to distinguish if a pipeline is finished or be terminated due to removal in the source(elasticsearch).

The flow of the issue

  1. Users deleted a pipeline in Kibana
  2. Logstash stops the pipeline instead of deleting it
  3. Users recreate the pipeline with the same setting in Kibana
  4. Logstash compares the hash value of the new pipeline with the old pipeline that stays in the registry
  5. The hash value is the same. Logstash keeps the old pipeline in terminated status. Users cannot see the pipeline running

This PR deletes the pipeline in the pipelines_registry if it is terminated and is removed in the source.

related issue
upstream #12387

@@ -76,7 +76,6 @@ def put(pipeline_id, state)
def remove(pipeline_id)
@lock.synchronize do
@states.delete(pipeline_id)
@locks.delete(pipeline_id)
Copy link
Contributor Author

@kaisecheng kaisecheng Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please review the locking mechanism.
the idea is to keep locks[pipeline_id] forever for create, reload, stop and delete to stay mutually exclusive

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO @locks.delete(pipeline_id) need to stay there because keeping @locks[pipeline_id] does not make sense if not also keeping @states[pipeline_id]. Once a pipeline is removed, it's pipeline_id should not exist anymore in the registry at all.
I am not sure I understand in which condition it would be useful to keep it around?

Copy link
Contributor Author

@kaisecheng kaisecheng Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread A, B and C want to update @State of pipeline_id 1

A B C
t1 lock_1 = get_lock(pid: 1)
t2 lock_1 = get_lock(pid: 1)
t3 lock_1.lock
t4 update @State
t5 remove @locks[pid: 1]
t6 lock_1.unlock
t7 lock1.lock
t8 update @State lock_1_new = get_lock(pid: 1)
t9 lock_1_new.lock
t10 update @State

A removes lock in @locks. B holds the old lock
C gets a new lock for pipeline_id 1 as A removed it. B and C have the right to update @State of pipeline_id 1

I think the purpose of @locks is to ensure only one thread can edit the same pipeline_id state simultaneously.
If @locks keeps the lock, it can keep the integrity of action A,B,C

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! 💡 I see what you mean. I think you are right here. Let me go over this a bit more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yes +1 on this change per your reasoning above. The downside here is a memory leak for removed pipelines that will never be recreated but practically speaking the potential for this becoming a problem is extremely low.
I guess that the alternative would be to rethink/refactor the locking logic but that does not seem necessary.

I would probably add a comment about this and maybe link it to this discussion here so that if this code is revisited in the future it is more explicit in the code.

@kaisecheng kaisecheng changed the title [WIP] delete pipeline in registry delete pipeline in registry Nov 5, 2020
@@ -41,14 +41,19 @@ def resolve(pipelines_registry, pipeline_configs)
end
end

configured_pipelines = pipeline_configs.map { |config| config.pipeline_id.to_sym }
configured_pipelines = pipeline_configs.map { |config| config.pipeline_id.to_sym }.to_set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming to_set here is to dedup values but under which condition are there duplicates in this collection??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the set is for O(1) configured_pipelines.include?. configured_pipelines was a Array

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, ok. But TBH given the small number of pipelines we are dealing with this extra to_set conversion is practically & possibly more costly than the actual include? iterations.

To push that micro optimization further I would try to avoid the Array -> Array -> Set and directly produce a Hash or Set using inject or each_with_object ?

Something like?

configured_pipelines = pipeline_configs.each_with_object(Set.new) { |config, set| set.add(config.pipeline_id.to_sym) }

configured_pipelines = pipeline_configs.map { |config| config.pipeline_id.to_sym }
configured_pipelines = pipeline_configs.each_with_object(Set.new) { |config, set|
set.add(config.pipeline_id.to_sym)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: we try to use brackets notation for blocks {|...| } for one-liners otherwise use do |...| end when multiline.

@colinsurprenant
Copy link
Contributor

Great stuff @kaisecheng! 🚀
Overall LGTM, left a minor style comment. Feel free to squash & merge.

@kaisecheng kaisecheng merged commit 244a9f4 into elastic:master Nov 6, 2020
kaisecheng added a commit to kaisecheng/logstash that referenced this pull request Nov 6, 2020
deletes the pipeline in the pipelines_registry if it is terminated and is removed in the source

Fixed: elastic#12414
kaisecheng added a commit to kaisecheng/logstash that referenced this pull request Nov 6, 2020
deletes the pipeline in the pipelines_registry if it is terminated and is removed in the source

Fixed: elastic#12414
kaisecheng added a commit that referenced this pull request Nov 9, 2020
deletes the pipeline in the pipelines_registry if it is terminated and is removed in the source

Fixed: #12414
kaisecheng added a commit to kaisecheng/logstash that referenced this pull request Nov 11, 2020
deletes the pipeline in the pipelines_registry if it is terminated and is removed in the source

Fixed: elastic#12414
elasticsearch-bot pushed a commit that referenced this pull request Nov 11, 2020
deletes the pipeline in the pipelines_registry if it is terminated and is removed in the source

Fixed: #12414
kares added a commit that referenced this pull request Dec 1, 2020
andsel pushed a commit that referenced this pull request Dec 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants