Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved failure handling fully remove multierr.Errors system test for testing connection errors bugfix issue 350 View.Get fails on reconnecting view bugfix processor freeze on error (334) #361

Merged
merged 1 commit into from
Nov 11, 2021

Conversation

frairon
Copy link
Contributor

@frairon frairon commented Nov 7, 2021

Improve system tests

  • replace flags-based trigger by env variable
  • create proxy to test kafka-connection errors
  • test processor disconnect
  • test view reconnect

improve error handling

  • removing multierr.Error cosequently
  • removing complexity when returning errors (e.g. removed some errs.Collect in defered functions)
  • bugfix issue View.Get fails on reconnecting view #350 View.Get fails on reconnecting view
  • proper shutdown of processor on connection issues.

@@ -425,6 +430,12 @@ func (v *View) Evict(key string) error {

// Recovered returns true when the view has caught up with events from kafka.
func (v *View) Recovered() bool {
// no partitions --> never recover
// Otherwise we might mask errors of initializing the view
if len(v.partitions) == 0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible bugfix: if views were empty due to errors, they might have appeared to be recovered.

@@ -331,6 +331,11 @@ func (v *View) Topic() string {
// Get can be called by multiple goroutines concurrently.
// Get can only be called after Recovered returns true.
func (v *View) Get(key string) (interface{}, error) {

if v.state.IsState(State(ViewStateIdle)) || v.state.IsState(State(ViewStateInitializing)) {
Copy link
Contributor Author

@frairon frairon Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functional change: block attempts to read if the view isn't ready yet.

@@ -711,7 +673,7 @@ func (p *PartitionTable) IteratorWithRange(start []byte, limit []byte) (storage.

func (p *PartitionTable) readyToRead() error {
pstate := p.CurrentState()
if pstate != PartitionRunning {
if pstate < PartitionConnecting {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually fixes bug #350

partition_table.go Outdated Show resolved Hide resolved
partition_table.go Show resolved Hide resolved
processor.go Show resolved Hide resolved
select {
case <-ctx.Done():
return
case <-sessionCtx.Done():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either context.Done will break the loop. Is this correct or do you need to wait for both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, because sessionCtx recreated on each loop iteration and starts the error-reading go-routine again. But we need to stop this go-routine on rebalance (where there is no error), otherwise we end up with multiple of those routines. The parent context (ctx) however will not be cancelled yet, because there was not an error yet.

fully remove multierr.Errors
system test for testing connection errors
bugfix issue 350 View.Get fails on reconnecting view
bugfix processor freeze on error (334)
@frairon frairon requested a review from ubunatic November 8, 2021 10:19
@frairon frairon merged commit 150be40 into master Nov 11, 2021
@frairon frairon deleted the failure-handling branch November 11, 2021 05:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants