Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace engine.Unit with ComponentManager in Pusher Engine #6747

Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
3552f06
replace engine.Unit with ComponentManager in Pusher Engine
tim-barry Nov 21, 2024
6d3f462
pusher engine test: update positive test
tim-barry Nov 21, 2024
0aadc13
Pusher engine test: update negative test
tim-barry Nov 21, 2024
dec0d58
Start pusher engine in mocks
tim-barry Nov 21, 2024
b5418dd
Merge branch 'master' into tim/7018-pusher-engine-use-componentmanager
jordanschalm Nov 22, 2024
b7166f3
Refactor pusher engine: merge function with no callers
tim-barry Nov 27, 2024
e25cba7
Refactor pusher engine: error on non-local messages
tim-barry Nov 27, 2024
f2f53a8
Refactor pusher engine: rename and propagate error
tim-barry Nov 27, 2024
f66ba69
Refactor pusher engine
tim-barry Nov 27, 2024
1c80949
Revert "Pusher engine test: update negative test"
tim-barry Nov 27, 2024
fbed8e7
Refactor pusher engine: (lint) remove unused code
tim-barry Nov 27, 2024
051e6d4
Merge branch 'feature/pusher-engine-refactor' into tim/7018-pusher-en…
tim-barry Nov 27, 2024
ddb0cb7
Refactor pusher engine: queue length metrics
tim-barry Nov 29, 2024
17a9a2b
Update pusher engine doc comment
tim-barry Nov 29, 2024
aad332c
Apply suggestions from code review
tim-barry Nov 30, 2024
ad04f27
Pusher engine refactor: remove unused collection metrics
tim-barry Nov 30, 2024
88a7d45
Refactor pusher engine: add error return doc comments
tim-barry Dec 2, 2024
cebf100
Refactor pusher engine: add metrics for dropped messages
tim-barry Dec 3, 2024
2048b4a
Apply suggestions from code review
tim-barry Dec 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 89 additions & 28 deletions engine/collection/pusher/engine.go
Copy link
Member

@AlexHentschel AlexHentschel Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the core processing logic in the Engine and lets try to remove layers of complexity and indirection where possible:

func (e *Engine) onSubmitCollectionGuarantee(originID flow.Identifier, req *messages.SubmitCollectionGuarantee) error {
if originID != e.me.NodeID() {
return fmt.Errorf("invalid remote request to submit collection guarantee (from=%x)", originID)
}
return e.SubmitCollectionGuarantee(&req.Guarantee)
}
// SubmitCollectionGuarantee submits the collection guarantee to all consensus nodes.
func (e *Engine) SubmitCollectionGuarantee(guarantee *flow.CollectionGuarantee) error {

Looking at this code raises a few questions:

  • we are only propagating collections that are originating from this node (👉 code). But the exported method SubmitCollectionGuarantee is not checking this? Why not?

    • If it is an important check, we should always check it. If we are fine with collections from other nodes being broadcast, then we don't need the check in onSubmitCollectionGuarantee either..
    • turns out, SubmitCollectionGuarantee is only ever called by the pusher.Engine itself (not even the tests call it).

    Lets simplify this: merge SubmitCollectionGuarantee and onSubmitCollectionGuarantee.

  • Further digging into the code, I found that the originID is just completely useless in this context:

    • the pusher.Engine is fed only from internal components (so originID is always filled with the node's own identifier)
    • the pusher.Engine drops messages that are not from itself. Just hypothetically, lets assume that there were collections with originID other than the node's own ID. ❗Note that those messages would be queued first, wasting resources and then we have a different thread discard the Collection after dequeueing. Checks like that should always be applied before queueing irrelevant messages.

    What you see here is a pattern we struggle with a lot: the code is very ambiguous and allows a whole bunch of stuff that is never intended to happen in practise. This is an unsuitable design for engineering complex systems such as Flow, as we then have to manually handle undesired cases.

    • Example 1: messages other than SubmitCollectionGuarantee. In comparison, if we had a method that only accepted SubmitCollectionGuarantee, the compiler would enforce this constraint for us as opposed to us having to write code. Often, people don't bother to manually disallow undesired inputs, resulting in software that is doing something incorrect in case of bugs as opposed to crashing (or the code not compiling in the first place)
    • Example 2: originID. We employ a generic patter which obfuscates that we don't need an originID in this particular example, because the Engine interface does not differentiate between node-external inputs (can be malicious) vs inputs from other components within its own node (always trusted) and mixes modes of processing (synchronous vs asynchronous).

    For those reasons, we deprecated the Engine interface. Lets try to update the pusher and remove the engine interface (for concrete steps towards this, please see my next comment).

For Flow, the APIs should generally be as descriptive and restrictive as possible and make it hard to feed in wrong / mismatching inputs. Writing extra lines of boilerplate code is not a challenge for Flow. In contrast, it is challenging if you have to dig through hundreds or thousands of lines of code because the APIs are too generic and you need to know implementation details to use them correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially adressed in b7166f3 and e25cba7: originID has been pretty much removed from the interface, now that Process and Submit (the functions in the Engine interface handling node-external inputs) simply error on any input.

tim-barry marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,20 @@
package pusher

import (
"context"
"errors"
"fmt"

"github.com/rs/zerolog"

"github.com/onflow/flow-go/engine"
"github.com/onflow/flow-go/engine/common/fifoqueue"
"github.com/onflow/flow-go/model/flow"
"github.com/onflow/flow-go/model/flow/filter"
"github.com/onflow/flow-go/model/messages"
"github.com/onflow/flow-go/module"
"github.com/onflow/flow-go/module/component"
"github.com/onflow/flow-go/module/irrecoverable"
"github.com/onflow/flow-go/module/metrics"
"github.com/onflow/flow-go/network"
"github.com/onflow/flow-go/network/channels"
Expand All @@ -24,7 +29,6 @@ import (
// Engine is the collection pusher engine, which provides access to resources
// held by the collection node.
type Engine struct {
tim-barry marked this conversation as resolved.
Show resolved Hide resolved
unit *engine.Unit
log zerolog.Logger
engMetrics module.EngineMetrics
colMetrics module.CollectionMetrics
tim-barry marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -33,18 +37,44 @@ type Engine struct {
state protocol.State
collections storage.Collections
transactions storage.Transactions

messageHandler *engine.MessageHandler
notifier engine.Notifier
inbound *fifoqueue.FifoQueue

component.Component
cm *component.ComponentManager
}

// TODO convert to network.MessageProcessor
var _ network.Engine = (*Engine)(nil)
var _ component.Component = (*Engine)(nil)
Comment on lines +47 to +49
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be a good exercise to make the change to MessageProcessor here, though likely as a separate PR.

I'll add some thoughts below; I suggest we spend some time discussing this live as well. There is no need to implement any of the comments below before we have a chance to discuss.

The pusher engine is unusual in a couple of ways:

  1. It is using an old framework for internal message-passing, where two components in the same process would pass messages between each other using standard methods (SubmitLocal and ProcessLocal). The meaning of the message was carried by its type (here: messages.SubmitCollectionGuarantee) and the logical structure was similar to that of messages received over the network. Message processing logic needed to potentially handle inputs originating both locally and remotely.
  2. It does not expect to receive any inbound messages from non-local sources. It is send-only from a networking perspective.

Replacing network.Engine with network.MessageProcessor there are a few opportunities for simplifying the implementation:

  • Currently the internal message-passing is done through SubmitLocal. We can remove SubmitLocal and ProcessLocal and replace them with a context-specific function (and corresponding interface type, for use in the caller).
    • A reference example is the Compliance interface, which is exposed by the compliance engine for local message-passing.
    • For this reason, engine.MessageHandler may not be needed in this context. It is better suited for cases where we are handling external messages.
    • We then do not need to check originIDs and do not need both onSubmitCollectionGuarantee and SubmitCollectionGuarantee methods
    • Now, this context-specific function is the only way for local messages to enter the pusher engine, which will simplify our network message processing logic 👇
  • In order to satisfy network.MessageProcessor, we need a Process method, which accepts messages from remote network sources. Since this engine does not expect any remote messages, we should reject all inputs to Process.
    • In general, when we observe an unexpected message from another node, we should flag it as a possible Byzantine message. In the future, these messages would result in a slashing challenge targeting the originator of the message. For now, we can log a warning with the relevant log tag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Had a similar thought 👉 my comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to network.MessageProcessor and the new interface for local messages will be part of a separate PR.


func New(log zerolog.Logger, net network.EngineRegistry, state protocol.State, engMetrics module.EngineMetrics, colMetrics module.CollectionMetrics, me module.Local, collections storage.Collections, transactions storage.Transactions) (*Engine, error) {
// TODO length observer metrics
inbound, err := fifoqueue.NewFifoQueue(1000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the queue capacity we should pick here. I think we should have a good argument (and document it!) of why the number we chose makes sense. If we don't want to go through the trouble of defining a sensible limit, I think we could make this clear by using an unbounded queue (implementation).

Anyway, I think a limit makes sense:

  • In a nutshell, a collection must reference a block and expires $\texttt{DefaultTransactionExpiry}$ = 600 blocks thereafter
    // Let E by the transaction expiry. If a transaction T specifies a reference
    // block R with height H, then T may be included in any block B where:
    // * R<-*B - meaning B has R as an ancestor, and
    // * R.height < B.height <= R.height+E
    const DefaultTransactionExpiry = 10 * 60
    (its a subtle mechanic, feel free to ignore)
  • consensus nodes produce about $\rho$ = 1.25 blocks per second
  • collector clusters produce up to $\varrho$ = 3 collections per second

So very very roughly, between a collection being first proposed (usually referencing the newest known block) to the collection expiring, the collector cluster will produce $\texttt{DefaultTransactionExpiry}\cdot \rho^{-1} \cdot \varrho = 1440$ collections. So if we queued 1500 collections max, then we would be relatively confident that collections have been broadcast before they expire.
The downside is that collections might sit in the queue for quite some time, if several hundreds are before them. That could be an argument of queuing only much fewer collections (e.g. 200), so collection that don't make it out fast are dropped.
Another argument we could make is actually using a LIFO queue. If the queue is nearly empty, there is little difference. However, if there is a large backlog of collections to be broadcast, I think its a valid argument of broadcasting the newest first, at least then, some collections make it in time as opposed to all of them being queued past their expiry. Just some thoughts.

@jordanschalm what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the choice of queue size here is unlikely to have significant impact:

  • All collection nodes can publish all finalized guarantees.
  • In the happy path, collection nodes are observing finalized collections at a very consistent rate over tiem. We can broadcast collection guarantees much more quickly than they are finalized, and expect the queue to be empty most of the time.
  • In the unhappy path where a collection node is finalizing collections more quickly than usual, it is behind and these are old collections -- we don't care about successfully broadcasting them anyway.

So if we queued 1500 collections max, then we would be relatively confident that collections have been broadcast before they expire.

If we actually had 1500 collections waiting in this queue, I would feel much more confident that there was a severe problem with that node's ability to publish messages, than that collections would be broadcasted before expiring. 😀

General thoughts:

  • I think we should prefer dropping from the queue than making sure we publish every message (choose a relatively small queue size). I'm happy with 200 (~1 minute of collections) or even a bit less.
  • I agree that a LIFO queue (stack) would be better suited. I don't think we should actually make that change now (premature optimization)

if err != nil {
return nil, fmt.Errorf("could not create inbound fifoqueue: %w", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add those length metrics now for completeness. You can use this engine as an example. We'll also need to add a parameter for the mempoolMetrics to the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in ddb0cb7


notifier := engine.NewNotifier()
tim-barry marked this conversation as resolved.
Show resolved Hide resolved
messageHandler := engine.NewMessageHandler(log, notifier, engine.Pattern{
Match: engine.MatchType[*messages.SubmitCollectionGuarantee],
Store: &engine.FifoMessageStore{FifoQueue: inbound},
})
Copy link
Member

@AlexHentschel AlexHentschel Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend to not use the MessageHandler here for the following reasons:

  • I think it is good to apply the patter of the MessageHandler at least once by hand ("learning by doing")
  • The MessageHandler adds a layer of abstraction, if we can do without it that would be preferable in my opinion.
  • In this particular case, the Engine only processes a single input type SubmitCollectionGuarantee. We have tried it before and the message handler is actually more code compared to just doing the same logic by hand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it did end up being less complicated - applied in f66ba69


e := &Engine{
unit: engine.NewUnit(),
log: log.With().Str("engine", "pusher").Logger(),
engMetrics: engMetrics,
colMetrics: colMetrics,
me: me,
state: state,
collections: collections,
transactions: transactions,

messageHandler: messageHandler,
notifier: notifier,
inbound: inbound,
}

conduit, err := net.Register(channels.PushGuarantees, e)
Expand All @@ -53,55 +83,86 @@ func New(log zerolog.Logger, net network.EngineRegistry, state protocol.State, e
}
e.conduit = conduit

e.cm = component.NewComponentManagerBuilder().
AddWorker(e.inboundMessageWorker).
Build()
e.Component = e.cm

return e, nil
}

// Ready returns a ready channel that is closed once the engine has fully
// started.
func (e *Engine) Ready() <-chan struct{} {
return e.unit.Ready()
func (e *Engine) inboundMessageWorker(ctx irrecoverable.SignalerContext, ready component.ReadyFunc) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a short godoc for this method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

ready()

done := ctx.Done()
wake := e.notifier.Channel()
for {
select {
case <-done:
return
case <-wake:
e.processInboundMessages(ctx)
}
}
}

// Done returns a done channel that is closed once the engine has fully stopped.
func (e *Engine) Done() <-chan struct{} {
return e.unit.Done()
func (e *Engine) processInboundMessages(ctx context.Context) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably make sure functions have doc comments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please add a short godoc comment to each function (especially public functions, but ideally private ones too).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added short doc comments in f2f53a8

for {
nextMessage, ok := e.inbound.Pop()
if !ok {
return
}

asEngineWrapper := nextMessage.(*engine.Message)
asSCGMsg := asEngineWrapper.Payload.(*messages.SubmitCollectionGuarantee)
originID := asEngineWrapper.OriginID

_ = e.process(originID, asSCGMsg)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems incorrect to fully discard any error, should likely be logged instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should be handling all possible error cases. In general, all functions which return an error should document all possible expected error cases in their documentation. Callers which observe an error return that they don't know explicitly how to handle safely, should propagate the error further up the stack. Eventually this results in either the component or entire process restarting.

Here is the part of our coding guidelines talking about error handling: https://github.com/onflow/flow-go/blob/master/CodingConventions.md#error-handling. In short, the rule is to prioritize safety over liveness.

Practically, the way we would do this here is using the irrecoverable.SignalerContext. That has a Throw method which will signal to restart the component or process.

I usually propagate errors all the way up the stack to where the SignalerContext is defined (here, that would be in inboundMessageWorker) then do ctx.Throw(err). You can also use irrecoverable.Throw to throw any context.Context.

Change Suggestions:

  • In general, make sure all functions in the file which return an error document their expected error returns (or the lack of expected errors). Here is an example of error documentation.
  • Propagate the error value to inboundMessageWorker and Throw unexpected errors at that frame.

Copy link
Member

@AlexHentschel AlexHentschel Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with your comment that discarding any error is bad. Please take a look at our Coding Conventions, if you haven't already. Tldr:

  • just discarding any error: very bad
  • logging error and continuing: still bad (this is the pattern of continuing on best effort basis, which is not a viable approach for high-assurance software systems)
  • desired:
    • explicitly check error type and only continue on errors that are confirmed to be benign.
    • all other errors (outside of the specified benign code path) are fatal

The pusher.Engine is a bit of a simpler case, because it only distributes information to other nodes. So swallowing errors does not undermine safety as it would for many other Engines. BUT, we already had cases where nodes were logging OutOfMemory errors for hours, but the code just dropped those and the node was dysfunctional without the software noticing it ... so even if it is not safety critical, we still generally expect that errors are not swallowed (and neither logged and then just discarded).

There are exceptions to this rule, e.g. if it would be a massive time investment to clear up technical debt and thoroughly documenting possible benign errors. But then, a detailed comment is absolutely required why it is always fine to log and discard any errors in that particular case.

I would suggest to leave error handling for a bit and come back to it after some iterations. After having implemented some of the cleanup and simplifications I suggest further below, I suspect that error handling will become a lot more straightforward.


select {
case <-ctx.Done():
return
default:
}
}
}

// SubmitLocal submits an event originating on the local node.
func (e *Engine) SubmitLocal(event interface{}) {
Comment on lines 146 to 147
Copy link
Member

@AlexHentschel AlexHentschel Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an exercise, try to figure out which other components are feeing inputs into the pusher.Engine. We are lucky here because we have a rarely used message type SubmitCollectionGuarantee. But just imagine you would see a log that says no matching processor for message of type flow.Header from origin. With the Engine interface it is way too time intensive to figure out which components inside a node are feeding which other components with data.

Turns out that the Finalizer is the only component that is feeding the pusher.Engine (other than messages arriving from the networking layer). Let's make our lives easier in the future by improving the code:

  • introduce a dedicated interface GuaranteedCollectionPublisher. It only has one type-safe method
    SubmitCollectionGuarantee(guarantee *flow.CollectionGuarantee)
    this method just takes the message and puts it into the queue. Then we know that the queue will only ever contain elements of type flow.CollectionGuarantee. (currently, the Finalizer creates the wrapper messages.SubmitCollectionGuarantee around CollectionGuarantee ... and the pusher.Engine unwraps it and throws the wrapper away 😑 ).
  • change the Finalizer and all layers in between to work with the GuaranteedCollectionPublisher interface. This improves code clarity: currently, engineers would need to know what functionality was backing the generic Engine interface, e.g. here: because the right type of inputs need to be fed into the pusher.Engine. When using the GuaranteedCollectionPublisher interface we have the following benefits:
    • the type itself carries information about what functionality is behind this interface. Huge improvement of clarity.
    • its easy to find all places that can feed the Pusher with data, because we work with concrete interfaces and context-specific methods (e.g. SubmitCollectionGuarantee).
    • the compiler will reject code that feeds inputs with incompatible type into the Pusher
  • In my opinion, it would improve clarity if we were to discard all inputs from the network.MessageProcessor - conceptually they are messages from different node, which they are already broadcasting. The Pusher is for broadcasting our own messages. But at the moment, inputs from node-internal components as well as messages from other nodes passing through the networking layer share the same code path in the pusher.Enging, which makes it non-trivial to figure out where messages are coming from.

To summarize: one of the main benefits of the Component interface is that it provides lifecycles methods, but no methods for ingesting data - on purpose. This is because all data from components within the node should be passed via context-specific methods. The only exception is the interface network.MessageProcessor, which would still call Process with a generic message type.

Sorry for the lengthy comment. I hope you see how the very ambiguous structure of the Engine makes it really hard to determine from the code what we are expecting and where to drop messages, because its all hidden behind layers of ambiguous interfaces in a huge code base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed comment :)

I added the described SubmitCollectionGuarantee method in f66ba69 , but the interface & changing the Finalizer will be pushed to the next PR.

Inputs from the network.MessageProcessor (via the Process method) are being discarded (e25cba7).

e.unit.Launch(func() {
err := e.process(e.me.NodeID(), event)
if err != nil {
engine.LogError(e.log, err)
}
})
err := e.messageHandler.Process(e.me.NodeID(), event)
if err != nil {
engine.LogError(e.log, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
engine.LogError(e.log, err)
// TODO: do not swallow errors here, remove this function when transitioning to `network.MessageProcessor`
engine.LogError(e.log, err)

}
}

// Submit submits the given event from the node with the given origin ID
// for processing in a non-blocking manner. It returns instantly and logs
// a potential processing error internally when done.
func (e *Engine) Submit(channel channels.Channel, originID flow.Identifier, event interface{}) {
e.unit.Launch(func() {
err := e.process(originID, event)
if err != nil {
engine.LogError(e.log, err)
}
})
err := e.messageHandler.Process(originID, event)
if err != nil {
engine.LogError(e.log, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
engine.LogError(e.log, err)
// TODO: do not swallow errors here, remove this function when transitioning to `network.MessageProcessor`
engine.LogError(e.log, err)

}
}

// ProcessLocal processes an event originating on the local node.
jordanschalm marked this conversation as resolved.
Show resolved Hide resolved
func (e *Engine) ProcessLocal(event interface{}) error {
return e.unit.Do(func() error {
return e.process(e.me.NodeID(), event)
})
return e.messageHandler.Process(e.me.NodeID(), event)
}

// Process processes the given event from the node with the given origin ID in
// a blocking manner. It returns the potential processing error when done.
func (e *Engine) Process(channel channels.Channel, originID flow.Identifier, event interface{}) error {
return e.unit.Do(func() error {
return e.process(originID, event)
})
func (e *Engine) Process(channel channels.Channel, originID flow.Identifier, message any) error {
err := e.messageHandler.Process(originID, message)
if err != nil {
if errors.Is(err, engine.IncompatibleInputTypeError) {
e.log.Warn().Bool(logging.KeySuspicious, true).Msgf("%v delivered unsupported message %T through %v", originID, message, channel)
return nil
}
// TODO add comment about Process errors...
return fmt.Errorf("unexpected failure to process inbound pusher message")
}
return nil
}

// process processes events for the pusher engine on the collection node.
Expand Down
22 changes: 15 additions & 7 deletions engine/collection/pusher/engine_test.go
Original file line number Diff line number Diff line change
@@ -1,20 +1,21 @@
package pusher_test
package pusher

import (
"context"
"io"
"testing"
"time"

"github.com/rs/zerolog"
"github.com/stretchr/testify/mock"
"github.com/stretchr/testify/suite"

"github.com/onflow/flow-go/engine/collection/pusher"
"github.com/onflow/flow-go/model/flow"
"github.com/onflow/flow-go/model/flow/filter"
"github.com/onflow/flow-go/model/messages"
"github.com/onflow/flow-go/module/irrecoverable"
"github.com/onflow/flow-go/module/metrics"
module "github.com/onflow/flow-go/module/mock"
"github.com/onflow/flow-go/network/channels"
"github.com/onflow/flow-go/network/mocknetwork"
protocol "github.com/onflow/flow-go/state/protocol/mock"
storage "github.com/onflow/flow-go/storage/mock"
Expand All @@ -32,7 +33,7 @@ type Suite struct {
collections *storage.Collections
transactions *storage.Transactions

engine *pusher.Engine
engine *Engine
}

func (suite *Suite) SetupTest() {
Expand Down Expand Up @@ -63,7 +64,7 @@ func (suite *Suite) SetupTest() {
suite.collections = new(storage.Collections)
suite.transactions = new(storage.Transactions)

suite.engine, err = pusher.New(
suite.engine, err = New(
zerolog.New(io.Discard),
net,
suite.state,
Expand All @@ -82,19 +83,26 @@ func TestPusherEngine(t *testing.T) {

// should be able to submit collection guarantees to consensus nodes
func (suite *Suite) TestSubmitCollectionGuarantee() {
ctx, cancel := irrecoverable.NewMockSignalerContextWithCancel(suite.T(), context.Background())
suite.engine.Start(ctx)
defer cancel()
done := make(chan struct{})

guarantee := unittest.CollectionGuaranteeFixture()

// should submit the collection to consensus nodes
consensus := suite.identities.Filter(filter.HasRole[flow.Identity](flow.RoleConsensus))
suite.conduit.On("Publish", guarantee, consensus[0].NodeID).Return(nil)
suite.conduit.On("Publish", guarantee, consensus[0].NodeID).
Run(func(_ mock.Arguments) { close(done) }).Return(nil).Once()

msg := &messages.SubmitCollectionGuarantee{
Guarantee: *guarantee,
}
err := suite.engine.ProcessLocal(msg)
suite.Require().Nil(err)

unittest.RequireCloseBefore(suite.T(), done, time.Second, "message not sent")

suite.conduit.AssertExpectations(suite.T())
}

Expand All @@ -109,7 +117,7 @@ func (suite *Suite) TestSubmitCollectionGuaranteeNonLocal() {
msg := &messages.SubmitCollectionGuarantee{
Guarantee: *guarantee,
}
err := suite.engine.Process(channels.PushGuarantees, sender.NodeID, msg)
err := suite.engine.process(sender.NodeID, msg)
suite.Require().Error(err)

suite.conduit.AssertNumberOfCalls(suite.T(), "Multicast", 0)
Expand Down
1 change: 1 addition & 0 deletions engine/testutil/mock/nodes.go
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ func (n CollectionNode) Start(t *testing.T) {
n.IngestionEngine.Start(n.Ctx)
n.EpochManagerEngine.Start(n.Ctx)
n.ProviderEngine.Start(n.Ctx)
n.PusherEngine.Start(n.Ctx)
}

func (n CollectionNode) Ready() <-chan struct{} {
Expand Down
Loading