Skip to content
This repository has been archived by the owner on Mar 16, 2022. It is now read-only.

Adds Java Controller and Metadata support #377

Merged
merged 2 commits into from
Sep 21, 2020

Conversation

jroper
Copy link
Member

@jroper jroper commented Jun 30, 2020

Depends on #366 being merged first.

I've only added metadata support inputs for event sourcing and crdts, I haven't added it for outputs yet.

Controller support is in full.

@sleipnir
Copy link

Hello @jroper I see that this PR implements Stateless Function, but I see that it was implemented in a less general context, specific to Eventing support. Wouldn't it be possible to use a Stateless Function instead of a new type created that it called Controller?

Anyway, I find it interesting for @ralphlaude to take a look and see what he can reuse in supporting Stateless Functions

@jroper
Copy link
Member Author

jroper commented Jul 1, 2020

@sleipnir It's not an eventing specific context, this can (and should) be used for anything, it's just using a different name. The CloudEvent stuff in there is all optional, any message, whether it comes from an event source or gRPC, could be a CloudEvent if it has the right metadata attached, what I've provided in here is a convenient mechanism to access that metadata if it exists. So, I probably should have raised this discussion earlier, but I've been thinking about the naming of stateless function.

  • Stateless function is a tautology - functions are stateless. So I don't think it makes sense to call it stateless function.
  • Function is a bit limiting - it's not really a function either. It can be just a function, but it's more than a function, it can also forward to other entities/controllers. We've talked about adding some form of ask support, where we the protocol would send messages back to the proxy to send to other entities and get a response back, or perhaps even to other services entirely. I think this is where that support would go. This I think makes it much more than a function, it becomes more like an actor, albeit a stateless actor.
  • Stateless entity is an oxymoron.
  • Stateless stateful service is an oxymoron.
  • Stateless anything is confusing and likely oxymoronic - I've deployed a StatefulService with a stateless X inside it.
  • Actor I think is introducing a concept that many users won't understand.

What these things do is they allow routing of messages between different stateful entities. The problem with the term routing though is that it is likely to conflict with network routing concepts - eg OpenShift allows you to deploy routes (roughly equivalent to a k8s ingress). So my thinking is the term controller, since it controls the flow of messages between different stateful entities. It's using the term controller in the same way as it's used in the MVC pattern, except that it would probably in our context be better described as MMC, model model controller, since it controls interactions between multiple models, rather than between the model and the view.

In future, we may add stateful/durable controllers, which I think would allow implementing the process controller pattern. These would achieve essentially the same thing as controllers, except they would be durable - if a failure occurred part way through, the process would be restarted. So then we'd have controllers and durable controllers, and both would have very similar APIs, the durable controllers would likely differ by allowing you to attach a durable context onto the asks, so that when the ask succeeded, if the controller wasn't still running due to a crash or whatever, it could be restarted with that context.

So this is my reason for calling it a controller. What do you think?

@jroper jroper force-pushed the java-controller-support branch from fa7bb50 to ac3e035 Compare July 1, 2020 06:02
@sleipnir
Copy link

sleipnir commented Jul 1, 2020

Hello @jroper , excellent explanation. I largely agree with everything you said.

But now that I have the context, thank you, I think that would be similar to what other frameworks like Dapr call Virtual Actor (although in Dapr the term Virtual is explained as the "Virtual representation of something real", I think that in our context it would be Virtual because is built on top a real Actors implementation and inherits from some concepts ) and I think this is the ideal name, because despite not maintaining its internal state as an Actor it is used for things where we usually use Actors. I think it's a widespread name, mainly due to the work that has been done by Microsoft with Dapr, Reliable Actors, among other projects and it would be a good point of conceptual synergy between ours and these other projects that are emerging today.

I understood the analogy with the MVC Controller and I think it has its appeal mainly within the community of developers used to web frameworks. That said, I have nothing against the name Controller, but if it were a vote I would vote for Virtual Actor.

What do you think ? Others have something to say? @viktorklang @pvlugter @ralphlaude @marcellanz Let us know your thoughts

@ralphlaude
Copy link
Contributor

@jroper, thanks for the great explanation which also helps me understand the context.
Calling it Controller is more accurate and the term is well understood comparing to Actor. So i agree with your proposals.

newAnySupport(additionalDescriptors),
persistenceId,
snapshotEvery));
service ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small typo. i think it should be system here instead of service

*
* @return The call level metadata.
*/
Metadata metadata();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it needed here because it is already defined in MetadataContext?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's defined here to override the docs, since for controllers, there are two levels of metadata, and this is documented here.

@@ -0,0 +1,286 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well done!

}
}

private trait StreamedInSupport {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice if we can support some kind of java.util.collection for streamedin. we had the disccusion here - #266. what do you think @jroper?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could definitely support this, though my question would be, what's the use case? If you want to handle messages that you know will fit in memory, why not use a unary call and put those messages in a repeated field in the protobuf? My view on streamed calls is that they are useful for the following cases:

  • Where you don't want to buffer all the elements in memory because that could cause you to run out of memory.
  • Where you want to handle elements as they arrive, eg, in a chat application, each message should be handled immediately.

In both of the above cases, using a collection would defeat the purpose of using a streamed call - in fact, a call that handles all its messages in a single collection is by definition not streamed - so why use a streamed call for something that isn't streamed? Supporting this seems to be promoting bad practices to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no special use case for supporting collection for streamed-in calls. it is an anti pattern to use streamed calls to buffer all the in memory. the unary call can be used and it is fine for me. So i agree with your explanation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jroper There are use cases for batch message handling, imagine a stream processing system where some other system has added messages that belong to some business category and should be aggregated into lists, very common in ETL processing, Machine Learning, CEP, and etc. ... In these cases the intake system will send finite data in batches that can be processed as a collection. Bearing in mind that a StreamIn will always be finite because it is expected to have a unary output, so it makes sense to support Collections, even though the same use case can be achieved with reactive flow operators

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ralphlaude It's not an anti pattern, it's just something you didn't think of

Copy link

@sleipnir sleipnir Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remembering that I am not referring to "Streamed" (In/Out streams) calls as @ralphlaude mentioned, I am referring to the only and exclusively "StreamIn" (In stream) calls, that is, finite incoming flows. And yes, it may be that the memory is not enough, but on the other hand the user would obviously be aware of this when making the choice of the desired operator in his method signature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stream in is not necessarily finite. It might ordinarily be infinite, but the receiver may change state, to a state where it can no longer accept messages, and in that case the out message produced would indicate information about that, for example, it might indicate that a long lived entity has now been deleted, or has been moved and the message may indicate how to reconnect, etc.

It might also be finite, but messages might need to be processed in real time, for example, you might use a stream in to report to a server the progress of a user integration, and once done, the server sends a single acknowledgement message. The thing happening has a finite number of state changes, so you're going to send a finite number of messages, but you want them handled as they arrived, not batched then handled.

If I was sending finite data to be processed in batches, I'd still rather model that as a single message containing the batch of data to process - this forces the sender to think about how much data can reasonably be fit into a single batch. Of course, that single message could still be very large and may cause things downstream to run out of memory, but it makes the contract clear, if the sender sends 100mb in a single message, that's obviously bad, but it's not obviously bad that sending 100mb spread across a stream is bad. If I did decide to use a stream, and I wanted downstream to process in batches, then I'd rather model that in the receiver using a grouped streaming operator, so that the receiver does do batching, but limits the amount that it buffers, and may process big batches as multiple sub batches, now the sender doesn't have to have any knowledge of the resource constraints of the receiver.

The advantage of supporting the collection types is convenience for the simple case, but increased danger - if we support collection types, then we have to document them (otherwise they may as well not exist), and many users will copy and paste those code examples without considering the consequences. The advantage of not supporting them is that we force the user to think about the fact that this is a stream and could cause them to run out of resources - they can still buffer them all then process them, but they've had to explicitly do that in code, and then in our documentation, we won't have any examples of buffering the whole stream, if we want to document batch handling, we can show how to do it safely with bounded buffer groups, so the users that just copy and paste will build something that by default is safe.

commandName -> new StreamedCallInvoker(method, serviceMethod, mat)
}.toMap

ReflectionHelper.validateNoBadMethods(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a small change. perhaps it can be done first to fail fast after allMethods are processed.

@jroper jroper force-pushed the java-controller-support branch 7 times, most recently from cb58b9a to 41103e7 Compare July 2, 2020 06:48
@marcellanz
Copy link
Contributor

marcellanz commented Jul 3, 2020

@jroper I read the sections where you explain why to use the term Controller, and also extensions to which controllers can get attached durable contexts which is interesting. I also read "What these things do is they allow routing of messages between different stateful entities." which seems to be the very purpose of these controllers.
In terms of a Router, some use this term a lot in other contexts, other than routing packets in a network, like Apache Camel.

I come a bit to the impression, that Controller is a pretty generic term by itself. If I see it right, the Controller and ControllerImpl is even inside a package named controller. If Controller is generic, so is a Manager. But for what I understand, a Manager perhaps has a bit more of a longer context attached to its operation lifespan (like an EntityManager?), and a controller might have a pretty short lifespan in terms of its activity.

Now, I'm not english native speaking, so I might get this wrong. But I find Controller a bit too generic. A Controller of what? I'm curious if we could iterate around the description of what you wrote such a thing is doing: "What these things do is they allow routing of messages between different stateful entities." and validate the term.

As @sleipnir wrote, I'm also not against it, I'm just curious if it's the right term for what it does. Wdyt?

@sleipnir
Copy link

sleipnir commented Jul 3, 2020

I agree with @marcellanz , it is worth remembering that the term Controller in Go is completely discouraged, am I right, @marcellanz?

@marcellanz
Copy link
Contributor

marcellanz commented Jul 4, 2020

it is worth remembering that the term Controller in Go is completely discouraged

For itself alone, I think, yes. A Controller is often too generic and without context not easy to understand what it does. There are many generic types in Go, io.Reader and io.Writer for sure. Or context.Context, which has a very (very) unfortunate package name. Context should probably be in the io package. context.Context also has other issues, as it features multiple orthogonal concerns.

I didn't want to bring into the context of Go what I wrote above, because I think Go is different in terms of naming things.

One can find many Go projects with different style, same as for Java or any other language. Some of them are clearly written in a Java-ish style, take kubernetes as an example. On the other hand projects like https://github.com/upspin are very different. And there is https://github.com/go-kit/kit, with some "enterprise" oriented style. You would not find one controller in these two projects, not even in the latter one.

So having a Controller is not wrong or "completely discouraged", but I think I would not use it as a concept without some context. It would have a describing package name at least, controller.Controller would be too broad and non-descriptive I think.

I'm still for it to find a name for a thing that is described as: "What these things do is they allow routing of messages between different stateful entities." :)

* types when needed.
* @return This Cloudstate builder.
*/
public CloudState registerController(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So according to the controllerspec.proto one can register 'custom' controllers for message routing? I like that a lot as it treats the controller/router/thing on the same level as a user function. Is that how its intended to work?

@jroper
Copy link
Member Author

jroper commented Jul 6, 2020

Whatever we choose - controller, manager, service, actor - is going to be generic. I've seen terms like manager discouraged in Java for being too generic - though Java is a huge ecosystem with a lot of history with many different ways of doing things, Go tends to be more homogeneous in that regard in my experience.

One thing raised here is that each language brings its own context to naming. One option could be that we name things according to what makes sense in each language. I'm wary of this though because the more naming diverges between languages, the harder it becomes to offer cross language tooling that makes sense - for example, we could produce a topology report, but how would you name these if every language calls them something different?

@marcellanz
Copy link
Contributor

Thats a good point. I found CloudEvents using a "Intermediary" for a similar usage:

An "intermediary" receives a message containing an event for the purpose of forwarding it to the next receiver, which might be another intermediary or a Consumer. A typical task for an intermediary is to route the event to receivers based on the information in the Context.

I agree that whatever is choosen, it might be have been used before and would have a history of meaning. But, if Controller is choosen, what would you say it is a controller of? Is it an Entity Controller? or a Message Controller? a Routing Controller?

@sleipnir
Copy link

sleipnir commented Jul 6, 2020

Whatever we choose - controller, manager, service, actor - is going to be generic. I've seen terms like manager discouraged in Java for being too generic - though Java is a huge ecosystem with a lot of history with many different ways of doing things, Go tends to be more homogeneous in that regard in my experience.

One thing raised here is that each language brings its own context to naming. One option could be that we name things according to what makes sense in each language. I'm wary of this though because the more naming diverges between languages, the harder it becomes to offer cross language tooling that makes sense - for example, we could produce a topology report, but how would you name these if every language calls them something different?

I don't think that all terms mentioned are equally generic as Controller, Service for example would be perfectly understood as a function because many other FaaS frameworks use this name, that is, anyone who has already dealt with the FaaS context would be able to understand the concept. Actor on the other hand would be completely understood as a unit that receives input and does something with it (with or without state). Router would be closer to the concept that I think we are dealing with here, which is something in the middle of different entities and that provide the forwarding of messages between them (I would still like to be able to have something completely independent of state entities, that is, in the case where I just want to be able to handle an http request and return a response, I can't see Controller, Router, as something capable of doing just that).

I don't think we should encourage or name a concept differently between languages. This makes life difficult for users who work with multiple languages and does not allow people to understand the concept intended by the name defined at the protocol level.

* @param descriptor The descriptor for the service that this controller implements.
* @param additionalDescriptors Any additional descriptors that should be used to look up protobuf
* types when needed.
* @return This Cloudstate builder.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a Cloudstate CloudState builder? :)

@marcellanz
Copy link
Contributor

marcellanz commented Jul 8, 2020

@pvlugter @jroper @ralphlaude
The shortly discussed "Process Manager" in the 2020-07-07 Contributors Call today described in the CQRS Journey mentions the origin of the term:

For a time the team developing the Reference Implementation used the term coordinating workflow before settling on the term process manager. This pattern is described in the book "Enterprise Integration Patterns" by Gregor Hohpe and Bobby Woolf.

https://www.enterpriseintegrationpatterns.com/patterns/messaging/ProcessManager.html

The book describes aspects we discussed (p.312) about managing state between message exchanges, process instances, correlation as well as potential declarative definitions together with the motivation:

How do we route a message through multiple processing steps when the required steps may not be known at design time and may not be sequential?

the book explains also the request-reply pattern, initiated by a trigger message, similar to the "ask and answer" - support described in this PR. (I think I can't share the paragraphs here).

@jroper
Copy link
Member Author

jroper commented Jul 8, 2020

I'm warming to ProcessManager. It does mean most projections will have a ProcessManager, even if all that projection does is forward to another entity.

I think if we went down this path, the plan would be while right now ProcessManager's are stateless, in future, they may be stateful, and the API we offer now should feel consistent with the stateful API.

So it's worth thinking about what a stateful process manager API might look like. The next stage of the process would typically be triggered by an event from an event sourced entity. Here's the challenge, this event may not necessarily have a direct reference to the process itself, and if it does, it may not necessarily be feasible for Cloudstate to automatically know which process that event is part of. So, how do we associate this event with the process?

One possibility that is more automatic could be metadata - if a command can carry metadata that then gets attached to any events that it emits, that would be a straight forward way to correlate. Though this would only work for Cloudstate entities - what about parts of the process that went outside of the Cloudstate system? Eg, if you had a payment provider that you could submit things to and it would asynchronously call you back via webhook? Or similarly a build system that would allow you to trigger a new build that asynchronously called you back via a webhook once complete or failed? These may not have the necessary metadata mechanism.

Another possibility is to provide an API that allows you to attach to a process when you receive the event. This would then give you the durable context that you could then work with. How would you know the id of the process to attach to? If the id is part of the event, easy, but what if it's not? We might need to allow indexing processes by multiple different keys. But perhaps that's overthinking it a bit, we might just say the user needs to ensure that id is carried through, either as a first class value in the messages/events, or through some metadata mechanism. The attachment to the process could be done using a reinvoking continuation - when you receive the event, you ask for the process context with a given id, if it doesn't exist, you would return an instruction to attach to a particular process (in some languages this might be done automatically by throwing a control exception on attempt to access the process context). The proxy would then reroute the message to the node where that processes shard lives, load the process with its state/context, and reinvoke again with the same message. This time, the context is present, and so processing continues onwards. In the Java API provided, this API would probably be possible to bolt on to the current unary API with no problems, we could offer both styles of API - control flow using exceptions or explicit returning of suspend.

Initially, save/resume points would all be implicit, when the process went through message queues or event logs, so a failure would result in a lack of acknowledgement, which would trigger retry. In future we might support explicit save points, along with timeouts, timeouts would be necessary in situations where the process went through an external service that had a non reliable webhook or similar return mechanism, along with situations where the business rules called for timeouts.

Anyway, the point of me writing the above was just brainstorming whether if we called it ProcessManager today, we'd be forward compatible with the direction we'd want to take it, which I think we will be. The biggest issue I see with ProcessManager is that it feels like overkill for the simple cases, we'd be introducing a new term to users who all they need is a stateless function. I do think process manager can still be applied to that context - it would be a single step process, I'm comfortable with that, and in the context of projections, it's common to call them a "read side processor", which implies that handling the events is a process. We can always sugar a layer on top of the API with simple function terminology if it's that big of a problem.

Apart from renaming to ProcessManager, a few other things would need to be done:

  • We'd want to modify the protocol:
    • Obviously rename it from Function to ProcessManager
    • We'd also need to make unary calls be handled by streamed calls in the protocol to support asks.
    • We'd probably also want to make the in/out types polymorphic (ie, a wrapper type with a oneof), this gives us a way to add ask support, as well as add other features in future.

I think this though should be done in another PR.

There is one more thing, currently we have some sample apps that we're using at Lightbend that use stateless functions in JavaScript to serve a web interface. It's a bit awkward to do now, but it works. It will still be possible to do with ProcessManager, except it's just a really weird name to call something that serves a web interface. I'm ok with that, the purpose of these things is not to serve web interfaces, and our sample apps at Lightbend are only using them in that way because we wanted to have a nice UI to demonstrate the stateful services in action, and in our locked down hosted service, you can't just deploy an arbitrary deployment. I just want to state here that we're explicitly ignoring that use case. If we were to provide something that better matched that use case, we might allow deploying a stateless service that doesn't have the proxy injected. Or, we could support something like dapr, where we don't proxy, but we do offer a sidecar that these stateless functions can call into to interact with the rest of Cloudstate, along with support libraries that offered idiomatic APIs that exposed Cloudstate terminology.

@jroper
Copy link
Member Author

jroper commented Jul 8, 2020

Another thing I've just considered - very often in read side processors you actually need to store two representations of the state, or at least, two indexes. The first is indexed by entity, the second is indexed by the projection key. So, consider a users by email projection. When I receive an email changed event, that's going to contain the new email address, but not the old. But if my projection is using a key value store, I need to know the old email to remove the entry for the old email address from it. So, to do this, I need to store what the current email address of the user is, keyed by user, so when I receive the email changed notification, I can look up the current email address for the user, and then I can do the two updates on the email projection. Similarly, when a user is deleted, that event won't contain the email address, so I'll need to lookup the current email address for the user, so that I can remove that entry from my database.

The durable process manager described above I think would be perfect for that - the users current email address would be the state. When you receive the email changed event, you attach to the context for that user id, this then gives you the email address, you can then send two concurrent asks to the users by email store to remove the entry for the old address and add an entry for the new address, when both return, you update the state and return. So, the users by email read side processor is implemented using a process manager per user. I think that's fairly simple to describe and understand.

@hseeberger
Copy link
Contributor

Uh, ah! Process Manager sounds ways too heavy for this poor little thing.

No surprise at all that the term has been coined by "Enterprise Gregor”.

@sleipnir
Copy link

sleipnir commented Jul 8, 2020

@marcellanz @jroper @hseeberger I think that to quote EIP the most generic example for what we want to do would be the "Message Router pattern", it applies very well to the simplest stateless cases and can be suitable for cases with states or multiple routing channels.
This pattern also provides less coupling between the parts...
Having an @RouterEntity annotation would also be really cool: P

@RouterEntity 
public class BlaBlaBlaRouter {
}

:D

@marcellanz
Copy link
Contributor

marcellanz commented Jul 10, 2020

Uh, ah! Process Manager sounds ways too heavy for this poor little thing.

No surprise at all that the term has been coined by "Enterprise Gregor”.

@hseeberger
I agree on that the term might have got a lot of weight put on from heavy things like P8, TIBCO and many others, like modern (business) process managers over time, but I also have the impression that the term and its known and used definition quite good represents what was described so far what the "little thing" is able to do.

I think Gregor and Bobby very much knew that this term captures a lot, perhaps too much. I quote (and I left the "excerpt" note, with which I think I can paste it here):

The versatility of a Process Manager is at the same time its biggest strength and weakness. A Process Manager can execute any sequence of steps, sequential or in parallel. Therefore, almost any integration problem can be solved with a Process Manager. Likewise, most of the patterns introduced in this chapter could be implemented using a Process Manager. In fact, many EAI vendors make you believe that every integration problem is a process problem.

(Excerpt From: Gregor Hohpe. “Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions (Marcel Lanz's Library)”. Apple Books.)

James already described a few very interesting use cases a "process manager" can be used for. I also like the "sugar layer" approach for other uses, where we don't need a fully loaded process manager. When we talk about a term, we all have attached a cognitive narrative for what a thing stands for. Calling it solely a "Controller" is too generic as I argued; and asked: "a controller of what?".

The CNCF has a section on Workflow with its serverless working group: https://github.com/cncf/wg-serverless/blob/master/workflow/spec/spec.md. While I would not like to call the "little thing" a workflow manager, a thing that has even more weight on on my experience, it seems other have similar needs to describe non-discrete composites.

So if a process manager is too much, what is it then. Wdyt @hseeberger?

I think the thing is addressable, the thing can control or arbitrate message passing or routing of messages, it even can attach durable context used to have session like semantics. I'd like to have a name for it which makes it clear how it can compose a thing as little as a forward, as well as orchestrate interactions in a greater context.

I currently stroll on on the following areas:

https://en.wikipedia.org/wiki/Communicating_sequential_processes :

describing patterns of interaction in concurrent systems

https://en.wikipedia.org/wiki/Process_calculus :

Process calculi provide a tool for the high-level description of interactions, communications, and synchronizations between a collection of independent agents or processes.

https://en.wikipedia.org/wiki/Calculus_of_communicating_systems :

and I have the impression that composition plays a role of what we're looking for. So far in Cloudstate we have simple discrete step-like functions; not knowing much about other entities. A Forward is the best we can do, but how do we compose or manage multiple interactions with the programming model we envision? If we don't like to loose sight of interactions of entities/things, how do we describe that? Do we express discrete steps with an entity or thing completely loosely coupled? or do we define a "process" in whatever form? Processes and Workflows are all about that.

Last, I think we also could not name the thing so that it describes its full purpose, and give it a name that is so generic, that it applies to any additional specific discriminatory term, that then defines its purpose. It does not be to be named so broad so that it gets attached a non-concrete purpose by the user. Think about a Service or Component in Spring; it can do a lot of things, its concrete purpose is defined by its very name and compositional use defined by other components or services. In this regard, we would name the thing to mark that it's treated as being addressable and at the same time give it a concrete name that describes what it's doing. It is the greatest weakness of calling it just a Controller not knowing what is doing I think. So we could call it just by what it needs to be to fulfil acting as a Forwarder or being Durable which are properties of their function.

I still like the term "process manager" even if it is loaded for now and I like many of the uses James described.

Wdyt?

@pvlugter
Copy link
Member

Thinking about this some more since the contributor call, I still can't get away from a simple approach.

Going back to the original naming problem. We have entity and stateful service, and those names are considered fine. We then have a "stateless function entity" and a "stateless stateful service", and those names don't make any sense. I think this is not a problem so much with the terms used (function or stateless) but that they seem to be expected to build on entity and stateful service as the core concepts.

We could simply strip away the extraneous:

stateless function entity -> function
stateless stateful service -> service

In its current form, it's code triggered by events, that responds with actions or effects (that can be managed by the runtime). In most serverless solutions it's called a function. There are then more specific variants or uses of functions. In AWS, step functions implement the process manager pattern. In Azure, orchestrator functions (which are durable functions) are process managers, and there are entity functions for stateful entities. That a function can return effects is not problematic to me, that makes sense.

I agree about having the core concepts in Cloudstate as aligned as possible with existing terminology, so people know what things are more easily.

For me, it works fine to have functions and to deploy a service. A service has functions. A stateful service has entities. And these concepts can be layered. Entities use a specialised function to implement their behaviour; the entity concept builds on the core concept of a function, by remembering previous states and events and passing these back in as inputs (just like the event sourced behaviour for Akka Persistence Typed). The concept of an actor also builds on the core concept of a behaviour function, and includes other properties, like an addressable mailbox, and allows stateful patterns to be implemented (such as entities or process managers).

This seems aligned with serverless in general, and with Akka, and to me has the correct connotations around state. Functions are expected to be stateless by default — where stateless usually means that there is no dependency on state or preceding events at all; only the inputs are used, same output for same inputs; it doesn't remember any information. There's also the secondary use of stateless as meaning the persistent state is externalised (such as stored in a database), which is how a "stateless" function or mid-tier process is part of a stateful application or implements stateful entities.

So for the current thing, I think function is still the appropriate name. Or if we wanted a different generic term, then action could also work. These are common terms used in this space already.

I agree that supporting the process manager pattern is going to be a useful feature, and could also be called workflow, coordinator, or orchestrator, or more generically it could be an actor. Using a single-step stateless process manager to implement what is currently called a function is possible, but I'm not convinced that saying it that way is cleaner conceptually.

I see different questions at play here. There's the original discussion on naming what is currently called a stateless function, and which is strangely a type of entity. There's what the future actor-like or process-manager-like thing should be called, and what its characteristics are. And then there's whether there should be both function-likes (stateless event-driven things) and actor-likes (stateful message-driven things), or if Cloudstate should only default to stateful and addressable building blocks (which could still implement the stateless event-driven use cases, in the same ways that actors can) — it is called Cloudstate after all, and the foundation is Akka. But personally, I'm not so inclined to use Process Manager as the name for this fundamental stateful thing, and I'm not persuaded that it's a more approachable term than Actor.

@sleipnir
Copy link

@pvlugter Perfect. You translated my thoughts

@pvlugter
Copy link
Member

Renaming of stateless functions discussed some more on the contributors' call. The name we're settling on instead of function is action. And workflow as a coordinator of actions. This also aligns with terminology used by the CNCF Serverless Working Group (for example, see this glossary).

The handlers for actions may just be called handler (rather than command handler or call handler) — and applies to commands/requests or events from an event source.

We'll look to remove references to "user function" or "client" in the docs and code. We'll continue to use service for the deployable thing, and instance for actual created things. Should we use entity service and action service instead of stateful service and stateless service? We should also look at naming in the protocol, as we have client action there.

It would be useful to create our own glossary of terms to apply consistently. I'll create a separate issue.

@sleipnir
Copy link

Hello guys
How was the evolution of this item defined? We have a PR for Stateless in Python support underway and we want to know which way to go.

ping @pvlugter @jroper @viktorklang @marcellanz

@marcellanz
Copy link
Contributor

We probably should discuss that and align any further work also with the planned roadmap I think. There also is an implementation for Node. It would be good to know how both of these should be in line with this PRs open questions.

@viktorklang
Copy link
Contributor

@sleipnir @marcellanz @pvlugter @jroper I think the renaming should be handled separately from the metadata support. From what I recall, @jroper and I had both created a bit of a proof-of-concept for representing metadata in the Cloudstate protocol. Then of course it remains to see what hte best representation looks like from an user-language support API perspective.

@sleipnir
Copy link

@viktorklang It's all right. Share the proof of concept with us.

@viktorklang
Copy link
Contributor

@sleipnir #377

@sleipnir
Copy link

@viktorklang I thought it was something new. That's old :P

@viktorklang
Copy link
Contributor

@sleipnir LOL—I'M OLD!

@marcellanz
Copy link
Contributor

@sleipnir @marcellanz @pvlugter @jroper I think the renaming should be handled separately from the metadata support. From what I recall, @jroper and I had both created a bit of a proof-of-concept for representing metadata in the Cloudstate protocol. Then of course it remains to see what hte best representation looks like from an user-language support API perspective.

@viktorklang Yeah, let's separate it from this PR and discuss and decide it in a Discussion or Issue? Also @viktorklang, let us know what you and @jroper see that defines or guides the implementation of the stateless protocol and what requirements you expect to be met from a language support implementation that is perhaps not documented in the protocol so far, so that a language implementation can be guided around that.

@sleipnir
Copy link

@viktorklang @pvlugter @marcellanz I think that if the user api should have another semantics it could be declared explicitly in the protobuf file, as it is the natural contract that guides all implementations. So to begin with we could rename the protobuf according to the names chosen in the discussion that Peter mentioned. But it would be interesting to move this around quickly and improve what is needed incrementally. What we cannot do is give the impression that we are static by not going in any direction

@viktorklang
Copy link
Contributor

@sleipnir Makes sense to me, @pvlugter and @jroper can you take point on this?

@pvlugter
Copy link
Member

Discussed on the contributors' call. We've reconfirmed that we'll rename stateless function to action. I'll find some time to update this PR with rename of Controller to Action.

We also want to update the protocol, and this is a good opportunity to start trying protocol versioning and changes. So let's also make the name changes in the protocol (separate from this PR) and bump the protocol minor version. Language supports should start checking the version — it's too early to worry about supporting multiple protocol versions, something to start looking at with a stable 1.0 protocol — but we should have a compatibility check between the proxy and service implementation (user function) and try out the experience of mismatches.

@marcellanz marcellanz mentioned this pull request Sep 16, 2020
@pvlugter pvlugter force-pushed the java-controller-support branch 2 times, most recently from 238ab0f to b2d664c Compare September 20, 2020 23:32
jroper and others added 2 commits September 21, 2020 16:06
I've only added metadata support inputs for event sourcing and crdts, I
haven't added it for outputs yet.

Controller support is in full.
@pvlugter pvlugter force-pushed the java-controller-support branch from 38465bf to 34ae044 Compare September 21, 2020 04:06
Copy link
Member

@pvlugter pvlugter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rebased this PR, and then renamed Controller to Action. We can merge this now and then continue with updating the protocol and any other changes.

Something to note: this PR updates Java support to JDK 11. I'm assuming this only for the Flow types. We could also keep supporting JDK 8 and publish multi-release jars. @sleipnir, let us know if the new minimum Java version 11 works for the other JVM support libraries.

@pvlugter pvlugter merged commit 227edc6 into cloudstateio:master Sep 21, 2020
@sleipnir
Copy link

@pvlugter I will check again but it looks like it will work with Java 11. I need to take a look at the travis builds as they must be using Java 8

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants