-
Notifications
You must be signed in to change notification settings - Fork 265
Ability to add named nodes #712
Comments
We already have .name. Why can't we use that mechanism?
The challenge in general is that the logical graph does not correspond to
the physical graph in very obvious ways.
…On Fri, Jan 27, 2017 at 15:32 Pankaj Gupta ***@***.***> wrote:
Automatically generating good names is very hard. It would be great if
summingbird allowed users to have some control over how Heron tasks are
named. One way would be to allow using named functions i.e. accept
functions with names in the dsl.
source .flatMap("name") { x => ... }
Then while generating task names we could use "name" instead of "FlatMap"
that we use now. Similarly for other components.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#712>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEJdiAbsinMy1MT8-96lMXq4JAtn7Nhks5rWpqagaJpZM4LwYQY>
.
|
You're right if we were to simply replace "FlatMap" with "name" in above example for the entire topology then .name is the best choice. That was actually not what I had in mind I expressed incorrectly, let me elaborate. What users want is to be able to identify nodes with what's executing on them with a name that makes sense to them. .name applies to the entire subgraph. I'll be happy with any good way of doing this but here's one: We could keep the current naming scheme but wherever we find named functions we use them instead, overriding name of only the node where they are executed. e.g. let's say a node was called Tail-FlatMap-Summer and was created using .sumByKey("aggregator") then the node can be called aggregator. If we merge multiple methods together we can use a concatenated name. If it applies to multiple nodes like in the summer above then we do it based on the component, e.g. map side reduce component of sumByKey above can be called "mapsideAggregator". Having two ways of naming i.e. .name and this scheme could be confusing and implementing above could be pretty complicated. But having bad names is a big issue for our users. Sometimes names get insane like: To be honest applying settings by .name is unintuitive to users. What they usually want is for the setting to be applied to just one node and not the entire subtree. We could instead use the above names for applying settings and keep the scope localized. To avoid confusion with .name we can require that only one of the naming mechanisms is used and not both. |
I think we should be able to use .name as an API. We just need improve the
algorithm for mapping the name nodes into names for the physical nodes.
Alternatively, we could make a macro that captures line number and file for
function literals and use the toString method of the functions to identify
position.
We did something similar with scalding to add line numbers to the physical
nodes.
…On Sat, Jan 28, 2017 at 01:58 Pankaj Gupta ***@***.***> wrote:
You're right if we were to simply replace "FlatMap" with "name" in above
example for the entire topology then .name is the best choice. That was
actually not what I had in mind I expressed incorrectly, let me elaborate.
What users want is to be able to identify nodes with what's executing on
them with a name that makes sense to them. .name applies to the entire
subgraph. I'll be happy with any good way of doing this but here's one:
We could keep the current naming scheme but wherever we find named
functions we use them instead, overriding name of only the node where they
are executed.
e.g. let's say a node was called Tail-FlatMap-Summer and was created using
.sumByKey("aggregator")
then the node can be called aggregator. If we merge multiple methods
together we can use a concatenated name. If it applies to multiple nodes
like in the summer above then we do it based on the component, e.g. map
side reduce component of sumByKey above can be called "mapsideAggregator".
Having two ways of naming i.e. .name and this scheme could be confusing
and implementing above could be pretty complicated. But having bad names is
a big issue for our users. Sometimes names get insane like:
"Tail-FlatMap-Summer-FlatMap-Summer-FlatMap-Summer-FlatMap-Source"
It's impossible for most users to keep such complicated names in their
mind. Searching for viz graphs becomes hard. Unlike batch where tasks are
short lived, for realtime each node is like a service and operators have to
interact with it for debugging, frequently.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#712 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEJdoTHudvUtnSQBFt1MA1YCm3YgYmbks5rWy1igaJpZM4LwYQY>
.
|
Automatically generating good names is very hard. It would be great if summingbird allowed users to have some control over how Heron tasks are named. One way would be to allow using named functions i.e. accept functions with names in the dsl.
source .flatMap("name") { x => ... }
Then while generating task names we could use "name" instead of "FlatMap" that we use now. Similarly for other components.
The text was updated successfully, but these errors were encountered: