Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grammY conversations (Why Scenes Are Bad) #136

Closed
KnorpelSenf opened this issue Dec 16, 2021 · 38 comments
Closed

grammY conversations (Why Scenes Are Bad) #136

KnorpelSenf opened this issue Dec 16, 2021 · 38 comments

Comments

@KnorpelSenf
Copy link
Member

KnorpelSenf commented Dec 16, 2021

Admittedly, that title was clickbait. It seems like you are interested in scenes or wizards, and other programming patters that allow you to define conversational interfaces like they were a finite-state machine (FSM, wiki).

THIS IS A A LOT OF TEXT. Please still read everything before you comment. Let's try to keep the signal-to-noise ratio up high on this one :)

What Is This Issue

One of the features that get requested most often are scenes. This issue shall:

  1. Bring everyone on the same page what people mean when they say scenes.
  2. Why there is not going to be a traditional implementation of scenes for grammY, and how we're trying to do better.
  3. Update you about the current progress.
  4. Introduce two novel competing concepts that could both turn out as better than scenes.
  5. Serve as a forum to discuss where we want to take this library regarding grammY conversations/scenes.

1. What are scenes?

A chat is a conversational interface. This means that the chat between the user and the bot evolves over time. Old messages stay relevant when processing current ones, as they provide the context of the conversation that determines how to interpret messages.

< /start
>>> How old are you?
< 42
>>> Cool, how old is your mother?
< 70
>>> Alright, she was 28 when you were born!

Note how the user sends two messages, and both are numbers. We only know that those two numbers mean two different things because we can follow the flow of the conversation. The two age numbers are following up two different questions. Hence, in order to provide a natural conversational flow, we must store the history of the chat, and take it into account when interpreting messages.

Note that Telegram does not store the chat history for bots, so you should store them yourself. This is often done via sessions, but you can also use your own database.

In fact, we often don't need to know the entire chat history. The few most recent messages are often enough to remember, as we likely don't have to care about what the user sent back in 2018. It is therefore common to construct state, i.e. a small bit of data that stores where in the conversation where are. In our example, we would only need to store if the last question was about the age of the user, or about the age of their mother.

Scenes are a way to express this conversational style by allowing you to define a finite-state machine. Please google what this is, it is essential for the following discussion. The state is usually stored in the session data. They achieve this by isolating a part of the middleware into a block that can be entered and left.

Different bot frameworks have different syntax for this, but it typically works roughly like this (explanatory code, do not try to run):

// Define a separate part of the middleware handling.
const scene = new Scene('my-scene')
scene.command('start', ctx => ctx.reply('/start command from inside scene'))
scene.command('leave', ctx => ctx.scene.leave()) // leave scene

// Define regular bot.
const bot = new Bot('secret-token')
bot.use(session())
bot.use(scene)
bot.command('start', ctx => ctx.reply('/start command outside of scene'))
bot.command('enter', ctx => ctx.scene.enter('my-scene')) // enter scene

bot.start()

This could result in the following conversation.

< /start
>>> /start command outside of scene
< /enter
< /start
>>> /start command from inside scene
< /leave
< /start
>>> /start command outside of scene

In a way, every scene defines one step of the conversation. As you can define arbitrarily many of these scenes, you can define a conversational interface by creating a new instance of Scene for every step, and hence define the message handling for it.

Scenes are a good idea. The are a huge step forward from only defining dozens of handlers on the same middleware tree. Bots that do not use scenes (or a similar form of state management) are effectively forgetting everything that happened in the chat immediately after they're done handling a message. (If they seem like they remember their context, then this is more or less a workaround which relies on a message that you reply to, inline menus, or other information in order to avoid state management.)

2. Cool! So what is the problem?

Scenes effectively reduce the flow of a conversation to being in a state, and then transitioning into another state (ctx.scene.enter('goto')). This can be illustrated by translating scenes into routers:

const scene = new Router(ctx => ctx.session.scene)

// Define a separate part of the middleware handling.
const handler = new Composer()
scene.route('my-scene', handler)
handler.lazy(ctx => {
  const c = new Composer()
  c.command('start', ctx => ctx.reply('/start command from inside scene'))
  c.command('leave', ctx => ctx.session.scene = undefined) // leave scene
  return c
})

// Define regular bot.
const bot = new Bot('secret-token')
bot.use(session())
bot.use(scene)
bot.command('start', ctx => ctx.reply('/start command outside of scene'))
bot.command('enter', ctx => ctx.session.scene = 'my-scene') // enter scene

bot.start()

Instead of creating new Scene objects, we simply create new routes, and obtain the same behaviour with minimally more code.

This may work if you have two states. It may also work for three. However, the more often you instantiate Scene, the more states you add to your global pool of states, between which you're jumping around arbitrarily. This quickly becomes messy. It takes you back to the old days of defining a huge file of code without indentation, and then using GOTO to move around. This, too, works at a small scale, but considering GOTO harmful led to a paradigm shift that substantially advanced programming as a discipline.

In Telegraf, there are some ways to mitigate the problem. For example, once could add a way to group some scenes together into a namespace. As an example, Telegraf calls the Scene from above a Stage, and uses the word scene to group together several stages. It also allows you to force certain stages into a linear history, and calls this a wizard, in analogy to the multi-step UI forms.

With grammY, we try to rethink the state of the art, and to come up with original solutions to long standing problems. Admitting that Update objects are actually pretty complex objects led us to giving powerful tools to bot developers: filter queries and the middleware tree were born, and they are widely used in almost all bots. Admitting that sending requests is more than just a plain HTTP call (at least when you're working with Telegram) led us to developing API transformer functions: a core primitive that drastically changes how we think about plugins and what they can do. Admitting that long polling at scale is quite hard led us to grammY runner: the fastest long polling implementation that exists, outperforming all other JS frameworks by far.

Regarding conversational interfaces, the best we could come up with so far is GOTO. That was an okay first step a few years ago. Now, it is time to admit that this is harmful, and that we can do better.

3. So what have we done about this so far?

Not too much. Which is why this issue exists. So far, we've been recommending people to combine routers and sessions, rather than using scenes, as it does not use much more code, and providing the same plain old scenes for grammY is not ambitious enough.

There is a branch in this repository that contains some experiments with the future syntax that could be used, however, the feedback for it was mixed. It does bring some improvements to the situation as it provides a structure between the different steps in the conversation. Unfortunately, the resulting code is not too readable, and it makes things that belong together end up in different places of the code. It is always cool if the things that are semantically linked can be written close to each other.

As a consequence of this lack of progress, we need to have a proper discussion with everyone in the community in order to develop a more mature approach. The next section will suggest two ideas, one of them is the aforementioned one. Your feedback and ideas will impact the next step in developing conversational interfaces. Please speak up.

4. Some suggestions

Approach A: “Conversation Nodes”

This suggestion is the one the we've mentioned above. Its main contribution is to introduce a more implicit way of defining scenes. Instead of creating a new instance of a class for every step, you can just call conversation.wait(). This will internally create the class for you. As a result, you can have a more natural way of expressing the conversation. The wait calls make it clear where a message from the user is expected.

Here is the example from the top again. Handling invalid input is omitted intentionally for brevity.

const conversation = new Conversation('age-at-birth')

conversation.command('start', async ctx => {
  await ctx.reply('How old are you'))
  ctx.conversation.forward()
})

conversation.wait()
conversation.on('message:text', async ctx => {
  ctx.session.age = parseInt(ctx.msg.text, 10)
  await ctx.reply('Cool, how old is your mother?')
  ctx.conversation.forward()
})

conversation.wait()
conversation.on('message:text', async ctx => {
  const age = parseInt(ctx.msg.text, 10)
  await ctx.reply(`Alright, she was ${age - ctx.session.age} when you were born!`)
  ctx.conversation.leave()
})

This provides a simple linear flow that could be illustrated by

O
|
O
|
O

We can jump back and forth using ctx.conversation.forward(3) or ctx.conversation.backward(5).
The wait calls optionally take string identifiers if you want to jump to a specific point, rather than giving a relative number of steps.

Next, let us see how we can branch out, and have an alternative way of continuing the conversation.

const conversation = new Conversation('age-at-birth')

conversation.command('start', async ctx => {
  await ctx.reply('How old are you'))
  ctx.conversation.forward()
})

conversation.wait()

// start a new sub-conversation
const invalidConversation = conversation.filter(ctx => isNaN(parseInt(ctx.msg.text))).diveIn()
invalidConversation.on('message', ctx => ctx.reply('That is not a number, so I will assume you sent me the name of your pet'))
invalidConversation.wait()
// TODO: continue conversation about pets here

// Go on with regular conversation about age:
conversation.on('message:text', async ctx => {
  ctx.session.age = parseInt(ctx.msg.text, 10)
  await ctx.reply('Cool, how old is your mother?')
  ctx.conversation.forward()
})

conversation.wait()
conversation.on('message:text', async ctx => {
  const age = parseInt(ctx.msg.text, 10)
  await ctx.reply(`Alright, she was ${age - ctx.session.age} when you were born!`)
  ctx.conversation.leave()
})

We have now defined a conversation that goes like this:

O
|
O
| \
O O

That way, we can define conversation flows.

There are a number of improvements that could be done to this. If you have any concrete suggestions, please leave them below.

Approach B: “Nested Handlers”

Newcomers commonly try out something like this.

bot.on('start', async ctx => {
  await ctx.reply('How old are you?')
  bot.on('message', ctx => { /* ... */ })
})

grammY has a protection against this because it would lead to a memory leak, and eventually OOM the server. Every received /start command would add a handler that is installed globally and persistently. All but the first are unreachable code, given that next isn't called inside the nested handler.

It would be worth investigating if we can write a different middleware system that allows this.

const conversation = new Conversation()
conversation.on('start', async ctx => {
  await ctx.reply('How old are you?')
  conversation.on('message', ctx => { /* ... */ })
})

This would probably lead to deeply nested callback functions, i.e. bring us back to callback hell, something that could be called the GOTO statement of asynchronous programming.

What could we do to mitigate this?

Either way, this concept is still tempting. It is very intuitive to use. It obviously cannot be implemented with exactly the above syntax (because we are unable to reconstruct the current listeners on the next update, and we obviously cannot store the listeners in a database), but could try to figure out if small adjustments could make this possible. Internally, we would still have to convert this into something like an FSM, but maybe one that is generated on the fly. The dynamic ranges of the menu plugin could be used as inspiration here.

5. We need your feedback

Do you have a third idea? Can we combine the approaches A and B? How would you change them? Do you think the examples are completely missing the point? Any constructive feedback is welcome, and so are questions and concerns.

It would be amazing if we could find the right abstraction for this. It exists somewhere out there, we just have to find it.

Thank you!

@IlyaSemenov
Copy link
Contributor

IlyaSemenov commented Dec 25, 2021

I kind of implemented "nested handlers" approach in https://github.com/IlyaSemenov/grammy-scenes

I've extracted it to a npm package from one of my actual bot projects.

I am not 100% happy with the API and architecture I ended up with, but it allows to write concise code, while still being quite flexible (it extends the built-in Composer class, thus allowing to fallback to generic scene.use(...) for advanced middleware use cases).

One of the things I'm not sure in is handling the scenes context. I'm not convinced whether scenes should have separate first-class contexts, or if we should simply direct user to use "normal" sessions for their context data. In my initial prototypes I went with the former, which (back then) resulted in very complex Typescript generics, and it also wasn't straightforward when to clear the context and when to keep it. So I ditched all that in favour of "normal" sessions with some simple typings (see example in README) — but I'm not exactly happy about that. First class scene contexts still seem to be a better fit to me, they just need to be carefully implemented.

@KnorpelSenf
Copy link
Member Author

KnorpelSenf commented Dec 31, 2021

Approach C: “Wait and Resume”

On first sight, this suggestion may look similar to Approach A. However, it actually behaves very differently and it makes more sense to start explaining it from scratch.

Remember from https://grammy.dev/advanced/middleware that middleware is a tree in grammY. This tree will be traversed in depth-first order.

We regard a conversation also as a hierarchy of different flows. Always start with question Q0, if the user says YES, continue with Q1, else continue with Q2.

We suggest to use the former hierarchy to express the latter.

Here is the example from the top again. Handling invalid input is omitted intentionally for brevity.

const conversation = new Conversation('age-at-birth')

conversation.command('start', async ctx => {
  await ctx.reply('How old are you'))
  ctx.conversation.resume()
})

conversation.wait()

conversation.on('message:text', async ctx => {
  ctx.session.age = parseInt(ctx.msg.text, 10)
  await ctx.reply('Cool, how old is your mother?')
  ctx.conversation.resume()
})

conversation.wait()

conversation.on('message:text', async ctx => {
  const age = parseInt(ctx.msg.text, 10)
  await ctx.reply(`Alright, she was ${age - ctx.session.age} when you were born!`)
  ctx.conversation.leave()
})

This provides a simple linear flow that could be illustrated by

O
|
O
|
O

We can interrupt the middleware execution by inserting a wait call. We can call resume to continue traversing the middleware tree. In a way, calling ctx.conversation.resume is like calling next in the traditional middleware tree. (Please leave a comment if you think the method should be renamed to ctx.conversation.next in order to display this similarity.)

The wait calls optionally take string identifiers if you want to jump to a specific point, rather than only resuming after the next wait call.

Next, let us see how we can branch out, and have an alternative way of continuing the conversation.

const conversation = new Conversation('age-at-birth')

conversation.command('start', async ctx => {
  await ctx.reply('How old are you'))
  ctx.conversation.resume()
})

conversation.wait()

// start a new sub-conversation
const invalidConversation = conversation.filter(ctx => isNaN(parseInt(ctx.msg?.text ?? '')))
invalidConversation.on('message', ctx => ctx.reply('That is not a number, so I will assume you sent me the name of your pet. What animal is it?'))
invalidConversation.wait()
// TODO: continue conversation about pets here

// Go on with regular conversation about age:
conversation.on('message:text', async ctx => {
  ctx.session.age = parseInt(ctx.msg.text, 10)
  await ctx.reply('Cool, how old is your mother?')
  ctx.conversation.resume()
})
conversation.wait()

conversation.on('message:text', async ctx => {
  const age = parseInt(ctx.msg.text, 10)
  await ctx.reply(`Alright, she was ${age - ctx.session.age} when you were born!`)
  ctx.conversation.leave()
})

We have now defined a conversation that goes like this:

O
| \
O  O
|
O

That way, we can define conversation flows.

Note that the conversation will remain between two wait calls until you explicitly call ctx.conversation.resume. It may be desirable to let the conversation resume automatically, and instead let people remain explicitly. Here is how that could work:

const conversation = new Conversation('word-of-the-year', { autoResume: true })

conversation.on('message', async ctx => {
  await ctx.reply('What is your favourite English word?))
})
conversation.wait()
conversation.on('message', async ctx => {
  ctx.session.word = ctx.msg.text
  await ctx.reply('Why do you want this word to become the word of the year?'))
})
conversation.wait()
conversation.on('message', async ctx => {
  ctx.session.reason = ctx.msg.text
  await ctx.reply('Is there anything else you would like to say?'), {
    reply_markup: new InlineKeyboard().text('Nope', 'no')
  })
})
conversation.wait()
conversation.on('message', async ctx => {
  ctx.session.comment = ctx.msg.text
  await ctx.reply('Thank you for you submission'))
})
conversation.callbackQuery('no', ctx => ctx.reply('Skipped. Thanks for the submission!'))

Personally, I think this is so far the best option we have. I prefer it over the other two approaches. What do you think?

@KnorpelSenf
Copy link
Member Author

KnorpelSenf commented Dec 31, 2021

How Approach C Resembles Imperative Code

It struck me that we are getting full flexibility with statements, branching, loops, functions, and recursion using Approach C. In other words, we have the flexibility of code when designing how conversations work. This is a little bit revolutionary and maybe not too obvious on first sight. I would like to illustrate how it will work.

Statements

A statement in conversations is simply all middleware between two wait calls. Example:

// First statement
conversation.on('message', ctx => { /* ... */ })
conversation.wait()
// Second statement
conversation.on('message:text', ctx => { /* ... */ })
conversation.on('message:photo', ctx => { /* ... */ })
conversation.on('message', ctx => { /* ... */ })
conversation.wait()
// Third statement
conversation.on('message', ctx => { /* ... */ })

These individual blocks of handlers act like the individual instructions of a conversation.

Branching

This has already been described in the post above.

In short, you can use filter calls in the middleware (or branch or on or custom middleware or whatever) to perform branching the middleware tree. The branches of the middleware tree translate to the branches of the conversation hierarchy.

Loops

Iteration is possible by recursion (preferred, see below), or by jumping back to a fixed step. Admittedly, this is rather a jump statement than an actual loop. (If you have a use case for for/while/do-while loop and a syntax suggestion, we may add syntactic sugar abstracts away from the jumps.)

Example of jumping back:

conversation.on('message', ctx => { /* ... */ })
conversation.wait('begin-loop') // loop starts here
conversation.on('message', ctx => { /* ... */ }) // first loop statement
conversation.wait()
conversation.on('message', ctx => { /* ... */ }) // second loop statement
conversation.wait()
conversation.filter(condition, ctx => ctx.conversation.resume('begin-loop')) // loop condition
conversation.on('message', ctx => { /* ... */ }) // broke out of loop

Functions

You can define reusable parts of conversations. This lets you define sub-conversations. They can be included in different places of the main conversation, and nested as deep as you like. This is the same concept as functions in code: reusable named pieces of control flow.

In order to illustrate this, lets define a small part of the conversation that verifies the identity of the user by asking for their birthday.

const captcha = new Conversation('captcha', { autoResume: true })
captcha.use(async ctx => {
  await ctx.reply('Please enter your birthday to proceed. Day of month?', {
    reply_markup: numberKeyboard(1, 31) // builds a keyboard 31 buttons with numbers, omitted here
  })
})
captcha.wait()
captcha.on('callback_query:data', async ctx => {
  ctx.session.captcha.day = ctx.callbackQuery.data
  await ctx.reply('Month?', { reply_markup: numberKeyboard(1, 12) })
})
captcha.wait()
captcha.on('callback_query:data', async ctx => {
  ctx.session.captcha.month = ctx.callbackQuery.data
  await ctx.reply('Year?', { reply_markup: numberKeyboard(1900, new Date().getFullYear()) })
})
captcha.wait()
captcha.on('callback_query:data', async ctx => {
  ctx.session.captcha.year = ctx.callbackQuery.data
  await verifyBirthday(ctx, ctx.captcha) // checks birthday against a database, calls `ctx.conversation.leave` if incorrect
})

We now have defined a “function” for checking the birthday. We can “call” the function by passing it as conversational middleware to another conversation.

const mainConversation = new Conversation('rocket-launcher')

mainConversation.command('launch_rocket', captcha, async ctx => {
  await ctx.reply('You are verified, launching rocket 🚀')
})
mainConversation.command('purge', captcha, async ctx => {
  await ctx.reply('You are verified, purging all data')
})

Note how we can use the same captcha in different places. It behaves just like a middleware tree does, and it is closely integrated with grammY's middleware tree.

Recursion

It should be possible to use the middleware recursively. This implies some sort of call stack, but it isn't quite clear yet if we actually need to store it or not.

In order to use recursion, we could for example restart the captcha with another attempt if the verification fails. In the last step, we could do:

captcha.on('callback_query:data', async (ctx, next) => {
  ctx.session.captcha.year = ctx.callbackQuery.data
  await next()
})
// `verify` checks the birthday in `ctx.session.captcha` against a database, and returns `true` if it is correct
captcha.filter(verifyBirthday, ctx => ctx.conversation.resume()
// else: restart captcha
captcha.use(async (ctx, next) => {
  await ctx.reply('Wrong birthday, try again!')
  await next()
})
captcha.use(captcha)

This will re-enter the captcha until it is correct.

In reality, you probably also want to allow the user to exit the captcha manually, this was omitted here for brevity.

@IlyaSemenov
Copy link
Contributor

  1. How does one actually enter (start) a conversation? Please come up with sample top-level bot instance integration.

The only way I see is that one always uses a single top-level conversation with bot.use(topLevelConversation) which in turns handles /commands. Is that what you have in mind?

  1. How do you see this:
mainConversation.command('launch_rocket', captcha, async ctx => {
  await ctx.reply('You are verified, launching rocket 🚀')
})

surviving server restart? What if a user takes a 10 minutes break before selecting a month in the captcha sub-conversation, and CI/CD restarts the server during that time?

The command handler is by its nature momentary, it either completes or fails... unless I'm missing something.

  1. This proposed API is not exactly obvious:
// First statement
conversation.on('message', ctx => { /* ... */ })
conversation.wait() // <--- what do we "wait" for here? a message (above), or text/photo (below)?
// Second statement
conversation.on('message:text', ctx => { /* ... */ })
conversation.on('message:photo', ctx => { /* ... */ })
conversation.on('message', ctx => { /* ... */ })
conversation.wait()
// Third statement
conversation.on('message', ctx => { /* ... */ })
// <--- why don't we have wait here?

I'd rather suggest something like:

conversation.step(step => {
  step.on('message', ctx => { /* ... */ })
})
conversation.step(step => {
  step.on('message:text', ctx => { /* ... */ })
  step.on('message:photo', ctx => { /* ... */ })
  step.on('message', ctx => { /* ... */ })
})
conversation.step(step => {
  step.on('message', ctx => { /* ... */ })
})

This is immediately obvious what are the conversation steps and how the handlers are grouped. There is also no discrepancy in the number of wait calls. (Essentially, that is what I did in grammy-scenes).

@KnorpelSenf
Copy link
Member Author

KnorpelSenf commented Jan 1, 2022

Thanks for the feedback! :)

Those are some good questions, here are the explanations:

1

We want to be able to start conversations very flexibly from anywhere in any handler. This means that we cannot simply pass a conversation as middleware to start it. As an example, this does not work:

const conversation = new Conversation('id')
// ...

// This is NOT a good idea:
bot.filter(ctx => ctx.chat?.type === 'private').command('start', conversation)

The conversation must be able to receive all future updates in order to continue where it left off. In the above example, it would only be able to handle /start commands in private chat. We need to register it with bot.use. Hence, from the top-level, it will be used just like a menu from the menu plugin:

const conversation = new Conversation('id')
// ...

// Install it:
const bot = new Bot('token')
bot.use(session({ initial() { return {} } }))
bot.use(conversation)

// Enter a conversation from anywhere:
bot.filter(ctx => ctx.chat?.type === 'private').command('start', ctx => ctx.conversation.enter('id'))

Once you entered a conversation, the flow will follow the natural flow of the middleware, and sub-conversations are entered by reaching them in the middleware tree execution order.

2

The conversations plugin requires users to install the sessions plugin. As a result, we have access to persistent storage. We can use this to store the current state of the conversation across server restarts.

The plugin will be able to handle arbitrary restarts (potentially after every single incoming update). As long as you do not experience any disk corruption or other data loss, the user will not notice any intermediate downtime between the messages.

3

Maybe my formatting was a little off in the example above. Here are some definitions, I hope they help to shape a better intuition.

Statement. Also called step. A part of the middleware tree between two wait calls. Basically just any number of handlers that describe how to behave at the respective point in the conversation when different types of messages are incoming. The wait calls themselves are not considered part of a step.

wait call. A separator between two steps in the conversation. You can think of them as checkpoints from where to resume the conversation. They are boundaries that divide the middleware tree into adjacent sections. When an update arrives, we

  1. jump back to the last position where we waited,
  2. run the following section of the middleware up to the next wait statement, and
  3. store the position of the conversation in the session in case it was changed by ctx.conversation.resume calls or jumps or other things.

That's why we also don't need a closing wait call. When the middleware tree is exhausted, and no subsequent parts exist, then that's it. Appending another wait call would mean that once we're done with the last section, we'd wait for more messages, and eventually enter an empty part of the tree. The bot would then just stop reacting to anything because there are no handlers after the last wait call.

What are we waiting for? That cannot be answered in a straightforward manner. We obviously don't literally wait, because JS does not allow us to block the execution or something. It's not a sleep statement. The wait calls are simply properties of the middleware tree.

A better way to phrase this question would be to ask what happens when an update reaches the wait call while flowing through the middleware. In contrast, the answer to this new question is easy: the update is discarded. When an update reaches a wait call, it will stop propagating. It will behave like all middleware definitions ended there, and no further handlers were installed. Assuming that the handlers reached above the wait call will advance the state of the conversation, we are basically waiting for a new update to arrive before processing the middleware that is further down below the wait call. So technically, if asked what we are waiting for, we could answer a new update.

@KnorpelSenf
Copy link
Member Author

Regarding your suggestion, that's actually the old scenes approach we know, something that exists in many shapes and forms. You are basically defining a state automaton. Branching and reusability are hard. State automata inevitably lead to spaghetti code. How to conditionally skip a step? How to repeat several steps? How would something like using the captcha look? The answers (if possible at all) include wild jumping from one place in the code to the next, almost always using a number as index, or some kind of string identifiers. It's what I labelled conversational GOTO code in the original post of this issue.

For example, in your plugin grammy-scenes, we need to use move calls. They are basically the JUMP statements from assembly, applied to conversations. If I read move('foo'), I have no idea where to keep on reading. There is no connection from one step to the next. I must read all steps in my entire code, and find the one with the matching name. That sort of interruption in the reading flow does not contribute to readability. (Don't get me wrong, your implementation of scenes is excellent. My point is rather that the scenes abstraction is the wrong thing to do. Defining a flat list of instructions, and jumping back and forth between them puts us back to the stone age. That's why there has not been any official plugin for this yet.)

I understand that many people are looking for exactly scenes, because that's all we've known in the past years. It is so to say the best abstraction we have had until now, and other libs like Telegraf have done it already. However, I am fairly confident that once we give people the power of defining readable conversations that they can structure the same way they structure their code, it will become more obvious why I consider scenes harmful.

@IlyaSemenov
Copy link
Contributor

The conversations plugin requires users to install the sessions plugin. As a result, we have access to persistent storage. We can use this to store the current state of the conversation across server restarts.

That's not really what I was asking. I was referring to the particular piece of code:

mainConversation.command('launch_rocket', captcha, async ctx => {
  await ctx.reply('You are verified, launching rocket 🚀')
})

Let me alter it a little bit:

mainConversation.command('launch_rocket', captcha1, captcha2, async (ctx) => {
  await ctx.reply('You are verified, launching rocket 🚀')
})

In order to persist the state in the workflow one must uniquely identify the workflow position. In the specific code above, one must store not only the position inside a (internally named) captcha conversation, they must also somehow record that the conversation is a part of the outer workflow (unnamed list of launch_rocket command middlewares).

It slipped me at first that it's not bot.command but mainConversation.command which makes it doable indeed, although not exactly trivial. You will probably need to auto-generate these "command" conversation IDs such as based on command name + position of inner middleware in the list?

By the way, what if a user setups two command handlers in a conversation? I suppose you'll need to "stack" them internally?

mainConversation.command('launch_rocket', captcha1, captcha2, async (ctx, next) => {
  if (random() < 0.5) {
    await ctx.reply('You are lucky, proceeding...')
    await next()
  } else {
    await ctx.reply('Bad luck today!')
  }
})
mainConversation.command('launch_rocket', captcha3, async ctx => {
  await ctx.reply('You are verified, launching rocket 🚀')
})

What are we waiting for? That cannot be answered in a straightforward manner.

So that means the proposed library API leads to unreadable/indigestible code, which is not good.

I wasn't referring to myself when I was asking that question. (I personally understand the whole picture with the middleware trees. I published a few open source libraries of my own, including now 3 grammy plugins, and contributed to much more in the past years.) I was trying to illustrate how does this code look in a stranger's eyes. Everyone's going to ask "what do we wait for here?" and there must be a clear answer.

But I think we actually can answer that question: the conversation waits for the "proceed" instruction coming from the previously registered handlers.

@IlyaSemenov
Copy link
Contributor

How to conditionally skip a step?

Using the normal if statement:

main_scene.scene("step1", (scene) => {
	scene.on("message:text", async (ctx) => {
		if (ctx.message.text === "secret") {
			await ctx.reply(`Greetings, mylord`)
			ctx.scenes.move("step3")
		} else {
			await ctx.reply(`Hello`)
			ctx.scenes.move("step2")
		}
	})
})

How to repeat several steps?

You just don't move off the current step:

const main_scene = new Scene()

main_scene.scene<
	MainBotContext &
		SessionFlavor<{
			main_step1?: {
				names: string[]
			}
		}>
>("step1", (scene) => {
	scene.enter(async (ctx) => {
		await ctx.reply(`Send me 3 names.`)
	})
	scene.on("message:text", async (ctx) => {
		if (!ctx.session.main_step1) {
			ctx.session.main_step1 = { names: [] }
		}
		ctx.session.main_step1.names.push(ctx.message.text)
		if (ctx.session.main_step1.names.length >= 3) {
			await ctx.scenes.move("step2")
		}
	})
})

How would something like using the captcha look?

const captcha_scene = new Scene<
	Context &
		SessionFlavor<{
			captcha?: {
				answer: string
				next_scene: string
			}
		}>
>()
captcha_scene.enter(async (ctx, next_scene) => {
	const { answer, image } = await generateCaptcha()
	ctx.session.captcha = { answer, next_scene }
	await ctx.replyWithPhoto(image)
	await ctx.reply(`Enter letter you see on image above`)
})
captcha_scene.on("message:text", async (ctx) => {
	if (ctx.message.text === ctx.session.captcha?.answer) {
		await ctx.scenes.move(ctx.session.captcha.next_scene)
	} else {
		await ctx.reply(`Please try again.`)
	}
})

const rocket_scene = new Scene()
rocket_scene.enter((ctx) => ctx.scenes.inner("captcha", "launch"))
rocket_scene.scene("captcha", captcha_scene)
rocket_scene.scene("launch", (scene) => {
	scene.enter(async (ctx) => {
		await ctx.reply(`Launching rocket!`)
	})
})

This indeed requires passing the ID of the next (sub)step, but is that really a big deal? The code is still concise and reusable.


If I read move('foo'), I have no idea where to keep on reading.

Most of the time you read the next (sub)scene block. Since that is a 95% use case, this could indeed be improved, with ctx.scene.next() or something along the lines. Respectfully, I agree that naming a scene (a conversation step) should not be necessary. Names should be optional for jumps and such.

@grammyjs grammyjs deleted a comment from allcontributors bot Jan 2, 2022
@KnorpelSenf
Copy link
Member Author

KnorpelSenf commented Jan 2, 2022

But I think we actually can answer that question: the conversation waits for the "proceed" instruction coming from the previously registered handlers.

That's a smart way to think about it, I had not even considered that yet. Regarding the proceed instructions as events that “uncork” the wait calls is a cool analogy, it'll make it prominently into the docs once they're written. Good perspective!

@all-contributors add @IlyaSemenov for idea

You will probably need to auto-generate these "command" conversation IDs such as based on command name + position of inner middleware in the list?

Yes, that's correct. (Sorry, for misunderstanding the question at first, I didn't really get you were asking about the internal implementation.)

As we are giving the users the full capabilities of imperative code, we will need to store a stack of function calls each with its respective “return address” as well as a program counter per stack frame. This is more complex to implement than having a map object from string identifiers to steps (like scenes do), but I am happy to deal with complexity if that allows for readable code in user land.

The single remaining challenge I see with the implementation is how to handle conversation.lazy calls. Perhaps this will just fall into place, perhaps not. It only works if we let people generate middleware, not new conversations, that's for sure. We need to figure out what happens when people do strange stuff like conversation.lazy(() => Math.random() < 0.5 ? conversation : []). Either way, this is a single case that needs special attention. Only looking at the API surface, everything else is straightforward in what it does, so the implementation should be pretty clear in behaviour, and hence hopefully easy to implement, too.

but is that really a big deal?

In my opinion, yes! It's a huge deal. Let's see how the three things would look with grammY conversations:

const conv = new Conversation('illustration')

Using the normal if statement:

// We can use block statements just like in if-statements:
const isLord = bot.hears('secret'); {
        isLord.use(async ctx => {
                await ctx.reply(`Greetings, mylord`)
                ctx.conversation.resume()
        })
}
conv.use(async ctx => {
        await ctx.reply(`Hello`)
        ctx.conversation.resume()
})
conv.wait()
// ...

How to repeat several steps?

You just don't move off the current step:

conv.on('message:text', ctx => {
        ctx.session.main_step1 ??= { names: [] }
        ctx.session.main_step1.names.push(ctx.message.text)
  		if (ctx.session.main_step1.names.length >= 3) {
                ctx.conversation.resume()
 		}
})

How would something like using the captcha look?

Illustrated above.


If I may elaborate on the analogy between assembly and scenes: The concept of next_scene looks a lot like manually storing the return address to jump back to. Isn't it cool if we can make grammY abstract that away from us, just like C functions do?

Note that the above code examples still have a lot of potential for being made more readable. So far, this is kind of the “raw” version that directly operates on the conversations abstraction. It helps if we introduce a little bit of syntactic sugar.

Let's add bot.do. It is the same as telegraf.tap. It takes middleware, executes it, and makes sure that next is always called after it's done, no matter what the installed middleware does or does not do with next.

We cannot do if-else statements in the middleware trees, and hence cannot do them in conversations either (except for branch which requires this weird notion of receiving exactly two middlewares). We could for example solve this by adding a new composer method else which gets access to the false case of the most recent filter call, but we will have to confirm that this is actually a good idea. What do you think? It would allow for this:

const conv = new Conversation('illustration', { autoResume: true })

// Using the normal `if` statement:
const text = conv.on('message:text'); {
        const isLord = text.hears('secret'); {
                isLord.do(ctx => ctx.reply(`Greetings, mylord`))
        }
        const notLord = text.else(); {
                notLord.do(ctx => ctx.reply(`Hello`))
                notLord.wait()
                // do more step2 things here using `notLord`
        }
        text.wait()
        // do step3 things here
}

conv.wait()
// ...

No more jumps! This is much, much closer to the actual control flow that you meant by the example you wrote. It makes perfectly clear where the code waits, and where it does not. Compare it to the pseudo-version of the example:

IF text message:
    IF text is "secret":
        reply "Greetings, mylord"
    ELSE
        reply "Hello"
        wait for message
        // do step2 stuff
    wait for message
    // do step3 stuff

This is almost surprisingly identical with how the conversations code looks.


Admittedly, there a still a few details to figure out around grammY conversations. I think it makes sense to iterate further on this idea once we have a working implementation.

For example, we may consider to set autoResume to true by default, or to differentiate between the update falling through to the wait call, versus being handled by other middleware before. I am also not quite sure how mutual recursion will work, whether we need to implement some tail call optimisation internally or what not.

However, those are merely coding challenges. They have been solved before, and we can solve them again. There will be beta releases, we will have to collect feedback, odd edge cases are going to be found, and we will have to reiterate. Either way, the really hard stuff is how to design the API surface, and how conversations can be used, i.e. how the bot code will look. This seems to be pretty clear now. Thank you again for the questions :)

@allcontributors
Copy link
Contributor

@KnorpelSenf

I could not determine your intention.

Basic usage: @all-contributors please add @Someone for code, doc and infra

For other usages see the documentation

@KnorpelSenf
Copy link
Member Author

@all-contributors add @IlyaSemenov for ideas

@allcontributors
Copy link
Contributor

@KnorpelSenf

I've put up a pull request to add @IlyaSemenov! 🎉

@HeeroML
Copy link
Member

HeeroML commented Jan 3, 2022

Looking at both codes of you two, the one @IlyaSemenov postet looks really out of place. It's for me as a noob, much harder to understand, because it's departure from normal code written in grammY. I in the mean time fully understand @KnorpelSenf flow. Though the Else and hear are fishy. They are not replicates of bot.hears. and else isn't a thing previously. That would at least need a word in documentation. I also don't really like, that the Else exists at all.
I don't know if it's a good idea, but I'd rather have something :

if(text.noMatch){ //do something}

noMatch includes the success return of previous call text.hears. i suspect .else would be the same, just that it calls the following code. I hope this makes sense. Feel free to explain .else() 😁

@KnorpelSenf
Copy link
Member Author

KnorpelSenf commented Jan 3, 2022

Else works this way:

// Triggers on updates from private chats
bot.filter(ctx => ctx.chat?.type === "private", ctx => { /* ... */ })
// Triggers on updates not from private chats
bot.else(ctx => { /* ... */ })

// Triggers on forwards
bot.on(":forward_date", ctx => { /* ... */ }))
// Triggers on everything that's not a forward
bot.else(ctx => { /* ... */ })
// Same as else
bot.drop(matchFilter(":forward_date"), ctx => { /* ... */ })

// Triggers on "hi"
bot.hears("hi", ctx => { /* ... */ }))
// Triggers on all updates that don't match the hears trigger
bot.else(ctx => { /* ... */ })

@IlyaSemenov
Copy link
Contributor

We could for example solve this by adding a new composer method else which gets access to the false case of the most recent filter call, but we will have to confirm that this is actually a good idea. What do you think?

I have controversial feelings about this one. On the one hand, this introduces more "state" to the API, and typically the more stateless an app is, the more reliable and scalable it is.

On the other hand, in this particular case everything is already heavily coupled, so it's not making it really worse in that perspective, yet it indeed brings certain useful "sugar" to the user land code. So I guess considering this proposed pseudo wait API else shortcut will fit just fine.

If it were me, I'd also try to come up with something like (rephrasing your functional example):

const [isLord, notLord] = text.branch(text.hears('secret')) // or text.branch(text => text.hears('secret'))
isLord.do(...)
notLord.do(...)

@KnorpelSenf
Copy link
Member Author

bot.else would be completely stateless, just like bot.filter or bot.on do not need state. But based on the feedback from you two, it looks like its behaviour is much less intuitive than I expected. I don't think we should add it. We may reconsider it if someone can come up with a better suggestion.

Returning two composers from bot.branch is indeed an interesting idea, even though it'd be a breaking change. We could either change this behaviour for the next major release, or introduce a second method (maybe bot.ifElse?) which does exactly that.

I don't think we should confuse the composer instances returned by bot.hears with a predicate function that returns a boolean. Those are different things.

@IlyaSemenov
Copy link
Contributor

Consider this:

bot.filter(ctx => random() < 0.1, (ctx, next) => next())
bot.else(ctx => ctx.reply(`Bad luck!`))
bot.use(ctx => ctx.reply(`Lucky!`))

or even:

bot.filter(ctx => random() < 0.1, (ctx, next) => next())
bot.use(async (ctx, next) => { await ctx.reply(`Wait for it...`); await next(); } )
// can we run `else` not directly after `filter`? or do we disallow this during config time?
bot.else(ctx => ctx.reply(`Bad luck!`))
bot.use(ctx => ctx.reply(`Lucky!`))

Somehow else is supposed to be aware of the previous filter predicate result (and/or make sure that it was actually a filter type middleware). You can't simply reuse the previous filter during config time with drop/matchFilter because the filter is not necessary stateless as in my example.

It's not a big deal though (these implementation details are not visible in user land anyway), so I'd rather for it, it could be handy indeed for defining simple 'flows'.

@KnorpelSenf
Copy link
Member Author

I don't think I made clear enough what I meant. Let me give you a working solution

@KnorpelSenf
Copy link
Member Author

I tested both code snippets and they work flawlessly with the linked PR. First snippet:

bot.filter(ctx => random() < 0.1, (ctx, next) => next())
bot.else(ctx => ctx.reply(`Bad luck!`))
bot.use(ctx => ctx.reply(`Lucky!`))

image

Second snippet:

bot.filter(ctx => random() < 0.1, (ctx, next) => next())
bot.use(async (ctx, next) => { await ctx.reply(`Wait for it...`); await next(); } )
// can we run `else` not directly after `filter`? or do we disallow this during config time?
bot.else(ctx => ctx.reply(`Bad luck!`))
bot.use(ctx => ctx.reply(`Lucky!`))

image

Also, regarding your question

// can we run `else` not directly after `filter`? or do we disallow this during config time?

My PR does not interdict this, so the above code snippet actually works. But if we deem it counterintuitive that one can put statements between the if and the else part, and we need to protect people from having such code, then we can maybe add a line that will prevent people from doing that.

@KnorpelSenf
Copy link
Member Author

KnorpelSenf commented Jan 3, 2022

For completeness: there were four other lines in my file.

import { Bot } from "https://raw.githubusercontent.com/grammyjs/grammY/fe8f4ff16d78790c25d6cd62e6a929f93a43a781/src/mod.ts";
const random = () => Math.random()
const bot = new Bot("my-bot-token-was-here");

// each snippet came here

bot.start();

@IlyaSemenov
Copy link
Contributor

I see, this is quite elegant indeed. this.lastElse is still an additional stateful part in the composer but who cares anyway.

@KnorpelSenf
Copy link
Member Author

Well fair enough, we do have to add additional state to the composer (at least we add no extra state to the session).

I am still hesitant towards merging #152 because

bot.filter().else()

does not work, only

bot.filter()
bot.else()

does. It makes a lot of sense to us but it could be really confusing to newcomers.

In contrast, it also does not work to do bot.command("start").command("help") and we expect people to understand that, so perhaps merging would be fine? Idk … I'm undecided.

Either way, next up for me is to start implementing the actual conversations plugin. I have a few more ideas about some tweaks that should make the internals quite elegant, let's see if those work out the way I hope. We can wait for other people to provide more feedback on the else thingy, also if it needs to be renamed to something.

@IlyaSemenov
Copy link
Contributor

else thing apart, what I believe is extremely important for the conversations is somehow being able to push them forward with an external event. A typical use case will be a conversation that directs a user to an external OAuth page and when the auth is complete it somehow "proceeds" the conversation on a IPC signal from the http callback (OAuth success URL) handler.

Obviously, this should not rely on in-memory state.

I kind of added a duct tape solution in my (admittedly) duct tape scenes library, but that's using an ugly monkey patch generating a pseudo Telegram update... definitely not a proper library grade solution.

@KnorpelSenf
Copy link
Member Author

You are right, this is a common use case. I will need it myself.

Conversations are going to be storing their state in the session. Session data is loaded depending on the context object. The context object sources its information from a Telegram update. Hence, we can only operate the conversations when have an update.

We could break this chain at any point:

  1. fake a Telegram update without incoming webhook request/long polling response
  2. fake a context object without Telegram update
  3. load the session data manually without context object
  4. fake the session data without loading it

None of the options are too pretty.

If I understood you correctly, you went with option 1. That should work here, too, but I agree that it's not the best thing to do. Option 2 is really hard because the context object is tightly integrated with the update. Option 4 is even harder because one would have to know and replicate the internal data representation of the conversations. This leaves us with option 3. However, it requires that the user manually loads the session data, and somehow passes it to the conversations plugin (without middleware, because middleware can only be run if we have a context object, which we don't). Again, that's not pretty.

I therefore suggest to avoid a direct integration of conversations with external events.

Instead, we could add another composer method while. As a motivation, let's look at possible pseudo-code for the scenario.

WHILE no auth token in database
    reply "please authorize yourself: <link>"
    wait for message
reply "successfully authorized!"
...

The authorisation flow would then need to conclude with a deep linking redirect. This will open the bot, and send a /start message, which will cause the conversation to proceed.

In summary, the usage would be like this:

  1. User tries to use a feature that requires authorisation
  2. Bot sends auth link
  3. User clicks auth link, authorises the bot, and gets redirected back to Telegram
  4. User clicks START
  5. Bot continues conversation, now having access to the auth token

Analogously, we could add a do-while loop. This would make it simple to avoid the message saying “successfully authorized!” when the authorisation has been completed before.

Alternatively, the bot could proactively send a message to the user when the authorisation is complete. This would happen outside of the conversation.

Implementation-wise, these proposed control structures are trivial to add. I have a working solution locally, it is more elegant than the else method, and it integrates seamlessly with the draft implementation which I have of the conversations plugin.

@IlyaSemenov
Copy link
Contributor

That is correct, I went with option 1 — but that is because I didn't see a direct way for option 2 with monkey patches only.

However, I think grammy itself could implement option 2 natively, but not "faking" a context, rather using a separate context class for handlers of these externally generated updates. This could be well typed and not really be a hack or fake.

What I suggest is something like (just a sketch):

export declare class Composer<C extends Context> implements MiddlewareObj<C> {
  // ...
  onExternal(...middleware: Array<Middleware<ExternalUpdateContext<C>>>): void; // void for proof of concept
}

/*
Just a sketch to show the intention.

Probably it'll be needed to extract a BaseContext interface and extend it here and in Context,
and use it Middleware<> generics.
*/
interface ExternalUpdateContext<C extends Context> extends Omit<C, "update"> {
	payload?: any
}

and then:

conv.onExternal(async ctx => {
  // here we are well aware that ctx is not a Telegram-update context,
  // which, however, properly implements `ctx.chat` and `ctx.reply`
  await ctx.reply(`External event: ${ctx.payload}`)
})

httpServer.post("/callback", async ctx => {
  await bot.handleExternalEvent(ctx.query.chat_id, ctx.request.data)
})

This will actually allow streamlined user experience without having them to poll their status, or click /start deeplink.

@IlyaSemenov
Copy link
Contributor

IlyaSemenov commented Jan 6, 2022

Ah no, that won't work well. If it's going to bypass the other middlewares (non-onExternal), we are incorrectly advertising that this context is derived from C.

If we do pass it to other middlewares (as we want session etc. to work - that's what I do), they could be expecting the Update object which won't be there.

So yes it goes to faking an Update. However, there are still reasonable options:

  • ctx.update could be marked optional
  • ctx.update could be a TelegramUpdate | ExternalUpdate

After all, most of the time ctx.update is not needed in user land. They need shortcuts such as ctx.chat, ctx.from and ctx.reply which still can be implemented.

I understand that you're probably not going to approve either of these for generic use. It's a tough balance between feature richness and simplicity. Ideally, the types should be extensible enough to make this possible with an external plugin.

@KnorpelSenf
Copy link
Member Author

Messing with the update is not going to end well, at least not on the type level. The library is built with the reasonable assumption that Telegram updates are following the specs. Every single field that we might remove or make optional will inevitably remove some guarantees for some user. Note that adding a union member to the context type in middleware is a breaking change for every handler ever written with grammY. Also, you couldn't extend the context class anymore because you cannot extend union types.

This also doesn't make a lot of sense semantically. We write middleware to handle updates. Faking requests to call these handlers from other places is a big hack.

If we want to send a message to a chat upon other events, why don't we just do it? bot.api exists for a reason. It will look identical to the user, and cleanly separates concerns on our end.

You can use transformative context flavours to mess around with the context object as much as TypeScript allows. I don't think it's smart, but you're right that it's flexible enough to support this.

@IlyaSemenov
Copy link
Contributor

If we want to send a message to a chat upon other events, why don't we just do it?

We surely can, but we also want to move the conversation state forward in a natural way. Forcing users to click START once again, or having them click some inline button to proceed is a big hack as well (this time from the UX standpoint).

Thinking more, "moving the conversation" essentially means jumping over the current wait() statement (in the above terms). Can we somehow allow just that maybe? That wouldn't need a fake Update...just updating the session.


Note that in this case, it should be possible to distinguish whether the conversation is still "the same". For instance, imagine there are two commands /foo and /bar both starting two unrelated conversations, both doing some off-Telegram workflow (oauth login for simplicity, but on two different websites). Let's a user was in the middle of /foo conversation, opened the foo login page, then went back to Telegram and typed /bar, followed to the bar login page, and then suddenly completed the previously opened foo login. The bar conversation should not move forward in that case.

I've implemented ctx.scenes.getContinueToken() and ctx.scenes.continue(token) for that scenario, but that needs ctx (thus, faked Telegram update) for proceeding. Merging these two approaches, I think the token could simply include the current session ID, so the usage will be as simple as ctx.conversation.getWaitToken() and then bot.conversations.skipWait(token) or something along the lines (thus, no need to prepare the fake context with fake update).

@KnorpelSenf
Copy link
Member Author

Forcing users to click START once again, or having them click some inline button to proceed is a big hack as well (this time from the UX standpoint).

Fair enough, that's true …

Maybe one of the other solutions is better then.

Thinking more, "moving the conversation" essentially means jumping over the current wait() statement (in the above terms). Can we somehow allow just that maybe? That wouldn't need a fake Update...just updating the session.

I think that's already possible with what I suggested. Let's pick this point up again after I published some code.

Regarding the point about nested conversations, in my current code draft, I disallowed having several conversations in parallel. It's leads to non-obvious behaviour in user land code. Also, I don't think this would be a good user experience.

The described function calls are already possible, and they allow nesting.

@IlyaSemenov
Copy link
Contributor

Regarding the point about nested conversations

I wasn't describing nested conversations. In the scenario above, user started a first conversation with /foo, then typed /bar which fully disrupted the first conversation and started a new one from scratch.

The first conversation started a non-bot job that finished after the new conversation has been started. The proposed global resume/skipWait/whatever API should allow the first conversation to be resumed externally, but only if it hasn't been replaced by a new conversation (or resumed manually with some other .on() handler).

Does that make sense now?

@KnorpelSenf
Copy link
Member Author

I think so, yes. Thanks for the elaborations. I'll drop a message here as soon as the plugin code is ready. I think having something concrete to talk about will make it easier to properly address this use case. Then we can look into how exactly this should look.

@KnorpelSenf
Copy link
Member Author

KnorpelSenf commented Jan 7, 2022

@IlyaSemenov you can checkout the branch conversation-experiments and see a first version of how the implementation will look. This code runs without problems:

type MyContext = Context & ConversationFlavor;
const c = new Conversation<MyContext>("id");

c.do((ctx) => ctx.reply("Hi there, send me a message"));
c.wait();
c.do((ctx) => ctx.reply("Cool"));
c.do((ctx) => ctx.reply("I want another message"));
c.wait();
c.do((ctx) => ctx.reply("Awesome, now I'm done listening to you!"));

const bot = new Bot<MyContext>("");
bot.use(session({ initial: () => ({}) }));
bot.use(c);

bot.command(["start", "help"], (ctx) => ctx.reply("Send /enter"));
bot.command("enter", (ctx) => ctx.conversation.enter("id"));

bot.start();

This will enter the conversation, keeping track of the middleware execution order. When the wait statement is reached, the conversation will stop running, and the call stack is stored in the session. Upon the next message, the stack is read and replayed, hence continuing the conversation from where it left off.

I must warn you! This code is very much not working. This simple example runs, but the actual cool thing of nested statements etc still has at least one bug so it does not even work. That means no branching, no loops, etc. Also, everything is still full of debug statements and the code needs to be refactored more than twice to become more readable, not to mention better commented.

Now that I have expressed how uncomfortable I am with how early I upload this, it may still give you an idea how this will be implemented and what to expect. Please share whatever feedback you have, I know it's almost not working.

@IlyaSemenov
Copy link
Contributor

Please don't apologise - I understand what early prototypes are :)

So with the example above, the conversation intercepts everything:

> /enter
Hi there, send me a message
> /enter
Cool
I want another message

Normally, I would expect /command to be a global escape hatch for whatever conversation is currently active (unless the conversation step itself registers a local command handler).

Moving bot.use(c); below the command handlers makes ctx.conversation.enter crash with Cannot read properties of undefined (reading 'enter').

So I suppose you will want to separate the conversations control middleware and the current conversation.

@IlyaSemenov
Copy link
Contributor

Also, I'm yet to see how the actual handlers will work along with wait's, user input validation and nested conversations.

Right now, if I insert on("message:text") this behaves oddly (from the user perspective - I understand why it works like this):

c.do((ctx) => ctx.reply("Hi there, send me a message"))
c.wait()
c.do((ctx) => ctx.reply("Cool"))
c.on("message:text", (ctx) => ctx.reply(`You sent: ${ctx.message.text}`))
c.do((ctx) => ctx.reply("I want another message"))
c.wait()
c.do((ctx) => ctx.reply("Awesome, now I'm done listening to you!"))

result:

> /enter
Hi there, send me a message
> hi
Cool
You sent: hi
> hmm
I want another message
> more
Awesome, now I'm done listening to you!

@KnorpelSenf
Copy link
Member Author

Normally, I would expect /command to be a global escape hatch for whatever conversation is currently active (unless the conversation step itself registers a local command handler).

Moving bot.use(c); below the command handlers makes ctx.conversation.enter crash with Cannot read properties of undefined (reading 'enter').

So I suppose you will want to separate the conversations control middleware and the current conversation.

I see that point. You are right, this needs some redesign … I'll have to think about that, I'd like to avoid that people must register multiple things, but I'm not sure it's possible.

Right now, if I insert on("message:text") this behaves oddly

That's true, we need to encourage people to always use do:

c.on("message:text").do((ctx) => ctx.reply(`You sent: ${ctx.message.text}`))

I have come to accept this, even though I wasn't too happy about it at the start. Any other solutions I have investigated must somehow set more symbol properties on the context object to track if wait was called, and this adds a fair bit of magic to the wait call.

Right now, it's at least consistent: if you don't call next, the conversation will wait.

That being said, I'm still open for suggestions.

@KnorpelSenf
Copy link
Member Author

KnorpelSenf commented Jan 14, 2022

Another thing I'm pretty unhappy with is the narrow context types after filtering.

const texts = conversation.on("message:text")
{
    texts.do(ctx => ctx.reply("Echo: " + ctx.msg.text))
    texts.wait()
    texts.do(ctx => ctx.reply("Echo: " + ctx.msg.text)) // compiles, but may crash
}

Note that this kind of filtering does not actually work yet.

Currently, the way to mitigate this is to continue on the composer returned by wait, which has the correct types.

const texts = conversation.on("message:text")
{
    texts.do(ctx => ctx.reply("Echo: " + ctx.msg.text))
    const waitedAfterText = texts.wait()
    waitedAfterText.do(ctx => ctx.reply("Echo: " + ctx.msg.text)) // compiles, but may crash
}

This obviously is not ideal, because it still permits crashing code to compile. It makes sense that this is the case, because when resuming the execution, we would have to skip the checks installed by filter in order to jump back to the previous wait call. (This is btw the main part that is not implemented correctly yet.)

I still love the possibilities of being able to express conversational flows as code, and I am convinced that this is a very promising solution. I hope you agree that it's certainly is a better approach than scenes. However, the longer we work on this, the harder are the problems that need to be solved, and that's not a good sign.

I'll first try to solve some of the above problems with the current approach, but at the same time I'd like to explore alternative solutions to this implementation. The three main ideas I have are still very abstract, but they start like this:

  1. So far, the suggestion is to use imperative code to express conversations. If we only think about the typing problem, then a much more natural approach would be to support functional coding patterns. I have no clue how that would look yet.
  2. The suggested implementation could be considered as a way to provide an embedded DSL for defining conversations in JS. We could make this language external. In other words, we could allow people to define conversations either in a new file format with our own syntax, and then parse and interpret these files. That would mean we lose type safety, editor tooling, and basically everything else that goes beyond bare VIM on text files. I do not feel like creating a VSCode extension from scratch for a new programming language. We may want to look in to Xtext, but still that's taking it pretty far. An eDSL is so easy.
  3. We could merge the two above ideas. For example, people could define their conversations in a declarative way as large JS objects. This would allow us to define a schema which should be followed, and it supports at least some reusability. Of all the three ideas, I like this one the least.

@IlyaSemenov
Copy link
Contributor

IlyaSemenov commented Jan 23, 2022

@KnorpelSenf Please take a look at the new version of https://github.com/IlyaSemenov/grammy-scenes (README includes reasonably feature-full examples).

The primary change is that a scene (conversation) is now a middleware like in your examples (without separate step calls like in my earlier implementation).

I've architected things in the way that wait() creates a special type of middleware which breaks the execution, waits for Telegram update, and passes it to the inner middleware, which is supposed to call ctx.scene.resume() if it is pleased with user input and wants to proceed the conversation.

The implementation supports unconditional and conditional nested scenes, including recursion. I also added support for context-local session data (kind of "local variables" for the scenes, as opposed to "global" session data).


UPDATE: So far I've been using grammy-scenes with 2 bots, ~15 scenes, and ~100 "steps", and here are my observations.

@KnorpelSenf
Copy link
Member Author

@IlyaSemenov sorry for the long delay. I did not have much time for this in past few months, and also I started having more and more doubts about this. I have concluded that my idea above is bad. You are right. We have already talked about some of the disadvantages here, and I thought of several more, which I will skip now.

Thanks for your work on scenes, I appreciate it! This is great, it is probably the best implementation of scenes that I've seen so far. I found your observations especially valuable. It's cool to see that composition and nesting based around a concept of waiting and resuming is well-suited. However, I did not change my mind about scenes in general. I still can't read the code in a linear fashion (because every step has a different callback), and I have to know a lot about the plugin to make sense of what all the GOTO jumping does. This still interrupts the reading flow all the time.

This is why I created a new package with a completely different strategy than discussed at any point before. It is very much not usable yet, but there is an example bot in the README which outlines the concept. It can do the wait/resume thing, supports nesting, loops, functions, branching, recursion, error-handling with try-catch-finally, composition across modules, and so on. (We basically get all of this for free, it does not need to be supported explicitly.) It is trivial to implement the system you talked about which lets you resume based on external events (but I wanted to publish before adding that, so it's not supported yet). It has type safe local sessions, even across functions, and the code is more concise than any of the ideas discussed above. Also, it is very easy to understand the code even if you have never seen the package before.

Naturally, there are a few pitfalls. The main thing is that side-effects outside of grammY must be wrapped in function calls. Basically, you can communicate with the user and do everything with the Bot API, but external things which perform side-effects or are non-deterministic must be wrapped in another function call. Some people may know this pattern from React, there is is called useEffect. It is not pretty, so it must be documented well, but it is a known thing and I think people will cope with it.

I published the (unfinished) code at https://github.com/grammyjs/conversations.

Thank you for brainstorming here! I will close this issue now. There will be more things to talk about, but they are going to be specific to conversations, so they can be discussed in new issues in the new repository.

I hope you give this thing a try, I'd be looking forward to your feedback and criticism!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

No branches or pull requests

3 participants