[Question] Automatically 'close' assignments with multiple units after some time if missing agents #483

federicoruggeri · 2021-06-17T15:04:48Z

Let's consider a task where each assignment has multiple units (e.g. dialogue task).
I was wondering if it is possible (and what is the best possible way) to automatically mark an assignment as incomplete (or expired), after some time if not all units have been assigned to an agent.

An example to clarify:
task -> dialogue task
assignments -> [1, 2]
units -> [1, 2, 3, 4]

agent_1 connects and gets assigned to assignment_1, unit_1. After X minutes, none has connected to the application and agent_1 is the only one in assignment_1.

I would like to have a timeout that, once expired, marks unit_1 and unit_2 (assignment_1) as expired.

Q1: is it possible?
Q2: what's the best way?

My attempt so far (dialogue task template): I've set a timeout (front-end side, react) that marks episode_done (using onMessageSend). Everything seems fine front-end side (the client gets disconnected, the interface gets updated, etc..). However, it seems like nothing is happening to update the database state of the unit. It seems like something is not running since some worker has still to connect to the assignment.

Am I missing something?

I hope that the description is sufficiently clear.
Thanks in advance!

JackUrb · 2021-06-17T19:55:17Z

Hi @federicoruggeri, I'm not fully sure what the circumstances would be that this feature would be used, but the reason why the backend doesn't update is because there is no world running - we only launch the world for a dialogue task once everyone has connected:

Mephisto/mephisto/operations/supervisor.py

Lines 464 to 484 in c571bf9

    
           # See if the concurrent unit is ready to launch 
        
           assignment = unit.get_assignment() 
        
           agents = assignment.get_agents() 
        
           if None in agents: 
        
               agent.update_status(AgentState.STATUS_WAITING) 
        
               return  # need to wait for all agents to be here to launch 
        
           # Launch the backend for this assignment 
        
           agent_infos = [self.agents[a.db_id] for a in agents if a is not None] 
        
           assign_thread = threading.Thread( 
        
               target=self._launch_and_run_assignment, 
        
               args=(assignment, agent_infos, channel_info.job.task_runner), 
        
               name=f"Assignment-thread-{assignment.db_id}", 
        
           ) 
        
           for agent_info in agent_infos: 
        
               agent_info.agent.update_status(AgentState.STATUS_IN_TASK) 
        
               agent_info.assignment_thread = assign_thread 
        
           assign_thread.start()

As far as an implementation that would allow this functionality, I'm unclear how to do this cleanly.

federicoruggeri · 2021-06-18T08:03:19Z

Many thanks for the very quick reply! It makes sense.
The motivation behind this request is that, as far as I can understand, there's no way to 'exit' a unit once you are in (to free it again).
Thus, for instance, if worker_1 connects to an assignment with 2 units at time X. Then, at time Y (Y >> X), agent_2 connects. Agent_1 is still counted in and it is improbable that agent_1 is still active (because a lot of time has passed). Basically, assignments with more than 1 unit inherently require synchronization -> I was wondering if there's a good strategy to handle the scenario when you don't achieve synchronization.

Please, correct me if I'm saying something wrong :D

JackUrb · 2021-06-18T12:44:33Z

Hm, after a certain timeout, for synchronized (live) tasks, the first Agent should be issued a disconnect and Unit 1 should be put back into the pool (once the person leaves the page). It's possible this isn't happening though, I've heard from others that the disconnect event may not be registered by heroku servers in the last few months.

Likely, we're not triggering this function correctly:

Mephisto/mephisto/abstractions/architects/router/deploy/server.js

Line 244 in c571bf9

function handle_possible_disconnect(agent) {

If the socket isn't disconnecting and sending the close error, we'd need to find a way to catch that here instead (as a ping to a disconnected agent would fail):

Mephisto/mephisto/abstractions/architects/router/deploy/server.js

Line 195 in c571bf9

handle_forward(ping_packet);

Unfortunately until I get a chance to add a local view for heroku logs, I don't imagine this to be an easy thing to debug.

federicoruggeri · 2021-07-08T12:54:12Z

Many thanks for the clear explanation! Do you remember where the timeout is defined? I would like to check how many seconds does the system wait before returning the unit back into the pool.

Thanks in advance.

JackUrb · 2021-07-08T14:01:20Z

The timeout I believe should be around 15 seconds. I added a change in #489 that should cause this to trigger more consistently.

pringshia · 2021-08-19T22:22:10Z

Closing for now, please feel free to reopen the issue if there are any further questions.

pringshia added the question Further information is requested label Aug 19, 2021

pringshia closed this as completed Aug 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Automatically 'close' assignments with multiple units after some time if missing agents #483

[Question] Automatically 'close' assignments with multiple units after some time if missing agents #483

federicoruggeri commented Jun 17, 2021

JackUrb commented Jun 17, 2021

federicoruggeri commented Jun 18, 2021

JackUrb commented Jun 18, 2021

federicoruggeri commented Jul 8, 2021

JackUrb commented Jul 8, 2021

pringshia commented Aug 19, 2021

[Question] Automatically 'close' assignments with multiple units after some time if missing agents #483

[Question] Automatically 'close' assignments with multiple units after some time if missing agents #483

Comments

federicoruggeri commented Jun 17, 2021

JackUrb commented Jun 17, 2021

federicoruggeri commented Jun 18, 2021

JackUrb commented Jun 18, 2021

federicoruggeri commented Jul 8, 2021

JackUrb commented Jul 8, 2021

pringshia commented Aug 19, 2021