Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Realtime websocket loses connection regularly when browser tab goes to background #121

Closed
GaryAustin1 opened this issue Dec 8, 2021 · 90 comments
Assignees
Labels
bug Something isn't working

Comments

@GaryAustin1
Copy link

GaryAustin1 commented Dec 8, 2021

Bug report

Describe the bug

With a simple subscription to a table (RLS or Not) the websocket drops and reconnects every 3 minutes after 5 minutes of tab in background. Verifying on Windows 10 with Chrome and Edge. Note going to different tab on browser puts other tabs in background mode.

To Reproduce

var supabase = supabase.createClient("https://url.supabase.co", "auth code")
let mySubscription = supabase
.from('status')
.on('*', payload => {
console.log('Change received!', payload)
})
.subscribe((status,e)=>{
console.log('subscribe status,error',status,e);
})

Start up subscription to a table. Start Dev Console turn on networking and timestamps. Shrink browser or hide tab under others. Wait >10 minutes.
Websocket will have disconnected and connected several times.

This is a pretty bad condition for real-time if it is to be relied on at all for more than the current active window time. During each disconnect changes to the table would be missed.

It is also compounded that Supabase does not really document error and close handlers for realtime subscription at supabase.js or higher. Only here in the realtime.js description do you see error handlers. It took looking at the supabase.js code to see that one could do .subscribe((status,error)=>{console.log(status,error)} to see connection failures. OTHERWISE THIS IS A SILENT FAILURE AS THE SUBSCRIPTION KEEPS RUNNING EVEN THOUGH DROPPING POTENTIAL UPDATES. Anytime realtime drops the socket, if you want reliable data you have to reload from your tables to get all recent changes.

Expected behavior

The websocket should remain alive even when tab/window is hidden and in background mode. Realtime.js must understand different timings done by background throttle of browsers and keep connection alive.

Screenshots

Below is about 15 minutes after leaving browser in background...

image
image

System information

Windows 10
supabase.js 1.28.5
Chrome or Edge

Additional context

Out of time for a few days but I'll try and get more info on these additional issues.
EDIT: I no longer believe there is an error with multiple connections for a single user.
Further investigation posted in my next issue points to a more general realtime.js bug on losing the refreshed token on a disconnect/reconnect of the websocket for any reason (the above being one).

@GaryAustin1 GaryAustin1 added the bug Something isn't working label Dec 8, 2021
@GaryAustin1
Copy link
Author

Just want to add it is understood by me, and should be understood or made clear to users of real-time, the need to monitor the subscribe and take appropriate action if the subscription fails. The reason I dug into how to watch the subscribe status is I use it as part of my online/offline logic and reload message counts when coming back online. But my overhead is such that doing this with a background tab every 3 minutes is very costly. I’ve already started down the path to use the visibility hook to not reload if hidden but it will still be painful if the user is just moving thru tabs on his screen and goes away fo 10 minutes.

@GaryAustin1
Copy link
Author

Looking into this more I see realtime uses websocket-node. I'm hopeful there are some settings that can deal with the background tab throttle giving client more time to ping. I've found no discussions on that particular package but here are others that have had to deal with this:
https://socket.io/blog/engine-io-4-release/#heartbeat-mechanism-reversal
https://stackoverflow.com/questions/66496547/signalr-and-or-timer-issues-since-chrome-88
SignalR/SignalR#4536

@fschoenfeldt
Copy link

I see that you're using Phoenix. We're having similar issues with the websocket disconnecting after a while – mostly in an inactive browser tab.

@GaryAustin1
Copy link
Author

Adding this comment here also on Chrome and going to 1 minute timeouts (which would break the 60 sec heartbeat).

OK, here's the new bit in Chrome 88. Intensive throttling happens to timers that are scheduled when none of the minimal throttling or throttling conditions apply, and all of the following conditions are true:
The page has been hidden for more than 5 minutes.
The chain count is 5 or greater.
The page has been silent for at least 30 seconds.
WebRTC is not in use.
In this case, the browser will check timers in this group once per minute. Similar to before, this means timers will batch together in these minute-by-minute checks.

https://developer.chrome.com/blog/timer-throttling-in-chrome-88/

@GaryAustin1
Copy link
Author

Not directly related to the above issue (which is still there).

@w3b6x9 I'm not sure who at Supabase is the knowledge keeper of the phoenix realtime socket level stuff...

But I think it is important to understand what level of interruption a websocket can have and still have the server maintain an output queue of realtime changes to that socket.
For instance if the socket drops one heartbeat and then the reconnect process starts is the "channel" queue on the server still there and taking in realtime changes that can then be fed back down to the client on reconnect?
If so how long can the websocket not have connection before the server drops the channel?

This is important in determining when to reload a dataset because changes could have been missed.
Thanks

@w3b6x9
Copy link
Member

w3b6x9 commented Jan 20, 2022

@GaryAustin1 if the server doesn't receive a heartbeat for 60 seconds (this is the default for all Supabase projects running Realtime RLS but can be customized) then it will sever the socket connection. When the socket connection is severed, all of the channel processes (every topic that a client was listening to via the socket conn) dies. On reconnect, all of the topics that the client is listening to will join as new channel processes on the server.

@GaryAustin1
Copy link
Author

GaryAustin1 commented Feb 22, 2022

I've been looking into this a bit. Unfortunately without any sort of short term queue I believe Supabase needs to make clear that when a tab is not in focus, you need to stop realtime, and plan on restarting completely on focus using the visibility event. A tab not in focus can mean just a simple go check stocks on another tab and come back. Based on further research a queue would probably only benefit desktop browsers chrome/edge. Firefox is the only browser that has no issues on desktop. I've not tested on mobile yet.

Firebase does not have these issues with their realtime as they maintain a constant copy of your query and restore behind the scenes when your tab comes back. I believe they might also use webworkers to do this, but not sure.

I was hoping for some way to at least use the 5 minutes Chrome/Edge have before they throttle timers (which kills heartbeat) to keep from reloading if someone just leaves your realtime app/tab for a minute to look at something else, but it appears that only works on desktop. On mobile devices both android with chrome and iPad with safari the results range from lost connection within a minute to 5 minutes depending on powered or battery and only tab or background tab. There appears to be no way to get consistent time before loss of connection across multiple devices and browsers.

Here are some sample traces. Note anytime there is a connect error you could lose data and in some cases the device just stops running the tab and a few old realtime events get logged and then loss of data until device/tab comes back into focus. All of these have a program running generating a constant incrementing update count, so should always be +1 if no data loss.

Edge(Chrome is similar) on windows desktop showing pretty constant visibility event + 5 minutes to failure
edge-windows background tab
Edge (Chrome is similar) on windows desktop showing data loss
edge-windows background

Android/Chrome powered (chrome has a freeze event in addition to visibility but does not seem reliable)
Loss of data from freeze to visible
image

iPad battery front tab going to sleep
image

Then I have traces where "strange" things happen like this android/chrome one... no idea why there are no retries here and code gets to keep running.
image

@GaryAustin1
Copy link
Author

GaryAustin1 commented Mar 4, 2022

@w3b6x9
Can you take a look at this discussion (in particular the flowchart and then the code in the bottom reply) when you get a chance (no hurry) and comment.
supabase/supabase#5641
I've been trying to come up with a straight forward way to handle realtime across all connection failures in a reasonably performant manner. So far limited testing is good on desktop, starting on mobile devices soon.

Unfortunately the code is somewhat custom for each case of a subscription depending on table size (do you want just last x entries or whole table), the id column for keeping the "table" array updated, inserts at beginning or end and filter. I tried to push that code into an update handler.

@GaryAustin1
Copy link
Author

EDIT: Add link
supabase/supabase#5641

@xxholyChalicexx
Copy link

This hasn't been solved yet ?

@w3b6x9
Copy link
Member

w3b6x9 commented May 19, 2022

@xxholyChalicexx we have not visited this issue yet but will investigate in the near future. Is this currently blocking you? Have you come up with alternatives?

@xxholyChalicexx
Copy link

well its not blocking but it gets annoying at times. As of now i try to catch and reconnect, i do that so that too is hit and a miss. Thankfully it doesnt have any adverse effect so for now just making it work.

@zbennett
Copy link

I just wanted to check in and see if there was any movement here?!

@w3b6x9
Copy link
Member

w3b6x9 commented Sep 26, 2022

We'll be implementing a solution for this in the next few weeks. realtime-js heavily draws from phoenix-js and they have already implemented a solution for this: https://github.com/phoenixframework/phoenix/blob/bf1f2bfc9392c515081b1614df1b507f2c120fde/assets/js/phoenix/socket.js#L119. We'll be adopting that solution.

@zamorai
Copy link

zamorai commented Nov 7, 2022

Has the solution been implemented?

@netgfx
Copy link

netgfx commented Nov 28, 2022

Interested in this as well because I'm also seeing disconnects on inactive tabs and it causes the client to not receive changes from the DB which in turn places the client out of sync with the rest of the participants

@GaryAustin1
Copy link
Author

@netgfx,
This is just one of the reasons the connection can be dropped (loss of signal, mobile power savings are at least two others). You need to have code in place to capture the disconnects and restart the process (including loading any old data you need) on every disconnect. Although realtime will in many cases reconnect, any data changes during that process are lost.

The flow chart here might be a bit dated supabase/supabase#5641 but shows a general idea.
Also another user has generated this (I've not used it): https://code.build/p/GZ6ioN6YzcpDwNwGNnDpEn/supabase-subscriptions-just-got-easier

@lhermann
Copy link

I have the same problem as OP and want to express my support for fixing this issue.
My current workaround is to reload data from scratch on every successful SUBSCRIBED event. But they happen constantly when tab is in background so my server gets overwhelmed with reload requests.

@alex1s1
Copy link

alex1s1 commented Apr 3, 2023

Struggling with the same issue. I thought this was a rather easy fix?

@eliasm307
Copy link

I'm currently having this issue. I'm using the NextJs clients and tried to play around with the realtime options e.g. timeout and heartbeatIntervalMs but they dont seem to have any effect

@GaryAustin1
Copy link
Author

GaryAustin1 commented Aug 1, 2023

The problem is the timers get throttled by the browser and not much works that relies on timer, if it is the background or powerdown mode of a mobile device.
You pretty much need to shut it down and wait for visibility to come back and restart everything. IMO.
https://github.com/orgs/supabase/discussions/5641
and
https://github.com/GaryAustin1/Realtime2
have some more info and ideas, but the first is a bit dated. The 2nd not complete.

@netgfx
Copy link

netgfx commented Aug 2, 2023

Could the connection be monitored by a web worker? That would solve the throttle or backgrounded tab issue

@claudio-bizzotto-zupit
Copy link

i have the same problem. With latest version of Chrome on win11, the channel-connection dies and no more updates are received

@ioRekz
Copy link

ioRekz commented Sep 25, 2023

same issue here

@Thimows
Copy link

Thimows commented Apr 8, 2024

w3b6x9

@w3b6x9 Is this solution implemented? I am still seeing the disconnection issue when the tab is not used for a while unfortunately..

@vfatia
Copy link

vfatia commented Apr 25, 2024

We've been struggling with same issue for a while and have found a solution that's working for us by disconnecting from the realtime channel whenever the tab is hidden and reconnecting when the tab is visible.
This avoids the core problem of the heartbeat dying in the background and means we then only need to handle the connect/disconnect graciously.

Here's how we are doing it (app is in Svelte):

	const channelName = `channel.id`;
	channel = $supabaseClient
		.channel(channelName)
  	.subscribe(async (status) => 
   	switch (status) {
    		case 'SUBSCRIBED':
   			await channel.track({ user: $user });
   			// Checks if a notification was sent for a connection error, sends a new notification to update the user. Doesn't send if everything is alright
   			if (channelState === 'error') {
    				$createRealtimeNotification = {
   					id: channelName,
   					type: 'success',
   					action: () => null
    				};
   			}
   			channelState = 'connected';
   			// Drives app logic that the connection is valid
   			connected = true;
   			break;
    		case 'TIMED_OUT':
    		case 'CHANNEL_ERROR':
   			if (!connected) {
   			  // Update state to drive logic on reconnect
    				channelState = 'error';
    				// Send message notifying the channel is disconnected
    				$createRealtimeNotification = {
   					id: channelName,
   					type: 'error',
   					action: () => null
    				};
   			}
   			connected = false;
    		case 'CLOSED':
    		default:
   			connected = false;
   	}
    
   	if (!connected) {
   	// If disconnected reload all server functions
    		invalidateAll();
   	}
  });
};

function reconnectOnTabChange() {
	if (!document.hidden) {
		refreshSubscription();
	} else {
		channel.unsubscribe();
	}
}

onMount(() => {
	refreshSubscription();

	document.addEventListener('visibilitychange', reconnectOnTabChange);

	return () => {
		channel.unsubscribe();
		document.removeEventListener('visibilitychange', reconnectOnTabChange);
	};
});```
 

@netgfx
Copy link

netgfx commented Apr 25, 2024

the above seems like a good solution. If the disconnection is an issue due to browser pausing the tab then perhaps this operation (to keep the connection alive) should be offloaded to a webworker that is never paused by default 🤔

@netgfx
Copy link

netgfx commented Sep 6, 2024

@netgfx what do you think of having a blob and a url option? 🤔

Sure that could be a way to go, although offering too many options could be confusing.
I think if you locally construct the blob from a URL it would be seamless for everyone, no matter where they load the worker from.

The other option would be to add an example to the docs on how to use this technique to bypass CORS restrictions and let the users implement it at will, it is not that complex either just a couple of lines of code.

@filipecabaco
Copy link
Contributor

probably the option of building the blob from the url is really the best will tackle that

@filipecabaco
Copy link
Contributor

filipecabaco commented Sep 9, 2024

Screenshot 2024-09-09 at 11 47 57 CORS still gets triggered with this method 😞
private async _onConnOpen() {
    this.log('transport', `connected to ${this._endPointURL()}`)
    this._flushSendBuffer()
    this.reconnectTimer.reset()
    if (!this.worker) {
      this.heartbeatTimer && clearInterval(this.heartbeatTimer)
      this.heartbeatTimer = setInterval(
        () => this._sendHeartbeat(),
        this.heartbeatIntervalMs
      )
    } else {
      this.log('worker', `starting worker for from ${this.workerUrl!}`)
      const objectUrl = await this._workerObjectUrl(this.workerUrl!)
      this.workerRef = new Worker(objectUrl)
      this.workerRef.onerror = (error) => {
        this.log('worker', 'worker error', error.message)
        this.workerRef!.terminate()
      }
      this.workerRef.onmessage = (event) => {
        if (event.data.event === 'keepAlive') {
          this._sendHeartbeat()
        }
      }
      this.workerRef.postMessage({
        event: 'start',
        interval: this.heartbeatIntervalMs,
      })
    }

    this.stateChangeCallbacks.open.forEach((callback) => callback())!
  }
private async _workerObjectUrl(url: string): Promise<string> {
    const response = await this.fetch(url)
    const blob = await response.blob()
    return URL.createObjectURL(blob)
  }

@netgfx
Copy link

netgfx commented Sep 9, 2024

Screenshot 2024-09-09 at 11 47 57 CORS still gets triggered with this method 😞

private async _onConnOpen() {
    this.log('transport', `connected to ${this._endPointURL()}`)
    this._flushSendBuffer()
    this.reconnectTimer.reset()
    if (!this.worker) {
      this.heartbeatTimer && clearInterval(this.heartbeatTimer)
      this.heartbeatTimer = setInterval(
        () => this._sendHeartbeat(),
        this.heartbeatIntervalMs
      )
    } else {
      this.log('worker', `starting worker for from ${this.workerUrl!}`)
      const objectUrl = await this._workerObjectUrl(this.workerUrl!)
      this.workerRef = new Worker(objectUrl)
      this.workerRef.onerror = (error) => {
        this.log('worker', 'worker error', error.message)
        this.workerRef!.terminate()
      }
      this.workerRef.onmessage = (event) => {
        if (event.data.event === 'keepAlive') {
          this._sendHeartbeat()
        }
      }
      this.workerRef.postMessage({
        event: 'start',
        interval: this.heartbeatIntervalMs,
      })
    }

    this.stateChangeCallbacks.open.forEach((callback) => callback())!
  }
private async _workerObjectUrl(url: string): Promise<string> {
    const response = await this.fetch(url)
    const blob = await response.blob()
    return URL.createObjectURL(blob)
  }

Does the same happen when loading the worker script from a CDN like supabase storage (public) or unpkg.
That pdf-viewer library does something like this:

import { Worker } from '@react-pdf-viewer/core';

<Worker workerUrl="https://unpkg.com/[email protected]/build/pdf.worker.min.js">
    <!-- The viewer component will be put here -->
    ...
</Worker>

so I think loading the worker from a CDN is the norm for this type of functionalities

@filipecabaco
Copy link
Contributor

#423

opened PR with your suggestion (as discussed in Discord) and works really well 🔥

@filipecabaco
Copy link
Contributor

FYI PR has been merged - https://www.npmjs.com/package/@supabase/realtime-js/v/2.10.5-next.2 - 2.10.5-next.2

@netgfx
Copy link

netgfx commented Sep 9, 2024

@filipecabaco I can confirm it works out of the box on codesandbox with minimal configuration:

const client = new RealtimeClient(
        "wss://PROJECT_URL.supabase.co/realtime/v1",
        {
          worker: true,
          heartbeatIntervalMs: 15000,
          logger: console.log,
          params: {
            apikey: "API_KEY",
          },
        }
      );

@filipecabaco
Copy link
Contributor

filipecabaco commented Sep 9, 2024

way better and more elegant! thank you again for finding this approach @netgfx

@appelmoesje
Copy link

I updated to 2.10.5-next.2, and now I’m seeing this error:

Uncaught (in promise) TypeError: URL.createObjectURL is not a function

image

@netgfx
Copy link

netgfx commented Sep 10, 2024

URL.createObjectURL is not a function

URL.createObjectURL is only available in a browser environment, if you are using this library via nodejs I don't think this solution will work.

You can check via:

if (typeof URL !== 'undefined' && typeof URL.createObjectURL === 'function') {
  console.log('URL.createObjectURL is available');
} else {
  console.log('URL.createObjectURL is not available');
}

But if you are using it via nodejs you shouldn't need to use the worker, because nodejs connection shouldn't drop, the issue of this post happens because the browser throttles javascript execution and the websocket connection closes after a while.

@filipecabaco
Copy link
Contributor

filipecabaco commented Sep 10, 2024

But if you are using it via nodejs you shouldn't need to use the worker, because nodejs connection shouldn't drop, the issue of this post happens because the browser throttles javascript execution and the websocket connection closes after a while.

this 👍 for node environments I would not advice to use web workers as they are not properly supported

but the screenshot is from a browser. @appelmoesje could you details which version and which browser are you using?

@appelmoesje
Copy link

@filipecabaco @netgfx I'm using it in a Chrome Extension background script so it does not have the browser context.

@appelmoesje
Copy link

appelmoesje commented Sep 11, 2024

I'm currently using workerUrl to load the local worker.js file. I tried hosting it on a domain, but using chrome.runtime.getURL forces it to look for the file locally, so that wasn't viable (haven't explored this option further). The workerUrl is correct and points to the right file.

export const realtimeClient = new RealtimeClient(
    "wss://URL/realtime/v1",
    {
      worker: true,
      workerUrl: chrome.runtime.getURL('/assets/js/worker.js'),
      heartbeatIntervalMs: 15000,
      logger: console.log,
      params: {
        apikey: process.env.PLASMO_PUBLIC_SUPABASE_ANON_KEY,
      },
    }
);

But I still get the Worker undefined error:
image

The location in the RealtimeClient.ts:
image

@netgfx
Copy link

netgfx commented Sep 11, 2024

@appelmoesje does the same error occur if you completely remove the workerUrl? The latest iteration of the library doesn't require it, it creates it internally.

@appelmoesje
Copy link

If i remove the workerUrl i get this:

image

@filipecabaco
Copy link
Contributor

this seems to be a bigger issue because of Chrome Extensions ways of work ( https://www.reddit.com/r/learnjavascript/comments/18t42ic/stuck_on_trying_to_download_a_blob_for_chrome/ )

Maybe you can encode the content as a base64 blob and send it via workerUrl param?

@appelmoesje
Copy link

I think i found a solution for my case. The code below i added to the background worker. This seems to work for now.

chrome.alarms.onAlarm.addListener(() => {
  console.log('Alarm triggered, keeping the service worker alive.');
  setupSupabaseChannel();
});

chrome.alarms.create('keep-alive', { periodInMinutes: 0.1 });

@filipecabaco
Copy link
Contributor

@appelmoesje nice! yeah since chrome extensions are workers you can just capture and send the event 👍

@njoshi22
Copy link

URL.createObjectURL is not a function

URL.createObjectURL is only available in a browser environment, if you are using this library via nodejs I don't think this solution will work.

You can check via:

if (typeof URL !== 'undefined' && typeof URL.createObjectURL === 'function') {
  console.log('URL.createObjectURL is available');
} else {
  console.log('URL.createObjectURL is not available');
}

But if you are using it via nodejs you shouldn't need to use the worker, because nodejs connection shouldn't drop, the issue of this post happens because the browser throttles javascript execution and the websocket connection closes after a while.

FWIW, my connection is dying / getting a TIMED_OUT on Node.

@filipecabaco
Copy link
Contributor

@njoshi22 could you try reduce the heartbeat time to 20 seconds?

@njoshi22
Copy link

njoshi22 commented Sep 14, 2024

I tried and deployed to a k8s cluster (with persistent connection), still timed out. I was planning on having a server that listened to realtime and then triggered notifications, looks like I won't be able to do that now without relying on Debezium or another external library, given how unpredictable these time outs are.

@filipecabaco
Copy link
Contributor

I think this is from an unrelated issie, try lower the heartbeat to 25 seconds and check if it still persists. Currently I suspect that the default 30s is killing the connection somewhere but I'm still investigating

@njoshi22
Copy link

Hmm - yeah I made it to 15s per your suggestion above. Happy to help debug further!

@filipecabaco
Copy link
Contributor

This has been released in 2.10.7 👍

thank you for all the collaboration on this issue! specially @netgfx 🙏

closing it for now 🎉

@gittyupnow
Copy link

@filipecabaco I can confirm it works out of the box on codesandbox with minimal configuration:

const client = new RealtimeClient(
        "wss://PROJECT_URL.supabase.co/realtime/v1",
        {
          worker: true,
          heartbeatIntervalMs: 15000,
          logger: console.log,
          params: {
            apikey: "API_KEY",
          },
        }
      );

I'm having realtime disconnection issues as well, but I'm using the JS Client Library. When I try that code in Chrome Console for instance, I get "Uncaught ReferenceError: RealtimeClient is not defined". Any idea what I can do? Thanks!

@netgfx
Copy link

netgfx commented Dec 12, 2024

@filipecabaco I can confirm it works out of the box on codesandbox with minimal configuration:

const client = new RealtimeClient(
        "wss://PROJECT_URL.supabase.co/realtime/v1",
        {
          worker: true,
          heartbeatIntervalMs: 15000,
          logger: console.log,
          params: {
            apikey: "API_KEY",
          },
        }
      );

I'm having realtime disconnection issues as well, but I'm using the JS Client Library. When I try that code in Chrome Console for instance, I get "Uncaught ReferenceError: RealtimeClient is not defined". Any idea what I can do? Thanks!

Realtime client is imported like this:
import { RealtimeClient } from "@supabase/realtime-js";
It is a separate library
https://github.com/supabase/realtime-js

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests