-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FireStore listener as a service stop working after a day #2721
Comments
When you say "without any errors" - what happens to the task exposed as
? That would be the first piece of interesting information. In terms of further diagnostics, there's nothing particularly built into the library to help here, unfortunately. You could turn on gRPC tracing, but that would generate a huge amount of logging. Does everything else in the application continue to work? In particular, are you doing other Firestore operations on the same If we could work out a way of reproducing this in an isolated way, that would be ideal. Do you think having one console application which had basically the code you've already shown, and another that just updates a single user entry once every (say) 15 seconds, would be a good start? I'd be happy to create that and just leave it running. |
I've now written some test code. Will leave it running and see what happens: using Google.Cloud.Firestore;
using System;
using System.Linq;
using System.Threading.Tasks;
namespace Issue2721
{
class Program
{
static void Main(string[] args)
{
Task updatingTask = UpdateAsync(args[0]);
Task listeningTask = ListenAsync(args[0]);
updatingTask.Wait();
listeningTask.Wait();
}
static async Task UpdateAsync(string projectId)
{
var db = FirestoreDb.Create(projectId);
var document = db.Collection("Issue2721").Document("testdoc");
while (true)
{
var now = DateTime.UtcNow;
await document.SetAsync(new { LastUpdated = now });
Log($"Updated to {now:HH:mm:ss.ffffff}");
await Task.Delay(TimeSpan.FromSeconds(15));
}
}
static async Task ListenAsync(string projectId)
{
var db = FirestoreDb.Create(projectId);
CollectionReference collection = db.Collection("Issue2721");
FirestoreChangeListener listener = collection.Listen(snapshot =>
{
var doc = snapshot.Documents.First();
var value = doc.GetValue<DateTime>("LastUpdated");
Log($"Observed value {value:HH:mm:ss.ffffff}");
});
await listener.ListenerTask;
}
static void Log(string message) =>
Console.WriteLine($"{DateTime.UtcNow:HH:mm:ss.ffffff}: {message}");
}
} |
Hey jskeet. Was working ok after some time? I will deploy your code to Azure Web Site and also will leave for couple days. Will update you on monday with results |
Eek! I sincerely apologise - I dropped the ball on this one. I started it running on my laptop first, and of course that dropped the connection when the laptop went to sleep. Will get it running on a desktop, writing to a file so I can see what's going on... |
(It's been running for 4 hours now - no problems so far, but we'll see what happens over the next 24-48 hours...) |
First test died after 18 hours due to the "updating" code failing with a ServiceUnavailable error. Rerunning - I'd really like to see the exact behavior you described. |
Yep, also deployed to Azure. Will provide link tomorrow. So if its running on Azure how to prevent from getting ServiceUnavailable error? Just to re-attach the listener to continue listening for changes? |
I had the same issue but it disappeared by itself somehow. I mean listeners stopped to work without exception after about one day |
@DenSmoke: I'm not quite sure I understand what you mean by "it disappeared by itself" - the issue disappeared, or the listener disappeared? If you still have listeners that are stopping reporting changes (but without the task failing), that's what we're trying to reproduce. @nicksav: The listener should reattach itself on ServiceUnavailable errors (I think - it certainly does for some errors; I'd need to check exactly which). In my case it was the code making the change that failed with that error. |
@jskeet I couldn't reproduce the issue again so "issue disappeared". It was on beta11 |
The new test has been running for 25 hours with no problems so far. That doesn't mean it won't happen eventually though... |
Problem happen again in .NET Core 2.2 console app with Google.Cloud.Firestore beta14 on Ubuntu 18.04 running in Compute Engine f1-micro instance. |
I have a similar issue. I am running an Asp.Net Core 2.1 Web API on Google Cloud App Engine. The FirestoreListener stops working eventually. I can't say it is precisely after 24 hours. It seems just random. From my analysis so far, it seems to be an issue with Firestore backend rather than the Firestore .NET SDK. Firestore randomly fails when reading/writing documents. It is not just about the listeners. For example. Yesterday my Asp.Net Core Web API had the following exception when getting a document snapshot: Grpc.Core.RpcException: Status(StatusCode=PermissionDenied, Detail="Missing or insufficient permissions.") I have seen this at least once every month. That is not the only one. I also have this one randomly once in a while when reading documents: Grpc.Core.RpcException: Status(StatusCode=Unavailable, Detail="The datastore operation timed out, or the data was temporarily unavailable.") My conclusion is that, as a beta product, it is possible that Firestore backend randomly fails to serve requests for whatever reason. For the specific listener issue, my workaround is to have a function triggered by a timer which will restore the listener as required. |
Thanks for the updates everyone - and particularly for that extra detail, @pauloevpr. I'll contact the Firestore internal team and point them at this issue. They may have some ideas about what's going on. |
Internal feedback suggests this may be a gRPC issue rather than a server side issue. Obviously it being hard to reproduce doesn't help us :( (The test I started 5 days ago is still going strong...) @pauloevpr: I've filed #2760 for the Unavailable issue. The PermissionDenied issue is odder. Have you seen it more than just the once? If you see it again, could you send me an email ([email protected]) with your project ID and a timestamp, and we'll look in the logs? |
Thanks - will check mine! |
@nicksav: Thanks very much for the log - that's interesting to see. Interestingly, my test failed, but in a different way: the process was aborted. I realized that the Main method wasn't going to tell me the listener task status, either. I've changed it to: static void Main(string[] args)
{
try
{
Task updatingTask = UpdateAsync(args[0]);
Task listeningTask = ListenAsync(args[0]);
listeningTask.Wait();
Console.WriteLine("Listening task finished");
updatingTask.Wait();
}
catch (Exception e)
{
Console.WriteLine(e);
}
} I've restarted the test. I suspect eventually we'll need to capture gRPC detailed logs, but as lots of folks are now going on holiday, I suspect that'll be in 2019. Apologies that it's taking so long to work out what's going on here, but we'll get there eventually... |
This issue with listeners started to happen suspiciously often. I had to restart our server 3 times today. Is there some good workaround to detect listener disfunction? I can restart listeners after some interval but this will consume operations quota |
My test failed today, but did so in a way which could be detected: the task returned by
This is great news! It could be an explanation of what's going wrong:
I can easily just swallow any exception that occurs while closing the stream, although I'd eventually want to have something a little more elegant than blindly swallowing it. At this stage I'd rather not publish (on nuget.org) another full release before Christmas, but I'd be happy to create a "test package" for you to try, if you'd be willing to do so. Even if you would rather not try that (and I'll test the fix either way myself) it suggests that if you just use the |
Thanks for explanation. I never awaited ListenerTask result so didn't see that exception was thrown. I can easily observe it now and restart listener on failure |
@DenSmoke: Should I take it from that comment that you'd prefer not to test a speculative fix? (I completely understand if so - just making sure I've understood you correctly.) I'm going to try that speculative fix now... (with a tiny bit of extra diagnostic code to observe when it's happened). |
@jskeet Yes, you understood correctly. I have not enough time to test the fix now |
@nicksav: Thanks - I had another error over the weekend, so need to work a bit more on the fix, but I'll hopefully have an interim build to test some time tomorrow. |
Update: I've had to change my test code to handle failures in the "update" side as well; I've also started logging the errors that cause the stream to be closed in the first place. It's possible that my fix did actually work, but that a lack of further updates was stopping it from showing anything more. |
I saw a failure yesterday and this morning which was unrelated (and which I've asked about internally). I could merge my expected fix in and do another release today; I don't believe it's harmful, and it would let others test over Christmas. |
This contains a slightly-speculative fix for issue googleapis#2721, where listeners may fail after a while.
This contains a slightly-speculative fix for issue #2721, where listeners may fail after a while.
I've now released 1.0.0-beta15. I won't close this issue just yet, but your feedback would be much appreciated. (The only change compared with 1.0.0-beta14 is swallowing exceptions when closing the listener stream.) Note that if you get an exception about "Server removed watch target" that's a separate issue - I've seen it twice so far today, but never before. |
Note that until #2776 is fixed (and even after that, really), it's still worth watching the |
hi guys |
@nicksav the most simple way: private async void ListenerRestarter(FirestoreChangeListener listener, Func<CancellationToken, Task> startListener, CancellationToken ct = default)
{
try
{
await listener.ListenerTask.ConfigureAwait(false);
}
catch (Exception ex)
{
if (ex.GetType() != typeof(TaskCanceledException))
{
//handle exception
await startListener(ct).ConfigureAwait(false);
}
else
throw ex;
}
} |
Just to make it easier private async void ListenerRestarter(FirestoreChangeListener listener, Func<CancellationToken, Task> startListener, CancellationToken ct = default)
{
try
{
await listener.ListenerTask.ConfigureAwait(false);
}
catch (Exception ex)
{
if(ex is TaskCanceledException)
throw ex;
//handle exception
await startListener(ct).ConfigureAwait(false);
}
} |
The listener ran without any exceptions for just over two weeks, so I suspect that we're fine apart from #2776. |
Something strange is going on today. My restarter catched these errors (time in UTC):
Idk is it related to this issue or not UPDATE:
|
Odd indeed. I'm at a conference right now, but will look into it when I can. |
@DenSmoke: Do you have access to a stack trace at all? It would be interesting to know exactly where that's coming from. "Aborted" isn't one of the status codes we retry on - but I'll ask whether we should. |
@jskeet program is working on Compute Engine instance so I can't debug it. But I can modify exception catcher to log stack trace also |
@DenSmoke: That would be handy, thanks. I've asked internally about whether we should add Aborted to the list of status codes to retry. |
@DenSmoke: It looks like I should add Aborted to the list of server statuses to retry. I'll do that tomorrow. |
@jskeet New errors with stack traces. What's interesting - they were thrown right after program start. Until January 11, there were almost no such errors, but now I see about 3 errors a day.
|
That's interesting about them being on start-up. It's good to know they're in MoveNext. When I've added retry, I'd expect these to go away. I'm hoping to make the changes and create another release today. |
- Retry on Aborted - Use RpcException when the server removes a target (we don't want to retry) Fixes googleapis#2776 Fixes googleapis#2721
- Retry on Aborted - Use RpcException when the server removes a target (we don't want to retry) Fixes googleapis#2776 Fixes googleapis#2721
Version: Google.Cloud.Firestore-1.0.0-beta13
Guys, I am having some issues with FireStore real time updates. I have deployed a web app to Azure Web Site. On a start up, this application subscribes to the updates in the FireStore users table.
`
`
All works perfect, but after a couple hours, or sometimes days, it just stop receiving updates without any errors. Is there any way for me to debug it to understand why?
The text was updated successfully, but these errors were encountered: