-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why do we repeat our backing up in R-Instat if nothing has changed in the log file? #9236
Comments
@rdstern I agree. I'm surprised by that as well, but what surprises me even more is why the back up operations are written on the log file. I think this was not the case in previous versions. I think the back up operations should not be in the 'users' log file. It should be silent and something that a user never needs to be aware of until when the application crushes or user unintentionally kills the application. |
@Patowhiz happy with your support of my first point. The times of backing up have been included recently - and I'm afraid at my request! That's by @N-thony and after discussions with Stephen. They form part of our recent "ultimate undo system" for those who can cope - and I hope you'll like it. Neat eh, for those who can cope? |
@N-thony after my experience in the Bangladesh workshop I would like to take the backing-up a stage further. This is particularly important for our climatic data. The main data file is usually there. It is often used to get summaries, etc, without being changed. And soon we will be dealing with larger files from sub-dialy data. Then often the daily data will be generated. It will still be useful to keep the primary data available, but it will rarely be changed and hence will need backing up anew. |
What has been the current backup experience with all the data frames (hourly, daily, summaries, etc.) open? I recall hearing that you were working with sub-daily data containing 7 million records, which I believe will become increasingly common in the next 5–10 years. I'm also curious about how this operation performed on users' machines. Did they notice any performance degradation during the process? I’m asking these questions with the understanding that R is single-threaded and might face input/output bottlenecks. The fact that no users have complained about the backup process is encouraging news to me and shows how personal computing has improved in developing countries. |
@rdstern, I’m looking forward to your answers and @jkmusyoka's experience as well. It would also be helpful to check with @N-thony if this bottleneck could be related to the recent undo feature. Memory issues can sometimes slow down an application. My suggestion would be to follow the same steps using a version of the application without the undo feature (or switch it off in the current feature, I hope that's possible) and see if the same performance degradation occurs. The reason I think this should be tested is that it might be simpler to optimise the undo functionality first, before addressing the optimisation of saving the data book contents. |
@Patowhiz thank you for the words of caution here. There are two distinct levels to the tasks proposed here. The first is that we don't backup a new copy of the data book, if nothing has changed. I suggest this is urgent, and probably simple and not related to your point above? So this could be done soon, and merged? Then the second task is more ambitious and does relate to the backing up of individual sheets, which is what is done in undo. Following your words of caution, let's leave that for now. I am happy that it is not in the december release, and we come back to the backing up/undo area in 2025. And for reference, the current undo has limits of the data frames it works on and there is an option to turn it off completely. So undo is only possible on data sheets where backing up is not an issue - except that the data book may contain large data frames, that are not possible to undo, but will of course, be backed up. |
@rdstern Thank you for the feedback! I've now understood your separate points. Regarding point 1, I'm not sure if addressing it will lead to any significant improvement in the performance bottleneck. Over 90% of R-Instat operations involve changes to the data book, and it's rare for a user to go 10 minutes without performing an operation that alters it (in my opinion). I think that's why the developer of the original implementation went for a simpler solution. By "changes to the data book," I mean modifications to the data objects (e.g., data frames) or output objects (e.g., additions or deletions). If we define data book changes more narrowly as modifications to the data frames alone, there might be a very small performance improvement. However, I could be completely wrong and missing the point here. Looking forward to seeing the implementation! |
@ChrisMarsh82 and @N-thony and @Patowhiz I am writing the help and have got as far as the log file. Here is a bit of my latest:
I often have R-Instat open, while I am doing other things. I notice, from the figure above that it backs up every 10 minutes, even if I have not used it, since the last one? Couldn't it check that, and only backup if something has changed? I can imagine situations (with large datasets) where this could be very annoying?
The text was updated successfully, but these errors were encountered: