Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Library sometimes deleted when application closed #343

Closed
3 tasks done
Nividica opened this issue Aug 8, 2024 · 5 comments
Closed
3 tasks done

[Bug]: Library sometimes deleted when application closed #343

Nividica opened this issue Aug 8, 2024 · 5 comments
Labels
System: Windows For Microsoft Windows TagStudio: Library Relating to the TagStudio library system Type: Bug Something isn't working as intended Type: File System File system interactions

Comments

@Nividica
Copy link

Nividica commented Aug 8, 2024

Checklist

  • I am using an up-to-date version.
  • I have read the documentation.
  • I have searched existing issues.

TagStudio Version

Alpha v9.3.2

Operating System & Version

Windows 10

Description

Occasionally when closing Tag Studio, via the X button, it will wipe the ts_library.json file, leaving it with zero bytes, despite the logs claiming the library was saved.

I spent some time fiddling around with the code and stumbled upon a potential solution:
Move the call to self.to_json() outside of the with open( block in the save_library_to_disk function.

Example

        # If `to_json` is called in the `with` block, the file will sometimes fail to write
        jsonLibrary = self.to_json()

        with open(
            self.library_dir / TS_FOLDER_NAME / filename, "w", encoding="utf-8"
        ) as outfile:
            outfile.flush()
            ujson.dump(
                jsonLibrary,
                outfile,
                ensure_ascii=False,
                escape_forward_slashes=False,
            )

Note that having jsonLibrary = self.to_json() inside the with open( block did not fix the issue.

I stumbled into this when I was logging the size of the jsonLibrary to see if it was actually being populated, and comparing the sizes at different places in the function.
(It was consistently the same size, regardless of if it wrote the bytes to disk or not.)

With that change in place I was able to go through over a dozen open/close cycles and it saves the library each time.
Where previously it would fail to save by the time I got to close number 5

I can't pretend to understand why this seems to have solved the issue, so I can not guarantee the robustness of the proposed fix.

Other Things I Tried

  • Moving outfile.flush() after the ujson.dump call
  • Adding outfile.close() after the ujson.dump call

Work Around

During my testing I discovered if I manually save the library then immediately close the app, the bug never seemed to happen.

Additional Info

Expected Behavior

Data should be written to the library file on exit

Steps to Reproduce

  1. Open Tag Studio
  2. Create a new library
  3. Manually save
  4. Note the size of the save file (mine is ~340KB)
  5. Close Tag Studio
  6. Open Tag Studio
  7. Wait for it to load the library
  8. Close Tag Studio
  9. Check the size of the save file
  10. If the save file is not 0 bytes, repeat steps 6-9

Logs

Log output when it fails to save

Saving Library...
[LIBRARY] Saving Library to Disk...
[LIBRARY] Formatting Tags to JSON...
[LIBRARY] Formatting Entries to JSON...
[LIBRARY] Formatting Collations to JSON...
[LIBRARY] Done Formatting to JSON!
[LIBRARY] Library saved to disk in 0.008 seconds
[SHUTDOWN] Ending Thumbnail Threads...

Log output when it successfully saves

Saving Library...
[LIBRARY] Saving Library to Disk...
[LIBRARY] Formatting Tags to JSON...
[LIBRARY] Formatting Entries to JSON...
[LIBRARY] Formatting Collations to JSON...
[LIBRARY] Done Formatting to JSON!
[LIBRARY] Library saved to disk in 0.031 seconds
[SHUTDOWN] Ending Thumbnail Threads...
@Nividica Nividica added the Type: Bug Something isn't working as intended label Aug 8, 2024
@CyanVoxel CyanVoxel added Priority: Critical An issue that requires immediate attention TagStudio: Library Relating to the TagStudio library system Type: File System File system interactions labels Aug 8, 2024
@Nividica
Copy link
Author

Nividica commented Aug 9, 2024

I've continued to try and track down the root cause of the issue, and it seems like it is a timing issue.
Potentially one outside of the app.

Starting with the original code, I added a print for the file path.
Then if I print the size of the file before and after a call to sleep, they disagree with one another

jsonPath = self.library_dir / TS_FOLDER_NAME / filename
print(f"[LIBRARY] Save path: {jsonPath}")
with open(
    jsonPath, "w", encoding="utf-8"
) as outfile:
    outfile.flush()
    ujson.dump(
        self.to_json(),
        outfile,
        ensure_ascii=False,
        escape_forward_slashes=False,
    )
    # , indent=4 <-- How to prettyprint dump
print(f"[LIBRARY] Pre-sleep Size: {jsonPath.stat().st_size} bytes")
time.sleep(0.1)
print(f"[LIBRARY] Post-sleep Size: {jsonPath.stat().st_size} bytes")

On my 3rd close, the bug happened:

[LIBRARY] Saving Library to Disk..
[LIBRARY] Save path: E:\Screenshots\.TagStudio\ts_library.json
[LIBRARY] Formatting Tags to JSON...
[LIBRARY] Formatting Entries to JSON...
[LIBRARY] Formatting Collations to JSON...
[LIBRARY] Done Formatting to JSON!
[LIBRARY] Pre-sleep Size: 348937 bytes
[LIBRARY] Post-sleep Size: 0 bytes
[LIBRARY] Library saved to disk in 0.140 seconds

Then I added a call to sleep just before ujson.dump, and the bug magically goes away.

with open(
    jsonPath, "w", encoding="utf-8"
) as outfile:
    outfile.flush()
    time.sleep(0.1)
    ujson.dump(
        self.to_json(),
        outfile,
        ensure_ascii=False,
        escape_forward_slashes=False,
    )
    # , indent=4 <-- How to prettyprint dump
print(f"[LIBRARY] Pre-sleep Size: {jsonPath.stat().st_size} bytes")
time.sleep(0.1)
print(f"[LIBRARY] Post-sleep Size: {jsonPath.stat().st_size} bytes")
[LIBRARY] Saving Library to Disk..
[LIBRARY] Save path: E:\Screenshots\.TagStudio\ts_library.json
[LIBRARY] Formatting Tags to JSON...
[LIBRARY] Formatting Entries to JSON...
[LIBRARY] Formatting Collations to JSON...
[LIBRARY] Done Formatting to JSON!
[LIBRARY] Pre-sleep Size: 348937 bytes
[LIBRARY] Post-sleep Size: 348937 bytes
[LIBRARY] Library saved to disk in 0.227 seconds

Ran this test 10 times, and no bug.

As a final set of tests I went back to the original code with just the sleep inside the with-block

with open(
    self.library_dir / TS_FOLDER_NAME / filename, "w", encoding="utf-8"
) as outfile:
    outfile.flush()
    time.sleep(0.1)
    ujson.dump(
        self.to_json(),
        outfile,
        ensure_ascii=False,
        escape_forward_slashes=False,
    )
    # , indent=4 <-- How to prettyprint dump

Ran this test 10 times, and no bug.

So there seems to be two independent "fixes" for the bug

  1. Move self.to_json() outside of the with-block
  2. Add time.sleep(0.1) just before ujson.dump

I can't wrap my head around what is going on here.
Option 2 would suggest there isn't enough time between opening the file and writing to the file.
Why would there need to be an artificial wait added between open and write?

If I drastically increase the sleep time for Option 2 to several seconds I can see that when the file is opened for writing the existing contents are, expectedly, cleared with a file size of 0, the data is saved, and the file size goes to ~340K

But none of that explains why Option 1 would have any effect at all, and yet I can't repro the bug when self.to_json() is outside of the with-block.

If the source of the bug really is some kind of race condition, then I'm starting to wonder if a true fix for this would be to save a new library file, delete the old library file, then rename the new file.

@Nividica
Copy link
Author

Nividica commented Aug 9, 2024

I ran with the temp file idea and have not run into the bug again

        tempJsonPath = self.library_dir / TS_FOLDER_NAME / "temp.json"
        jsonPath = self.library_dir / TS_FOLDER_NAME / filename

        # Delete temp file if exists
        tempJsonPath.unlink(missing_ok=True)

        # Write to a temp file
        with open(
            tempJsonPath, "w", encoding="utf-8"
        ) as outfile:
            ujson.dump(
                self.to_json(),
                outfile,
                ensure_ascii=False,
                escape_forward_slashes=False,
            )
        
        # Delete existing libary
        jsonPath.unlink(missing_ok=True)

        # Rename temp file
        tempJsonPath.rename(jsonPath)

In my opinion, of the three solutions, this one feels like the winner.

Pros

  • No weird calls to sleep
  • No worries about someone accidentally "optimizing" the code by getting rid of a variable and moving self.to_json() back inside the with-block
  • If something goes wrong, there is data saved somewhere.
    • If writing to the temp file fails or is interrupted by a crash or power cut, the original ts_library.json file is left untouched
    • If deleting ts_library.json fails, temp.json has the latest data
    • If renaming fails, temp.json still exists
  • The open_library code could be expanded to check for the temp.json and attempt to load it first
    • If temp.json fails to read/parse, fall back on ts_library.json
    • Could potentially check timestamps

Cons

  • Will briefly require twice as much disk space

@thrynae
Copy link

thrynae commented Aug 22, 2024

Just a minor note (which you will probably already think of yourself: there will at some point be a user that decides to name their database temp, at which point you will have a name collision. Perhaps the name would be a better choice? Still not guaranteed to be unique, but if you want a stable name, you need to stop at some point.
jsonPath = self.library_dir / TS_FOLDER_NAME / filename+".temp.json"
Or compute a hash of the file name and use that (+".json") as the temp file name.

@CyanVoxel CyanVoxel added the System: Windows For Microsoft Windows label Aug 25, 2024
@Leseratte10
Copy link

Leseratte10 commented Sep 9, 2024

Also, looking at the code, if the application crashes between the "unlink" and the "rename" call, there would again be data loss (unless, as suggested, code will be added to recover from the temp file in that case).

Maybe instead of unlink and rename (which first deletes the database and then renames the temp one to the real database name) it'd be better to use os.replace.

That way you could call os.replace(temp_db, actual_db) without first deleting the actual database file, and the OS will guarantee (as of Python 3.3) that the move will either fail entirely (keeping the old DB) or succeed entirely (and you'll have the new file saved). That means that there's no need for extra code to recover from a temp file - the database will always be there no matter when the process was interrupted.

Also, at least to me it looks like the code currently in save_library_to_disk could fail on any OS, not just Windows. If the "open" call succeeds (and truncates the existing file) but then the json dump calls fail for whatever reason, the file will probably be corrupted.

CyanVoxel added a commit that referenced this issue Nov 4, 2024
… (#554)

* fix: move `self.to_json()` outside `with` block

* feat: add library autosave on start

* fix: check if `backup_folder` exists before loading backup
@CyanVoxel
Copy link
Member

Should be fixed as of #554, coming with the next 9.4.x patch. If the issue persists, I can reopen this issue.

@CyanVoxel CyanVoxel removed the Priority: Critical An issue that requires immediate attention label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
System: Windows For Microsoft Windows TagStudio: Library Relating to the TagStudio library system Type: Bug Something isn't working as intended Type: File System File system interactions
Projects
None yet
Development

No branches or pull requests

4 participants