Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate readlists and missed books in readlists on scan with StoryArc in ComicInfo.xml. #1317

Closed
5 tasks done
Shakahron opened this issue Nov 29, 2023 · 15 comments
Closed
5 tasks done
Labels
bug Something isn't working released

Comments

@Shakahron
Copy link

Steps to reproduce

Scan library with many readlists set with StoryArc in embedded ComicInfo file.

Expected behavior

There shouldn't be duplate readlists and all books should be added to readlists from a normal scan.

Actual behavior

I have a library full of oneshots with many of them being in a readlist so I don't know if this is an issue for books in series or not. I noticed that occasionally a duplicate readlist with a single book in it will be created and sometimes books won't be added to a readlist unless you specifically scan those books for metadata.

Logs

No response

Komga version

1.8.2

Operating system

Windows 10

Installation method

from download.komga.org

Other details

No response

Acknowledgements

  • I have searched the existing issues (open AND closed) and this is a new ticket, NOT a duplicate or related to another open issue.
  • I have written a short but informative title.
  • I have checked the FAQ.
  • I have updated the app to the latest version.
  • I will fill out all of the requested information in this form.
@gotson
Copy link
Owner

gotson commented Nov 30, 2023

I don't quite understand what the problem is, could you explain more or provide some screenshots maybe?

@Shakahron
Copy link
Author

Shakahron commented Nov 30, 2023

I'll try and explain more. I have a library made up entirely of oneshots, so the oneshots folder is set as the library folder. There are a lot of readlists set with the StoryArc and StoryArcNumber tags in ComicInfo.xml's in this library, so going through them individually would be a time consuming process.

When I scan this library, not all of the books are going into the same readlist. Some are duplicated, some are split off into a seperate readlist and sometimes books that are meant to go into readlists aren't found unless you refresh metadata. I've blurred the screenshots as they are adult:

This one shows 2 readlists where 1 readlist is already complete with 4 books, but a duplicate was made with 1 book.
firefox_giVbW7jKXe

This one shows 2 readlists with 1 book, where there should be 1 readlist with 2 books.
firefox_mMPpARse0e

Here there are 6 books in this readlist where there should be 7. When I refresh the library metadata, sometimes the book is added, but sometimes I need to go and refresh metadata for the book itself.
firefox_nekVUwP30E

I hope that makes it clearer what the problem was.

@gotson
Copy link
Owner

gotson commented Nov 30, 2023

can you retrieve the API response for GET /api/v1/readlists ? I'd need to have a look at the raw data in there.

@gotson
Copy link
Owner

gotson commented Nov 30, 2023

Side question, how many threads did you configure for background tasks?

@Shakahron
Copy link
Author

I configured 8 threads for background tasks

@gotson
Copy link
Owner

gotson commented Nov 30, 2023

I configured 8 threads for background tasks

can you set it to 1 and try if you have the problem ? I surmise that processing in parallel may create duplicate read lists.

@Shakahron
Copy link
Author

Shakahron commented Nov 30, 2023

I'll try that now. By the way for your first question, I'm not sure how to do that, can you point me in a general direction to get that for you?

@Shakahron
Copy link
Author

Shakahron commented Nov 30, 2023

I figured it out. I'll get the API response after I finish testing 1 thread.

@Shakahron
Copy link
Author

Shakahron commented Nov 30, 2023

The library hasn't finished scanning yet but I'm 100% confident in saying setting background tasks to 1 thread has completely fixed the issues. Do you still want the api response for readlists?

@gotson
Copy link
Owner

gotson commented Nov 30, 2023

I have been testing this on my side, even though i could not reproduce the duplicate read list (probably because my database / cpu are too fast on my dev machine), i can see that it could happen. I could definitely reproduce the issue where a read list did not contain all the items it was supposed to contain.

It's a multi-threading issue, there should be some resource locking when creating/updating readlists (and collections too), which is missing.

I am adding that now.

@gotson gotson added bug Something isn't working and removed triage labels Nov 30, 2023
@gotson gotson closed this as completed in a4384a6 Nov 30, 2023
@gotson
Copy link
Owner

gotson commented Nov 30, 2023

The fix won't remove the duplicates though, you will need to do the cleanup yourself.

@Shakahron
Copy link
Author

Do you mean the fix won't remove duplicates already in the library? Or are duplicates just something that can't be prevented with more than one thread?

@gotson
Copy link
Owner

gotson commented Nov 30, 2023

Do you mean the fix won't remove duplicates already in the library? Or are duplicates just something that can't be prevented with more than one thread?

the fix will not remove the existing duplicates, you will have to delete the duplicates yourselves.

the fix will prevent new duplicates (hopefully!!)

@Shakahron
Copy link
Author

That's not a problem at all, thanks so much for taking the time on this.

Copy link
Contributor

🎉 This issue has been resolved in 1.8.3 (Release Notes)

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working released
Projects
None yet
Development

No branches or pull requests

2 participants