Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Larger Library Challenges #278

Closed
txwireless opened this issue Jun 14, 2023 · 7 comments
Closed

Larger Library Challenges #278

txwireless opened this issue Jun 14, 2023 · 7 comments
Labels
status(previewed) This issue or pull request should be fixed in a released beta version

Comments

@txwireless
Copy link

txwireless commented Jun 14, 2023

For those of us with larger libraries (in my case 80K+), there are a number of things that I've been fighting through.

  1. Call Stack Issue - raised here Maximum call stack size exceeded #230 and fixed with PR fix Maximum call stack size exceeded issue #231

  2. Error about maximum open files - fixed by adding ulimits in excess of library files to docker-compose.yml config, in my case:
    ulimits:
    nofile:
    soft: "131072"
    hard: "131072"

  3. Various breaks in the syncronization with errors like this:
    [6/14/2023, 9:18:53 AM] INFO Sync-Engine: Detected recoverable error: Request failed with status code 503
    [6/14/2023, 9:18:53 AM] INFO CLI-Interface: Detected recoverable error, refreshing iCloud connection & retrying (# 1)...
    [6/14/2023, 9:18:53 AM] INFO CLI-Interface: ----------------------------------------------------------------------------------------------------------------------------------------------------------
    [6/14/2023, 9:18:53 AM] INFO Sync-Engine: Error occurred with 54135 asset(s) left in the download queue, clearing queue...
    [6/14/2023, 9:18:53 AM] INFO Sync-Engine: Error occurred with 5 pending job(s), waiting for queue to settle...

which result in the library being locked when you attempt to re-rerun the sync:
[6/14/2023, 11:35:47 AM] INFO CLI-Interface: Experienced fatal error at 6/14/2023, 11:35:47 AM: LIBRARY_LOCKED (FATAL): Library locked. Use --force (or FORCE env variable) to forcefully remove the lock (Locked by PID 88)
[6/14/2023, 11:35:47 AM] INFO CLI-Interface: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[6/14/2023, 11:35:47 AM] ERROR Error-Handler: LIBRARY_LOCKED (FATAL): Library locked. Use --force (or FORCE env variable) to forcefully remove the lock (Locked by PID 88)

  1. Other errors in synchoronization such as:
    5[6/13/2023, 9:55:32 PM] INFO CLI-Interface: ----------------------------------------------------------------------------------------------------------------------------------------------------------
    [6/13/2023, 9:55:32 PM] INFO CLI-Interface: Experienced fatal error at 6/13/2023, 9:55:32 PM: APP_SYNC (FATAL): Sync failed caused by SYNC_UNKNOWN_SYNC (FATAL): Unknown sync error caused by read ET)
    [6/13/2023, 9:55:32 PM] INFO CLI-Interface: ----------------------------------------------------------------------------------------------------------------------------------------------------------
    [6/13/2023, 9:55:32 PM] ERROR Error-Handler: APP_SYNC (FATAL): Sync failed caused by SYNC_UNKNOWN_SYNC (FATAL): Unknown sync error caused by read ETIMEDOUT (Error Code: 3989ed9b-106c-4f38-85dc-7670)

which don't lock the library.

  1. After an error and restarting a sync, the data downloaded already was completely wiped out. I don't have the logs for that sorry but it does raise the bigger issue particularly for large libraries of the robustness of the sync process and how errors are handled, how things are resumed, and whether it has to rerun through everything or picks up where things leave off. I appreciate that this is an evolving project and I would contribute more albeit the limited abilities I have. Stability/error handling is definitely in the enhancement area and like trying to eradicate ants from your yard ... a never ending story. I would be remiss though to not bring it up.

  2. Definitely in the enhancement area: would be nice to have the downloaded assets not dropped into a single directory with random character names of the files. Instead, would be nice to have a hierarchy of folders based on picture date with file names what they are in the underlying library similar to how icloudpd works. I was utilizing icloudpd but the project has largely stalled and the PRs for shared library support thus far haven't been successful. As almost all of our assets are in our share library, it lead me to this project. I do appreciate the objectives of a dedicated photo downloader, etc. that you've enumerated and appreciate your fine work. I also appreciate the challenges of hacking through iCloud website api to figure this stuff out.

So far I've only successfully downloaded about 25K of my assets but it is a start and nice to have a backup of them. I'll try to add additional details as I find potential issues/fixes for large library support. Cheers and appreciate your efforts.

@steilerDev
Copy link
Owner

Thanks for the feedback - some points from my POV:

1 - Already discussed :)

2 - Great point, will add this to the docs!

3 - The sync error is expected on long downloads - the data initially loaded is invalidated by Apple after about an hour, which requires a metadata reload. This could be done more gracefully but dying and starting should work and it's simple (see point 5 below). However what should not happen is a locking error, something is wrong in that logic and this should be a separate bug

4 - There are some edge cases in which axios and the download process can die - there are some I'm not catching till now

3, 4 and 5 are somewhat related - because I actually tried to design the program in a way, that it would gracefully recover after a sync error. It should load everything that is on disk and compare it to what's in the cloud and make a decision about what to keep and what to remove based on this - so the logs where you are seeing a full wipe would be helpful, since this should not happen :)

6 - Actually the _All-Photos folder is only the storage of the assets, after they are downloaded, the tool will also replicate the folder structure of your albums and restore the file names in this view:
image
This however only happens after an initial asset sync - the reason why I'm not using the actual names, but the IDs created by Apple is that one asset can be linked in multiple folders (so it would be space inefficient to store them in folders which is why I'm linking them) and there can be equally named files in the library (so when I store them all in a central place there might be a collision if I use the original name) - so I'm going for Apple's UUID

@steilerDev
Copy link
Owner

I've added documentation around increasing file limits - the other issues are currently hard to investigate for me (especially loosing track because this ticket includes multiple of them).

If you could open a separate issue/discussion for the remaining challenging issues, that would be highly appreciated (please include logs and error codes there).

The locking behaviour is something I'm looking into at the moment.

I'll be closing this for now - but don't hesitate to open a new issue for each of your problems - check out my brand new contributing guidelines in case you are unsure what to do!

steilerDev pushed a commit that referenced this issue Jul 4, 2023
@github-actions
Copy link

github-actions bot commented Jul 5, 2023

This issue should be resolved with version v1.1.1-beta.2, please confirm.

@github-actions github-actions bot added the status(previewed) This issue or pull request should be fixed in a released beta version label Jul 5, 2023
@lonevvolf
Copy link

So now that I've finally sync'ed my 300k+ elements, I definitely wanted to comment on this item. Especially the feedback:

Definitely in the enhancement area: would be nice to have the downloaded assets not dropped into a single directory with random character names of the files. Instead, would be nice to have a hierarchy of folders based on picture date with file names what they are in the underlying library similar to how icloudpd works. I was utilizing icloudpd but the project has largely stalled and the PRs for shared library support thus far haven't been successful. As almost all of our assets are in our share library, it lead me to this project. I do appreciate the objectives of a dedicated photo downloader, etc. that you've enumerated and appreciate your fine work. I also appreciate the challenges of hacking through iCloud website api to figure this stuff out.

100% this would be a great feature, even for smaller libraries. I don't really make use of Albums, but rather browse by dates (Year, Month type of folder structure). This is how the Photos organizes itself, and it's pretty comfortable for browsing old memories. The Albums strategy is ok if that's part of your main workflow, but unfortunately doesn't even work for large libraries apparently (all Albums are empty and have a long ID number instead of a name). I am extremely happy that I finally have a backup of the files, but would love to see some automatic organization applied.

But I did just want to add a huge thanks to @steilerDev - this tool downloaded my entire library in just a few days, compared to my Mac with Photos, working on this task for a few months in the native Photos app.

@steilerDev
Copy link
Owner

steilerDev commented Jul 10, 2023

[B]ut unfortunately doesn't even work for large libraries apparently (all Albums are empty and have a long ID number instead of a name)

This sounds like a bug, unless all your items are part of the shared library OR you can't see symlinks through your network share, check out this discussion.

This is how it is supposed to look like:
e6c99463-bedc-40c1-bda2-757eb4c11084

I'm trying to explain the local file structure in here - I should add some more information around the goals of the design process.

We could 'hide' the folder that is actually holding all the data and create an 'All Photos' meta album that uses date based formatting (and links to the assets). This would add (imho unnecessary) additional logic and complexity, because you should also be able to sort the 'All Photos' album by modified time and browser the pictures in the same order as on the phone (as I'm reading and applying this metadata) - file names would still be ugly, however is AUR0lYIXVYq9SuZQ3OGBqbpEkY9P.arw really worse than IMG_1181.arw?

@steilerDev
Copy link
Owner

steilerDev commented Jul 10, 2023

Having a better way to access the library (through a web based application) is something that I've already though about - however this is probably the lowest priority work item at the moment.

Unless someone from the community (with maybe some experience in web technology) wants to take a stab at it, I would not hold my breath for getting this done anytime soon though.

@lonevvolf
Copy link

This sounds like a bug, unless all your items are part of the shared library OR you can't see symlinks through your network share, #161.

Ok, that was a good tip - I am indeed running this in a Docker container on Synology. I found out that the file browsing app on the Synology (File Station) doesn't show the links, but they are actually there when browsing in bash or mounting the SMB share. So the album names and links are ok.

However, all my albums are empty due to the
ICLOUD_PHOTOS_COUNT_MISMATCH (WARN): Received unexpected amount of records (expected 55 CPLMaster & 55 CPLAsset records, but got 0 CPLMaster & 0 CPLAsset records for album BA42E172-2EE6-47C9-8344-333F0EAF48AE)
errors (see #231 and common warnings).

Nonetheless, the automatic year/month folder sorting (ie. 2020/05) would be much appreciated. Can certainly also be done with symbolic links, though there's really no issue in that case with duplicate pointers to files as each can be allocated to a single folder.

We could 'hide' the folder that is actually holding all the data and create an 'All Photos' meta album that uses date based formatting (and links to the assets).

This could work, but no need to hide the original folder I suppose. If people want to see it, that's fine.

file names would still be ugly, however is AUR0lYIXVYq9SuZQ3OGBqbpEkY9P.arw really worse than IMG_1181.arw

Not a major issue for me - agree that a nonsense name is a nonsense name. :)

Having a better way to access the library (through a web based application) #120 - however this is probably the lowest priority work item at the moment.

This is not really something needed from my end at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status(previewed) This issue or pull request should be fixed in a released beta version
Projects
None yet
Development

No branches or pull requests

3 participants