Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match existing songs by spotify URL instead of name #1641

Merged
merged 18 commits into from
Dec 10, 2022

Conversation

Domi250
Copy link
Contributor

@Domi250 Domi250 commented Oct 28, 2022

Match existing songs by spotify URL instead of name

Description

The download option checks whether a song exists by looking for the current spotify name. Because spotify song names change quite often, this leads to the same song being downloaded twice (with different name).
Similarly, the songs deleted by sync are also found by matching current filenames with the song's current name on spotify, causing similar effects.

This commit uses the spotify URL instead of the filename to ensure uniqueness of song files. It does so by modifying the comment tag, which previously contained the YouTubeURL to the format YouTubeURL|SpotifyURL, and using this spotify URL to match songs for equality.

Related Issue

#1578

Motivation and Context

  • Songs are no longer being downloaded twice
  • Songs are now properly being deleted
  • Some minor errors fixed along the way

How Has This Been Tested?

Tested on (almost) clean Debian.

  • I downloaded + synced playlists while adding and removing songs from the a spotify playlist, while changing the --format and --overwrite {metadata,skip,force} parameters in various combinations.
  • I also tested spotdl meta for all filetypes and overwrite options

Todo

This is as far as I could get without help. Nonetheless, this is not a finished commit:

  • This branch is behind, it has to be merged with the recent v4.0.1 changes.
  • There's a TODO in util/metadata.py, where I used redundant code in order to make it work. It was the only way I knew to extract the comment ID3 tag.
  • I tried changing as little as possible. But there still are quite a few changes and some might not seem optimal. If so, just ask for my reasoning or revert it.

Types of Changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have read the CONTRIBUTING document
  • I have read the CORE VALUES document
  • I have added tests to cover my changes
  • All new and existing tests passed

@Silverarmor Silverarmor changed the base branch from master to dev October 28, 2022 22:18
@Silverarmor
Copy link
Member

How does this account for songs that are on Spotify multiple times, e.g. in different albums? They have different Spotify URLs but we wouldn't want to download them multiple times...

@Silverarmor Silverarmor added the Enhancement Enhancing spotDL label Oct 28, 2022
@Domi250
Copy link
Contributor Author

Domi250 commented Oct 28, 2022

I'm not sure what you mean, could you give an exemplary spotdl command (chain)?

Usually these songs are only once within a playlist or album. When downloading a playlist they therefore are also downloaded only once.
I barely changed the spotdl functionality. Take downloading, for instance: I only added a check, whether the working directory contains a file with the exactly same spotify URL. If so, this file is skipped.
The check for a file by name is still performed, we would not want to overwrite any files without explicit permission.

@Silverarmor
Copy link
Member

The below are all the same song, same artist, same content, but are released under different EPs/singles/albums. This is a common occurrence.
The current behaviour will only download the song once, as it will disregard it later since a song exists with the same name ("Dean Lewis - Hurtless"). This is intended behaviour.

Your PR will attempt to download the song 3x times, resulting in 2x YT-DLP errors (unintended behaviour as the program wastes resources trying to donload multiple times

https://open.spotify.com/track/1JFX2Sj2ySyU6nIL7X3LWN?si=253198e8cc594a63
https://open.spotify.com/track/0wxRBFbcFylycUF7uf6nPO?si=a2d651307f1c416a
https://open.spotify.com/track/6HJizCbaqaEQG1eLjn341Z?si=c33ee30d3c7a423d

@Domi250
Copy link
Contributor Author

Domi250 commented Nov 5, 2022

I assume you are talking about downloading a playlist containing only these 3 songs:
spotdl download {playlist_url}

Yes, this crashes. However, this is an old spotdl problem which occurs when songs with the exactly same name are right next to each other in a playlist. (It doesn't matter if it's exactly the same song with the same ID or not, only the resulting name matters.)
However, this error is not deterministic. Therefore you might get different error messages (or none) and need to try it about 5 times.
When downloading the songs sequentially (spotdl download {single_song_url}) there won't be an error. The reason is, as I mentioned in my previous answer, that my PR checks the songs ID and keeps the existing filename check.

I realize now, that my PR title is confusing, maybe just bad, since it contains "instead of" while still performing both checks. It stems from the conceptual idea I had at the beginning to make the song ID the "primary key" of a song. Errors like the YT-DLP one are the only reason I've kept the filename check in - to ensure current functionality.
Depending on how double songs should be handled (skipped or downloaded under a different name with a counter appended) this would provide more flexibility, I think.

@Domi250
Copy link
Contributor Author

Domi250 commented Nov 5, 2022

Functional overview

It might be confusing to grasp what I tried to change and how, without going thoroughly through all the code. So I thought I'd create an overview.
Before implementing this, I really thought long about the best way to fix the "same song downloaded twice" bug and keep spotdl backward compatible.

Metadata of songs:

Changed comment tag format from YouTubeURL to YouTubeURL|SpotifyURL, if possible.

Download option:

Get all files (with the chosen suffix, like .mp3) from the working directory and gather their spotify URL from the comment tag, if possible.
(unchanged:) If the name of a song from the current query already exists in the working directory, skip it.
If songs from the current query match one of the URLs, they are not downloaded again. The name (and therefore metadata) of these songs is outdated, but it wont be altered unless the user explicitly wants to.

Behaviour for overwrite options:

  • skip: Skip the song, even if its metadata is outdated (unchanged)
  • metadata: Don't download the song anew, but update its metadata (unchanged)
  • force: Always download the file (unchanged)

Sync option

Get all previously downloaded songs from the JSON save-file.
Determine which songs were removed from the playlist, and delete those.
No longer checks for already downloaded songs, as download already does this.
(unchanged:) Update save-file and pass songs to the downloader

Behaviour for overwrite options:

All three options are irrelevant for deleting songs and therefore now irrelevant for sync. Simply pass the used option to the downloader.

Meta option

Instead of using a songs name to search for its metadata, try using its spotify URL to match the song.

Copy link
Member

@xnetcat xnetcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please merge with upstream dev branch, if you can't merge it for some reason open a new PR, or let me know.

spotdl/download/downloader.py Outdated Show resolved Hide resolved
spotdl/download/downloader.py Outdated Show resolved Hide resolved
@GreenCobalt
Copy link

GreenCobalt commented Nov 22, 2022

The below are all the same song, same artist, same content, but are released under different EPs/singles/albums. This is a common occurrence. The current behaviour will only download the song once, as it will disregard it later since a song exists with the same name ("Dean Lewis - Hurtless"). This is intended behaviour.

Your PR will attempt to download the song 3x times, resulting in 2x YT-DLP errors (unintended behaviour as the program wastes resources trying to donload multiple times

https://open.spotify.com/track/1JFX2Sj2ySyU6nIL7X3LWN?si=253198e8cc594a63 https://open.spotify.com/track/0wxRBFbcFylycUF7uf6nPO?si=a2d651307f1c416a https://open.spotify.com/track/6HJizCbaqaEQG1eLjn341Z?si=c33ee30d3c7a423d

Those 3 are not the same song. The last is the normal version, and the other two are the acoustic version and a remix. Should spotDL not download these versions separately (being different versions of the same song), instead of rejecting them because they all have the same name?

@Domi250
Copy link
Contributor Author

Domi250 commented Nov 23, 2022

Those 3 are not the same song. The last is the normal version, and the other two are the acoustic version and a remix. Should spotDL not download these versions separately (being different versions of the same song), instead of rejecting them because they all have the same name?

You're right. And I also think that spotdl indeed should consider different song versions. That, and similar issues, is what my patch is about :-)
If the songs are distinguished by URL, spotdl would be able to tell them apart if just a single piece of metadata differs.

My patch won't solve your problem yet, but it's the first step.

@Domi250
Copy link
Contributor Author

Domi250 commented Nov 23, 2022

Please merge with upstream dev branch, if you can't merge it for some reason open a new PR, or let me know.

I managed to merge with the master branch. Is it really that important to merge with dev?

I'm new to Github. When I created my fork, I forked the master branch, and therefore the dev branch doesn't show up in my forked repo. I didn't find a way to merge with the dev branch other than using the Github conflict editor, which seems overly complicated at first glance.

Apart from that, I still have two TODOs in there which I can't fix on my own. One is described in detail, the other is because of a bug, where (as far as I can see) the "output" variable is misused.

@Domi250
Copy link
Contributor Author

Domi250 commented Nov 26, 2022

Nevermind, I managed to merge into dev :)

Now all that's left are the two TODOs I can't fix myself. If you have any questions let me know.

@jakermx
Copy link

jakermx commented Dec 3, 2022

I have the same issue, I suggest adding the track id as metadata and add an option to scan storage locations and crete exclude list to be used as --archive target

@xnetcat
Copy link
Member

xnetcat commented Dec 4, 2022

Will merge this for v4.1.0

@Domi250
Copy link
Contributor Author

Domi250 commented Dec 4, 2022

Hi Jakub, great to hear from you.
So what does 4.1.0 mean for this PR and me? Since my merge into the dev branch, 14 new commits have been made and every new spotdl version makes the merging process harder.
I've done everything I can do on my own. I could merge again into the dev branch, but this won't help if you are too diligent and create new features and patches all the time ;)

@xnetcat
Copy link
Member

xnetcat commented Dec 4, 2022

v4.1.0 will be after v4.0.6, I have some major API changes in mind that's why next release won't be a minor one. I will resolve all the conflicts that will arise, so you won't have to worry about this + I will fix all the TODO's in code.

@Domi250
Copy link
Contributor Author

Domi250 commented Dec 4, 2022

Oh, that's awfully nice of you. If there's anything I can do or if there's anything you'd like to know, just ask.

@xnetcat xnetcat merged commit 85c307d into spotDL:dev Dec 10, 2022
@xnetcat xnetcat mentioned this pull request Dec 16, 2022
@Domi250 Domi250 deleted the url-matching branch January 12, 2023 11:57
@isle9
Copy link

isle9 commented Feb 24, 2023

Is this supposed to be implemented in dev already then? Because I'm on dev and I'm not getting the spotify urls in my comments in addition to the regular yt urls.

@Domi250
Copy link
Contributor Author

Domi250 commented Feb 24, 2023

Yes it is, but there have been quite some changes to my code.
One of them is that the Spotify url is no longer saved in the comment, but as "Original Audio Source" (WOAS ID3-Tag).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Enhancing spotDL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants