Skip to content

Utility for downloading fanfiction in bulk from the Archive of Our Own

License

Notifications You must be signed in to change notification settings

thenianblues/ao3downloader

 
 

Repository files navigation

What is this?

This is a program intended to help you download fanfiction from the Archive of Our Own in bulk. This program is primarily intended to work with links to the Archive of Our Own itself, but has a secondary function of downloading any Pinboard bookmarks that link to the Archive of Our Own. You can ignore the Pinboard functionality if you don't know what Pinboard is or don't use Pinboard.

Table of Contents

  • Announcements: List of changes that may be of note for returning users (not a complete changelog).
  • Instructions: Complete instructions for downloading and starting ao3downloader on Windows and Mac (running ao3downloader on Linux is left as an exercise for the reader). I have tried to make this as easy to follow as possible, even for those who have little experience with computers. If any of it is confusing, or you have a suggestion to improve the instructions, please contact me.
  • Menu Options: Explanation of the options you will see when you start ao3downloader and what they do. Note that most of these options will in turn present you with a series of prompts. These should largely be self-explanatory, however, if you are confused by any of the prompts your question may be answered in the notes.
  • Notes: Explanation of some of ao3downloader's features and quirks that may not be immediately obvious. I recommend reading this.
  • Known Issues: List of bugs that I know about but haven't yet been able to fix. If you encounter strange behavior, there may be a workaround here.
  • Troubleshooting: If you encounter a problem running the script, please read this section carefully and do all of the steps in order to the best of your ability before sending a bug report.
  • Contact: How to get in contact with me. Don't be shy!

Announcements

The script now works with python version 3.10!

As of March 8, 2022 I have changed how file names are generated to allow for the inclusion of non-alphanumeric characters (cnovel fans rejoice). If you have a Process going on which relies on file names for the same fic being the same, please take note of this if/when you download the new version of the code.

As of May 14, 2022 I have reduced the maximum length of file and folder names generated by the script from 100 characters to 50 characters. This is to reduce the incidence of download failures caused by exceeding the maximum Windows file path length. Once again, note that this may cause the same fic to be saved under a different name than when it was downloaded previously.

Instructions

  1. install python. make sure to install version 3.9.0 or later. as of the time of writing, the latest version (currently 3.10.5) will work.
  2. download the repository as a zip file. the "repository" means the folder containing the code.
    • if you are reading this on github, you can download the repository by clicking on the "Code" button in github and selecting "Download ZIP"
    • if you are reading this on my website, you can download the repository by clicking the button at the top of the page that says "Click to Download"
  3. unzip the zip file you just downloaded. this will create a folder. open it. if you see a file called "ao3downloader.py" then you're in the right place.
  4. run the script using the instructions for your operating system:
    • windows: double-click on "ao3downloader.cmd"
    • mac:
      • open a terminal window pointed to the folder containing "ao3downloader.py".
        • You can do this by right-clicking on the folder, going to Services at the bottom of the menu, and clicking "New Terminal at Folder". Alternatively, you can type "cd " and drag the folder to the terminal to copy the folder path.
      • enter the following commands one by one:
      python3 -m venv venv
      source venv/bin/activate
      python3 -m pip install --upgrade pip
      pip install -r requirements.txt
      python3 ao3downloader.py
      • after this initial setup, when you want to run the program you only need to enter:
      source venv/bin/activate
      python3 ao3downloader.py
      • note that if you delete the "venv" folder for any reason you will need to do the initial setup again.
    • other platforms: ao3downloader should work on any platform that supports python, however, you will need to do your own research into how to run python programs on your system.

Menu Options Explanation

  • 'download from ao3 link' - this works for most links to ao3. for example, you can use this to download a single work, a series, or any ao3 page that contains links to works or series (such as your bookmarks or an author's works). the program will download multiple pages automatically without the need to enter the next page link manually.
  • 'get all work links from an ao3 listing (saves links only)' - instead of downloading works, this will simply get a list of all the work links on the page you specify (as well as subsequent pages) and save them in a .txt file inside the downloads folder (one link on each line). this is useful if you prefer to download fics through FanFicFare or some other method, rather than using the ao3 download buttons. this option is much, much faster than a full download - usually only a few seconds per page.
  • 'download latest version of incomplete fics' - you can use this to check a folder on your computer (and any subfolders) for files downloaded from ao3 that are incomplete works. for each incomplete fic found, the program will check ao3 to see if there are any new chapters, and if so, will download the new version to the downloads folder.
  • 'download missing fics from series' - checks for files downloaded from ao3 that are part of a series, and for each series found, checks the series page on ao3 and downloads any fics in the series that are not already in your library.
  • 're-download fics saved in one format in a different format' - checks for all files downloaded from ao3 and redownloads every fic it finds (if possible - failed downloads due to deletion or other reasons will be logged). good if you change your mind about what format you want your library to be in. (file type choices for this option are not saved to settings.)
  • 'download bookmarks from pinboard' - download ao3 bookmarks from pinboard. ignore this if you don't use pinboard. to get the api token go to settings -> password on the pinboard website.
  • 'convert logfile into interactable html' - all downloads from ao3 (and some other actions) are logged in a file called log.jsonl in the 'logs' folder (if this folder does not exist it means no logs have been generated yet), along with information such as whether or not the download was successful, details about errors encountered, and so on. this option converts log.jsonl into a much more human-readable, searchable and sortable (click on the column headers to sort) html file that can be opened in any browser. the file is called 'logvisualization.html' and is saved in the same place as log.jsonl.

Notes

  • IMPORTANT: some of your input choices are saved in a file called settings.json (in the same folder as ao3downloader.py). In some cases you will not be able to change these choices unless you clear your settings by deleting settings.json (or editing it, if you are comfortable with json). In addition, please note that saved settings include passwords and keys and are saved in plain text. Use appropriate caution with this file.
  • The purpose of entering your ao3 login information is to download archive-locked works or anything else that is not visible when you are not logged in. If you don't care about that, there is no need to enter your login information.
  • Try to keep your ao3 browsing to a minimum while the script is running. It won't break anything, but it may cause you to hit ao3's limit on how many hits to the site you are allowed within a certain time frame. This limit is per user, or per IP if you are not logged in. If this happens, the script will pause for 5 minutes to let the limit reset, and you may see a "Retry later" message when you try to open an ao3 page during that time. Don't be alarmed by this, just wait it out.
  • If you choose to 'get works from series links' then if the script encounters a work that is part of a series, it will also download the entire series that the work is a part of. This can dramatically extend the amount of time the script takes to run. If you don't want this, choose 'n' when you get this prompt. (Note that this will cause the program to ignore all series links, including e.g. series that you have bookmarked.)
  • If you choose to 'download embedded images' the script will look for image links on all works it downloads and attempt to save those images to an 'images' subfolder. Images will be titled with the name of the fic + 'imgxxx' to distinguish them.
    • Note that this feature does not encode any association between the downloaded images and the fic file aside from the file name.
    • Most file formats will include embedded image files anyway, regardless of whether you choose this option. I have confirmed this for PDF, EPUB, MOBI, and AZW3 file formats. (If you saw me contradict this in an earlier version of this readme... no you didn't)
    • Should an image download fail, the details of the failure will be logged in the log file with the message 'Problem getting image' along with the work link and the image link. It's a good idea to check the log file for these messages, since you may still be able to download the image manually or track it down some other way.
  • If you need to stop a download in the middle, you can just close the window. When you restart the script:
    • If you are using the option 'download from ao3 link', you will be given an option to restart the download from the page you left off on. Some downloads may be repeated.
    • If you are using the option 'download bookmarks from pinboard' or 're-download fics saved in one format in a different format', the list of fics to download will be retrieved as normal but will then be filtered to remove work links that meet the following conditions:
      • A record of a download attempt for that link is present in the log file AND
        • There is a fic with the same title already in the downloads folder OR
        • The download was marked as unsuccessful
    • If you are using the option 'download latest version of incomplete fics' or 'download missing fics from series', just make sure to add any fics you don't want to download again to your library (that is, the folder you entered when prompted 'input path to folder containing files you want to check for updates') and clean up any old versions before re-starting the download.
    • Most methods of avoiding repeat downloads rely on a file called log.jsonl which is generated by the script. Make sure not to move, delete, or modify log.jsonl if you want these features to work. (Using the option to generate the log visualization file is fine.)
  • When checking for incomplete fics, the code makes certain assumptions about how fic files are formatted. I have tried to make this logic as flexible as possible, but there is still some possibility that not all incomplete fics will be properly identified by the updater, especially if the files are old (since ao3 may have made changes to how they format fics for download over time) or have been edited.
  • If you need to keep a different version of python on your system for some other purpose, please note that these instructions may not work as expected if you have multiple versions of python installed. However, I can point you toward the following resources:
    • Windows: the py launcher may be helpful to you
    • Mac and Linux: pyenv may be helpful to you

Known Issues

  • With the exception of series links, if you enter a link to an ao3 page that contains links to works or series, but does not support multiple pages of results, the script will loop infinitely. Most notably, this applies to user dashboard pages. If this happens, you can close the window to get out of the loop.
  • Works that contain certain archive messages in either the work text or the tags may cause unexpected behavior. These problem phrases are:
    • Error 404
    • This work could have adult content.
    • This work is only available to registered users of the Archive

Troubleshooting

  • Make sure you are using python version 3.9.0 or later. To check which version of python you are using:
    • Windows: open a command prompt and enter "python --version"
    • Mac: open a terminal window and enter "python3 --version"
  • If you are able to create logvisualization.html (menu option 'v'), take a look through the logs to see if there are any helpful error messages.
  • If there are no logs or the logs are unhelpful, look for a folder called "venv" inside the repository. Delete "venv" and try re-running the script.
  • If deleting venv doesn't work, try deleting the entire repository and re-downloading from github (but remember to save your existing downloads if you have any!)
  • If re-downloading the repository doesn't work, try uninstalling and reinstalling python.
    • Make sure you install version 3.9.0 or later.
    • Choose "Customize installation" when prompted, and check the "Add Python to environment variables" checkbox when it appears. (This option was previously called "add to PATH"). Everything else can be left as default.
  • If reinstalling python doesn't work, see this stackoverflow answer.
  • If you have tried all of the above and it still doesn't work, see below for how to send me a bug report.

Questions? Comments? Bug reports?

Feel free to head over to the discussion board and make a post, or create an issue. I prefer to communicate through the above channels if possible, however I understand many of my users don't have github accounts and may not want to make one just for this, so you can also email me at [email protected] if you prefer. Please include "ao3downloader" in the subject line of emails about the downloader. If you are reporting a bug, please describe exactly what you did to make the bug happen to the best of your ability. (More is more! Be as detailed as possible.)

(Please note that while I will absolutely do my best to get back to you, I can't make any promises - I have a job, etc.)

About

Utility for downloading fanfiction in bulk from the Archive of Our Own

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.6%
  • HTML 2.9%
  • Batchfile 0.5%