Skip to content

Latest commit

 

History

History
22 lines (15 loc) · 914 Bytes

TODO.md

File metadata and controls

22 lines (15 loc) · 914 Bytes

URLExtractor

A script to extract URLs into a csv file and annotate them, from exported WhatsApp chats.

Todo

  • Comment all methods.

  • Implement error logging.

  • Duplicate links should not be allowed.

    • But this might cause a problem where links already sent to API contain the repeated link.
    • Might be an issue which appending the links might solve
  • Make retrieving the last date in the exported chat faster.

  • Make parsing URLs faster with BeautifulSoup.

  • Make metadata parsing for particular websites much more detailed (eg. YouTube, Github):

  • Make script able to extract titles of PDFs, might need downloading each PDF.