Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring importing process #92

Open
blackforestboi opened this issue Jan 31, 2017 · 0 comments
Open

Refactoring importing process #92

blackforestboi opened this issue Jan 31, 2017 · 0 comments

Comments

@blackforestboi
Copy link
Member

blackforestboi commented Jan 31, 2017

Hey folks,

So i made a clickdummy for the new importing process. Happy to hear your input on it. the wireframes can be changed it quite quickly. It starts with the screen after you would use "Import History & Bookmarks" in the sidebar. You can access it here: https://invis.io/KZ8XQZ1BR#/216813851_SETUP_-_Analyse_URLS

bold elements can be retrieved via the chrome API

@aquibm wanted to do the front end part with react

@ShamariFeaster I think this is highly relevant for your refactoring

Regarding the development there are 7 modules that need to be developed.

  1. Accessing history urls via chrome.history()-api and storing them as "empty shells" (without any crawled content) in the PouchDB containing:
    • Unique ID
    • url
    • title
    • text: none (default)
    • lastVisitTime
    • History_Item_ID (from Chrome ID)
    • array of visits
      • id
      • visitTime
    • is_bookmark: false (default)
    • bookmark_ID: none (default)
    • bookmark_date_added: none (default)
    • download_status: failed/successful/not_started
  2. Accessing bookmarks Urls via chrome.bookmarks-api and storing them as "empty shells" (without any crawled content) in the PouchDB.
    • Check before if URL already exists, if yes update:
      • is_bookmark: true
      • bookmark_ID: none (default)
      • bookmark_date_added: none (default)
    • If no, add empty shell containing:
      • Unique ID
      • url
      • title
      • text: none (default)
      • lastVisitTime (empty by default)
      • History_Item_ID (from Chrome ID)
      • array of visits (empty as for now)
        • id
        • visitTime
      • is_bookmark: true
      • bookmark_ID:
      • bookmark_date_added:
  3. Module that checks file type and directs URL to fitting download modules.
    • HTML Document
    • PDF
  4. Module that downloads HTML via XMLHTTP-Request
    • Update data in url
      • text
  5. Module that downloads PDFs
    • Update data in url
      • text
  6. Module that ensures continuation of download if errors come up while downloading. (including errorlogging)
    • there are issues encountered at the moment stopping the download. This should not happen. For example:
      • Security warnings (Fishing websites)
      • too long response time (implement time out)
  7. Module that keeps tab on all the already (successfully) downloaded URLs (so that a download can easily be interrupted(restart browser, pausing heavy error), logging in import process or check if URL already imported before restarting import process
    • Store list of all urls that were successful
      • when importing check if already downloaded
    • Store list of all urls that failed, including errormessages

open questions:

  • is the history ID and the bookmarks ID the same? If yes, we could check for the bookmarks ID directly in the request for the history API and store the is_bookmark (etc.) points with it.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant