Add YouTube downloading support via `youtube-dl’

Also requires rewriting the HTML to show the downloaded video.

See the hooks heading below.

Other custom backends?

CANCELLED Use the process sentinel to refresh the Attachments section

State “CANCELLED” from “TODO” [2016-08-17 Wed 22:32]

when the download has completed.

Value of point or the current buffer may have changed in the meantime. We will leave this up to the user.

Detect unsuccessful downloads?

Delete corresponding ARCHIVED_AT property if wget did not exit

properly and the directory is empty.

Fix unsavory list splicing via append/apply

State “DONE” from “TODO” [2016-09-11 Sun 20:22]

See org-board-wget-call.

Add customization options

State “DONE” from “TODO” [2016-08-17 Wed 22:24]

Show/hide wget buffer.

State “DONE” from “TODO” [2016-08-17 Wed 22:24]

Default wget flags.

Hooks

(Preprocessing) Add hook list for before downloading

Why is this needed?

Say I’m downloading from Google Cache. They don’t allow a wget user agent, so we need to use the `alist’ option “No-Agent”. A pre-processing hook could fix this so that you don’t even have to add the WGET_OPTIONS value yourself: it should be added for you.

(Postprocessing) Use a process filter to catch downloaded links and pass to hook functions later

This would be for, say, auto-downloading videos from YouTube.

Show which org-board process finished

State “DONE” from “TODO” [2016-08-17 Wed 22:24]

Done by showing which org entry it belonged to (using process-put).

Allow deletion of entries

State “DONE” from “TODO” [2016-08-17 Wed 22:25]

Provide org-board

State “DONE” from “TODO” [2016-08-17 Wed 22:23]

Quickly open .html page

State “DONE” from “TODO” [2016-08-18 Thu 14:56]

At the moment, a link is provived in the ARCHIVED_AT property to the root archival folder. This is correct, but it would be better if there was a function to recurse through the directory structure and offer up all .html files for opening.

Use something like `find’ in dired.

Steps:

Take the most recent ARCHIVED_AT folder.
Run something like `find . -name “*.html”’. If none, open the folder; if one, open it; if several, offer them up for choice. With a prefix argument open in Emacs.

Quick set of URL property, then archive immediately

State “DONE” from “TODO” [2016-08-18 Thu 15:34]

In the manual there is “org-set-effort” which apparently sets a property interactively. Use something similar, then start the archival process immediately with a call to org-board-archive.

Add support for multiple URLs

State “DONE” from “TODO” [2016-08-20 Sat 12:55]

CANCELLED Add support for quotes in arguments

State “CANCELLED” from “TODO” [2016-08-24 Wed 22:00]
Does not seem like this will work. Use single quotes instead.

I cannot get –header=”Accept: text/html” to work, for example.

Add diffing capability between archives

State “DONE” from “TODO” [2016-08-24 Wed 22:00]

Via “ztree”.

Add support for opening/renaming/deleting/mailing individual archives

Using a dispatcher, one line per archive in the current entry, using widgets, like recentf. The “open” dispatcher will open an archive, then close immediately, the “rename” will rename both the link description and the folder name, the “delete” will delete an archive.

There should also eventually be support for diffing from the dispatcher.

Open the file pointed to by the URL

[2016-09-22 Thu 14:44] If not found, show all files recursively.

[2016-09-29 Thu 08:20] This works now, except for URLs with a list of “?” parameters at the end. Also, for instance Google Cache URLs would need a little algorithm to detect non-directory slashes.

Levenshtein distance algorithm

List out all the files in the archival directory, take the Levenshtein distance with the URL, and open the closest one.

Better: find the name of the file from the URL structure directly

Then open the folder and open the file closest in name to the basename of the URL.

For example, we have a URL: http://www.foo.com/bar/taleb.html?params=test.

`wget’ will have downloaded the file “taleb.html?params=test” in the directory structure “www.foo.com/bar/”. So we need to extract the part after the protocol specification of the URL, (maybe use the Emacs URL library?). Then we can descend the directory structure and find the file we want to open.

There are some edge cases where descending the structure is not so easy. For example, a Google Cache URL contains literal slashes that are actually part of a filename, not a directory structure. `wget’ encodes these as `%2F’ in the saved filename, reducing the ambiguity.

Ignoring the colon issue, say we have a URL like this (similar to Google Cache’s) where the last slash is actually part of the filename: http://www.foo.com/bar/:cache:google.com/news

We have to iteratively check first that “www.foo.com” is a directory, then check “www.foo.com/bar/”, then check “www.foo.com/bar/:cache:google.com” (which will fail) then use the failed part as part of the filename and open (whatever is closest to) that.
This won’t work when there is a slash later that does descend the directory structure, but that can’t be very common.

Log the command used by org-board in the timestamped archival root

State “DONE” from “TODO” [2016-08-25 Thu 11:46]

Better folder date format display

State “DONE” from “TODO” [2016-11-14 Mon 15:30]

The format for a date currently looks like 2016-08-24-22-23-20. Ideally it should look as much like an Org date as possible (without the spaces as they are a nuisance in filenames). Maybe they could be fontified somehow?

With prefix argument to org-board-archive, ask for custom label to append to the folder name

So that you can quickly differentiate between archives that you’ve taken with different arguments to wget.

Add more user agents

PARTIAL Add ability to quit wget process for current entry

Sometimes it happens that you made an error with the command, so there should be a quick way of doing this.

My current implementation does not take into account multiple archival processes.

Add more error checking

Check for no URL property set

Check that we have connectivity before starting `wget’

Add a sample capture template

Exporting options for an archive

Allow exporting an archived page or set of pages to PDF?

Zip archive to file

Zip all archives to file

Send zipped archive

HTML-occur archive?

Archive directly from “Eww” or “xwidget”

For this, you would need to set a default bookmark file… or maybe use org-capture as a backend?

You could have a capture template that runs a function which, in Eww, will take the URL of the current page and archive that one, or if not in Eww, take a URL from the clipboard.

Add wget output logging in the archival root

Autodetect `wget’ from PATH

Add dry run option

Add asterisks in doc for easy transfer to README

Add support for remote files

With remote archival. `process-file’ is apparently for this.

Add buffer marker pointing to archive entry

Write tests

Add support for searching attached content

See Github Issue #11.

Use httrack

See Github Issue #13.

Style

Code repetition in org-board-archive-dry-run

All the let bindings are are copied from the worker functions. Extract to a macro.

Fix concat directory naming

Long-term

Annotation

With a captured page there should be some way to annotate it. Possibly JS in the browser, or possibly via links from a dedicated org annotation file? To discuss.

Export

Exporting a bookmarks file (or an org-board entry) isn’t spectacular: the properties are not shown, for example, so there’s no internal link to the file archived.

Bugs

FIXED Bug with %-encoded URLs

[2016-09-22 Thu 15:11] Calling `message’ (like in org-board-archive-dry-run) with a string containing a literal “%” will have Emacs complaining, if there are no input arguments specified afterwards. See Xah Lee:

Tip for elisp coders. Often, while still writing your program and in the debugging process, you might put (mesage myStr) to print some variables. However, if your “myStr” happens to contain “%” char such as in URL, then you’ll get a mysterious error “Not enough arguments for format string”.

Fix it by using the “%s” format specifier.

Files

TODO.org

Latest commit

History