-
Notifications
You must be signed in to change notification settings - Fork 0
/
urlwatch.1
72 lines (68 loc) · 2.3 KB
/
urlwatch.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
.TH URLWATCH "1" "February 2011" "urlwatch 1.12" "User Commands"
.SH NAME
urlwatch \- Watch web pages and arbitrary URLs for changes
.SH SYNOPSIS
.B urlwatch
[\fIoptions\fR]
.SH DESCRIPTION
urlwatch watches a list of URLs for changes and prints out unified
diffs of the changes. You can filter always-changing parts of websites
by providing a "hooks.py" script.
.SH OPTIONS
.TP
\fB\-\-version\fR
show program's version number and exit
.TP
\fB\-h\fR, \fB\-\-help\fR
show the help message and exit
.TP
\fB\-v\fR, \fB\-\-verbose\fR
Show debug/log output
.TP
\fB\-\-urls\fR=\fIFILE\fR
Read URLs from the specified file
.TP
\fB\-\-hooks\fR=\fIFILE\fR
Use specified file as hooks.py module
.TP
\fB\-e\fR, \fB\-\-display\-errors\fR
Include HTTP errors (404, etc..) in the output
.SH ADVANCED FEATURES
urlwatch includes some advanced features that you have to activate by creating
a hooks.py file that specifies for which URLs to use a specific feature. You
can also use the hooks.py file to filter trivially-varying elements of a web
page.
.SS ICALENDAR FILE PARSING
This module allows you to parse .ics files that are in iCalendar format and
provide a very simplified text-based format for the diffs. Use it like this
in your hooks.py file:
from urlwatch import ical2txt
def filter(url, data):
if url.endswith('.ics'):
return ical2txt.ical2text(data).encode('utf-8') + data
# ...you can add more hooks here...
.SS HTML TO TEXT CONVERSION
There are three methods of converting HTML to text in the current version of
urlwatch: "lynx" (default), "html2text" and "re". The former two use
command-line utilities of the same name to convert HTML to text, and the last
one uses a simple regex-based tag stripping method (needs no extra tools).
Here is an example of using it in your hooks.py file:
from urlwatch import html2txt
def filter(url, data):
if url.endswith('.html') or url.endswith('.htm'):
return html2txt.html2text(data, method='lynx')
# ...you can add more hooks here...
.SH "FILES"
.TP
.B ~/.urlwatch/urls.txt
A list of HTTP/FTP URLs to watch (one URL per line)
.TP
.B ~/.urlwatch/lib/hooks.py
A Python module that can be used to filter contents
.TP
.B ~/.urlwatch/cache/
The state of web pages is saved in this folder
.SH AUTHOR
Thomas Perl <thp.io/about>
.SH WEBSITE
http://thp.io/2008/urlwatch/