forked from probcomp/bayeslite
-
Notifications
You must be signed in to change notification settings - Fork 1
/
HACKING
158 lines (111 loc) · 5.54 KB
/
HACKING
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
Notes for anyone hacking bayeslite
* Automatic tests
Every commit on long-term branches including master should pass
% ./check.sh
It builds bayeslite and runs the automatic smoke tests. Please run this
in your .git/hooks/pre-commit script during development, and in each
commit please add or update automatic tests for any bugs you add or
features you fix in that commit.
check.sh has two modes: if you pass no arguments, it runs smoke tests
only. There are tests that are slower, use the network, or otherwise
are better suited to testing during continuous integration only. If
you run a particular test, e.g.:
% ./check.sh --pdb tests/test_foo.py
that will run even the tests that would otherwise have been run during
continuous integration, for that file (and with --pdb it will drop you into the python debugger on the first failure). These tests are marked with
__ci_ in their names, and you can `git grep __ci_` to find them.
If you want to run all tests, you can
% ./check.sh tests/ shell/tests
Because this may take awhile, we don't require it of all commits, but
you should monitor the continuous integration suite, and rapidly roll
back your change if it causes a failure.
* Versions
Our version scheme, compatible with PEP 440:
<major>.<minor>[.<teeny>]a<date> (prerelease snapshot)
<major>.<minor>[.<teeny>]rc<N> (release candidate)
<major>.<minor>[.<teeny>] (release)
We do not currently make any semantic API compatibility guarantees
about the meaning of <major>, <minor>, and <teeny>.
In the source tree, the VERSION file contains either the current
release version number, or the most recent tagged version number
followed by a plus sign `+'. In that case, a suffix will be added to
the most recent version number, derived from `git describe', of the
form:
.post<N>+g<commitid>[.<dirty>]
The Git tag for a tagged version is named with a `v' prefix. Note
that the content of VERSION for any tagged version MUST NOT include a
`+' suffix, so that `cat VERSION' is sufficient to find the version
number, and `git describe' is not necessary.
To tag a new version:
1. Set VERSION to the new version, say `0.2.42':
% echo 0.2.42 > VERSION
% git commit -m 'Bump version to 0.2.42.' VERSION
2. Tag it with an annotated tag:
% git tag -a -m v0.2.42 v0.2.42
If you want, you can include release notes in the annotated tag
message.
3. Append `+' to the version in VERSION:
% echo 0.2.42+ > VERSION
% git commit -m 'Bump version to 0.2.42+.' VERSION
* SQL/BQL parameters
Use SQL/BQL parameters to pass strings and other values into SQL/BQL.
DO NOT use format strings.
DO: cursor.execute('UPDATE foo SET x = ? WHERE id = ?', (x, id))
DON'T: cursor.execute("UPDATE foo SET x = '%s' WHERE id = %d" % (x, id))
DON'T: cursor.execute("UPDATE foo SET x = '{}' WHERE id = {}".format(x, id))
DO: cursor.execute('SELECT x, y FROM t WHERE z = ?', (z,))
DON'T: cursor.execute('SELECT x, y FROM t WHERE z = ?', z)
DON'T: cursor.execute('SELECT x, y FROM t WHERE z = {}'.format(z))
Prefer named parameters if the query has more than one parameter and
covers multiple lines:
cursor = db.cursor().execute('''
SELECT COUNT(*)
FROM bayesdb_generator AS g, bayesdb_column AS c
WHERE g.id = :generator_id
AND g.tabname = c.tabname
AND c.colno = :colno
''', {
'generator_id': generator_id,
'colno': colno,
})
If the tables and columns in the query are determined dynamically,
then use bql_quote_name and format strings to assemble SQL/BQL
queries. But prefer to avoid this by writing different queries or
reusing subroutines that already do it, such as in bayeslite.core.
DO: from bayeslite import bql_quote_name
qt = bql_quote_name(table)
qc = bql_quote_name(column)
cursor.execute('SELECT %s FROM %s WHERE x = ?' % (qc, qt), (x,))
DON'T: cursor.execute('SELECT %s FROM %s WHERE x = ?' % (column, table), (x,))
DON'T: cursor.execute('SELECT %s FROM %s WHERE x = %d' % (qc, qt, x))
* Randomization
Avoid indiscriminate nondeterminism.
All random choices should be made from PRNGs with seeds that the user
can control, via the normal Python API and the bayeslite shell. Any
actual nondeterminism should be clearly labelled as such, e.g. a
future shell command to choose a seed from /dev/urandom.
* Python coding style
Generally follow PEP 8, with these exceptions:
- Single line between top-level definitions.
=> Two blank lines are needlessly verbose.
- Four spaces, instead of alignment, for continuation lines.
=> Using alignment penalizes descriptive function names and renaming.
Additional guidelines:
- General API should be exposed in the bayeslite module.
- General names should begin with the lexeme `bayesdb' or `bql'.
- Prefer greppable descriptive global functions over methods.
=> Legibility is more important than extensibility.
- Prefer single-quoted strings, except use """ for docstrings.
- unicode for user data; str for SQL/BQL text.
- Don't bother with pylint. It will waste your time.
- Prefer clarity over boilerplate.
=> No need to clutter every docstring with a sphinx template.
* SQL updates
When issuing an UPDATE command to sqlite3, if you can count the number
of rows it should affect, do so and assert that it affected that many
rows:
total_changes = bdb._sqlite3.totalchanges()
bdb.sql_execute('UPDATE ...', (...))
assert bdb._sqlite3.totalchanges() - total_changes == 1
XXX Should not use bdb.sqlite3 explicitly here.
XXX Not every use is so marked.