-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For certain Bech32 strings, deleting or inserting a single character produces a string that is still valid #51
Comments
Hi, thanks for analysing this. This is not by design, but it can be explained from the specification. It's also a bit more general than what you explain (basically: take a bech32 string, xor a 1 into the last character, then push or pop as many 'q's as you like, and then xor a 1 into the last character again... should always give you a valid new bech32 string). We should document this. |
No worries! We discovered this in our testing, and wanted to share the finding. |
I can't fully replicate the claims of the OP when using BIP173 addresses for Bitcoin. Specifically, the "Cases" section seems to imply one or more
Here's code for checking the frequency of detecting an insertion error that uses the reference python library. It's easy to modify the regex and slice to check for deletion errors. from hashlib import sha256
import segwit_addr
import re
## Make results completely reproducible by using a shachain initialized
## from a seed.
SEED='foo'
## Print the seed into the log
print('Seed: "{}"'.format(SEED))
## Initialize the program value from our seed
program = sha256(bytes(SEED, 'utf-8')).digest()
typos_missed=0
typos_caught=0
iterations=0
while True:
## Derive a new (dummy) witness program
program = sha256(bytes(program)).digest()
## Create a v1 segwit address for the program. Use v1 because its
## only current restrictions are a length between 2 and 40 bytes (inclusive)
addr = segwit_addr.encode('bc', 1, program)
## If the address ends with 'p', attempt inserting a 'q' in the
## penultimate position and then see if the address decodes
if re.search('p$', addr):
mod_addr = addr[0:-1] + 'qp'
if segwit_addr.decode('bc', mod_addr) == (None, None):
typos_caught += 1
else:
typos_missed += 1
else:
continue
iterations += 1
if iterations % 1e4 == 0:
print("Missed:", typos_missed / iterations)
print("Caught:", typos_caught / iterations)
print("p$ addresses checked:", iterations)
break I don't think this changes anything significantly for this issue, but I wanted to mention it as I was doing some other analysis on the frequency of uncaught deletion errors and couldn't understand why, in millions of random addresses, I hadn't yet encountered a problem with any addresses mutated from '...qp' to '...p'. |
@harding Nice find. What you're seeing is the effects of the BIP173 address encoding beyond Bech32 itself. A segwit address is encoded in Bech32 as one symbol for the witness version, plus an 8bit-to-5bit conversion of the witness program bytes. This conversion has the following properties:
When you're turning a final "qp" into "p" for an otherwise 32-byte witness program, you end up with an invalid length (turning "qqp" into "p" should sometimes work, though). When you're turning "p" into "qp", one symbol from the checksum now becomes a data symbol. That data symbol becomes the final part of the 8-to-5 converted witness program in BIP173 addresses, so it must be correctly padded. If the original checksum bits on those places were not zero, the result will not be a valid address. |
For those not following the bitcoin-dev mailinglist, I've posted a more elaborate analysis of this (and other) kinds of errors in https://gist.github.com/sipa/a9845b37c1b298a7301c33a04090b2eb. I'll work on including the findings into BIP173, and proposing a variant of the algorithm that's resistant to all known classes of errors (with failure probability 1/2^30 or better). |
I've created a proposal to amend BIP-173 to address this issue, and would appreciate any feedback. |
Overview
For certain Bech32 strings, deleting or inserting just a single character can produce a string that is still valid when tested against the reference implementations.
Implementations Tested
Cases
There appear to be two main cases:
Insertion: if a valid Bech32 string has the suffix
p
, inserting a singleq
character immediately before thep
will produce another valid Bech32 string.Deletion: if a valid Bech32 string has the suffix
qp
, removing theq
character will produce another valid Bech32 string.These production rules can apparently be applied repeatedly, either by deleting many
q
characters, or by inserting manyq
characters, producing a valid Bech32 string after each step.Examples
The string
ii2134hk2xmat79tp
is a valid Bech32 string, with:ii2
34hk2xm
at79tp
Insertion
By inserting one or more
q
characters immediately before the finalp
character, we can produce more strings that are valid Bech32 strings:ii2134hk2xmat79tqp
ii2134hk2xmat79tqqp
ii2134hk2xmat79tqqqp
ii2134hk2xmat79tqqqqp
Deletion
Conversely, by deleting one or more
q
characters located immediately before a finalp
character, we can go the other way. The following are all valid Bech32 strings:eyg5bsz1l2mrq5ypl40hqqqp
eyg5bsz1l2mrq5ypl40hqqp
eyg5bsz1l2mrq5ypl40hqp
eyg5bsz1l2mrq5ypl40hp
Outstanding Question
It's currently unknown to us whether this is a property of the specification itself, or merely a feature of the reference implementations that we tested.
Background
We ran a suite of property tests against a modified version of the reference implementation, designed to simulate simple user errors such as omitting a character from, or inserting an extra character into, pseudo-randomly generated Bech32 strings.
The text was updated successfully, but these errors were encountered: