Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

IPNS Improvement Design Exploration #260

Open
7 of 23 tasks
Stebalien opened this issue Aug 31, 2017 · 0 comments
Open
7 of 23 tasks

IPNS Improvement Design Exploration #260

Stebalien opened this issue Aug 31, 2017 · 0 comments
Labels
Candidate Open Problem Mutable Data Naming, Real-Time updates, IPNS

Comments

@Stebalien
Copy link
Member

Design Exploration

This document explores some areas in which it would be nice to improve IPNS and some ways in which to do so. I'm posting it here so I/we don't lose it and in case anyone else is interested in picking up this work. This is a very rough document with some good ideas but not nearly enough thought put into them.

Design Goals

First, we need to decide on what we need out of IPNS (what we have (checked) and what
needs to be improved):

  1. A consistency/threat model:
  • Consistency
    • Between IPNS addresses.
    • Within a single IPNS address.
    • Forgery Proof (signed IPNS records).
    • Freshness Guarantees (IIRC, the current guarantee is "not expired").
  • (?) Censorship Resistance (DHTs can be censorship resistant).
  • (?) DoS Resistance (DHTs can be DoS resistant).
  1. Mobile/IoT friendly lookups:
  • No CPU intensive operations (e.g., really expensive crypto)
  • Minimal bandwidth usage (DHT lookups require many round trips).
  • No continuous background communication to avoid wasting power and
    bandwidth (IIRC, the DHT does a fair amount of background communication).
  • Minimal storage requirements.
  1. Low latency initial lookups
  2. Low latency updates (PubSub).
  3. Avoid having to constantly re-publish IPNS records.

This document focuses on discussing a system to tackle goals 1-3. We're currently working on using PubSub to tackle goal 4 and goal 5 could probably be bolted on to any system solving goals 1-3 with some incentive model.

Given these goals, the main design decisions from my perspective are:

  • Funding Model
  • Consistency/Threat Model

Parties

First, so we can have some consistent terminology, I'll define the following
parties:

  • Provider: An entity participating in the server-side of the IPNS distributed
    system.
  • Publisher: An entity who publishes IPNS records.
  • Client: An entity retrieves IPNS records.

Funding Model

With the DHT, we amortize the cost over all participants. However, due to the
low-latency requirements of this system, we'll probably have to go with a more
centralized model and therefore may need a way to fund it.

Nice People Pay

One solution is to assume that some set of "benevolent" organizations will run
IPNS providers on their own dime. I say "benevolent" in quotes because
organizations may be willing to act as a provider for many reasons:

  1. Metadata: They want to know who requests what IPNS records, how are IPNS records
    clustered, etc.
  2. Political: Decentralized systems tend to be censorship resistant and tend to
    promote free speech.
  3. Invested: We (and other companies that rely on IPFS) might want to run one as
    it increases the value of IPNS, IPFS, and all related technology.

Pros

  • End users don't have to pay.
  • We don't have to implement a payment system.

Cons

There's no such thing as a truly free lunch. I'm always weary of free
centralized (or semi-centralized) services as anyone offering one usually has an
agenda (which may not be in societies best interest).

Publishers Pay

We could use a peering system (like ISPs) where publishers pay a single provider
and that provider agrees to store and serve the publisher's IPNS records and
exchange them with their (the provider's) peers.

Pros:

  • Someone is directly paying (no free lunch problem).

Cons:

Publishing IPNS records would cost something. This could be alleviated by
allowing free short-lived records and/or allowing free limited accounts.
Furthermore, publishers could always just publish to the DHT if they don't
care about latency.

Clients Pay

An alternative would be to let clients pay. For example, clients could pick a
few trusted IPNS providers, pre-pay for some number of request tokens (ecash?)
using a crypto currency, and then return a request token each time they make a
request to one of their chosen providers. Alternatively (far future), ISPs could
simply provide IPNS resolution as a service to their customers.

Pros:

  • May incentivize IPNS providers to store and serve popular IPNS records.
    Depending on the consistency model, this may not be relevant (IPNS providers
    may have to store all IPNS records).
  • Doesn't necessarily involve manual peering agreements.

Cons:

Costs users which could be a significant barrier to entry (any price is a
barrier to entry, even if it's absurdly low). However, we can always start off
by providing our own free IPNS provider and move to a paid model if necessary.

Conclusion

As I believe we could upgrade to either (or both) of the latter two payment
models later if needed, we should probably hold off on them for now. However, we
should leave room in the protocol for a payment system.

Consistency/Threat Model

I'm going to mix the consistency and threat models because they're inextricably
linked: the ways in which we trust each party determine how each party can
effect the system's guarantees.

  1. Authenticated: It must be (practically) impossible to create IPNS records
    without the associated private key. Luckily, we already have this property.
  2. Censorship Resistant: I would like to make it infeasible to censor IPNS
    record updates. While we do want to support censorship of IPLD objects
    using a voluntary blacklist, IPNS address censorship is too easily abused.
    Unfortunately, this censorship resistance could be abused to turn the IPNS
    system into a censorship resistant data-store (by storing the illegal/illicit
    data in the IPNS records themselves).
  3. DoS Resistant: The system must be resistant to DoS attacks. Unfortunately,
    it's generally impossible to be completely resistant. Note: this is not the
    same as censorship resistant. I consider censorship to be censoring
    individual records while allowing the system to continue functioning.
  4. Consistent: Exactly what consistency model we want is still an open question.
    I consider this part of the threat model because a malicious provider may
    attack the system by presenting clients with inconsistent state.

In the following discussion, I assume:

  1. There is a set of "important" (non-garbage) IPNS keys/records.
  2. There is a set of "trusted" IPNS providers.

I then break privilege into three categories:

  1. Unprivileged: Entities that don't control an "important" IPNS key and aren't
    in the "trusted" set of providers.
  2. Publishers: Entities that control an "important" IPNS key.
  3. Providers: Entities that control a "trusted" IPNS provider.

For reference:

  • Means I intend to allow the attack (trust the party not to do this).
  • [-] Means I intend to try to stop the attack but may not be able to entirely
    thwart it.
  • Means I intend to prevent the attack (don't trust the party not to do this).

Unprivileged

I divide attacks from unprivileged entities into (exhaustive):

  • [-] DoS: Anyone may attempt to prevent this system from functioning (DoS). The
    system will have to have some DoS resistance but there's no way we can
    completely protect against DoS.
  • Forgery: Attackers may try to forge IPNS records. This will be
    (practically) impossible (enforced by crypto).
  • Censorship: Attackers may try to censor IPNS records. It should be
    practically impossible to censor an individual IPNS record without causing a
    wider DoS. While we can't stop a global adversary from shutting down the
    system (or disconnecting individual clients), we should be able to prevent
    such adversaries from censoring individual records.

Publishers

In addition to the unprivileged attacks listed above, publishers can try to
perform the following potentially undesirable actions (non-exhaustive?) with
respect to their own IPNS addresses:

  1. Collision: Publish multiple different IPNS records for the same IPNS
    address with the same timestamp.
  2. Backdate: Backdate an IPNS record.
  3. Partition: Present two different IPNS records to two different parties at
    the same time. These IPNS records don't necessarily need to have the same
    timestamp so this doesn't quite fall under the collision category.

All these problems are present in DNS+HTTPS. Furthermore, I believe these
problems are mostly out-of-scope for this system and we can probably layer a
"trusted" IPNS system on top. I've outlined a few ways to discourage such
behavior in Appendix A.

Therefore, I'm inclined to largely trust publishers when it comes to how they
manage their own IPNS addresses. We may end up choosing a consistency model that
forbids these actions but I don't consider that a goal.

Providers

Providers may perform the same attacks as unprivileged entities however, the
threat model is significantly more nuanced. Furthermore, they may try to violate
the chosen consistency model.

Given that providers have reputation, the extent to which they can attack the
system isn't a simple can/can't. Therefore, I break the extent to which
providers can attack the system into the following categories:

  1. Can hinder (e.g., slow down).
  2. Can do but may loose reputation.
  3. Can do with sufficient collusion.
  4. Can.

DoS

For DoS, we can limit providers to attacks 2 and 3. If all servers in a client's
"trusted providers" set refuse to operate, there's nothing we can do about it.
If a subset of providers try to DoS the system, they'll loose reputation.

Forgery

As before, we prevent forgery using cryptography so we should be able to prevent
all attacks of this type (assuming the adversary has limited computational power
and the crypto is sound).

Censorship

A trickier attack to defend against is censorship, mostly due to the legal
issues involved (and the fact that this won't be a fully decentralized
system). I believe the best solution is to prevent silent censorship and allow
clients to try multiple providers (in different legal jurisdictions) if they
wish to work around censored records (possibly falling back on a fully
decentralized system like a DHT).

Unless we go for a fully decentralized system, we can't outright prevent
providers from censoring IPNS records because, given sufficient collusion, the
trusted set of IPNS providers could simply erase all evidence of an IPNS
record's existence. Furthermore, simply discouraging censorship by punishing
providers that censor IPNS record updates (category 2) will not be sufficient as
providers in some (all?) jurisdictions will inevitably be legally compelled to
censor some IPNS addresses (or face being shut down entirely). These are
just inherent issues with centralized systems and governments.

However, we should be able to discourage silent censorship by requiring that
providers inform clients that an IPNS record is being censored instead of
returning an old IPNS record or claiming that one doesn't exist. While some law
enforcement agencies won't like this requirement, I believe most will accept
that a provider can't do otherwise and abstain from simply shutting down the
provider entirely. This way, clients can try another provider in a different
legal jurisdiction if they encounter censorship. This will increase the latency
for retrieving highly censored records but I believe it's a reasonable
compromise.

So, at the end of the day, I believe we can restrict providers to attacks 1-3
(prevent them from outright censoring IPNS records).

Consistency

Finally, we need to decide on an actual consistency model.

In this section, I don't consider the threat model (under what attacks does the
system remain consistent) as this section is already more detailed than I would
like. Furthermore, doing so would (mostly) be a waste of effort as there's
little point in considering the possible threat models of consistency models we
don't end up using.

In this section, I consider the following consistency models:

  1. Monotonic Consistency: For any given IPNS address, if a client observes an
    IPNS record with timestamp T1, the client will never accept any record with a
    lower timestamp T2 < T1. I believe this is effectively the consistency model
    of the DHT.
  2. Strict Consistency: There's a total order of all IPNS record updates.
  3. Explicit Causal Dependencies: IPNS records explicitly state the minimum
    version/timestamp of all IPNS addresses on which they depend.
  4. Causal Consistency: If a publisher observes a set of IPNS records for
    addresses A[] with a timestamps T[] and then publishes an IPNS record
    R, any client that observes R and then looks up A[i] will receive an
    IPNS record for A[i] with a timestamp greater than or equal to T[i]. This
    is a common consistency model in shared memory systems.
  5. Application Level Consistency: Instead of enforcing a consistency model at
    the IPNS level, we can enforce it at the application level as needed.
Monotonic Consistency

This can be enforced client-side by simply rejecting old IPNS records when a
newer one is known.

Pros

  • It's really simple (enforced entirely by the client).
  • IPNS providers could be simple caching resolvers built on-top of the DHT
    (without modifying it).

Cons

  • The only real guarantee clients can rely on is that the IPNS records they get
    haven't expired.
Strict Consistency

Enforcing strict consistency would require some form of global byzantine
consensus system to ensure global consistency and some way to verify that an
IPNS record belongs to the current agreed upon state.

See Appendix B for details on how this system might work.

Pros

  • Strict consistency generally makes developers' lives easier.
  • Strict consistency would imply censorship resistance.
  • Should be fairly easy to design (see Appendix B)

Cons

  • It requires that every provider agree on the entire state of the world at
    any given point in time.
  • It's not interplanetary. At the very least, we'd need a separate IPNS network
    per "latency zone".
Explicit Causal Consistency

In practice, strict consistency may be overkill. Instead, we could consider a
system where each IPNS record describes the minimum state of the world on which
it depends. That is, the minimum version/timestamp of all IPNS addresses an IPNS
record depends on. Clients would use this information to determine the minimum
acceptable timestamp for any given IPNS record.

Pros

  • Doesn't require any cooperation between providers.
  • Works well in a partitioned network (all consistency information is encoded in
    the IPNS records themselves).

Cons

  • Potentially large IPNS records.
  • Publishers (applications) need to somehow track dependencies. Simply treating
    the entire "read set" (set of IPNS seen to date) as the dependency list isn't
    feasible as it would make IPNS records massive.
Causal Consistency

Instead of explicitly listing dependencies, IPNS records could include, for
every provider used by the publisher (likely just one), a "pointer" to the
current state that provider's timeline. Clients would then use this information
to verify that their provider's state is at least as up-to-date as the listed
states.

Note: This sounds really inefficient but I've convinced myself that there are
reasonably efficient ways to do this, especially if we trust third parties
(e.g., a set of trusted providers) to do the actual consistency verification.

Pros

  • Unlike strict consistency, this is interplanetary. One can think of this as
    strict consistency from the publisher's point of view.

Cons

  • This would require a lot more "original" design work than a strict consistency
    system (unless I just haven't read the relevant material).
Application-Level Consistency

It's worth noting that explicit consistency can be achieved at the application
level by including the oldest acceptable IPNS record along with IPNS links. For
example, given:

minimum_ipns_records = {
  "/ipns/$a": {
    timestamp: ...,
    link: "/ipld/...",
  },
  "/ipns/$b": {
    timestamp: ...,
    link: "/ipld/..."
  }
}; // Can be a separate object (shared between multiple IPLD objects).

thing = {
  "minimum_ipsn_records": minimum_ipns_records,
  ...
}

The application would fall back on the listed IPNS records if the ones it
retrieves from the system are older.

Pros

  • By doing this at the application level, applications can enforce any
    consistency guarantees they need without paying (in performance) for any
    consistency guarantees they don't.
  • Even if an IPNS link goes dead (the newest IPNS record expires), applications
    will be able to resolve the IPNS address to some valid IPLD name.

Cons

  • It shifts the burden to application developers.

Appendix

Appendix A: Publisher Shenanigans

Below, I outline a few ways we can deal with misbehaving publishers with crypto.
Even if we choose a consistency model for the global IPNS system that doesn't
allow these kinds of shenanigans, the ideas presented below may still be useful
in partitioned networks. Note: you don't have to read this section, I mostly
included it to have these ideas recorded somewhere.

Collision

We could use some special crypto to ensure that issuing two IPNS records with
the same time stamp reveals the secret key. This will likely not be an effective
deterrent for short-term attacks so it may not be that useful. However, unlike a
byzantine agreement system, this solution works in a partitioned network.

Backdate

We can build an unbroken chain of IPNS records where each record can have at
most one successor. We should be able to enforce this property by using some
form of single-use signature scheme where making two signatures with the
same key reveals the key. This would obviously need to be significantly fleshed
out.

Unfortunately, unlike a byzantine agreement system, this would still allow
publishers to backdate up to their last published IPNS record.

However, like the crypto solution to collisions, this can be used in a
partitioned network.

Appendix B: Strict Consistency IPNS

Given that I like this option the most, I've put a bit of thought into
how it might be implemented. Unless I'm mistaken, we should be able to build
much of it on-top of IPLD (which is one of the reasons I like this option).

First, at every time step, the system would need to agree on what IPNS record
updates to accept. To do this, we'd use some form of byzantine agreement system.

However, the threat model isn't the same as that of cryptocurrencies. Instead of
double spending, malicious parties will likely want to remove an IPNS address
from the system for either monetary (extortion, competition) or political
reasons (censorship). We may be able to use this to simplify some parts of the system.

Second, we'd need a way to distribute the IPNS records along with proofs that
each IPNS record belongs to the current block. To do this, I'd store all IPNS
records in a single merkle tree (using IPLD) that all providers agree on (using
the byzantine consensus system). The root hash would name the current state and
the proof of membership would consist of a path of IPLD objects from the root to
the IPNS record.

Finally, to prove that the root hash is authentic without forcing clients to
verify the entire blockchain, providers could all sign the current root hash
(e.g., using IPNS records) and clients could fetch a quorum of such signatures
as needed.

Note: This m/n signatories system may not be necessary in some byzantine consensus systems. For example, given a expected consensus based system, it should be possible for peers to know if a block is valid for some timestamp (we could even have an expected N system where we usually end up with duplicate blocks but that's not really an issue for us because we don't have the problem with double spending).

Example IPNS Tree

Given:

  • /ipns/aaa
  • /ipns/aab
  • /ipns/abb

Tree:

root = {
  "a": ipns_a
};

// ---

ipns_a = {
  "a": ipns_aa,
  "b": ipns_ab
};

// ---

ipns_aa = {
  "a": ipns_aaa,
  "b": ipns_aab
};

ipns_ab = {
  "b": ipns_abb
};

// ---

ipns_aaa = IPLD_RECORD_AAB;
ipns_aab = IPLD_RECORD_AAB;
ipns_abb = IPLD_RECORD_ABB;

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Candidate Open Problem Mutable Data Naming, Real-Time updates, IPNS
Projects
None yet
Development

No branches or pull requests

3 participants