PoRep security policy (FIP-0047) #415
Replies: 11 comments 29 replies
-
I think there are (at least) two different parts to the policy to be decided here.
For part 1, on the surface it would seem desirable to have SPs re-seal the "same" sectors they have already committed. That is, perform a new proof of replication for the same underlying data (unsealed sector CID, aka CommD). If they did this, then any parties relying on the data sealed into a sector could continue to do so. Storage markets and the FIL+ program are two obvious such parties, but as we enable more programmability there could be many more on- and off-chain entities that rely on the assurances offered by PoRep. I point this out because, as far as I know today, we don't know how to do this. We don't have a specific technique for proving that a re-seal of a sector commits to the same data as the original proof-of-replication. The unsealed sector CID is not stored in chain state. We don't yet know that it's impossible, but developing a mechanism (or proving we can't) could represent some weeks of work. We'd be far better off to work this out in advance of discovering a flaw. The unsealed sector CIDs do exist in the blockchain message history, but that is not immediately accessible to actors. A possible technique would be to build an off-chain database of them, and SPs would prove inclusion of the appropriate (Sector ID, CommD) when re-sealing. If we can't find a practical re-sealing mechanism, we should also prepare for the case that we can't require SPs to re-commit to the same data. This means that providers would recover power by committing new sectors. The default behaviour of the built-in storage market and FIL+ would be to consider the old sectors terminated when the proof validity deadline expires. This is an unattractive position for clients, but it turns out we are already working towards the mitigation we'd need. Proposals like #313 work towards the ability to transfer FIL+ verified pieces of data from an expiring sector into a new one, and #298 lays the foundations for the same capability in any storage market. These mechanism could allow a provider to transfer client data into new sectors, maintain deal-related commitments, and let their old sectors terminate "empty" (with any penalties correspondingly reduced to CC sectors). These mechanisms' utility in responding to a PoRep flaw might motivate implementation of those specific capabilities sooner rather than later (FYI @ZenGround0). For part 2, I think this should primarily be driven by analysis of effects on network power, collateral, token supply, SP profitability etc to find which policy would be least disruptive. |
Beta Was this translation helpful? Give feedback.
-
I would like to propose a modification to changes proposed in #366, dealing with increased The current SDM proposal increases the maximum sector commitment length from 1.5 to 5 years. It does it by increasing the ProposalThe core of the proposal is to separate the period of validity of a proof from the period of commitment for a sector. This leaves the 1.5-year sector extension process in place to maintain proof validity, but allows SPs to commit to longer periods. The existing sector A new sector property is introduced called
The proof refresh window creates a trade-off: the larger the window, the more refreshes can happen in a singular batch, but more frequently each proof must be refreshed. The proof expiration is not freely chosen by the SP, but takes a value that is derived and quantised from the sector’s activation epoch. Storage Provider can at any point call RefreshProofEpiration(SectorSelector) requesting a refreshed proof expiration. This call only results in an actual refresh of the ProofExpiration if called within ProofRefereshWindowof the ProofExpiration. In case of a PoRep bug, the It also has the following benefits:
Limitations of this mechanism include:
For more technical details please see this document. |
Beta Was this translation helpful? Give feedback.
-
I've posted a draft FIP at #446. The proposal introduces a proof expiration mechanism which supports the orderly processing of all sectors in the network. This mechanism would need to be implemented before committing any sectors with longer durations – it can't be deferred until we discover a PoRep bug. However, the policy about what to do in case of a bug does not need to be implemented yet (or, hopefully, ever). |
Beta Was this translation helpful? Give feedback.
-
@anorth Hi would like to know if this proposal is introduced for Committed Capacity (CC) sectors or sectors containing storage deals. Thanks~ |
Beta Was this translation helpful? Give feedback.
-
Hi @anorth @jennijuju Is it possible that the day before the sector expires the storage provider extends the lifetime for 1.5 year again, the power will now be 5x or even less in terms of sectors containing storage deals? |
Beta Was this translation helpful? Give feedback.
-
@jennijuju Thanks a lot! May I know when this proposal will come into effect please? |
Beta Was this translation helpful? Give feedback.
-
The policy for this FIP stated here is
Is point 5 correct? If yes that seems like it places an immediate and substantial burden on SPs in the case of a PoRep bug. Unlucky sectors will have arbitrarily small durations to reseal (such as 1 day). More precisely, simulating the stated policy, assuming a max 5 year sector commitment, gives a distribution of ‘time-to-reseal’ in the event of bug discovery that’s shown in the image attached. You can see a substantial proportion have short reseal times (e.g < 2 weeks). While this is a disaster response policy, such a short time for any proportion seems unnecessarily disruptive to SPs. If this interpretation of the policy is correct it raises some questions on the balance of tradeoffs and potentially the policy should be improved:
But first, I’d like to find out if the above policy interpretation/statement is what’s planned to be implemented. 🙏 |
Beta Was this translation helpful? Give feedback.
-
Hi @geoff-vball, with regards to the most recent commit to the FIP locking down the exact parameters for the policy, we at the CryptoEcon Lab have some analysis that we would like to share on specifically this. We are currently working on consolidating all our research and intend to share it publicly by Monday/Tuesday latest. |
Beta Was this translation helpful? Give feedback.
-
We recently conducted an analysis of the incentives in FIP-047 and created a report detailing some of our findings. We welcome all suggestions and feedback and wish to open a discussion on locking down the exect parameters for the policy, specifically the Our key findings can be summarized as follows:
|
Beta Was this translation helpful? Give feedback.
-
BackgroundFIP-0047 aims to provide a security mechanism for the network in case a flaw is discovered in the PoRep algorithm used to commit sectors. It establishes a rolling 1.5 year schedule for expiring the validity of the PoRep proof for every sector, requiring each to be replaced or terminated. This schedule must be implemented prior to any sector being committed for >1.5 years and any such flaw being discovered, as (a) doing so afterwards would involve a large and complicated state migration in the middle of a presumably fragile network and intense time pressure to fix the actual flaw, and (b) implementing the schedule now clearly communicates to SPs how such a flaw would be handled, so they can mitigate risk as they see fit. The mechanism specified in FIP-0047 requires a storage provider to send a message to refresh the proof for each sector in a 60-day window before each 1.5-year validity period. If the message is not sent, the sector is immediately terminated. This ensures that every sector’s proof has been checked within the past 1.5 years, and presents a simple mechanims for the network to deny subsequent refreshes if a flaw is discovered. Motivation for changeDespite this mechanism being approved and implemented (but not yet active on the network), there are two reasons motivating a change:
ProposalWe (@Kubuxu, @ZenGround0 and I) have the concept for an alternative implementation of FIP-0047 which will solve both problems: a much simpler mechanism for a schedule of proof expirations with no need for a message from storage providers to update proof validity. In brief:
Compared with the original implementation, this proposal:
The most difficult part of this proposal is likely to be a data structure for an arbitrary-cardinality collection of sector numbers. A single bitfield has bounded capacity, so we need a collection of them with appropriate indexing and splitting logic. MigrationOne-time migration to populate the schedule with existing sectors. Requires reading sector infos, but not writing them. ConstraintsFIP-0047 is already approved, implemented, and scheduled for activation nv19. It’s scheduled there in order to unblock extension to maximum sector commitments, such as FIP-0052 (approved) or FIP-0056. We don’t intend to impact the timeline of those changes being activated. Thus any alteration to FIP-0047 needs to be implemented and tested in a very short amount of time. We have reason to believe that this proposal is simple enough to be implemented on such a tight timeline, and still result in a mechanism that is better for SPs and simpler and safer than the existing mechanism. In the worst case, we can just ship the existing FIP-0047 implementation if necessary. |
Beta Was this translation helpful? Give feedback.
-
@Kubuxu and I have now realised an even simpler solution: we can do nothing after all. We have convinced ourselves that in fact we can make no code or state changes now, and still establish an orderly off-boarding of sectors over a defined period of time in the event that becomes necessary. The new mechanism for doing so is even more friendly to SPs, giving them more control over which sectors to terminate or replace and in what sequence. BackgroundFIP-0047 aims to provide a security mechanism for the network in case a flaw is discovered in the PoRep algorithm used to commit sectors. It establishes a rolling 1.5 year schedule for expiring the validity of the PoRep proof for every sector, requiring each to be replaced or terminated. We thought that this schedule must be implemented prior to any sector being committed for >1.5 years and any such flaw being discovered, as (a) doing so afterwards would involve a large and complicated state migration in the middle of a presumably fragile network and intense time pressure to fix the actual flaw, and (b) implementing the schedule now clearly communicates to SPs how such a flaw would be handled, so they can mitigate risk as they see fit. However, we have since realised that an actual schedule (i.e. sequence of sectors and epochs) is not required. We need a scheme for orderly replacement or termination the sectors with old proofs, but we do not need the specific schedule to be established ahead of time. Instead, we can simply require that all SPs terminate or replace some fraction of their old sectors per proving period, such that all of them are terminated after 540 days. The network doesn't care which ones happen when, and the SP can chose those sectors that are most valuable (e.g. those with deals first). We just need to count them Motivation for changeProduct impact: All schemes which establish a schedule for termination remove freedom for the SP in arranging their re-sealing operations. While "in order of ActivationEpoch % 1.5y" is a natural sequence, it might not match the SPs utility function at all, and is far from evenly distributed network-wide. An even distribution can be obtained with "SectorNumber % 540", but this essentially randomises the sequence for the SP, also unaligned with utility. There's no clear reason for the network to prefer one sequence to another (both the proposals were arbitrary), except to prefer whatever the SP prefers to keep their operation as intact as possible in the circumstances. Complexity: Doing nothing is much simpler! Also the code to enforce this scheme will be much simpler than either prior proposal. This does postpone all of the work to enforce the off-boarding until such time as we might discover and respond to a PoRep flaw. We can choose to do some of the implementation work ahead of time and leave it dormant. ProposalIn the event of a PoRep flaw, a network upgrade will introduce a new proof type and associated code, and specify a start epoch
Then the following changes are made to miner actor operational code:
This scheme lets the SP choose to terminate/replace sectors in whatever sequence they wish, so long as they at least keep up with a rate of uniform progress that would replace all their sectors within the specified period. If the SP lacks sealing capacity they will need to manually terminate, rather than replace, some sectors, but they can choose which ones. The inducement to an SP to keep up is that their deadlines will be faulted if they don't. This puts those sectors on a fixed timeline to forced termination anyway, but costs more in fees than just terminating them up front. One notable difference of this scheme from the previous ones is that the forced termination is an explicit SP action (unless they let deadlines fault), which means they pay gas. Prior schemes did the forced termination of unreplaced sectors in cron. This may be a case where we consider a gas rebate for part of the gas costs of manual termination to be appropriate. Such a rebate is a network subsidy, but in this case where the network is requiring the SPs to re-seal, in order to recover from a network flaw, extracting additional payment seems undesirable (the gas would represent a transfer from SPs to token holders). We must charge at least some gas, though, to keep from overloading block validation too much. Next stepsWhile FIP-0047 is already approved and scheduled for activation, it should be either re-written or replaced. The motivation of clearly communicating to SPs how such a flaw is expected to be handled remains important. FIP-0047 could be reduced to an informational one describing this policy. |
Beta Was this translation helpful? Give feedback.
-
This is a proposal for the Filecoin network to ratify a policy to be adopted in case an insecurity is discovered in the theory or implementation of proof-of-replication. The discussion is prompted by #386 which, by proposing to change the maximum sector commitment duration, changes the implicit policy that exists today. But this is a discussion that we should have in any case.
Background
Proof-of-replication and network security
The Filecoin network uses cryptographic techniques to provide assurance of the physical uniqueness of sectors of data (proof of replication, or PoRep) and their ongoing availability (proof of space-time, PoSt). These mechanisms provide the proof of work underlying the blockchain’s security (in addition to the security offered by pledge collateral stake).
Some of the cryptography involved has been developed relatively recently. It is possible that there are errors in either the theory or implementation, or that errors may be introduced one day, that undermine the desired assurances. The result of such an error would most likely be that storage providers could “cheat” the network to claim they were maintaining more committed storage than they in fact possessed. This would reduce network security as consensus power could be gained without the expected physical infrastructure commitment. It is also unfair to non-cheating providers, assuming knowledge of the flaw was limited. In the case of an error in PoRep, it is likely that there would be no possible protocol change that could detect the cheating sectors after commitment.
This situation has already arisen once in the life of the Filecoin network. The v1.1 PoRep algorithm patched a bug in the v1 PoRep implementation that weakened the security assurances of sectors. The bug was responsibly reported to the Filecoin team and it is unknown if it was ever exploited by a provider.
We are not aware of any similar bugs at this time. Filecoin storage is secure as far as we can ascertain.
Current policy
The network today has an implicit policy on what to do if another such bug is detected:
The policy might thus be summarised as: put up with the potentially-insecure power for a limited period of time, but retain existing commitments of providers to the network and vice-versa. The 1.5-year window was selected as a compromise: a longer maximum commitment would be beneficial for storage stability, but a shorter bound improves the response time in case of a bug.
Pre-genesis network designers may have considered this policy a placeholder, to be replaced on the fly in the event of a bug. But without alternative ideas expressed clearly in FIPs or code, participants might reasonably assume that the current code is a stable policy.
Changing the maximum sector commitment duration
The sector commitment duration (currently 1.5 years) is the period availability a provider commits to when first proving a sector. When a sector is close to expiration, a provider can extend the sector for up to the same duration again. This can be repeated until the sector maximum lifetime (currently 5 years), after which further extensions are prohibited.
The sector duration multiplier proposal proposes increasing the maximum sector commitment duration to match the maximum lifetime of 5 years, in order to most simply provide increased rewards for longer commitments.
This demands a new analysis and policy for how the network should respond to an error in current or future PoRep implementations. We should express such a policy in any case, but the proposed extension of commitment duration makes it more urgent.
Proposal
This proposal aims to establish a concrete and widely-accepted policy for how the network should respond to possible future bugs that compromise the security of storage-based consensus.
Goals
It may not be necessary to implement code embodying the agreed policy until necessitated by discovery of an insecurity. It’s quite likely that the details of the flaw would inform the implementation. A FIP-level ratification of a concrete policy, but without requiring code, provides a good balance between community consensus and future adaptability.
Ideas
We identify two basic classes of policy, some of which have parameters and/or multiple mechanisms that could realise them.
Note that these are policies for disaster response. We should not expect the outcomes to be desirable, as compared with PoRep continuing to be secure. But it’s important to identify what trade-offs network participants would prefer in the event of such a discovery.
Option A: status quo
One possible policy is to continue today’s implicit policy, but allowing insecure power to persist for >3x the current policy, up to 5 years.
Storage providers in this scenario could be prohibited from extending the life of any sectors initially committed for less than 5 years, but given the very large proposed power multipliers offered for long commitments, it’s likely that 5-year-committed sectors dominate the network power table at most times. Such sectors might be part-way through their life at the time a flaw is detected.
This is the easiest to implement (no change from today) but may be unpalatable from a network security point of view. While new sectors would gradually reduce the power attributable to possibly-cheating ones, significant insecure power would persist for years.
Option B: re-sealing
Another possible policy is to allow/require storage providers to re-seal their committed sectors with a new PoRep algorithm within some fixed time window in order to maintain power. The time window for re-sealing is a policy parameter. We might assume that 1.5 years is an appropriate value for minor weaknesses in security, but insufficient to address a major flaw that retro-actively weakens all power. In such cases, the re-sealing deadline might have to be much shorter.
Another parameter of mandated re-sealing is any penalty to be extracted for sectors which are not re-sealed. Failure to re-seal before the deadline could be considered an early termination of the sector, where the provider would pay the termination penalty (at present, approximately 90 days worth of projected reward at the epoch of commitment, but may differ in the future). Alternatively, failure to reseal before the deadline (which was unknowable when the sector was committed) could be considered a “normal” expiration, effectively forgiving the commitments in case of a PoRep flaw. Or something in between: a penalty that differs from the usual early-termination fee.
We might not be certain about the appropriate re-sealing window in advance of identifying a concrete flaw, but we should strive to align on a default for minor flaws with similar impact to the V1 PoRep. Similarly, it may not be possible to lock in an appropriate failure-to-reseal penalty, but we should strive for alignment on a value given a moderate re-sealing window. It might be easier to set a tight re-sealing deadline if failure to meet it is not penalised too hard.
Discussion
Informing storage provider business operations
Storage providers need some level of certainty over network behaviour in order to appropriately fund and structure the risk they take. For example, if Option B were ratified, a provider could chose to commit only short-duration sectors and effectively avoid most unexpected obligations to re-seal, or, seek the higher returns from longer commitments but factor in or insure against the potential costs of re-sealing.
Without clarity on the network’s response to such a possibility, providers might assume the status quo and be ill-prepared for re-sealing were it subsequently mandated. Destabilising storage provider business and operations by invalidating assumptions would likely make a bad situation even worse.
Analysis
The options above, and any new ones proposed in this discussion, require some analysis to understand their impacts. We need to answer questions like:
We should understand what we are trading-off between various policy options and parameters, in response to a severe network distress.
Implementation feasibility
The options around re-sealing require significant implementation work to realise on-chain. At the scale of the Filecoin network even today, traversing all sectors to compute new deadlines or terminate faulty sectors is a large amount of on-chain processing that would have to be spread over many weeks or months. An algorithm to prove that a new PoRep commits to the same data as the original proof must also be described.
Before adopting a policy, we should gain a reasonable idea of its implementation feasibility.
Limits
There are limits to the scope of events we can plan for. An insecure proof-of-replication is one network risk which has been considered since Filecoin’s early development, and the current 1.5-year maximum commitment provides a mitigation to some class of flaws.
However, there is always the possibility of some class of flaw that researchers and implementers have not predicted. The policy we arrive at here may not help in all possible situations, and we should avoid applying it if it doesn’t fit whatever flaw we might discover. Nevertheless, for this class of well-understood possible flaw, a clear policy agreed up-front should greatly aid the network in navigating that situation should it arise.
Beta Was this translation helpful? Give feedback.
All reactions