Stop downloading entire segments when retrieving objects #3362

teor2345 · 2025-01-31T02:22:49Z

What it does

This PR fixes two performance edge cases in object retrieval:

downloading a segment to work out how much padding the segment has
downloading a segment to remove the parent segment header at the start of that segment

Instead, it:

tries all the possible padding lengths against the object hash
manually decodes and discards the segment version variant, parent segment header, and the start of the block continuation containing the continuing object

How it's written

There is no slow path any more, because that path downloaded full segments, which this code doesn’t need to do.

This code has more constants, structs, and documentation, because the assumptions in the old code were undocumented, or implemented as hard-coded calculations inside each function. That made them tricky to understand, review, maintain, and test. (Which is why multiple bugs have slipped through review and testing.)

Other Changes

This PR fixes a miscalculation of segment offsets, which was never changed (or tested) since 2023, when the segment format was rewritten to remove the encoded number of segment items.

It also does some minor refactors, and adds some utility methods on pieces and segments.

Close #3318.

TODO

Write tests for each object reconstruction edge case (in this PR, or a future PR)

Code contributor checklist:

I have read, understood and followed contributing guide

vedhavyas

make sense
Will look through again once remaining tests are added

shared/subspace-data-retrieval/src/piece_fetcher.rs

crates/subspace-archiving/src/archiver.rs

shared/subspace-data-retrieval/src/object_fetcher.rs

nazar-pc

I went until "Stop downloading full segments during object retrieval" commit and it was not obvious to me how it stops downloading full segments due to large number of changes and new implementation not being an extension of the previous implementation, at least not in any obvious way.

Another challenge is that there is a huge public API surface provided, but it is not clear why all of those bits need to be public and makes it hard to figure out where to even start looking at this. Previous version had somewhat straightforward approach: there was a single public struct + error enum + 2 constants (not sure if both needed to be public though). Struct had just constructor and fetching method, fetching method in turn had a few conditional paths for code to go through that were each mostly self-contained.

I no no longer see fast paths in the code as clearly as before, also judging by number of constants present and extensive documentation there is a lot of assumptions baked in and not even all cases covered yet.

PR description was not enough for me to get at least an overview of the chosen implementation approach here, there is just a brief mention of the outcome and 1451 changed lines in a single commit that changes everything. I could really benefit from a more extensive explanation of "what" and "why", maybe even a P2P discussion of the approach and the changes. I'll have to fully understand what this is doing, how and why before I can approve it.

Non-controversial cleanups better go into a separate PR to merge them quickly and reducing the diff here.

nazar-pc · 2025-02-01T15:41:55Z

crates/subspace-archiving/src/archiver.rs

@@ -111,11 +108,19 @@ impl Decode for Segment {
 }

 impl Segment {
+    /// Decodes and returns the segment variant, leaving the rest of the input as-is.
+    pub fn decode_variant<I: Input>(input: &mut I) -> Result<u8, parity_scale_codec::Error> {


I understand you might need this, but I see little point in having just this method, you can do let variant = input[0]; anywhere in your code directly already with the same exact result.

You need low-level decoding of objects for data retrieval, which makes sense, but in that case either the logic of decoding should be abstracted away in a nice way and integrated logically into subspace-archiver in full or it should not be here at all and you just accept the fact that encoding and parsing for data retrieval purposes are unfortunately disconnected for now and live in different places despite being inherently related to each other.

I'd keep them disconnected for now until full scope of necessary APIs is clear and something nice and logical can be implemented instead of bolting low-level methods on top of data structured that were never meant to expose them. Everything except the highest level APIs was private on purpose. Exposing the lowest level components such as decode_variant() seems to leaks abstractions.

nazar-pc · 2025-02-01T15:42:54Z

crates/subspace-archiving/src/archiver.rs

@@ -176,6 +182,34 @@ pub enum SegmentItem {
    ParentSegmentHeader(SegmentHeader),
 }

+impl SegmentItem {
+    /// Decodes and returns the segment item variant and length, leaving the rest of the input as-is.
+    pub fn decode_variant_and_length<I: Input>(


This is similarly low-level API that I don't think should exist here

nazar-pc · 2025-02-01T15:44:01Z

shared/subspace-data-retrieval/src/lib.rs

 #![feature(exact_size_is_empty)]
+#![feature(slice_take)]
+#![feature(trusted_len)]


Suggested change

#![feature(exact_size_is_empty)]

#![feature(slice_take)]

#![feature(trusted_len)]

#![feature(exact_size_is_empty, slice_take, trusted_len)]

nazar-pc · 2025-02-01T15:52:12Z

shared/subspace-data-retrieval/src/object_fetcher.rs

+/// The minimum size of the segment header.
+//
+// V0 variant + segment index + segment commitment + prev segment header hash +
+// last archived block: number + progress variant (no data in the variant).
+pub const MIN_SEGMENT_HEADER_SIZE: usize =
+    1 + 8 + SegmentCommitment::SIZE + Blake3Hash::SIZE + 4 + 1;


It is not just V0 variant, any variant in SCALE code is exactly one byte, so comment could be clearer.

Also for some reason you're using 8 for segment index (instead of adding a constant there like for others) and use ::SIZE, which by accident happens to match encoding size, but it is in fact an in-memory size, not encoding size of the data structure.

I'd rather add const fn encoded_size() methods to data structures instead of relying on SIZE in external scope like here (which unfortunately also uses SIZE constant):

subspace/crates/subspace-farmer-components/src/sector.rs

Lines 149 to 151 in 56d135e

pub(crate) const fn encoded_size() -> usize {

RecordWitness::SIZE + RecordCommitment::SIZE + Blake3Hash::SIZE

}

Alternatively I'd add a dynamic calculation like done here:

subspace/crates/subspace-farmer-components/src/sector.rs

Lines 122 to 134 in 56d135e

#[inline]

pub fn encoded_size() -> usize {

let default = SectorMetadataChecksummed::from(SectorMetadata {

sector_index: 0,

pieces_in_sector: 0,

// TODO: Should have been just `::new()`, but https://github.com/rust-lang/rust/issues/53827

// SAFETY: Data structure filled with zeroes is a valid invariant

s_bucket_sizes: unsafe { Box::new_zeroed().assume_init() },

history_size: HistorySize::from(SegmentIndex::ZERO),

});

default.encoded_size()

}

It can be cached and will not be much different from a constant from runtime cost perspective, but will automatically adjust if any code changes ever happen, much better from maintenance point of view than numbers like 8 and 4, which despite being nicely documented, will not prevent compilation in case things change in a breaking way.

shared/subspace-data-retrieval/src/object_fetcher.rs

teor2345

Nazar (and Ved if you want), let’s schedule a video call after I’m finished with the private EVM stuff?

I went until "Stop downloading full segments during object retrieval" commit and it was not obvious to me how it stops downloading full segments due to large number of changes and new implementation not being an extension of the previous implementation, at least not in any obvious way.

I understand it would be easier to review a minor rewrite, which added extra steps to the existing functions. But there’s a bunch of reasons why that wasn’t possible.

It was tricky to extend the old code, because it had a lot of unnecessary assumptions (and in some cases, they were wrong). That’s why this code is a partial rewrite - we don’t want to download segments, so I had to remove that entire codepath, and update the other codepath to handle all the edge cases.

Another challenge is that there is a huge public API surface provided, but it is not clear why all of those bits need to be public and makes it hard to figure out where to even start looking at this. Previous version had somewhat straightforward approach: there was a single public struct + error enum + 2 constants (not sure if both needed to be public though). Struct had just constructor and fetching method, fetching method in turn had a few conditional paths for code to go through that were each mostly self-contained.

Sure, happy to make most of the types and constants private.

I was trying to keep code roughly in the same place to minimise the diff, but it might be more readable if I move some of the lower-level operations to a submodule.

I no no longer see fast paths in the code as clearly as before, also judging by number of constants present and extensive documentation there is a lot of assumptions baked in

There is no slow path any more, because that path downloaded full segments, which this code doesn’t need to do. The assumptions in the old code were undocumented, or implemented as hard-coded calculations inside each function. That made them tricky to understand, review, maintain, and test. (Which is why multiple bugs have slipped through review and testing.)

and not even all cases covered yet.

I’m pretty sure this code covers all the necessary cases. Why are you concerned that some cases aren’t covered?

PR description was not enough for me to get at least an overview of the chosen implementation approach here, there is just a brief mention of the outcome and 1451 changed lines in a single commit that changes everything. I could really benefit from a more extensive explanation of "what" and "why", maybe even a P2P discussion of the approach and the changes. I'll have to fully understand what this is doing, how and why before I can approve it.

Yep, happy to have a chat. Currently it’s not clear to me what the specific gaps are between the PR description and the code comments, so a call is a good way to work that out.

I’m also not sure how an incremental commit series could be used to make these changes, let’s talk about that too.

Non-controversial cleanups better go into a separate PR to merge them quickly and reducing the diff here.

Sure, happy to do that.

I feel like there’s a fairly complicated set of rules about what gets a separate PR, what gets put together in the same PR, what gets separate commits, and what gets put together in the same commit. Can one of the more experienced team members write it down somewhere?

crates/subspace-archiving/src/archiver.rs

shared/subspace-data-retrieval/src/object_fetcher.rs

nazar-pc · 2025-02-02T22:43:43Z

There is no slow path any more, because that path downloaded full segments, which this code doesn’t need to do. The assumptions in the old code were undocumented, or implemented as hard-coded calculations inside each function. That made them tricky to understand, review, maintain, and test. (Which is why multiple bugs have slipped through review and testing.)

Thanks, that is definitely a helpful context. Having a single code path that handles everything does make sense and would probably be better, though I'll have to understand better how this is achieved because I do believe they are somewhat distinct and not that difficult to understand in isolation (though they certainly must be non-buggy to be useful).

I’m pretty sure this code covers all the necessary cases. Why are you concerned that some cases aren’t covered?

I meant TODOs and some limitations (like size of supported objects) mentioned in the code. I don't necessarily have anything more specific than that.

Can one of the more experienced team members write it down somewhere?

I don't think there is or there even need s to be a set of hard rules. Just looking at changes it is not hard to guess that some of them will take more time to understand and merge than others, while others are mostly trivial and will be carried with all of the potential rebases and reviews. So it just logically makes sense to send them separately to be merged and done with.

teor2345 · 2025-02-03T08:52:08Z

I've moved this to draft because I'm focused on Private EVM at the moment. I'll pick it back up once I've updated that PR.

teor2345 added 8 commits January 31, 2025 09:04

Replace &Vec with &[]

5baeeb6

Move ObjectMappingResponse to subspace-rpc-primitives

a8ffad5

Add methods for partially decoding Segments

6d025f3

Rename data_size to data_length for consistency

30f14b0

Reformat error declarations for readability

5217dee

Stop downloading full segments during object retrieval

5fdaaba

Remove unused ErasureCoding from ObjectFetcher

4562e28

Add ObjectFetcher tests

da8c06b

teor2345 added bug Something isn't working improvement it is already working, but can be better labels Jan 31, 2025

teor2345 self-assigned this Jan 31, 2025

teor2345 requested a review from nazar-pc as a code owner January 31, 2025 02:22

nazar-pc requested review from vedhavyas and NingLin-P January 31, 2025 02:39

vedhavyas reviewed Jan 31, 2025

View reviewed changes

nazar-pc reviewed Feb 1, 2025

View reviewed changes

teor2345 commented Feb 2, 2025

View reviewed changes

crates/subspace-archiving/src/archiver.rs Show resolved Hide resolved

shared/subspace-data-retrieval/src/object_fetcher.rs Show resolved Hide resolved

teor2345 marked this pull request as draft February 3, 2025 08:38

teor2345 mentioned this pull request Feb 4, 2025

Fix second segment offset in object retrieval, other minor cleanups #3367

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop downloading entire segments when retrieving objects #3362

Stop downloading entire segments when retrieving objects #3362

teor2345 commented Jan 31, 2025 •

edited

Loading

vedhavyas left a comment

nazar-pc left a comment

nazar-pc Feb 1, 2025

nazar-pc Feb 1, 2025

nazar-pc Feb 1, 2025

nazar-pc Feb 1, 2025

teor2345 left a comment

nazar-pc commented Feb 2, 2025

teor2345 commented Feb 3, 2025

	pub(crate) const fn encoded_size() -> usize {
	RecordWitness::SIZE + RecordCommitment::SIZE + Blake3Hash::SIZE
	}

	#[inline]
	pub fn encoded_size() -> usize {
	let default = SectorMetadataChecksummed::from(SectorMetadata {
	sector_index: 0,
	pieces_in_sector: 0,
	// TODO: Should have been just `::new()`, but https://github.com/rust-lang/rust/issues/53827
	// SAFETY: Data structure filled with zeroes is a valid invariant
	s_bucket_sizes: unsafe { Box::new_zeroed().assume_init() },
	history_size: HistorySize::from(SegmentIndex::ZERO),
	});

	default.encoded_size()
	}

Stop downloading entire segments when retrieving objects #3362

Are you sure you want to change the base?

Stop downloading entire segments when retrieving objects #3362

Conversation

teor2345 commented Jan 31, 2025 • edited Loading

What it does

How it's written

Other Changes

TODO

Code contributor checklist:

vedhavyas left a comment

Choose a reason for hiding this comment

nazar-pc left a comment

Choose a reason for hiding this comment

nazar-pc Feb 1, 2025

Choose a reason for hiding this comment

nazar-pc Feb 1, 2025

Choose a reason for hiding this comment

nazar-pc Feb 1, 2025

Choose a reason for hiding this comment

nazar-pc Feb 1, 2025

Choose a reason for hiding this comment

teor2345 left a comment

Choose a reason for hiding this comment

nazar-pc commented Feb 2, 2025

teor2345 commented Feb 3, 2025

teor2345 commented Jan 31, 2025 •

edited

Loading