Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patches series for dfuse #15645

Open
wants to merge 18 commits into
base: google/2.6
Choose a base branch
from
Open

patches series for dfuse #15645

wants to merge 18 commits into from

Conversation

wangdi1
Copy link
Contributor

@wangdi1 wangdi1 commented Dec 18, 2024

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

ashleypittman and others added 13 commits December 18, 2024 19:26
When dfuse sees I/O as well-aligned 128k reads then read MB at
a time and cache the result allowing for faster read bandwidth
for well behaved applicaions and large files.

Create a new in-memory descriptor for file contents, pull in a
whole descriptor on first read and perform all other reads from
the same result.

This should give much higher bandwidth for well behaved applications
and should be easy to extend to proper readahead in future.

Signed-off-by: Ashley Pittman <[email protected]>
This only serves to add confusion at this point.

Signed-off-by: Ashley Pittman <[email protected]>
Create a active_inode struct and allocate it for all inodes which have more than
one open handle. This allows us to share state/caching data across open handles
easier and to better support concurrent readers. Future work here will improve
performance for concurrent readers when caching is used, and allow us to make
the in-memory inode struct smaller which will save memory.

Signed-off-by: Ashley Pittman [email protected]
Attach pre-read buffers to the inode rather than open file handle. This code
was written with the idea that a client would open and read the file and that
would populate the kernel cache however in practice there are often concurrent
readers for the same file and what was happening was that the first-to-open
was doing the pre-read but this was often not the first process to perform a read
and furthermore often there would be multiple reads for the same regions, all
of which would hit the network.

Use the "active" entry on the inode to launch a pre-read on first open and have
any request on any file handle use the pre-read buffer for replies if possible.

In addition, when a read is to be serviced from the pre-read buffer but the
data is not yet in memory rather than spinning on a lock consuming a fuse
thread add a descriptor to a callback list so that the thread is released and
the reply is made sooner when the data is available.

This greatly reduces the number of duplicate network round-trip reads for
workloads where multiple clients are trying to fetch the same data, something
that we see a lot in some applications.

Signed-off-by: Ashley Pittman [email protected]
From #15298

Handle concurrent read in the chunk_read code. Rather than assuming
each slot only gets requested once save the slot number as part of the
request and handle multiple requests.

This corrects the behaviour and avoids a crash when multiple readers read
the same file concurrently and improves the performance in this case.

Required-githooks: true

Signed-off-by: Ashley Pittman <[email protected]>
From #15528

If a read matches a current outstanding read then simply
connect the two and when there's a reply from the network
then respond to both requests.

Ashley Pittman <[email protected]>
Required-githooks: true
Fix a bug where linear read was not correctly saved to the directory.
Improve the NLT testing of pre_read to not just invoke it but to use
the statistics to verify correct operation.

Required-githooks: true

Signed-off-by: Ashley Pittman <[email protected]>
Add readahead RPC in the open read list earlier to make sure
the following read will not send duplicate RPC.

Required-githooks: true

Signed-off-by: Di Wang <[email protected]>
Set dcache update timer for initialize timer, otherwise
keep_cache flag will not be set for opendir, then concurrent
opendir might truncate the directory page cache unnecessary.

Set valid timer for inode entry created by readdirplus,
otherwise the inode might needs to be lookup again.

Required-githooks: true

Signed-off-by: Di Wang <[email protected]>
Do not need object open in dfuse_cb_open, since
if the following fetch only needs to read from
the kernel page cache, then the object layout
is not needed at all.

Required-githooks: true

Signed-off-by: Di Wang <[email protected]>
Use chan patch to avoid some contention from fuse kernel.

Required-githooks: true

Signed-off-by: Ashley Pittman <[email protected]>
Temporarily force readdir plus for all cases now, which can
help to save some lookup RPC for some cases. Though this can be
removed once we use object enumeration to replace the normal key
enumeration for readdir.

Required-githooks: true

Signed-off-by: Di Wang <[email protected]>
Read from readahead cache aggressively, and it may need improved
later.

Required-githooks: true
Allow-unstable-test: true
Signed-off-by: Di Wang <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15645/1/execution/node/357/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15645/1/execution/node/314/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15645/1/execution/node/273/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15645/1/execution/node/317/log

Copy link

Errors are component not formatted correctly,Ticket number prefix incorrect,PR title is malformatted. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data
https://daosio.atlassian.net/browse/patches

@daosbuild1
Copy link
Collaborator

fix style

Allow-unstable-test: true
Required-githooks: true

Signed-off-by: Di Wang <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15645/2/execution/node/362/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15645/2/execution/node/370/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15645/2/execution/node/356/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15645/2/execution/node/348/log

@daosbuild1
Copy link
Collaborator

Test stage Unit Test on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15645/2/display/redirect

@daosbuild1
Copy link
Collaborator

fix style

Run-GHA: true
Allow-unstable-test: true
Required-githooks: true

Signed-off-by: Di Wang <[email protected]>
…use' into wangdi/google_26_dfuse

Run-GHA: true
Allow-unstable-test: true
Required-githooks: true
@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15645/3/execution/node/343/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15645/3/execution/node/342/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15645/3/execution/node/349/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15645/3/execution/node/348/log

Revert chan patch, since building RPM will fail with this patch.

Run-GHA: true
Allow-unstable-test: true
Required-githooks: true
Signed-off-by: Di Wang <[email protected]>
@daosbuild1
Copy link
Collaborator

Skip-func-hw-test: true
Required-githooks: true
Merge branch 'wangdi/google_26' into wangdi/google_26_dfuse
@daosbuild1
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants