Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open(): add support for O_DIRECT flag #2046

Merged
merged 4 commits into from
Aug 20, 2024
Merged

Conversation

francescolavra
Copy link
Member

When this flag is set, file I/O is performed directly on the storage device, bypassing the page cache. Direct file I/O requires the address and length of user buffers, as well as the file offset, to be aligned to the filesystem block size (which is 512 bytes for TFS); if this requirement is not met, I/O syscalls return -EINVAL.
As part of these changes, the filesystem code has been decoupled from the page cache code so that it is possible to use a filesystem without the page cache. All bootloaders (legacy PC and UEFI) and tools (mkfs, dump, tfs-fuse) are now built without the page cache code; this allows avoiding unnecessary memory copying when reading from or writing to the filesystem, and decreases binary file sizes, thereby speeding up the boot process.
The second commit fixes a segmentation fault that was occurring in the TFS FUSE driver when closing a directory file descriptor.

Calls to DMA functions are being moved from the pagecache code to
the generic filesystem code, in preparation for a future commit
which will allow using filesystems without a page cache.
In order to decrease the number of closure allocations, 2 new
callback functions (file_read and file_write) have been added to
struct filesystem: the implementation of these functions expects to
be passed DMA buffers, while the read and write closures stored in
struct fsfile operate on normal memory buffers.
The `fsf` field of the file struct associated to a non-regular file
must be NULL.
It is now possible to use a filesystem without the page cache.
All bootloaders (legacy PC and UEFI) and tools (mkfs, dump,
tfs-fuse) ae now built without the pagecache code; this allows
avoiding unnecessary memory copying when reading from or writing to
the filesystem, and decreases binary file sizes, thereby speeding
up the boot process.
`#ifdef KERNEL` pre-processor directives have been moved from the
pagecache code (which is now exclusively for kernel use) to the
filesystem code (which can now by used by code that does not use a
page cache).
When reading from a filesystem, populating the scatter-gather list
is now done by the calling code (which does not do any copying of
memory buffers), and the SG list is consumed inside the read
implementation (i.e., either the pagecache code, or the driver of
the underlying storage device).
In order to support the sendfile() syscall (which does not supply
any memory buffers when reading from the input file), the
pagecache_node_fetch_pages() function has been amended to take
optional SG list and completion arguments, so that the calling code
can access a populated SG list when the completion is invoked.
filesystem_read_linear() and filesystem_write_linear() have been
enhanced to convert unaligned buffers to aligned buffers, which is
required in order to interface directly with filesystem code
without going through the page cache.
sg_file_io closures in the fdesc structure have been replaced by
file_iov closures, which take a struct iovec array instead of an
sg_list; this allows eliminating conversions between struct iovec
arrays and SG lists when SG lists are not needed (as in the network
socket and Unix domain socket implementations), and will allow
creating file-descriptor-specific SG lists (which may not
necessarily be a 1-to-1 mapping between struct iovec entries and
SG list buffers) when SG lists are needed (as in the regular file
I/O implementation).
When this flag is set, file I/O is performed directly on the
storage device, bypassing the page cache. Direct file I/O requires
the address and length of user buffers, as well as the file offset,
to be aligned to the filesystem block size (which is 512 bytes for
TFS); if this requirement is not met, I/O syscalls return -EINVAL.
@francescolavra francescolavra merged commit cdb01c7 into master Aug 20, 2024
5 checks passed
@francescolavra francescolavra deleted the feature/o_direct branch August 20, 2024 16:01
francescolavra added a commit to nanovms/gpu-nvidia that referenced this pull request Sep 11, 2024
Since nanovms/nanos#2046, the
filesystem_read_entire() kernel function reads directly from the
storage device instead of from the page cache, thus it cannot be
called by nv_get_firmware() because the filesystem lock is already
acquired.
This change amends nv_get_firmware() so that firmware file contents
are retrieved from the page cache.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant