Wrong structs dirent, dirent64 #2669

safinaskar · 2022-02-04T19:58:56Z

Your structs dirent, dirent64 are wrong at x86_64-unknown-linux-gnu. Their field d_name contains 256 bytes. But names can be bigger than 256 bytes in NTFS (in Linux). I was able to create NTFS partition in my Linux and then create file with name "ыыыы..." (250 times "ы", this is Cyrillic letter). Such name contains more than 256 bytes (if you have difficulties with reproducing, I can provide more details).

So, fix structs. I am not sure how exactly fix should be implemented. Maybe [c_char; 512], but in this case we need to double check what is true upper limit. Maybe we can simply put [c_char; 1] or [c_char; 0]. Of course, it would be perfect to make dirent unsized, but with machine word sized reference. Unfortunately, as well as I know, such unsized structs are not possible in stable Rust

The text was updated successfully, but these errors were encountered:

devnexen · 2022-02-05T06:30:13Z

I m sorry to say but in fact dirent* struct are in fact correct, I invite you to look at the man pages/headers for confirmation. Note that libc is not a "smart wrapper" neither but reflects what the system libc are.
What seems more appropriate in your case is the getdents api usage.

safinaskar · 2022-02-05T11:08:24Z

@devnexen consider this code:

// This code for x86_64-unknown-linux-gnu
use std::ffi::CStr;
fn main() {
    unsafe {
        let dir = libc::opendir(b"/mnt/3\x00" as *const u8 as *const i8);
        assert_ne!(dir, std::ptr::null_mut());
        loop {
            let ent: *mut libc::dirent = libc::readdir(dir);
            if ent == std::ptr::null_mut() {
                break;
            }
            let name = CStr::from_ptr((*ent).d_name.as_ptr());
            println!("{}", name.to_bytes().len());
            println!("{:?}", name);
        }
    }
}

This code correctly handles file names bigger than 256 bytes. Does it invoke undefined behavior? I. e. is reading past formal struct size undefined behavior?

safinaskar · 2022-02-05T11:11:51Z

getdents

getdents is not wrapped by libc crate, nor by glibc. So I will have to manually wrap it, this is error-prone

devnexen · 2022-02-05T12:14:15Z

you can call it via syscall (the syscalls ids are present in libc) and map the linux_dirent* I think.

safinaskar · 2022-02-05T15:34:06Z

I still think that we should research whether accessing dirent past its formal size is undefined behavior (possibly by reading https://rust-lang.github.io/unsafe-code-guidelines ). If yes, then we should fix or remove readdir function from libc, because it is impossible to use it correctly

devnexen · 2022-02-05T15:42:12Z

or you can copy the result into a sized buffer. I disagree removing readdir honestly.

safinaskar · 2022-02-05T15:47:29Z

or you can copy the result into a sized buffer

You mean read result from dirent to sized buffer? Okey, but is this reading undefined behavior?

devnexen · 2022-02-05T16:37:43Z

Note that unsafe really means what it means, inside this block, rust safety and guarantees do not apply, it is the responsibility of code user to handle carefully possible side effects, use after free, stack overflow ... just like with language like C.

If anyone wants to chime in the discussion feel free :-)

digama0 · 2023-06-04T16:51:25Z

@devnexen consider this code:

// This code for x86_64-unknown-linux-gnu
use std::ffi::CStr;
fn main() {
    unsafe {
        let dir = libc::opendir(b"/mnt/3\x00" as *const u8 as *const i8);
        assert_ne!(dir, std::ptr::null_mut());
        loop {
            let ent: *mut libc::dirent = libc::readdir(dir);
            if ent == std::ptr::null_mut() {
                break;
            }
            let name = CStr::from_ptr((*ent).d_name.as_ptr());
            println!("{}", name.to_bytes().len());
            println!("{:?}", name);
        }
    }
}

This code correctly handles file names bigger than 256 bytes. Does it invoke undefined behavior? I. e. is reading past formal struct size undefined behavior?

I don't have all the context to answer this question definitively, but assuming that readdir actually returned a contiguous allocation with a null-terminated d_name field of longer than 256 bytes, your code as written is suspicious and possibly UB (whether it is actually UB is rust-lang/unsafe-code-guidelines#134) due to creating a reference at (*ent).d_name.as_ptr(), but you can rewrite it to be definitely not UB by using addr_of!((*ent).d_name) instead.

safinaskar · 2023-06-05T11:05:47Z

@digama0 , thanks a lot for answer. This is very big gotcha. This means lib::dirent should be fixed or at very least properly documented

safinaskar · 2023-06-05T13:21:50Z

I just found that actual std implementation of ReadDir is even more careful. The authors reject addr_of!((*ent).d_name) as wrong and instead perform offset computations manually. See code starting from this line: https://github.com/rust-lang/rust/blob/42f28f9eb41adb7a197697e5e2d6535d00fd0f4a/library/std/src/sys/unix/fs.rs#L676

digama0 · 2023-06-05T14:40:19Z

@RalfJung Could you comment on the correctness of the comment there? My take is that the reason addr_of!((*ptr).d_name) is said to be invalid is because the allocation may not actually be 256 bytes (plus the rest of the struct), and (at least for now) addr_of!((*ptr).d_name) asserts that the entire d_name: [c_char; 256] field is within an allocation. So an alternative way to fix the code would be to have a version of dirent with a d_name: [c_char; 0] VLA and using that to do the offset computation.

RalfJung · 2023-06-05T15:06:40Z

Yeah that comment sounds accurate for the current very conservative semantics we have for *ptr (where the pointer has to be fully aligned and dereferenceable). That comment is also a good argument for why we'd probably want a more relaxed semantics...

tgross35 · 2024-08-29T05:33:23Z

It sounds like we can improve things with a breaking change. If anyone has any proposals, feel free to put up a PR for 1.0.

safinaskar · 2024-08-29T07:17:25Z

@tgross35 , obviously, perfect solution would be waiting for "thin cstr" and making thin cstr (not reference to thin cstr, but unsized thin cstr itself) last member of dirent

tgross35 · 2024-08-29T07:26:44Z

Yeah, I do wish we had that available. Since we don't though, changing to [c_char; 0] is probably our best bet.

nwalfield · 2024-09-11T12:03:59Z

This issue is about the name being longer than 256 bytes. I'd like to add that the name could even be shorter than 256 bytes! I have observed this in practice and this is corroborated by this comment in the std library:

                // The dirent64 struct is a weird imaginary thing that isn't ever supposed
                // to be worked with by value. Its trailing d_name field is declared
                // variously as [c_char; 256] or [c_char; 1] on different systems but
                // either way that size is meaningless; only the offset of d_name is
                // meaningful. The dirent64 pointers that libc returns from readdir64 are
                // allowed to point to allocations smaller _or_ LARGER than implied by the
                // definition of the struct.

I noticed this, because the following code was crashing:

unsafe { *self.entry }.d_type

where self.entry is libc::dirent64 or libc::dirent. At least when compiling in debugging mode, the unsafe block copies the whole structure, and then accesses d_type. The memcpy can result in a seg fault if the direct data structure is smaller, at the end of a page, and the following page is not mapped:

Thread 7 "store::certd::t" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 10589]
memcpy () at src/string/x86_64/memcpy.s:18
warning: 18     src/string/x86_64/memcpy.s: No such file or directory
(gdb) up
#1  0x000055f054f18d34 in openpgp_cert_d::unixdir::DirEntry::file_type (self=0x7fa822fdb780)
    at src/unixdir.rs:144
144             FileType(unsafe { *self.entry }.d_type)

Making d_name a [c_char; 0] (or even a [c_char; 1] since the name is NUL terminated) would prevent this issue, I think.

RalfJung · 2024-09-11T12:09:54Z

Yeah by wrapping *self.entry in curly braces you are forcing that value to be loaded and moved. I would recommend you use unsafe { *self.entry.d_type } to avoid such an unnecessary load + move.

nwalfield · 2024-09-11T12:30:09Z

Yeah by wrapping *self.entry in curly braces you are forcing that value to be loaded and moved. I would recommend you use unsafe { *self.entry.d_type } to avoid such an unnecessary load + move.

Thanks a lot for the suggestion!

nwalfield · 2024-09-13T10:16:02Z

For the record, unsafe { *self.entry.d_type } doesn't work, but this does:

        unsafe { (&*self.entry).d_type }

(Here's my complete change.)

RalfJung · 2024-09-13T10:23:14Z

unsafe { (*self.entry).d_type } should also work.

nwalfield · 2024-09-13T11:20:31Z

Indeed it does. Thanks!

safinaskar added the C-bug Category: bug label Feb 4, 2022

safinaskar mentioned this issue Apr 5, 2023

[strict provenance] Fix System APIs That Are Liars rust-lang/rust#95496

Open

safinaskar mentioned this issue Jun 4, 2023

Stacked Borrows: raw pointer usable only for T too strict? rust-lang/unsafe-code-guidelines#134

Open

safinaskar mentioned this issue Apr 4, 2024

[libc 1.0] Put a big warning to Linux's dirent docs #3650

Closed

tgross35 added this to the 1.0 milestone Aug 29, 2024

tgross35 added the E-medium E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. label Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong structs dirent, dirent64 #2669

Wrong structs dirent, dirent64 #2669

safinaskar commented Feb 4, 2022

devnexen commented Feb 5, 2022 •

edited

Loading

safinaskar commented Feb 5, 2022 •

edited

Loading

safinaskar commented Feb 5, 2022

devnexen commented Feb 5, 2022

safinaskar commented Feb 5, 2022

devnexen commented Feb 5, 2022 •

edited

Loading

safinaskar commented Feb 5, 2022

devnexen commented Feb 5, 2022 •

edited

Loading

digama0 commented Jun 4, 2023 •

edited

Loading

safinaskar commented Jun 5, 2023

safinaskar commented Jun 5, 2023

digama0 commented Jun 5, 2023

RalfJung commented Jun 5, 2023

tgross35 commented Aug 29, 2024

safinaskar commented Aug 29, 2024

tgross35 commented Aug 29, 2024 •

edited

Loading

nwalfield commented Sep 11, 2024

RalfJung commented Sep 11, 2024

nwalfield commented Sep 11, 2024

nwalfield commented Sep 13, 2024

RalfJung commented Sep 13, 2024

nwalfield commented Sep 13, 2024

Wrong structs dirent, dirent64 #2669

Wrong structs dirent, dirent64 #2669

Comments

safinaskar commented Feb 4, 2022

devnexen commented Feb 5, 2022 • edited Loading

safinaskar commented Feb 5, 2022 • edited Loading

safinaskar commented Feb 5, 2022

devnexen commented Feb 5, 2022

safinaskar commented Feb 5, 2022

devnexen commented Feb 5, 2022 • edited Loading

safinaskar commented Feb 5, 2022

devnexen commented Feb 5, 2022 • edited Loading

digama0 commented Jun 4, 2023 • edited Loading

safinaskar commented Jun 5, 2023

safinaskar commented Jun 5, 2023

digama0 commented Jun 5, 2023

RalfJung commented Jun 5, 2023

tgross35 commented Aug 29, 2024

safinaskar commented Aug 29, 2024

tgross35 commented Aug 29, 2024 • edited Loading

nwalfield commented Sep 11, 2024

RalfJung commented Sep 11, 2024

nwalfield commented Sep 11, 2024

nwalfield commented Sep 13, 2024

RalfJung commented Sep 13, 2024

nwalfield commented Sep 13, 2024

devnexen commented Feb 5, 2022 •

edited

Loading

safinaskar commented Feb 5, 2022 •

edited

Loading

devnexen commented Feb 5, 2022 •

edited

Loading

devnexen commented Feb 5, 2022 •

edited

Loading

digama0 commented Jun 4, 2023 •

edited

Loading

tgross35 commented Aug 29, 2024 •

edited

Loading