Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuse: Ensure fd has same flags as read/write #154

Merged

Conversation

Champ-Goblem
Copy link
Contributor

We found when running minio with recent versions of nydus that it was no longer able to boot properly due to Invalid Argument when writing to a temporary file.
Tracing the high-level flow showed that the file was originally opened with O_DIRECT, then later on if the buffer size was not correctly aligned, fcntl with F_SETFL would be used to remove the O_DIRECT flag.
In virtiofs, there is no independent call for fcntl, so the underlying file descriptor opened by nydus still had the O_DIRECT flag set. Meaning that during a write to this file, it would error with -EINVAL as the buffer size was not aligned.

This patch aims to fix this by first recording the open flags for a file in the HandleData struct, then on read/write checking these flags match the flags provided by virtiofs in the read/write message. If the flags do not match, the current flags of the file are checked with F_GETFL and updated with F_SETFL if these also do not match.

) -> io::Result<usize> {
let data = self.get_data(handle, inode, libc::O_RDONLY)?;

// Manually implement File::try_clone() by borrowing fd of data.file instead of dup().
// It's safe because the `data` variable's lifetime spans the whole function,
// so data.file won't be closed.
let f = unsafe { File::from_raw_fd(data.get_handle_raw_fd()) };

self.check_fd_flags(data, f.as_raw_fd(), flags)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're doing this check on each read/write request, I'm a bit worried on the performance hit. Do you happen to have any performance numbers on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only do fcntl calls if the flags don't match, that seems have little performance impact in most cases. But still, some actual performance numbers would be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see what I can do, any preference on what kind of performance tests are run?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typical fio tests would be fine, e.g. rand-read rand-write to see iops, seq-read & seq-write to see bandwidth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the change to support fcntl:

=== maximum write throughput ===
test: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64
...
fio-3.28
Starting 16 processes
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
Jobs: 12 (f=12): [_(2),W(3),_(1),W(9),_(1)][94.6%][w=302MiB/s][w=302 IOPS][eta 00m:04s]
test: (groupid=0, jobs=16): err= 0: pid=681: Thu Sep 21 11:21:11 2023
  write: IOPS=240, BW=250MiB/s (262MB/s)(15.7GiB/64245msec); 0 zone resets
    slat (msec): min=3, max=230, avg=64.83, stdev= 8.12
    clat (msec): min=12, max=4369, avg=3940.07, stdev=714.01
     lat (msec): min=29, max=4436, avg=4004.24, stdev=718.88
    clat percentiles (msec):
     |  1.00th=[  489],  5.00th=[ 2366], 10.00th=[ 3406], 20.00th=[ 3943],
     | 30.00th=[ 4010], 40.00th=[ 4212], 50.00th=[ 4212], 60.00th=[ 4212],
     | 70.00th=[ 4212], 80.00th=[ 4212], 90.00th=[ 4329], 95.00th=[ 4396],
     | 99.00th=[ 4396], 99.50th=[ 4396], 99.90th=[ 4396], 99.95th=[ 4396],
     | 99.99th=[ 4396]
   bw (  KiB/s): min=110604, max=577714, per=99.35%, avg=254098.10, stdev=4581.05, samples=1945
   iops        : min=  108, max=  564, avg=246.67, stdev= 4.48, samples=1945
  lat (msec)   : 20=0.03%, 50=0.08%, 100=0.13%, 250=0.33%, 500=0.49%
  lat (msec)   : 750=0.54%, 1000=0.65%, 2000=2.14%, >=2000=99.59%
  cpu          : usr=0.12%, sys=0.05%, ctx=15489, majf=0, minf=932
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.1%, 32=2.5%, >=64=97.4%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,15430,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=250MiB/s (262MB/s), 250MiB/s-250MiB/s (262MB/s-262MB/s), io=15.7GiB (16.8GB), run=64245-64245msec

=== maximum write iops ===
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256
...
fio-3.28
Starting 16 processes
Jobs: 16 (f=16): [w(12),f(1),w(3)][19.9%][w=12.5MiB/s][w=3191 IOPS][eta 20m:18s]
test: (groupid=0, jobs=16): err= 0: pid=702: Thu Sep 21 11:26:13 2023
  write: IOPS=2768, BW=10.9MiB/s (11.4MB/s)(3261MiB/300010msec); 0 zone resets
    slat (usec): min=1040, max=105549, avg=5772.53, stdev=698.06
    clat (msec): min=4, max=1792, avg=1469.68, stdev=115.14
     lat (msec): min=9, max=1798, avg=1475.45, stdev=115.46
    clat percentiles (msec):
     |  1.00th=[ 1267],  5.00th=[ 1318], 10.00th=[ 1351], 20.00th=[ 1385],
     | 30.00th=[ 1418], 40.00th=[ 1435], 50.00th=[ 1469], 60.00th=[ 1502],
     | 70.00th=[ 1536], 80.00th=[ 1569], 90.00th=[ 1586], 95.00th=[ 1636],
     | 99.00th=[ 1720], 99.50th=[ 1737], 99.90th=[ 1787], 99.95th=[ 1787],
     | 99.99th=[ 1787]
   bw (  KiB/s): min= 8948, max=13484, per=99.60%, avg=11085.15, stdev=48.02, samples=9587
   iops        : min= 2236, max= 3370, avg=2770.59, stdev=12.01, samples=9587
  lat (msec)   : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.02%, 250=0.06%
  lat (msec)   : 500=0.09%, 750=0.08%, 1000=0.09%, 2000=100.14%
  cpu          : usr=0.23%, sys=0.27%, ctx=831265, majf=0, minf=932
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,830645,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
  WRITE: bw=10.9MiB/s (11.4MB/s), 10.9MiB/s-10.9MiB/s (11.4MB/s-11.4MB/s), io=3261MiB (3419MB), run=300010-300010msec

=== maximum read throughput ===
test: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64
...
fio-3.28
Starting 16 processes
Jobs: 2 (f=2): [_(3),R(1),_(4),R(1),_(7)][88.5%][r=271MiB/s][r=271 IOPS][eta 00m:09s]                         
test: (groupid=0, jobs=16): err= 0: pid=723: Thu Sep 21 11:27:22 2023
  read: IOPS=235, BW=247MiB/s (259MB/s)(15.9GiB/66182msec)
    slat (usec): min=11, max=4258.1k, avg=33716.07, stdev=245711.26
    clat (msec): min=4, max=42387, avg=3411.42, stdev=5590.17
     lat (msec): min=8, max=42387, avg=3445.46, stdev=5628.07
    clat percentiles (msec):
     |  1.00th=[   92],  5.00th=[  275], 10.00th=[  292], 20.00th=[  550],
     | 30.00th=[  860], 40.00th=[ 1368], 50.00th=[ 1905], 60.00th=[ 2165],
     | 70.00th=[ 2433], 80.00th=[ 4329], 90.00th=[ 7013], 95.00th=[13355],
     | 99.00th=[17113], 99.50th=[17113], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=32146, max=3269067, per=100.00%, avg=711233.57, stdev=57849.82, samples=759
   iops        : min=   29, max= 3192, avg=693.92, stdev=56.47, samples=759
  lat (msec)   : 10=0.03%, 20=0.07%, 50=0.34%, 100=0.72%, 250=2.10%
  lat (msec)   : 500=13.98%, 750=11.24%, 1000=5.09%, 2000=21.89%, >=2000=49.13%
  cpu          : usr=0.01%, sys=0.29%, ctx=12549, majf=0, minf=79272
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=1.9%, >=64=98.1%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=15603,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=247MiB/s (259MB/s), 247MiB/s-247MiB/s (259MB/s-259MB/s), io=15.9GiB (17.1GB), run=66182-66182msec

=== maximum read iops ===
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256
...
fio-3.28
Starting 16 processes
Jobs: 2 (f=2): [_(4),r(1),_(10),r(1)][37.0%][r=10.5MiB/s][r=2678 IOPS][eta 08m:41s] 
test: (groupid=0, jobs=16): err= 0: pid=744: Thu Sep 21 11:32:28 2023
  read: IOPS=2894, BW=11.3MiB/s (11.9MB/s)(3411MiB/300590msec)
    slat (nsec): min=962, max=5136.6M, avg=4593969.96, stdev=34594060.27
    clat (msec): min=4, max=61194, avg=1409.71, stdev=5726.73
     lat (msec): min=6, max=61194, avg=1414.29, stdev=5745.09
    clat percentiles (msec):
     |  1.00th=[   84],  5.00th=[   97], 10.00th=[  104], 20.00th=[  165],
     | 30.00th=[  186], 40.00th=[  199], 50.00th=[  218], 60.00th=[  253],
     | 70.00th=[  296], 80.00th=[  326], 90.00th=[  393], 95.00th=[ 5805],
     | 99.00th=[17113], 99.50th=[17113], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=  125, max=152461, per=100.00%, avg=11731.50, stdev=1710.19, samples=9473
   iops        : min=   29, max=38113, avg=2932.63, stdev=427.52, samples=9473
  lat (msec)   : 10=0.03%, 20=0.06%, 50=0.18%, 100=7.19%, 250=51.96%
  lat (msec)   : 500=33.16%, 750=0.68%, 1000=0.18%, 2000=0.61%, >=2000=6.31%
  cpu          : usr=0.13%, sys=0.16%, ctx=831102, majf=0, minf=1952
  IO depths    : 1=0.0%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=870190,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=11.3MiB/s (11.9MB/s), 11.3MiB/s-11.3MiB/s (11.9MB/s-11.9MB/s), io=3411MiB (3577MB), run=300590-300590msec

With same version of nydus without the above change:

=== maximum write throughput ===
test: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64
...
fio-3.28
Starting 16 processes
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
test: Laying out IO file (1 file / 1024MiB)
Jobs: 11 (f=11): [W(1),_(1),W(1),f(1),W(1),_(1),W(2),_(1),W(2),_(1),W(2),f(1),_(1)][94.6%][w=488MiB/s][w=488 IOPS][eta 00m:04s]
test: (groupid=0, jobs=16): err= 0: pid=681: Thu Sep 21 14:38:43 2023
  write: IOPS=240, BW=251MiB/s (263MB/s)(15.8GiB/64441msec); 0 zone resets
    slat (msec): min=20, max=119, avg=65.10, stdev= 4.15
    clat (msec): min=20, max=4233, avg=3959.85, stdev=687.79
     lat (msec): min=41, max=4300, avg=4024.31, stdev=692.01
    clat percentiles (msec):
     |  1.00th=[  460],  5.00th=[ 2467], 10.00th=[ 3507], 20.00th=[ 3943],
     | 30.00th=[ 4212], 40.00th=[ 4212], 50.00th=[ 4212], 60.00th=[ 4212],
     | 70.00th=[ 4212], 80.00th=[ 4212], 90.00th=[ 4212], 95.00th=[ 4212],
     | 99.00th=[ 4212], 99.50th=[ 4212], 99.90th=[ 4212], 99.95th=[ 4212],
     | 99.99th=[ 4245]
   bw (  KiB/s): min=147472, max=698499, per=98.93%, avg=253803.02, stdev=5134.91, samples=1967
   iops        : min=  144, max=  681, avg=247.75, stdev= 5.00, samples=1967
  lat (msec)   : 50=0.10%, 100=0.13%, 250=0.36%, 500=0.53%, 750=0.48%
  lat (msec)   : 1000=0.44%, 2000=2.11%, >=2000=99.79%
  cpu          : usr=0.09%, sys=0.07%, ctx=15564, majf=0, minf=932
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.1%, 32=2.5%, >=64=97.5%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,15529,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=251MiB/s (263MB/s), 251MiB/s-251MiB/s (263MB/s-263MB/s), io=15.8GiB (16.9GB), run=64441-64441msec

=== maximum write iops ===
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256
...
fio-3.28
Starting 16 processes
Jobs: 16 (f=16): [w(16)][21.3%][w=10.8MiB/s][w=2773 IOPS][eta 18m:37s]
test: (groupid=0, jobs=16): err= 0: pid=702: Thu Sep 21 14:43:45 2023
  write: IOPS=2965, BW=11.6MiB/s (12.2MB/s)(3492MiB/300010msec); 0 zone resets
    slat (usec): min=1512, max=18408, avg=5389.43, stdev=498.81
    clat (msec): min=2, max=1759, avg=1372.24, stdev=102.40
     lat (msec): min=8, max=1765, avg=1377.63, stdev=102.67
    clat percentiles (msec):
     |  1.00th=[ 1250],  5.00th=[ 1267], 10.00th=[ 1284], 20.00th=[ 1301],
     | 30.00th=[ 1318], 40.00th=[ 1334], 50.00th=[ 1351], 60.00th=[ 1385],
     | 70.00th=[ 1418], 80.00th=[ 1435], 90.00th=[ 1485], 95.00th=[ 1552],
     | 99.00th=[ 1687], 99.50th=[ 1737], 99.90th=[ 1737], 99.95th=[ 1754],
     | 99.99th=[ 1754]
   bw (  KiB/s): min= 9168, max=13939, per=99.64%, avg=11876.63, stdev=48.36, samples=9587
   iops        : min= 2292, max= 3484, avg=2968.63, stdev=12.09, samples=9587
  lat (msec)   : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.02%
  lat (msec)   : 250=0.05%, 500=0.08%, 750=0.08%, 1000=0.08%, 2000=100.15%
  cpu          : usr=0.20%, sys=0.25%, ctx=890412, majf=0, minf=932
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,889811,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
  WRITE: bw=11.6MiB/s (12.2MB/s), 11.6MiB/s-11.6MiB/s (12.2MB/s-12.2MB/s), io=3492MiB (3661MB), run=300010-300010msec

=== maximum read throughput ===
test: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64
...
fio-3.28
Starting 16 processes
Jobs: 4 (f=4): [_(2),R(1),_(1),f(2),_(8),R(1),_(1)][92.0%][r=271MiB/s][r=271 IOPS][eta 00m:06s]                             
test: (groupid=0, jobs=16): err= 0: pid=723: Thu Sep 21 14:44:54 2023
  read: IOPS=233, BW=245MiB/s (256MB/s)(15.8GiB/66208msec)
    slat (usec): min=11, max=4258.6k, avg=31207.01, stdev=242980.70
    clat (msec): min=9, max=53644, avg=3284.90, stdev=5541.66
     lat (msec): min=41, max=53645, avg=3316.33, stdev=5577.67
    clat percentiles (msec):
     |  1.00th=[  176],  5.00th=[  439], 10.00th=[  542], 20.00th=[  567],
     | 30.00th=[  802], 40.00th=[ 1334], 50.00th=[ 1770], 60.00th=[ 2165],
     | 70.00th=[ 2433], 80.00th=[ 3842], 90.00th=[ 6275], 95.00th=[13489],
     | 99.00th=[17113], 99.50th=[17113], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=31738, max=2645840, per=100.00%, avg=742078.10, stdev=52503.95, samples=726
   iops        : min=   18, max= 2583, avg=721.61, stdev=51.29, samples=726
  lat (msec)   : 10=0.01%, 50=0.09%, 100=0.28%, 250=1.27%, 500=4.65%
  lat (msec)   : 750=20.20%, 1000=11.15%, 2000=21.06%, >=2000=46.16%
  cpu          : usr=0.01%, sys=0.28%, ctx=12630, majf=0, minf=70309
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=1.7%, >=64=98.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=15440,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=245MiB/s (256MB/s), 245MiB/s-245MiB/s (256MB/s-256MB/s), io=15.8GiB (17.0GB), run=66208-66208msec

=== maximum read iops ===
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256
...
fio-3.28
Starting 16 processes
Jobs: 3 (f=3): [_(5),r(1),_(5),r(1),_(3),r(1)][13.2%][r=8280KiB/s][r=2070 IOPS][eta 33m:31s]          
test: (groupid=0, jobs=16): err= 0: pid=744: Thu Sep 21 14:49:59 2023
  read: IOPS=2203, BW=8867KiB/s (9080kB/s)(2604MiB/300684msec)
    slat (nsec): min=953, max=7803.9M, avg=5105884.01, stdev=72895125.93
    clat (usec): min=1430, max=94736k, avg=1854257.54, stdev=4917381.31
     lat (usec): min=1743, max=94736k, avg=1859333.26, stdev=4926565.03
    clat percentiles (msec):
     |  1.00th=[   90],  5.00th=[  106], 10.00th=[  182], 20.00th=[  236],
     | 30.00th=[  326], 40.00th=[  368], 50.00th=[  388], 60.00th=[  418],
     | 70.00th=[  443], 80.00th=[  535], 90.00th=[ 4463], 95.00th=[11745],
     | 99.00th=[17113], 99.50th=[17113], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=  112, max=128891, per=100.00%, avg=10813.70, stdev=1294.36, samples=7830
   iops        : min=   16, max=32220, avg=2702.49, stdev=323.56, samples=7830
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.04%, 50=0.11%
  lat (msec)   : 100=2.00%, 250=18.78%, 500=58.08%, 750=2.55%, 1000=0.72%
  lat (msec)   : 2000=2.55%, >=2000=15.78%
  cpu          : usr=0.16%, sys=0.32%, ctx=586228, majf=0, minf=932
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=662447,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=8867KiB/s (9080kB/s), 8867KiB/s-8867KiB/s (9080kB/s-9080kB/s), io=2604MiB (2730MB), run=300684-300684msec

@eryugey
Copy link
Contributor

eryugey commented Sep 20, 2023

And would you please rebase to latest master branch? The Macos-CI got fixed there.

@Champ-Goblem Champ-Goblem force-pushed the setfl-when-rw-flags-dont-match branch 2 times, most recently from c8e705f to c40c7cc Compare September 25, 2023 09:51
@@ -333,14 +333,16 @@ struct HandleData {
inode: Inode,
file: File,
lock: Mutex<()>,
open_flags: u32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using AtomicU32 here so we can avoid RwLock in Arc<RwLock<HandleData>>>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, I have pushed this suggestion

We found when running minio with recent versions of nydus that it
was no longer able to boot properly due to Invalid Argument when
writing to a temporary file.
Tracing the high-level flow showed that the file was originally
opened with O_DIRECT, then later on if the buffer size was not
correctly aligned, fcntl with F_SETFL would be used to remove
the O_DIRECT flag.
In virtiofs there is no independant call for fcntl, so the
underlying file descriptor opened by nydus still had the
O_DIRECT flag set. Meaning that during a write to this file
it would error with -EINVAL as the buffer size was not aligned.

This patch aims to fix this by first recording the open flags for
a file in the `HandleData` struct, then on read/write checking
these flags match the flags provided by virtiofs in the read/write
message. If the flags do not match, the current flags of the file
descriptor are set with F_SETFL and updated on the HandleData entry.

Signed-off-by: Champ-Goblem <[email protected]>
Copy link
Contributor

@bergwolf bergwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@bergwolf bergwolf merged commit e42310a into cloud-hypervisor:master Oct 25, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants