You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to re-check whether python itself is becoming a bottleneck for smallfile, and specifically I wanted to see if pypy3 was better than python3 and could keep up with the "find" utility for reading directory trees. Initially pypy3 would not run because of a new dependency on PyYAML, but when I commented out the YAML parsing option in smallfile (just 2 lines), pypy3 worked fine.
So I created a tree containing 1 million 0-byte (empty) files using a single smallfile thread, and then tried doing the smallfile readdir operation both with pypy and with python3, and then compared it to the same result with the "find" utility. No cache-dropping was done, so that all the metadata could be memory-resident. While this may seem an unfair comparison, NVDIMM-N memories can provide low response times similar to this, and in addition cached-storage performance is something we need to measure. So the 3 commands were:
test thousands of files/sec
---- ---------------------------
python3 160
pypy3 352
find 1000
Are all 3 benchmarks doing the same system calls? When I used strace to compare, smallfile was originally at a disadvantage because it was doing system calls to see if other threads had finished. Specifically it was looking for stonewall.tmp in the shared network directory every 100 files. This is not a big deal when doing actual file reads/writes, but for readdir this is a significant increase in the number of system calls. Here's what smallfile was doing per directory:
5693 openat(AT_FDCWD, "/var/tmp/smf/file_srcdir/bene-laptop/thrd_00/d_001/d_002/d_008", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 8
5693 fstat(8, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
5693 getdents(8, /* 112 entries */, 32768) = 5168
5693 getdents(8, /* 0 entries */, 32768) = 0
5693 close(8) = 0
5693 stat("/var/tmp/smf/network_shared/stonewall.tmp", 0x7ffc19b899e0) = -1 ENOENT (No such file or directory)
5693 stat("/var/tmp/smf/network_shared/stonewall.tmp", 0x7ffc19b899e0) = -1 ENOENT (No such file or directory)
5693 stat("/var/tmp/smf/network_shared/stonewall.tmp", 0x7ffc19b899e0) = -1 ENOENT (No such file or directory)
5693 stat("/var/tmp/smf/network_shared/stonewall.tmp", 0x7ffc19b899e0) = -1 ENOENT (No such file or directory)
5693 stat("/var/tmp/smf/network_shared/stonewall.tmp", 0x7ffc19b899e0) = -1 ENOENT (No such file or directory)
By adding the parameter "--stonewall Y" we get rid of the excess stat() system calls and the sequence per directory then becomes optimal. Rerunning the tests with this parameter, we get:
test thousands of files/sec
---- ---------------------------
python3 172 (was 160)
pypy3 380 (was 352
find 1000
Here's the system call pattern for the "find" command:
So the find utility is actually making more system calls.
pypy3 and python3 are both using 100% CPU according to "top", which means they are using up a whole core and can't go any faster. Most of their time is spent in user space not system space. Yet even pypy3 is 1/3 the speed of "find".
So what conclusion do we draw from this? Perhaps smallfile and other utilities will need to be rewritten in a compiled language for greater speed, in order to keep up with modern storage hardware.
The text was updated successfully, but these errors were encountered:
I was trying to re-check whether python itself is becoming a bottleneck for smallfile, and specifically I wanted to see if pypy3 was better than python3 and could keep up with the "find" utility for reading directory trees. Initially pypy3 would not run because of a new dependency on PyYAML, but when I commented out the YAML parsing option in smallfile (just 2 lines), pypy3 worked fine.
So I created a tree containing 1 million 0-byte (empty) files using a single smallfile thread, and then tried doing the smallfile readdir operation both with pypy and with python3, and then compared it to the same result with the "find" utility. No cache-dropping was done, so that all the metadata could be memory-resident. While this may seem an unfair comparison, NVDIMM-N memories can provide low response times similar to this, and in addition cached-storage performance is something we need to measure. So the 3 commands were:
The results were:
Are all 3 benchmarks doing the same system calls? When I used strace to compare, smallfile was originally at a disadvantage because it was doing system calls to see if other threads had finished. Specifically it was looking for stonewall.tmp in the shared network directory every 100 files. This is not a big deal when doing actual file reads/writes, but for readdir this is a significant increase in the number of system calls. Here's what smallfile was doing per directory:
By adding the parameter "--stonewall Y" we get rid of the excess stat() system calls and the sequence per directory then becomes optimal. Rerunning the tests with this parameter, we get:
Here's the system call pattern for the "find" command:
So the find utility is actually making more system calls.
pypy3 and python3 are both using 100% CPU according to "top", which means they are using up a whole core and can't go any faster. Most of their time is spent in user space not system space. Yet even pypy3 is 1/3 the speed of "find".
So what conclusion do we draw from this? Perhaps smallfile and other utilities will need to be rewritten in a compiled language for greater speed, in order to keep up with modern storage hardware.
The text was updated successfully, but these errors were encountered: