Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systematically compare/investigate performance on Zarrs #54

Open
yarikoptic opened this issue Feb 7, 2024 · 46 comments
Open

Systematically compare/investigate performance on Zarrs #54

yarikoptic opened this issue Feb 7, 2024 · 46 comments
Assignees
Labels
high priority Work on these first performance Efficient use of time and space

Comments

@yarikoptic
Copy link
Member

@satra did just a quick&dirty test using zarr viewer comparing access directly to S3 URL and then our testbed dandi.centerforopenneuroscience.org instance (on a zarr in 000108 under /dandisets/ not the manifests based one since no manifest for it was there yet) so there are no quantified characteristics available but

  • it took notably longer to load from our instance than from S3 when zooming in
  • overall browsing of the dav instance got significantly slower/less responsive

We need to come up with

  • some sample script which implements basic IO (e.g. reading the slice X) on a given zarr
  • time a single individual access run
  • time N parallel accesses to different or the same slices (assess scalability here as well)

Compare

  • direct S3
  • direct dandidav (no apache reverse proxy setup)
  • dandi.centerforopenneuroscience.org

to identify how much any component contributes and if we could improve any aspect (e.g. parallel access etc). I hope that it is not all simply due to redirects.

Later we need to create some nice "versioned zarr" with a few versions and to use for benchmarks and also benchmark on the /zarr endpoints.

@yarikoptic yarikoptic added the performance Efficient use of time and space label Feb 7, 2024
@yarikoptic
Copy link
Member Author

Here are preliminary results for a simple "download subtree part of zarr" using aws s3 and rclone directly from S3, vs access from dandidav's dandisets/ and then zarrs/ endpoints:

bash script
#!/bin/bash

set -eu

ds=000108
asset=0dfbb4b4-be63-45a6-8354-347cb98fdb5b
if false; then
        # just ot cut/paste outputs below
        curl --silent https://api.dandiarchive.org/api/dandisets/000108/versions/draft/assets/$asset/ | { echo -n "version="; jq -r '.digest["dandi:dandi-zarr-checksum"]'; }
        curl --silent https://api.dandiarchive.org/api/dandisets/000108/versions/draft/assets/$asset/ | { echo -n "zarr="; jq -r '.contentUrl[1]' | sed -e 's,.*zarr/,,g' -e 's,/,,g'; }
        curl --silent https://api.dandiarchive.org/api/dandisets/000108/versions/draft/assets/$asset/ | { echo -n "path="; jq -r '.path' ; }
fi
# too big already
#zarr=04767811-3cea-43c8-956d-81da5e496f80
#version=WRONG!!
#path=sub-MITU01/ses-20211002h21m40s58/micr/sub-MITU01_ses-20211002h21m40s58_sample-37_stain-YO_run-1_chunk-10_SPIM.ome.zarr

# a little smaller
version=0395d0a3767524377b58da3945b3c063-48379--27115470
zarr=0d5b9be5-e626-4f6a-96da-b6b602954899
path=sub-U01hm15x/ses-20220731h17m24s47/micr/sub-U01hm15x_ses-20220731h17m24s47_sample-mEhmAD031x15R2_ABETA_stain-ABETA_run-1_chunk-1_SPIM.ome.zarr

part=${PART:-0/0/0/0/0/}
: ${CONCUR:=20}

OUT="/tmp/zarr-bm"
rm -rf "$OUT"/* || :


#set -x
# simple download directly from S3
get_aws_s3() {
        aws configure set default.s3.max_concurrent_requests $CONCUR
        /usr/bin/time chronic aws s3 --no-sign-request sync s3://dandiarchive/zarr/$zarr/$part "$1"
}

get_rclone_s3() {
        aws configure set default.s3.max_concurrent_requests $CONCUR
        /usr/bin/time rclone sync --transfers $CONCUR DANDI-S3:dandiarchive/zarr/$zarr/$part "$1"
}

get_rclone_dandisets() {
        /usr/bin/time rclone sync --transfers $CONCUR DANDI-WEBDAV:dandisets/$ds/draft/$path/$part "$1"
}

get_rclone_zarr_manifest() {
        /usr/bin/time rclone sync --transfers $CONCUR DANDI-WEBDAV:zarrs/${zarr:0:3}/${zarr:3:3}/${zarr}/$version/$part "$1"
}

echo "Downloading part $part within zarr $zarr asking for up to $CONCUR processes"
for method in get_aws_s3 get_rclone_s3 get_rclone_dandisets get_rclone_zarr_manifest; do
        out="$OUT/$method"
        echo "---------------"
        echo "$method:  $out"
        $method "$out"
        checksum=$(TQDM_DISABLE=1 zarrsum local "$out" | tail -n 1)
        if [ -z "$part" ] && [ $checksum != "$version" ]; then
                echo "wrong checksum $checksum != $version"
        fi
        if [ -n "$part" ]; then
                echo "checksum $checksum"
        fi
done

so we see that /dandisets/ endpoint is quite slow, but the manifests based and shorter path of /zarrs/ one is quite performant (well -- still twice slower but that is just that -- twice! ;))

(dev3) yoh@typhon:~/proj/dandi/zarr-benchmarking/tools$ PART=0/0/0/0/ CONCUR=10 ./simple_parallel_get.sh
Downloading part 0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 10 processes
---------------
get_aws_s3:  /tmp/zarr-bm/get_aws_s3
22.49user 2.03system 0:29.91elapsed 82%CPU (0avgtext+0avgdata 119156maxresident)k
0inputs+24064outputs (0major+110906minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480
---------------
get_rclone_s3:  /tmp/zarr-bm/get_rclone_s3
3.64user 0.94system 0:16.16elapsed 28%CPU (0avgtext+0avgdata 85516maxresident)k
0inputs+24064outputs (0major+5286minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
4.63user 1.99system 3:16.25elapsed 3%CPU (0avgtext+0avgdata 74280maxresident)k
0inputs+24064outputs (0major+5663minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
3.47user 1.55system 0:32.48elapsed 15%CPU (0avgtext+0avgdata 74440maxresident)k
0inputs+24064outputs (0major+5382minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480

on a serial access we are up to par with aws on /zarrs/ but rclone beats us:

(dev3) yoh@typhon:~/proj/dandi/zarr-benchmarking/tools$ PART=0/0/0/0/ CONCUR=1 ./simple_parallel_get.sh
Downloading part 0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 1 processes
---------------
get_aws_s3:  /tmp/zarr-bm/get_aws_s3
20.58user 1.82system 3:06.44elapsed 12%CPU (0avgtext+0avgdata 109096maxresident)k
0inputs+24064outputs (0major+97814minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480
---------------
get_rclone_s3:  /tmp/zarr-bm/get_rclone_s3
4.69user 1.75system 2:43.25elapsed 3%CPU (0avgtext+0avgdata 81728maxresident)k
0inputs+24064outputs (0major+6314minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
4.86user 2.44system 12:44.14elapsed 0%CPU (0avgtext+0avgdata 71256maxresident)k
0inputs+24064outputs (0major+3675minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
4.45user 2.20system 3:04.90elapsed 3%CPU (0avgtext+0avgdata 71740maxresident)k
0inputs+24064outputs (0major+4791minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480

the /dandisets/ path remains many times slower - we should look into optimizing there. But I guess the first thing would be to establish efficient dandi-archive API per

@jwodder
Copy link
Member

jwodder commented Feb 13, 2024

@yarikoptic Yes, a dedicated dandi-archive endpoint would definitely help with efficiency.

@yarikoptic
Copy link
Member Author

with CONCUR=20 rclone scales linearly to cutting in half from 10. rclone of dandidav remains the same -- suggests that there might be limiting factor in parallel handling of requests
(dev3) yoh@typhon:~/proj/dandi/zarr-benchmarking/tools$ PART=0/0/0/0/ CONCUR=20 ./simple_parallel_get.sh
Downloading part 0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 20 processes
---------------
get_aws_s3:  /tmp/zarr-bm/get_aws_s3
30.80user 2.00system 0:23.41elapsed 140%CPU (0avgtext+0avgdata 127672maxresident)k
0inputs+24064outputs (0major+111824minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480
---------------
get_rclone_s3:  /tmp/zarr-bm/get_rclone_s3
3.90user 1.00system 0:08.26elapsed 59%CPU (0avgtext+0avgdata 91768maxresident)k
0inputs+24064outputs (0major+5026minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
4.68user 1.94system 3:28.99elapsed 3%CPU (0avgtext+0avgdata 74680maxresident)k
0inputs+24064outputs (0major+5036minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
3.79user 1.43system 0:32.42elapsed 16%CPU (0avgtext+0avgdata 77672maxresident)k
0inputs+24064outputs (1major+4009minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480
For /dandisets/ -- seems to stay at about 30% CPU
shub@falkor:~$ pidstat -p 3044013 1
Linux 6.1.0-10-amd64 (falkor) 	02/13/2024 	_x86_64_	(8 CPU)

09:15:06 AM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
09:15:07 AM  1001   3044013   34.00    1.00    0.00    0.00   35.00     0  dandidav
09:15:08 AM  1001   3044013   29.00    3.00    0.00    0.00   32.00     0  dandidav
09:15:09 AM  1001   3044013   30.00    2.00    0.00    0.00   32.00     0  dandidav
09:15:10 AM  1001   3044013   29.00    2.00    0.00    0.00   31.00     0  dandidav
09:15:11 AM  1001   3044013   30.00    2.00    0.00    0.00   32.00     0  dandidav
09:15:12 AM  1001   3044013   25.00    4.00    0.00    0.00   29.00     0  dandidav
09:15:13 AM  1001   3044013   26.00    2.00    0.00    0.00   28.00     0  dandidav
For `/zarr/` even less
shub@falkor:~$ pidstat -p 3044013 1
Linux 6.1.0-10-amd64 (falkor) 	02/13/2024 	_x86_64_	(8 CPU)

09:16:56 AM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
09:16:57 AM  1001   3044013    9.00    1.00    0.00    0.00   10.00     0  dandidav
09:16:58 AM  1001   3044013    5.00    1.00    0.00    0.00    6.00     0  dandidav
09:16:59 AM  1001   3044013   10.00    0.00    0.00    0.00   10.00     0  dandidav
09:17:00 AM  1001   3044013   11.00    1.00    0.00    0.00   12.00     0  dandidav
09:17:01 AM  1001   3044013    9.00    0.00    0.00    0.00    9.00     0  dandidav
09:17:02 AM  1001   3044013    8.00    1.00    0.00    0.00    9.00     0  dandidav
09:17:03 AM  1001   3044013    9.00    0.00    0.00    0.00    9.00     0  dandidav
09:17:04 AM  1001   3044013    9.00    2.00    0.00    0.00   11.00     0  dandidav
09:17:05 AM  1001   3044013    9.00    2.00    0.00    0.00   11.00     0  dandidav
09:17:06 AM  1001   3044013    9.00    1.00    0.00    0.00   10.00     0  dandidav
09:17:07 AM  1001   3044013   11.00    1.00    0.00    0.00   12.00     0  dandidav
09:17:08 AM  1001   3044013   10.00    1.00    0.00    0.00   11.00     0  dandidav
09:17:09 AM  1001   3044013   10.00    2.00    0.00    0.00   12.00     0  dandidav

Although no %wait I wonder if logging (#69) already something effecting it.

@jwodder
Copy link
Member

jwodder commented Feb 13, 2024

@yarikoptic Did you build the dandidav binary with --release? If not, you should.

@satra
Copy link
Member

satra commented Feb 13, 2024

@yarikoptic - just a note that testing throughput could be broken into two parts (reading, reading + verifying). the former is what any zarr reader would do. it wouldn't try to check that it received the correct bytes. the latter is what dandi cli would do to ensure that we do receive the correct bytes. and the tools used above do various things depending on flags. also for parallel read from s3, s5cmd is significantly faster than most other tools (https://github.com/peak/s5cmd).

@yarikoptic
Copy link
Member Author

@satra thanks for the note. In the script/output above I report the zarr checksum we have for all downloads -- all good as long as they match. If it is not partial download of a zarr (like in those above) -- it would explicitly compare to the target overall zarr checksum. Verification is not taken as part of the benchmark. If you have some python / any other code you would like to benchmark zarrs -- please share.

re s5cmd, I will add it then. Probably in favor over aws s3 which consistently shows that it is slow.

@yarikoptic
Copy link
Member Author

FWIW

with default settings s5cmd with direct access to s3 at 00.48 definitely beats everything down (note: for aws s3 invocation I just set concurrency to 255) especially if I do not request any specific level of concurrency and s5cmd goes wild to its top 255 etc - it is about 3 times faster than rclone access from our dandidav. But rclone itself is slower than s5cmd in direct access to s3 - and our overhead for it is quite small (2.13 rclone on s3 vs 2.30 rclone on our manifests)
yoh@typhon:~/proj/dandi/zarr-benchmarking$ tools/simple_parallel_get.sh
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 without specifying explicit number of processes
---------------
get_aws_s3:  /tmp/zarr-bm/get_aws_s3
TIME: 0:10.58 real      104.23 user     0.56 sys        990% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
---------------
get_s5cmd_s3:  /tmp/zarr-bm/get_s5cmd_s3
TIME: 0:00.48 real      2.00 user       0.24 sys        464% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
---------------
get_rclone_s3:  /tmp/zarr-bm/get_rclone_s3
TIME: 0:02.13 real      0.30 user       0.10 sys        19% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:15.81 real      0.31 user       0.15 sys        2% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.30 real      0.29 user       0.08 sys        16% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
but if I give rclone explicit up to 30 concurrent processes, it improves to 00.59 and actually beats s5cmd at the same level of concurrency making it in 00.61. Unfortunately rclone access to dandidav manifests stays at ceiling of 02.12
yoh@typhon:~/proj/dandi/zarr-benchmarking$ CONCUR=30 tools/simple_parallel_get.sh
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 30 processes
---------------
get_aws_s3:  /tmp/zarr-bm/get_aws_s3
TIME: 0:08.16 real      23.55 user      0.21 sys        290% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
---------------
get_s5cmd_s3:  /tmp/zarr-bm/get_s5cmd_s3
TIME: 0:00.61 real      1.43 user       0.19 sys        262% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
---------------
get_rclone_s3:  /tmp/zarr-bm/get_rclone_s3
TIME: 0:00.59 real      0.48 user       0.10 sys        99% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:12.14 real      0.46 user       0.12 sys        4% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.12 real      0.32 user       0.12 sys        21% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
and indeed we reach ceiling at some where around 20 concurrent processes
yoh@typhon:~/proj/dandi/zarr-benchmarking$ for c in 1 5 10 20 30 40 50; do CONCUR=$c METHODS=get_rclone_zarr_manifest tools/simple_parallel_get.sh; done
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 1 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:07.93 real      0.32 user       0.11 sys        5% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 5 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.21 real      0.30 user       0.12 sys        19% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 10 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.19 real      0.36 user       0.10 sys        21% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 20 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.12 real      0.36 user       0.12 sys        23% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 30 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.13 real      0.38 user       0.10 sys        23% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 40 processes
all prior results were with prior "debug" build. With fresh "release" build situation is qualitatively the same.
yoh@typhon:~/proj/dandi/zarr-benchmarking$ for c in 1 5 10 20 30 40 50; do CONCUR=$c METHODS=get_rclone_zarr_manifest tools/simple_parallel_get.sh; done
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 1 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:10.93 real      0.31 user       0.16 sys        4% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 5 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.31 real      0.29 user       0.08 sys        16% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 10 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.12 real      0.33 user       0.09 sys        20% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 20 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.08 real      0.27 user       0.12 sys        19% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 30 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.08 real      0.35 user       0.09 sys        21% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 40 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.09 real      0.37 user       0.12 sys        23% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 50 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.10 real      0.33 user       0.13 sys        22% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280

@jwodder said that he already looked into scalability a little but have not found a resolution yet.

Given that rclone seems generally can reach s5cmd performance on access to S3, I think solving scalability solution for the /zarr/ endpoint should provide a comparable solution altogether.

@yarikoptic
Copy link
Member Author

FWIW: I considered possibility that it is apache as reverse proxy making a limit. It is not the case: ran on falkor against apache and directly against dandidav -- the same
falkor:~/proj/dandi/zarr-benchmarking
$> for c in 1 5 10 20 30 40 50; do CONCUR=$c METHODS=get_rclone_zarr_manifest RCLONE_DANDI_WEBDAV=DANDI-WEBDAV-LOCAL tools/simple_parallel_get.sh; done
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 1 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:07.89 real      0.29 user       0.14 sys        5% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 5 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.22 real      0.27 user       0.12 sys        17% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 10 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.15 real      0.31 user       0.11 sys        19% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 20 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.10 real      0.24 user       0.17 sys        19% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 30 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.17 real      0.39 user       0.15 sys        25% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 40 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.10 real      0.36 user       0.10 sys        21% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 50 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.09 real      0.32 user       0.17 sys        23% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
(dev3.11) (git)/home/yoh/proj/dandi/zarr-benchmarking:[master]
#22 !2680 [0].....................................:Wed Feb 14 08:47:45:.
falkor:~/proj/dandi/zarr-benchmarking
*$> for c in 1 5 10 20 30 40 50; do CONCUR=$c METHODS=get_rclone_zarr_manifest RCLONE_DANDI_WEBDAV=DANDI-WEBDAV tools/simple_parallel_get.sh; done
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 1 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:07.69 real      0.29 user       0.16 sys        6% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 5 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.20 real      0.26 user       0.12 sys        17% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 10 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.14 real      0.25 user       0.16 sys        19% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 20 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.11 real      0.27 user       0.20 sys        22% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 30 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.11 real      0.28 user       0.13 sys        19% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 40 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.12 real      0.24 user       0.20 sys        20% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 50 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.12 real      0.27 user       0.14 sys        19% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280

FWIW I have pushed current version of the helper to https://github.com/dandi/zarr-benchmarking/blob/master/tools/simple_parallel_get.sh

@jwodder
Copy link
Member

jwodder commented Feb 14, 2024

@yarikoptic To be clear: You're concluding that dandidav can only support a maximum of about 20 requests based on the fact that the total wallclock runtime to process concurrent requests stops decreasing at 20, correct? If so, that interpretation seems backwards: if dandidav can't handle more than 20 concurrent requests, then sending more requests would result in some not being handled immediately, causing the total runtime to increase.

@yarikoptic
Copy link
Member Author

correct.

I don't think so since the overall number of requests is constant (number of files in that zarr path) and the only thing changing is how many we are submitting in parallel. So it is a question of throughput really. My trivilized thinking/explanation which simplifies as in submitter not waiting for response (like we do) but overall of the same reasoning. If dandidav can handle max 20 requests in 1 sec, the time to process 100 requests would remain constant 5 seconds, regardless in how many requests (above 20) I would request at once (in parallel).

@jwodder
Copy link
Member

jwodder commented Feb 14, 2024

@yarikoptic I think I misunderstood what you were doing with concurrent processes. I thought you were running, say, 20 copies of rclone against the same endpoint in parallel, but it's actually just one rclone process that internally divides its requests into concurrent batches of 20.

@yarikoptic
Copy link
Member Author

correct! indeed it might also be valuable to test against multiple clients in parallel bombarding the same/different zarrs.

@jwodder
Copy link
Member

jwodder commented Feb 15, 2024

@yarikoptic I'm going to ask about request limits on a Rust forum, and I need to know how many CPUs (not cores) the machine dandidav is running on has. I believe running python3 -c "import os; print(os.cpu_count())" should give the correct value.

@jwodder
Copy link
Member

jwodder commented Feb 15, 2024

@yarikoptic Also, could you run for c in 1 5 10 20 30 40 50; do CONCUR=$c ... using the get_rclone_dandisets method instead of get_rclone_zarr_manifest? I want to confirm that both endpoints exhibit the 20-request limit with a "release" build.

@yarikoptic
Copy link
Member Author

I believe running python3 -c "import os; print(os.cpu_count())" should give the correct value.

32

@yarikoptic Also, could you run for c in 1 5 10 20 30 40 50; do CONCUR=$c ... using the get_rclone_dandisets method instead of get_rclone_zarr_manifest?

Sure, doing against LOCAL

*$> for c in 1 5 10 20 30 40 50; do CONCUR=$c METHODS=get_rclone_zarr_manifest RCLONE_DANDI_WEBDAV=DANDI-WEBDAV-LOCAL tools/simple_parallel_get.sh; done
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 1 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:13.37 real      0.29 user       0.22 sys        3% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 5 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.47 real      0.25 user       0.17 sys        17% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 10 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.15 real      0.36 user       0.10 sys        21% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 20 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.10 real      0.30 user       0.18 sys        23% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 30 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.09 real      0.39 user       0.12 sys        24% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 40 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.14 real      0.36 user       0.14 sys        23% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 50 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.10 real      0.34 user       0.15 sys        23% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280

shub@falkor:~$ dandidav-test/dandidav --version
dandidav 0.2.0 (commit: 4998bd8)

shub@falkor:~$ file dandidav-test/dandidav
dandidav-test/dandidav: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=825b59e00515dad24f6ae389e6a79c65765232dd, for GNU/Linux 3.2.0, with debug_info, not stripped

if needed next time - to expedite, just run the script on e.g. drogon: it is now on github. I can also make account for you on either typhon (where I did originally) or falkor (webserver), just let me know.

@jwodder jwodder added this to the Current priorities milestone Feb 17, 2024
@yarikoptic
Copy link
Member Author

Hi @jwodder , anything from Rust community on the issue of scalability? What to look for?

@jwodder
Copy link
Member

jwodder commented Mar 7, 2024

@yarikoptic No, my post got no attention. The only thing I've learned is that axum (the web framework dandidav uses) does not impose any request limits by default, so the bottleneck isn't coming from there.

@yarikoptic
Copy link
Member Author

but what is it -- limited number of threads ?

@jwodder
Copy link
Member

jwodder commented Mar 7, 2024

@yarikoptic tokio (the async runtime) does limit the number of worker threads to the number of CPUs by default, but you said that's 32, which wouldn't explain why we're seeing bottlenecks at 20. Also, since the code is async, it's possible for more than 32 requests to be processed at once anyway.

@yarikoptic
Copy link
Member Author

is it tunable? if we double it and it doesn't have effect -- must be smth else. if not -- we could figure out why 20 and not 32 ;-)

@jwodder
Copy link
Member

jwodder commented Mar 7, 2024

@yarikoptic Yes, it's tunable. If you set the environment variable TOKIO_WORKER_THREADS to an integer when running the server, that'll be how many worker threads are used.

If you're really paranoid, you can check what Rust thinks the number of CPUs is as follows:

  • Create a new directory on the server machine and cd into it

  • Create a Cargo.toml file with the following contents:

    [package]
    name = "cpu-count"
    version = "0.1.0"
    edition = "2021"
    
    [dependencies]
    num_cpus = "1.16.0"
  • Create a src/main.rs file with the following contents:

    fn main() {
        println!("Number of CPUs: {}", num_cpus::get());
    }
  • Run cargo run.

@yarikoptic
Copy link
Member Author

FWIW.

Setting it to 10 caused performance ceiling at 10

on server: TOKIO_WORKER_THREADS=10 ./dandidav

on client typhon

(zarr-manifests) yoh@typhon:~/proj/dandi/zarr-benchmarking$ for c in 1 5 10 15 20 30 40 50; do CONCUR=$c METHODS=get_rclone_dandisets RCLONE_DANDI_WEBDAV=DANDI-WEBDAV tools/simple_parallel_get.sh; done
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 1 processes
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:37.85 real      0.39 user       0.12 sys        1% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 5 processes
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:09.88 real      0.39 user       0.16 sys        5% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 10 processes
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:08.74 real      0.41 user       0.11 sys        6% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 15 processes
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:08.20 real      0.43 user       0.11 sys        6% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 20 processes
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:08.36 real      0.46 user       0.10 sys        6% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 30 processes
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:08.44 real      0.50 user       0.18 sys        8% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 40 processes
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:08.27 real      0.56 user       0.16 sys        8% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 50 processes
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:08.59 real      0.83 user       0.20 sys        12% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
setting to 100 - the same ceiling at 20-30
(zarr-manifests) yoh@typhon:~/proj/dandi/zarr-benchmarking$ for c in 1 5 10 15 20 30 40 50; do CONCUR=$c METHODS=get_rclone_dandisets RCLONE_DANDI_WEBDAV=DANDI-WEBDAV tools/simple_parallel_get.sh; done
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 1 processes
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:34.04 real      0.29 user       0.20 sys        1% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 5 processes
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:09.10 real      0.37 user       0.13 sys        5% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 10 processes
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:08.74 real      0.31 user       0.24 sys        6% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 15 processes
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:08.34 real      0.40 user       0.13 sys        6% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 20 processes
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:08.42 real      0.41 user       0.15 sys        6% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 30 processes
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:08.22 real      0.51 user       0.21 sys        8% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 40 processes
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:08.79 real      0.54 user       0.13 sys        7% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 50 processes
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
TIME: 0:08.38 real      0.87 user       0.25 sys        13% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280

so there is indeed more to it

@jwodder
Copy link
Member

jwodder commented Mar 8, 2024

@yarikoptic This is largely a shot in the dark, but on the server that dandidav is running on, are 20 of the CPU cores performance cores and the rest efficiency cores? (I don't know how you'd check this.) I don't know how core scheduling works with efficiency cores, but since all of dandidav's work is relatively short-lived, maybe only the performance cores are being used?

@yarikoptic
Copy link
Member Author

I even had no clue that there are different kinds of cores. FWIW, to rule out cores effect we could check scalability while running on some other server. Could you please run on typhon which has 32 cores? ndoli has 40

@jwodder
Copy link
Member

jwodder commented Mar 11, 2024

@yarikoptic

I even had no clue that there are different kinds of cores.

It's a new thing that ARM came out with in 2011, and then Intel made its own version in 2022-ish. (Thus, if the server's CPUs are from 2021 or earlier, then they're just "normal" cores and my theory is wrong.) I only found out about the different types last week or the week before when I was looking at the specs for new MacBooks.

Could you please run on typhon which has 32 cores?

I don't think you ever gave me access to typhon, and I don't know how to access it. (At the very least, typhon.datalad.org doesn't resolve for me.)

@jwodder
Copy link
Member

jwodder commented Mar 11, 2024

@yarikoptic I ran both dandidav and the timing script on typhon, and the 20-request limit was still present:

get_rclone_dandisets:

Processes Time User Sys CPU
1 0:38.22 0.35 0.17 1%
5 0:10.30 0.42 0.16 5%
10 0:09.48 0.39 0.12 5%
15 0:08.83 0.37 0.22 6%
20 0:09.43 0.38 0.17 6%
30 0:09.24 0.42 0.16 6%
40 0:10.11 0.53 0.19 7%
50 0:09.68 0.53 0.20 7%

get_rclone_zarr_manifest:

Processes Time User Sys CPU
1 0:10.67 0.35 0.22 5%
5 0:02.28 0.32 0.10 18%
10 0:02.14 0.35 0.17 24%
15 0:02.10 0.42 0.10 25%
20 0:02.10 0.27 0.10 18%
30 0:02.12 0.57 0.11 32%
40 0:02.10 0.42 0.14 27%
50 0:02.09 0.37 0.13 24%

@jwodder
Copy link
Member

jwodder commented May 20, 2024

@yarikoptic I wrote https://github.com/jwodder/batchdav for measuring WebDAV traversal time using a given number of workers. Unlike rclone, rather than spending time downloading files from S3, the only requests it makes for the actual files are HEAD requests that don't follow redirects.

I've run it at various different points, and the times from different sessions have varied by about an order of magnitude, which I suspect is due to differing amounts of load on Heroku. Some of the fastest times I've ever gotten are:

Workers Time (mean) Time (stddev)
1 8.399036695100001 0.36142910510463516
5 1.6700318244 0.12919592271200123
10 1.0409548316000001 0.10855610294283857
15 0.7129774931999999 0.06181837739373458
20 0.750514105 0.10966455557731183
30 0.7945123642999999 0.10238084442203854
40 0.7258895968 0.08116879741778966
50 0.7132875974999999 0.07944869527032605

Make of that what you will.

@yarikoptic
Copy link
Member Author

Just to recall/summarize what we have so far:

  • We (for /zarr endpoints) seems to be (just) slightly over twice slower than the fast s5cmd (or rclone) when actually accessing some sample zarr per my much earlier investigation.
    • we have not timed yet on some other use cases/access patterns besides the one crafted
  • For dandidav there seems to be some ceiling at about 20 workers the nature of which we do not yet know, and we do not know how deciding that factor is ATM.

@jwodder
Copy link
Member

jwodder commented May 29, 2024

@yarikoptic I'm starting to think that the "20 workers ceiling" is just a statistical illusion resulting from the combination of two factors:

  1. As the number of concurrent requests to an axum server grows, the average response time increases. (Based on my observations, this happens regardless of whether the number of concurrent requests is greater than or less than the number of threads used by the server.)

  2. As the number of concurrent requests made by a client grows, assuming the response time remains constant, the overall runtime decreases.

It seems that the decrease from (2) is dominant until some point in the 20-30 concurrent requests range, after which the increase from (1) is dominant, thereby producing a minimum runtime around 20 workers.

@yarikoptic
Copy link
Member Author

it could indeed be the case. But why and how much the response time increases for the 1.?

@jwodder
Copy link
Member

jwodder commented May 31, 2024

@yarikoptic I'm not entirely sure why it happens, but you can see the increase in the following statistical summary of the individual request times per number of client workers when using batchdav from my MacBook Pro to traverse [1]:

- Workers: 1
  Qty: 4700
  Min: 0.028803198
  Q1:  0.03833497475
  Med: 0.042684092
  Q3:  0.04846862325
  Max: 0.282854373
  Avg: 0.04520333965510638
  StdDev: 0.014187492721929993

- Workers: 5
  Qty: 4700
  Min: 0.027815688
  Q1:  0.0367942775
  Med: 0.0411618505
  Q3:  0.0465202755
  Max: 0.282241313
  Avg: 0.042935500837659575
  StdDev: 0.009896549514405408

- Workers: 10
  Qty: 4700
  Min: 0.028312506
  Q1:  0.04303246275
  Med: 0.0491158515
  Q3:  0.05691809825
  Max: 0.160982916
  Avg: 0.051420016434468085
  StdDev: 0.012632252899625567

- Workers: 15
  Qty: 4700
  Min: 0.030884503
  Q1:  0.047435413749999995
  Med: 0.054248813
  Q3:  0.06353448650000001
  Max: 0.1791786
  Avg: 0.05743126231170213
  StdDev: 0.015056378924438578

- Workers: 20
  Qty: 4700
  Min: 0.032487009
  Q1:  0.0505410675
  Med: 0.057821753000000004
  Q3:  0.07021392474999999
  Max: 0.34045238
  Avg: 0.06307433582808511
  StdDev: 0.019146996600473823

- Workers: 30
  Qty: 4700
  Min: 0.034624271
  Q1:  0.07324338025
  Med: 0.09015912849999999
  Q3:  0.11517259725
  Max: 0.553115742
  Avg: 0.10026274531425532
  StdDev: 0.041730714375020364

- Workers: 40
  Qty: 4700
  Min: 0.036998484
  Q1:  0.09305445600000001
  Med: 0.1177166345
  Q3:  0.14652155375
  Max: 0.367216551
  Avg: 0.12432899197702128
  StdDev: 0.04396168177875441

- Workers: 50
  Qty: 4700
  Min: 0.041797039
  Q1:  0.11373270075
  Med: 0.14867984350000002
  Q3:  0.19339987925000002
  Max: 0.565099972
  Avg: 0.16100923599723405
  StdDev: 0.06995097492611861

The effect also happens when both the dandidav server and the batchdav client are run on smaug:

- Workers: 1
  Qty: 4700
  Min: 0.001521174
  Q1:  0.001634275
  Med: 0.001657998
  Q3:  0.0017370222499999998
  Max: 0.006338718
  Avg: 0.0017425304072340424
  StdDev: 0.000276490867720157

- Workers: 5
  Qty: 4700
  Min: 0.001630908
  Q1:  0.002234742
  Med: 0.0026924534999999998
  Q3:  0.0033815309999999997
  Max: 0.008711644
  Avg: 0.0029348376385106385
  StdDev: 0.0009457047047697312

- Workers: 10
  Qty: 4700
  Min: 0.001978361
  Q1:  0.00286769725
  Med: 0.003462554
  Q3:  0.0038105122499999997
  Max: 0.01161249
  Avg: 0.0035703647621276595
  StdDev: 0.0010899940171207661

- Workers: 15
  Qty: 4700
  Min: 0.002201353
  Q1:  0.0035673577499999998
  Med: 0.0037393955
  Q3:  0.0041512415000000006
  Max: 0.011563436
  Avg: 0.004090452591276596
  StdDev: 0.0011780001120566517

- Workers: 20
  Qty: 4700
  Min: 0.002114947
  Q1:  0.0042447985
  Med: 0.004752021
  Q3:  0.005448158
  Max: 0.016708018
  Avg: 0.005145343596170212
  StdDev: 0.0015452715150670242

- Workers: 30
  Qty: 4700
  Min: 0.00217514
  Q1:  0.00622427575
  Med: 0.0073929024999999995
  Q3:  0.00870435825
  Max: 0.018373898
  Avg: 0.007615053831702127
  StdDev: 0.0022078254266568575

- Workers: 40
  Qty: 4700
  Min: 0.002216688
  Q1:  0.0084486705
  Med: 0.010468385
  Q3:  0.012738740250000002
  Max: 0.026942456
  Avg: 0.010717534040851065
  StdDev: 0.003395530609215751

- Workers: 50
  Qty: 4700
  Min: 0.002178145
  Q1:  0.009653720250000001
  Med: 0.012588034000000001
  Q3:  0.01565948575
  Max: 0.032009146
  Avg: 0.012759034398085107
  StdDev: 0.004308517810350476

@yarikoptic
Copy link
Member Author

interesting. So indeed it pretty much jumps twice in Avg going from 20 to 40, but not before, i.e. before 20 it grows slower than factor for number of workers (e.g. it is growing twice in going from 1 to 10). Just for academic purposes -- could you produce those Avg times only for our dandidav and smaug only but with step 1? then I would like to look at estimate of Avg[n]/Avg[n/k] (k==2 or some other) -- is it jumping rapidly higher at some specific value and either that value the same for the both, and where it crosses the horizontal line for k - i.e. were increasing number of workers would not be of help.

So indeed your observation was probably right spot on that response time is what keeps us from "going faster", but the question remains why such rapid response time growth? it could again be if there is some hardcoded size of some queue somewhere to process only N requests in parallel, and that would increase average time since N+1st request would need to wait first to get its turn, which would add up to response time.

@jwodder
Copy link
Member

jwodder commented May 31, 2024

@yarikoptic

Just for academic purposes -- could you produce those Avg times only for our dandidav and smaug only but with step 1?

Over what range of worker values?

@yarikoptic
Copy link
Member Author

how academic you want to go? ;) if not much -- till 50, if more -- till 100 ;)

@jwodder
Copy link
Member

jwodder commented May 31, 2024

@yarikoptic

then I would like to look at estimate of Avg[n]/Avg[n/k] (k==2 or some other)

I'm not sure what formula you're trying to describe here. If k is a fixed number, then the given expression will always be equal to k, since Avg[n/k] = Avg[n]/k, so Avg[n]/Avg[n/k] = Avg[n]/(Avn[n]/k) = k.

@jwodder
Copy link
Member

jwodder commented May 31, 2024

The statistics for 1 through 50 workers against webdav.dandiarchive.org are:

- Workers: 1
  Qty: 1880
  Min: 0.030198719
  Q1:  0.04047590875
  Med: 0.0445370265
  Q3:  0.05099806525
  Max: 0.505651443
  Avg: 0.04770227094627659
  StdDev: 0.021967394220812318

- Workers: 2
  Qty: 1880
  Min: 0.029291391
  Q1:  0.039277515
  Med: 0.0443762905
  Q3:  0.05052834975
  Max: 0.117624117
  Avg: 0.0460287705106383
  StdDev: 0.010271837123910967

- Workers: 3
  Qty: 1880
  Min: 0.030182589
  Q1:  0.0407896065
  Med: 0.0459593295
  Q3:  0.05419803475
  Max: 0.274188679
  Avg: 0.05009355483138298
  StdDev: 0.016656739601998753

- Workers: 4
  Qty: 1880
  Min: 0.027692972
  Q1:  0.03707500075
  Med: 0.0416833945
  Q3:  0.04726783925
  Max: 0.267129368
  Avg: 0.04420503393617021
  StdDev: 0.012988273800523325

- Workers: 5
  Qty: 1880
  Min: 0.026963886
  Q1:  0.03712360225
  Med: 0.041677756999999996
  Q3:  0.04774004750000001
  Max: 0.277971116
  Avg: 0.0438017427356383
  StdDev: 0.011928574725979245

- Workers: 6
  Qty: 1880
  Min: 0.029059732
  Q1:  0.04042438225
  Med: 0.0462466995
  Q3:  0.055243211
  Max: 0.13999787
  Avg: 0.04965774111223404
  StdDev: 0.013700461119878111

- Workers: 7
  Qty: 1880
  Min: 0.028257397
  Q1:  0.038818042
  Med: 0.043996682999999995
  Q3:  0.0508364065
  Max: 0.146299726
  Avg: 0.046456366545744684
  StdDev: 0.011793250847199541

- Workers: 8
  Qty: 1880
  Min: 0.028853611
  Q1:  0.042837879
  Med: 0.051541457
  Q3:  0.06957287175
  Max: 0.320239926
  Avg: 0.05949934435265958
  StdDev: 0.025074489714542465

- Workers: 9
  Qty: 1880
  Min: 0.028510246
  Q1:  0.0403969415
  Med: 0.046127439
  Q3:  0.05523004225
  Max: 0.492025285
  Avg: 0.05122154616861702
  StdDev: 0.022898344247715256

- Workers: 10
  Qty: 1880
  Min: 0.026451435
  Q1:  0.040683591500000005
  Med: 0.0464000255
  Q3:  0.0539399995
  Max: 0.148002937
  Avg: 0.049111625025531916
  StdDev: 0.012994588316983175

- Workers: 11
  Qty: 1880
  Min: 0.028418947
  Q1:  0.04411216275
  Med: 0.0509800795
  Q3:  0.060620849500000004
  Max: 0.277585998
  Avg: 0.05454666861170213
  StdDev: 0.01615620883278283

- Workers: 12
  Qty: 1880
  Min: 0.029657315
  Q1:  0.0464608035
  Med: 0.0547166485
  Q3:  0.06761030975
  Max: 0.271769288
  Avg: 0.05995401004361702
  StdDev: 0.020413754904561217

- Workers: 13
  Qty: 1880
  Min: 0.030319581
  Q1:  0.047687136000000005
  Med: 0.056807948999999996
  Q3:  0.07380057725
  Max: 0.274448025
  Avg: 0.06486728137287234
  StdDev: 0.025753669748910947

- Workers: 14
  Qty: 1880
  Min: 0.030179559
  Q1:  0.045507504250000004
  Med: 0.053886984
  Q3:  0.06721529375
  Max: 0.293695972
  Avg: 0.06026955950478723
  StdDev: 0.024470219442912906

- Workers: 15
  Qty: 1880
  Min: 0.030015622
  Q1:  0.04543004825000001
  Med: 0.053180841
  Q3:  0.06633464275
  Max: 0.332681016
  Avg: 0.05880427830053192
  StdDev: 0.021664812097799337

- Workers: 16
  Qty: 1880
  Min: 0.031588945
  Q1:  0.04561960825
  Med: 0.0529881105
  Q3:  0.06584364175
  Max: 0.246868099
  Avg: 0.05973010918191489
  StdDev: 0.023111749345733613

- Workers: 17
  Qty: 1880
  Min: 0.028995174
  Q1:  0.0507760395
  Med: 0.059060258000000004
  Q3:  0.07307642375000001
  Max: 0.176472585
  Avg: 0.06394386569787235
  StdDev: 0.019133970826324433

- Workers: 18
  Qty: 1880
  Min: 0.033611645
  Q1:  0.047814756750000006
  Med: 0.054737262999999994
  Q3:  0.06728223875
  Max: 0.266728307
  Avg: 0.06048719055638298
  StdDev: 0.020122678711662696

- Workers: 19
  Qty: 1880
  Min: 0.031449239
  Q1:  0.05059406775
  Med: 0.059452169
  Q3:  0.0753107175
  Max: 0.273290625
  Avg: 0.06658642856702128
  StdDev: 0.024967574900272436

- Workers: 20
  Qty: 1880
  Min: 0.031575064
  Q1:  0.05169245275
  Med: 0.0623448345
  Q3:  0.0797893495
  Max: 0.218454668
  Avg: 0.06761636774840425
  StdDev: 0.021800469336841686

- Workers: 21
  Qty: 1880
  Min: 0.030493716
  Q1:  0.049088047499999996
  Med: 0.058016233
  Q3:  0.0733673765
  Max: 0.161157289
  Avg: 0.06381228032765957
  StdDev: 0.02059089904339535

- Workers: 22
  Qty: 1880
  Min: 0.038303362
  Q1:  0.063974428
  Med: 0.0835617045
  Q3:  0.112530386
  Max: 0.404207984
  Avg: 0.0964025817819149
  StdDev: 0.0478784277111293

- Workers: 23
  Qty: 1880
  Min: 0.035781558
  Q1:  0.05550457075
  Med: 0.070056598
  Q3:  0.090656272
  Max: 0.387076747
  Avg: 0.07993766085638297
  StdDev: 0.037897550636719124

- Workers: 24
  Qty: 1880
  Min: 0.036041731
  Q1:  0.060921950249999995
  Med: 0.074656136
  Q3:  0.0957148505
  Max: 0.346587787
  Avg: 0.08106884284627659
  StdDev: 0.028429675142782072

- Workers: 25
  Qty: 1880
  Min: 0.037840619
  Q1:  0.0629756705
  Med: 0.07655896400000001
  Q3:  0.09650261800000001
  Max: 0.243017182
  Avg: 0.08326741997712767
  StdDev: 0.028154275468516673

- Workers: 26
  Qty: 1880
  Min: 0.038228939
  Q1:  0.06477089224999999
  Med: 0.0803242125
  Q3:  0.1049138705
  Max: 0.746675107
  Avg: 0.09239364253031915
  StdDev: 0.04400628406814501

- Workers: 27
  Qty: 1880
  Min: 0.036371745
  Q1:  0.06456503425000001
  Med: 0.082823384
  Q3:  0.10855006725
  Max: 0.327976212
  Avg: 0.09352845995957447
  StdDev: 0.041411974328631786

- Workers: 28
  Qty: 1880
  Min: 0.033921641
  Q1:  0.06529700699999999
  Med: 0.079911845
  Q3:  0.098540441
  Max: 0.244019412
  Avg: 0.08469663869095745
  StdDev: 0.026363707305355814

- Workers: 29
  Qty: 1880
  Min: 0.034247912
  Q1:  0.0756947665
  Med: 0.0973659885
  Q3:  0.1295879955
  Max: 0.359678624
  Avg: 0.10843607427553192
  StdDev: 0.046264176305897506

- Workers: 30
  Qty: 1880
  Min: 0.041054888
  Q1:  0.08897897675
  Med: 0.111961684
  Q3:  0.14239367225
  Max: 0.361699035
  Avg: 0.11918543665638298
  StdDev: 0.041927636909251303

- Workers: 31
  Qty: 1880
  Min: 0.037353538
  Q1:  0.0744029605
  Med: 0.09185067499999999
  Q3:  0.11388209600000002
  Max: 0.291891112
  Avg: 0.09756171123829786
  StdDev: 0.033404854076688355

- Workers: 32
  Qty: 1880
  Min: 0.03760705
  Q1:  0.085931312
  Med: 0.108399407
  Q3:  0.13513629
  Max: 0.440032619
  Avg: 0.11592635626117022
  StdDev: 0.043976380306986175

- Workers: 33
  Qty: 1880
  Min: 0.037930898
  Q1:  0.07457999950000001
  Med: 0.0922163005
  Q3:  0.1134090805
  Max: 0.399841632
  Avg: 0.1001791668212766
  StdDev: 0.03959930437732155

- Workers: 34
  Qty: 1880
  Min: 0.039043573
  Q1:  0.0907591495
  Med: 0.1130351855
  Q3:  0.142176184
  Max: 0.360311793
  Avg: 0.12165001820053192
  StdDev: 0.04454270684016887

- Workers: 35
  Qty: 1880
  Min: 0.042111625
  Q1:  0.075454718
  Med: 0.09126802249999999
  Q3:  0.11218065899999999
  Max: 0.352494531
  Avg: 0.09615554100159573
  StdDev: 0.028982116873632273

- Workers: 36
  Qty: 1880
  Min: 0.039344224
  Q1:  0.08958393799999999
  Med: 0.1119491465
  Q3:  0.14033767500000002
  Max: 0.293377311
  Avg: 0.11830818654840426
  StdDev: 0.03981244324362139

- Workers: 37
  Qty: 1880
  Min: 0.039811418
  Q1:  0.08431090075
  Med: 0.1039188205
  Q3:  0.13125420075
  Max: 0.370457522
  Avg: 0.11323512460053191
  StdDev: 0.042970400619024594

- Workers: 38
  Qty: 1880
  Min: 0.037180738
  Q1:  0.090248271
  Med: 0.112835429
  Q3:  0.15111971925
  Max: 0.373409753
  Avg: 0.1269275020468085
  StdDev: 0.05279329914936619

- Workers: 39
  Qty: 1880
  Min: 0.039293444
  Q1:  0.11518510925
  Med: 0.14848806399999998
  Q3:  0.179011966
  Max: 0.415970775
  Avg: 0.1506312693547872
  StdDev: 0.049322983206494066

- Workers: 40
  Qty: 1880
  Min: 0.046026323
  Q1:  0.11352183775000001
  Med: 0.14302828
  Q3:  0.17975541675
  Max: 0.370786007
  Avg: 0.15082671499095743
  StdDev: 0.053878775153732364

- Workers: 41
  Qty: 1880
  Min: 0.042243285
  Q1:  0.1109422555
  Med: 0.14818105850000002
  Q3:  0.19478194499999998
  Max: 0.506666957
  Avg: 0.15950820534840426
  StdDev: 0.06547950686216591

- Workers: 42
  Qty: 1880
  Min: 0.037987138
  Q1:  0.09977411575
  Med: 0.126900079
  Q3:  0.16130051475
  Max: 0.414916882
  Avg: 0.1339623592744681
  StdDev: 0.049423841060615835

- Workers: 43
  Qty: 1880
  Min: 0.037800664
  Q1:  0.09326244
  Med: 0.11499287050000001
  Q3:  0.147206116
  Max: 0.351152881
  Avg: 0.12314373566170214
  StdDev: 0.04392976003292658

- Workers: 44
  Qty: 1880
  Min: 0.045228382
  Q1:  0.11149409525000001
  Med: 0.142151954
  Q3:  0.17733455124999997
  Max: 0.352633191
  Avg: 0.1458120733218085
  StdDev: 0.048546879938201996

- Workers: 45
  Qty: 1880
  Min: 0.032151294
  Q1:  0.09642264975
  Med: 0.124894909
  Q3:  0.16097414975
  Max: 0.370443691
  Avg: 0.13475005337234042
  StdDev: 0.05526900201009612

- Workers: 46
  Qty: 1880
  Min: 0.035793279
  Q1:  0.1005359305
  Med: 0.12990985900000002
  Q3:  0.16520698749999999
  Max: 0.402757271
  Avg: 0.1377827691856383
  StdDev: 0.05330886453400441

- Workers: 47
  Qty: 1880
  Min: 0.034641088
  Q1:  0.10163916175
  Med: 0.128542445
  Q3:  0.16163195425
  Max: 0.417836856
  Avg: 0.13393576528404255
  StdDev: 0.04574783805801064

- Workers: 48
  Qty: 1880
  Min: 0.038120426
  Q1:  0.1009265035
  Med: 0.1216764585
  Q3:  0.14816999349999999
  Max: 0.33581898
  Avg: 0.12660648065159574
  StdDev: 0.04069829660003819

- Workers: 49
  Qty: 1880
  Min: 0.039849426
  Q1:  0.1212875695
  Med: 0.1510706935
  Q3:  0.186415799
  Max: 0.416485898
  Avg: 0.15878518579680853
  StdDev: 0.06036032166784821

- Workers: 50
  Qty: 1880
  Min: 0.037548742
  Q1:  0.10892622575
  Med: 0.1389205055
  Q3:  0.17040354549999998
  Max: 0.3649731
  Avg: 0.14284811705425532
  StdDev: 0.04865791195576917

and the statistics for 1 through 50 workers against dandidav run on smaug are:

- Workers: 1
  Qty: 1880
  Min: 0.001263773
  Q1:  0.00133388425
  Med: 0.0013593185000000002
  Q3:  0.0015099735000000001
  Max: 0.004031561
  Avg: 0.0014336062680851064
  StdDev: 0.00021230805297713868

- Workers: 2
  Qty: 1880
  Min: 0.001304008
  Q1:  0.0023371572499999997
  Med: 0.0027662885
  Q3:  0.0028487119999999998
  Max: 0.308872887
  Avg: 0.0029223692627659576
  StdDev: 0.009984066924141

- Workers: 3
  Qty: 1880
  Min: 0.001296164
  Q1:  0.00183014175
  Med: 0.0023113185
  Q3:  0.0027977139999999998
  Max: 0.007494042
  Avg: 0.0024289806186170216
  StdDev: 0.0007714845810454107

- Workers: 4
  Qty: 1880
  Min: 0.001476163
  Q1:  0.0017672965
  Med: 0.002210318
  Q3:  0.0027398375
  Max: 0.006795677
  Avg: 0.0023590983409574467
  StdDev: 0.0007249617108132787

- Workers: 5
  Qty: 1880
  Min: 0.001531298
  Q1:  0.0017849205000000001
  Med: 0.0021271855
  Q3:  0.00267039975
  Max: 0.006667797
  Avg: 0.0023463242468085106
  StdDev: 0.0007397465174556249

- Workers: 6
  Qty: 1880
  Min: 0.001524057
  Q1:  0.001845738
  Med: 0.0021813795
  Q3:  0.002759644
  Max: 0.00867191
  Avg: 0.0023938246590425533
  StdDev: 0.0007519526091665974

- Workers: 7
  Qty: 1880
  Min: 0.001562128
  Q1:  0.0019453505
  Med: 0.002354564
  Q3:  0.00287174825
  Max: 0.00764177
  Avg: 0.002533195656382979
  StdDev: 0.0007906696561381343

- Workers: 8
  Qty: 1880
  Min: 0.001589142
  Q1:  0.00203439075
  Med: 0.0025405005
  Q3:  0.0029089575
  Max: 0.008614692
  Avg: 0.0026349355132978727
  StdDev: 0.0008139821589503481

- Workers: 9
  Qty: 1880
  Min: 0.001621635
  Q1:  0.0020699185
  Med: 0.0025790609999999997
  Q3:  0.0029408827499999997
  Max: 0.008686138
  Avg: 0.0026694448521276597
  StdDev: 0.000870264843030878

- Workers: 10
  Qty: 1880
  Min: 0.001612252
  Q1:  0.0021259549999999997
  Med: 0.0026734535
  Q3:  0.002990397
  Max: 0.011281332
  Avg: 0.0027496460813829786
  StdDev: 0.0009604458237714645

- Workers: 11
  Qty: 1880
  Min: 0.001662213
  Q1:  0.00232134425
  Med: 0.00281172
  Q3:  0.0031252265
  Max: 0.008985129
  Avg: 0.002875568825
  StdDev: 0.0008067738217647319

- Workers: 12
  Qty: 1880
  Min: 0.001654777
  Q1:  0.0024010327499999996
  Med: 0.00286444
  Q3:  0.0032087217500000004
  Max: 0.008042452
  Avg: 0.0029810785728723406
  StdDev: 0.0009001125875987497

- Workers: 13
  Qty: 1880
  Min: 0.001753488
  Q1:  0.00269221475
  Med: 0.0029310225
  Q3:  0.003244117
  Max: 0.008923116
  Avg: 0.003159238369680851
  StdDev: 0.0010739580608455101

- Workers: 14
  Qty: 1880
  Min: 0.001878984
  Q1:  0.00279802825
  Med: 0.0030029695
  Q3:  0.00335819025
  Max: 0.009186735
  Avg: 0.003239842380851064
  StdDev: 0.0010454811829535239

- Workers: 15
  Qty: 1880
  Min: 0.001881073
  Q1:  0.00289385625
  Med: 0.0030782995000000002
  Q3:  0.0033833510000000006
  Max: 0.009639233
  Avg: 0.0033117405170212766
  StdDev: 0.0009538718940503049

- Workers: 16
  Qty: 1880
  Min: 0.001758127
  Q1:  0.0029827680000000002
  Med: 0.0031780425
  Q3:  0.0036075400000000002
  Max: 0.00985623
  Avg: 0.0035179418771276597
  StdDev: 0.00106202138386058

- Workers: 17
  Qty: 1880
  Min: 0.001698211
  Q1:  0.00309828975
  Med: 0.0033287755
  Q3:  0.0037702165
  Max: 0.00956896
  Avg: 0.003620856455851064
  StdDev: 0.000950433325226505

- Workers: 18
  Qty: 1880
  Min: 0.001733253
  Q1:  0.00321529975
  Med: 0.0035158245
  Q3:  0.0040366855
  Max: 0.010607861
  Avg: 0.0038342121420212767
  StdDev: 0.0010568011060368978

- Workers: 19
  Qty: 1880
  Min: 0.001679107
  Q1:  0.0033450877499999997
  Med: 0.0037434225
  Q3:  0.0042829305
  Max: 0.010970623
  Avg: 0.004054301981914894
  StdDev: 0.0011403780273799113

- Workers: 20
  Qty: 1880
  Min: 0.001753481
  Q1:  0.0035197832499999997
  Med: 0.0039864215
  Q3:  0.00463226675
  Max: 0.01182554
  Avg: 0.004275437218085106
  StdDev: 0.0012002224551602373

- Workers: 21
  Qty: 1880
  Min: 0.00170803
  Q1:  0.0037598575
  Med: 0.004227930499999999
  Q3:  0.0048891495
  Max: 0.011399548
  Avg: 0.004492716238829787
  StdDev: 0.0012513384522316816

- Workers: 22
  Qty: 1880
  Min: 0.00184426
  Q1:  0.00383840575
  Med: 0.0044025755
  Q3:  0.00506287225
  Max: 0.012582417
  Avg: 0.004631916492553192
  StdDev: 0.0012674463353114294

- Workers: 23
  Qty: 1880
  Min: 0.001799269
  Q1:  0.00411439525
  Med: 0.0047580585000000005
  Q3:  0.00549100925
  Max: 0.0133126
  Avg: 0.005019725092553191
  StdDev: 0.0014947137003477953

- Workers: 24
  Qty: 1880
  Min: 0.001767635
  Q1:  0.004360623
  Med: 0.0051616675
  Q3:  0.00609550975
  Max: 0.021617576
  Avg: 0.005581546595212766
  StdDev: 0.001995528306263226

- Workers: 25
  Qty: 1880
  Min: 0.001696459
  Q1:  0.00432315025
  Med: 0.005074832499999999
  Q3:  0.0060352800000000005
  Max: 0.016029885
  Avg: 0.005448785329787234
  StdDev: 0.0018234227957253304

- Workers: 26
  Qty: 1880
  Min: 0.001738747
  Q1:  0.00450427125
  Med: 0.00537091
  Q3:  0.0063196929999999995
  Max: 0.015939861
  Avg: 0.0057183237005319154
  StdDev: 0.0019740240416537933

- Workers: 27
  Qty: 1880
  Min: 0.001740717
  Q1:  0.004693919749999999
  Med: 0.005576348
  Q3:  0.006506601250000001
  Max: 0.021465816
  Avg: 0.005882277896808511
  StdDev: 0.0019466468031291949

- Workers: 28
  Qty: 1880
  Min: 0.00176789
  Q1:  0.0048308875000000005
  Med: 0.005758545
  Q3:  0.006819377749999999
  Max: 0.015323992
  Avg: 0.006031835842021276
  StdDev: 0.001922897442174265

- Workers: 29
  Qty: 1880
  Min: 0.001696134
  Q1:  0.00486328175
  Med: 0.005882376999999999
  Q3:  0.007093499750000001
  Max: 0.01394583
  Avg: 0.006147665209042553
  StdDev: 0.0019238022404711732

- Workers: 30
  Qty: 1880
  Min: 0.001908575
  Q1:  0.0051919192500000004
  Med: 0.0061346535
  Q3:  0.007290996500000001
  Max: 0.01582853
  Avg: 0.006376735376595745
  StdDev: 0.0019019806167858422

- Workers: 31
  Qty: 1880
  Min: 0.001891961
  Q1:  0.005276604
  Med: 0.006309998000000001
  Q3:  0.00776528825
  Max: 1.161150329
  Avg: 0.02563116205425532
  StdDev: 0.14664862957074778

- Workers: 32
  Qty: 1880
  Min: 0.001629076
  Q1:  0.00513431675
  Med: 0.006151182
  Q3:  0.007334462
  Max: 0.014651206
  Avg: 0.006374087170212766
  StdDev: 0.001916222117081965

- Workers: 33
  Qty: 1880
  Min: 0.001858345
  Q1:  0.00533344625
  Med: 0.006365368
  Q3:  0.00757438675
  Max: 0.014003605
  Avg: 0.0065586491207446805
  StdDev: 0.0019488889528209194

- Workers: 34
  Qty: 1880
  Min: 0.001690778
  Q1:  0.00527693425
  Med: 0.006455962
  Q3:  0.00794573675
  Max: 0.014477802
  Avg: 0.00667195309468085
  StdDev: 0.0020118379439124324

- Workers: 35
  Qty: 1880
  Min: 0.001682999
  Q1:  0.0054949974999999995
  Med: 0.0066973365
  Q3:  0.0081161295
  Max: 0.014337319
  Avg: 0.006880291247872341
  StdDev: 0.0020261844816378692

- Workers: 36
  Qty: 1880
  Min: 0.001451109
  Q1:  0.005782872499999999
  Med: 0.006866042
  Q3:  0.00836848425
  Max: 0.015716399
  Avg: 0.007094151496808511
  StdDev: 0.002074590570273339

- Workers: 37
  Qty: 1880
  Min: 0.001468595
  Q1:  0.00572000025
  Med: 0.0071595165
  Q3:  0.008752421
  Max: 0.015978881
  Avg: 0.007264743660106383
  StdDev: 0.0021887910912431413

- Workers: 38
  Qty: 1880
  Min: 0.001503958
  Q1:  0.005949213
  Med: 0.0074373765000000005
  Q3:  0.00881852275
  Max: 0.015739134
  Avg: 0.007506660809574468
  StdDev: 0.0022989601972854966

- Workers: 39
  Qty: 1880
  Min: 0.001568519
  Q1:  0.00611301875
  Med: 0.007520967
  Q3:  0.00924958425
  Max: 0.0175203
  Avg: 0.0076871529292553194
  StdDev: 0.00237760051469743

- Workers: 40
  Qty: 1880
  Min: 0.00155
  Q1:  0.00613054425
  Med: 0.007776119
  Q3:  0.009510827500000001
  Max: 1.341618803
  Avg: 0.03601654922287234
  StdDev: 0.1913100155478866

- Workers: 41
  Qty: 1880
  Min: 0.001700998
  Q1:  0.00625324975
  Med: 0.007943563
  Q3:  0.009707434
  Max: 0.017197804
  Avg: 0.008015607229255319
  StdDev: 0.0025232677616616943

- Workers: 42
  Qty: 1880
  Min: 0.001610776
  Q1:  0.006439681
  Med: 0.0081648185
  Q3:  0.009795644999999999
  Max: 0.017197947
  Avg: 0.00819962987712766
  StdDev: 0.0025295954104922855

- Workers: 43
  Qty: 1880
  Min: 0.001788056
  Q1:  0.0066565795
  Med: 0.008349584
  Q3:  0.01030028425
  Max: 0.017926146
  Avg: 0.00848668465957447
  StdDev: 0.002687127405740674

- Workers: 44
  Qty: 1880
  Min: 0.0015614
  Q1:  0.00687974825
  Med: 0.008691787
  Q3:  0.010593080250000001
  Max: 0.01789089
  Avg: 0.008757082086702127
  StdDev: 0.0027471290834987358

- Workers: 45
  Qty: 1880
  Min: 0.001745542
  Q1:  0.00683847725
  Med: 0.008724561499999998
  Q3:  0.010632159
  Max: 0.017958523
  Avg: 0.008861858163829787
  StdDev: 0.0028737972963138346

- Workers: 46
  Qty: 1880
  Min: 0.001713159
  Q1:  0.00691737225
  Med: 0.009000072500000001
  Q3:  0.011032118250000002
  Max: 0.020088445
  Avg: 0.009054291571808511
  StdDev: 0.002923192755706995

- Workers: 47
  Qty: 1880
  Min: 0.001718873
  Q1:  0.007218515
  Med: 0.009135034
  Q3:  0.01111018125
  Max: 0.01960762
  Avg: 0.00922941064787234
  StdDev: 0.0029166310775548205

- Workers: 48
  Qty: 1880
  Min: 0.00177512
  Q1:  0.006972514000000001
  Med: 0.0091963275
  Q3:  0.01164765225
  Max: 0.019551087
  Avg: 0.009358817180319149
  StdDev: 0.0031469010699963773

- Workers: 49
  Qty: 1880
  Min: 0.001708601
  Q1:  0.00712719575
  Med: 0.009461824
  Q3:  0.011794431250000001
  Max: 0.018784388
  Avg: 0.009477265148404255
  StdDev: 0.003119737553731081

- Workers: 50
  Qty: 1880
  Min: 0.001716787
  Q1:  0.0072522212499999995
  Med: 0.009557792
  Q3:  0.01166624275
  Max: 0.0233712
  Avg: 0.009536564675000001
  StdDev: 0.0031595471975162073

Would you like me to provide the raw data, containing each of the individual request times?

@yarikoptic
Copy link
Member Author

here what I get for k=2

image

and smaug

image

assuming that excursion is noise (in particular for smaug at 15), it remains under 2, so in principle should still scale up although very slowly, and indeed that ratio plato is around around 12-15 in both cases, i.e. somewhere at 24-30 we reach so that ratio of response time at N becomes 1.8 of response time at N/2 parallel requests.

if zoom in at smaug one

image

we see that it pretty much grows linearly until then. So there is IMHO some inherent factor, possibly dependent on instance (what is # of processors/cores on heroku) but not clearly so, which just ceils scalability there.,

@yarikoptic
Copy link
Member Author

BTW -- while running on smaug what do you see for the CPU load on dandidav (since no actual download is involved I guess could differ from prior stated)?

@jwodder
Copy link
Member

jwodder commented May 31, 2024

@yarikoptic The load steadily increased as the number of workers increased, capping out at around 4.2 for the 5-second load average.

@yarikoptic
Copy link
Member Author

what about cpu%? ie is that task becoming somehow IO bound vs CPU?

@jwodder
Copy link
Member

jwodder commented Jun 3, 2024

@yarikoptic I ran watch -n1 'ps -q "$(pgrep -d, dandidav)" -o cuu,args' while running batchdav against dandidav on smaug, and within a few second the CPU percentage was at 99.999%, where it remained for the rest of batchdav's run.

@yarikoptic
Copy link
Member Author

so may be the ceiling is due to bottleneck in the implementation stack somewhere :-/ Are there convenient profiling tools in the rust land to see what it is mostly "thinking about"?

@jwodder
Copy link
Member

jwodder commented Jun 3, 2024

@yarikoptic There are a number of profilers. I'm starting out with flamegraph, but I'm getting the error:

Error:
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
access to performance monitoring and observability operations for processes
without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 3:
  -1: Allow use of (almost) all events by all users
      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access
>= 1: Disallow CPU event access
>= 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it
in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
failed to sample program

For the record, flamegraph's docs suggest setting perf_event_paranoid to -1.

@yarikoptic
Copy link
Member Author

yarikoptic commented Jun 3, 2024

Should be all set (but not for every user; and not persistent across reboots), you were added to the perf_users, so either re-login or start a new shell using newgrp perf_users.

@jwodder
Copy link
Member

jwodder commented Jun 6, 2024

@yarikoptic Side note: From the flamegraph, I've discovered that the functionality for logging how much memory dandidav uses (added for #118) is taking up a lot of time; when I remove the functionality, requests to dandidav speed up by an order of magnitude. However, response times still increase with the number of concurrent requests.

@jwodder
Copy link
Member

jwodder commented Jun 6, 2024

@yarikoptic I ended up posting a question about this on axum's repository: tokio-rs/axum#2772

@jwodder jwodder self-assigned this Jul 11, 2024
@jwodder jwodder added the high priority Work on these first label Jul 22, 2024
@jwodder jwodder removed this from the Current priorities milestone Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority Work on these first performance Efficient use of time and space
Projects
None yet
Development

No branches or pull requests

3 participants