Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object store - List bucket's content is very slow #4680

Closed
Stan-RED opened this issue Oct 19, 2023 · 10 comments · Fixed by #4712
Closed

Object store - List bucket's content is very slow #4680

Stan-RED opened this issue Oct 19, 2023 · 10 comments · Fixed by #4712
Labels
defect Suspected defect such as a bug or regression

Comments

@Stan-RED
Copy link

Observed behavior

Listing items in the specific bucket of the object store is very slow, even if there is only one item.

Expected behavior

Expected to get a list of the items faster.

Server and client version

nats-server 2.10.3
natscli 0.1.1 (that is recommended here https://docs.nats.io/nats-concepts/what-is-nats/walkthrough_setup)
NATS.Client for .NET 1.1.0

Host environment

Microsoft Windows NT 10.0.19045.0
CPU: AMD Ryzen 9 5950X
NVMe: ADATA SX8200PNP

[68776] 2023/10/19 13:47:51.238565 [INF] ---------------- JETSTREAM ----------------
[68776] 2023/10/19 13:47:51.238565 [INF]   Max Memory:      95.94 GB
[68776] 2023/10/19 13:47:51.238565 [INF]   Max Storage:     1.00 TB
[68776] 2023/10/19 13:47:51.238565 [INF]   Store Directory: "e:\Temp\nats\jetstream"
[68776] 2023/10/19 13:47:51.238565 [INF] -------------------------------------------

Steps to reproduce

I have created a bucket as in docs: nats obj add testbucket, and added a big file to it (~18G): nats obj add testbucket somefile.mp4. When trying to get the list of the objects in the bucket: nats obj ls testbucket I have received a message "nats: error: context deadline exceeded" with some delay.

In the server log I got this message:
[68776] 2023/10/19 14:48:10.303340 [WRN] Internal subscription on "$JS.API.CONSUMER.CREATE.OBJ_testbucket.32td3LZE.$O.testbucket.M.>" took too long: 6.8698428s

When upgraded timeout to 10s: /nats obj ls testbucket --timeout=10s it was ok.

I supposed that this is client-side problems and created a simple app using the latest NATS.Client for .NET 1.1.0:

using NATS.Client;
using NATS.Client.JetStream;
using NATS.Client.ObjectStore;

ConnectionFactory factory = new();
IConnection connection = factory.CreateConnection();

var jso = JetStreamOptions
    .Builder()
    .WithOptOut290ConsumerCreate(true)
    .Build()
;

IObjectStore os = connection.CreateObjectStoreContext("testbucket", ObjectStoreOptions.Builder(jso).Build());

DateTime start = DateTime.Now;
var response = os.GetList();
Console.WriteLine(DateTime.Now - start);

It was also failing with default timeout. After some tuning it started working and in a consequent 10 runs time to get the list with the single object there was varying from 3.5s to 5.0s. A bit better but anyway very slow. Adding the second 14G file push this delay to 7.5+ seconds, another 8G file added one more second to this process and so on.

At the same time getting files by name is quick. It starts to download immediately.

As I see, the name of the file is added to the end of the stream (the last block file). I don't know yet how the storage is organized internally , but if it is using streams, maybe it would make sense to separate blob stream from the stream with meta-information to remove this functional dependency on the blob size? Because small objects works much faster.

@Stan-RED Stan-RED added the defect Suspected defect such as a bug or regression label Oct 19, 2023
@derekcollison
Copy link
Member

derekcollison commented Oct 19, 2023

Do you see same slow behavior with the NATS cli?

@Stan-RED
Copy link
Author

Do you see same slow behavior with the NATS cli?

Is there any other CLI different from the one mentioned in the documentation - https://docs.nats.io/nats-concepts/what-is-nats/walkthrough_setup (https://github.com/nats-io/natscli/releases) and I have used?

@derekcollison
Copy link
Member

That's the one.

@Stan-RED
Copy link
Author

That's the one.

Then yes. The slow behavior have been started from the CLI tool. As I mentioned in the beginning of the "Steps to reproduce" there was error: "nats: error: context deadline exceeded" when trying to list objects with nats obj ls testbucket CLI command and in log was:

[68776] 2023/10/19 14:48:10.303340 [WRN] Internal subscription on
"$JS.API.CONSUMER.CREATE.OBJ_testbucket.32td3LZE.$O.testbucket.M.>" took too long: 6.8698428s

So almost 7s for the single file in the bucket.

@derekcollison
Copy link
Member

Would you be willing to create a stream snapshot of the ObjectStore and share with the Synadia team?

@Stan-RED
Copy link
Author

Would you be willing to create a stream snapshot of the ObjectStore and share with the Synadia team?

How to do this? And does it make sense to upload huge files somewhere if this behavior is reproducible? I have taken for example a Fedora Rawhide distro (previously it was tested in Windows 10) with the latest NATS release:

[stan@fedora ~]$ dnf info nats-server
...
Name         : nats-server
Version      : 2.10.3
Release      : 1.fc40
Architecture : x86_64
Size         : 14 M
Source       : nats-server-2.10.3-1.fc40.src.rpm

Then I took a CentOS distro (12G) for example and put it into the object store bucket in the same way:

$ curl http://centos.uni-sofia.bg/centos/8-stream/isos/x86_64/CentOS-Stream-8-x86_64-latest-dvd1.iso --output centos1.iso
$ nats obj add testbucket
$ nats obj put testbucket /home/stan/centos1.iso
$ nats obj ls testbucket
nats: error: context deadline exceeded

This is a log from the server with more details.

[1634489] 2023/10/23 11:55:44.083596 [INF]   Version:  2.10.3
[1634489] 2023/10/23 11:55:44.083600 [INF]   Git:      [not set]
[1634489] 2023/10/23 11:55:44.083603 [INF]   Name:     NAEMCRCEOGWRPQXGXOFJAW5PL6YBGEZNWOSHI4NPBPQW54DCJX352ISG
[1634489] 2023/10/23 11:55:44.083610 [INF]   Node:     u17m6wzJ
[1634489] 2023/10/23 11:55:44.083613 [INF]   ID:       NAEMCRCEOGWRPQXGXOFJAW5PL6YBGEZNWOSHI4NPBPQW54DCJX352ISG
[1634489] 2023/10/23 11:55:44.084051 [INF] Starting JetStream
[1634489] 2023/10/23 11:55:44.084264 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[1634489] 2023/10/23 11:55:44.084268 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[1634489] 2023/10/23 11:55:44.084270 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[1634489] 2023/10/23 11:55:44.084272 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[1634489] 2023/10/23 11:55:44.084274 [INF]
[1634489] 2023/10/23 11:55:44.084276 [INF]          https://docs.nats.io/jetstream
[1634489] 2023/10/23 11:55:44.084278 [INF]
[1634489] 2023/10/23 11:55:44.084280 [INF] ---------------- JETSTREAM ----------------
[1634489] 2023/10/23 11:55:44.084283 [INF]   Max Memory:      93.92 GB
[1634489] 2023/10/23 11:55:44.084285 [INF]   Max Storage:     46.96 GB
[1634489] 2023/10/23 11:55:44.084287 [INF]   Store Directory: "/tmp/nats/jetstream"
[1634489] 2023/10/23 11:55:44.084290 [INF] -------------------------------------------
[1634489] 2023/10/23 11:55:44.084591 [INF]   Starting restore for stream '$G > OBJ_tb1'
[1634489] 2023/10/23 11:55:44.084798 [INF]   Restored 0 messages for stream '$G > OBJ_tb1' in 0s
[1634489] 2023/10/23 11:55:44.085284 [INF] Listening for client connections on 0.0.0.0:4222
[1634489] 2023/10/23 11:55:44.086110 [INF] Server is ready
[1634489] 2023/10/23 11:56:40.871630 [WRN] Internal subscription on "$JS.API.CONSUMER.CREATE.OBJ_tb1.7VM0i7b5.$O.tb1.M.>" took too long: 5.170127768s
[1634489] 2023/10/23 11:56:40.871648 [WRN] Internal subscription on "$JS.API.CONSUMER.CREATE.OBJ_tb1.7VM0i7b5.$O.tb1.M.>" took too long: 5.170148617s
[1634489] 2023/10/23 11:56:40.871652 [WRN] Internal subscription on "$JS.API.CONSUMER.CREATE.OBJ_tb1.7VM0i7b5.$O.tb1.M.>" took too long: 5.170164157s
[1634489] 2023/10/23 11:56:40.871661 [WRN] 127.0.0.1:52042 - cid:9 - Readloop processing time: 5.170173817s

@derekcollison
Copy link
Member

nats stream backup

Just need to use full stream name, which you can see via nats stream ls -a

@derekcollison
Copy link
Member

If we can reproduce with your snapshot on our side, that would be very helpful. Thanks.

@derekcollison
Copy link
Member

You are correct, we can upload any large asset ourselves, will attempt to follow your steps and reproduce. Thanks for your patience and apologies for not reading more thoroughly.

@derekcollison
Copy link
Member

I can see the behavior here locally. Will keep you posted. And again thanks for the patience.

wallyqs added a commit that referenced this issue Oct 26, 2023
…mber for deliver last by subject (#4712)

Previously we would need to move through all msg blocks and load all of
the blocks into memory. We optimized this regardless to flush the cache
if we loaded it like other places in the code, but for this specifically
we load subject's totals which is already in memory and for each
matching subject we now simply load the last message for that subject
and add the sequence number to the skip list.

This drastically improves consumers that want last per subject, like KV
watchers and object store lists.

Signed-off-by: Derek Collison <[email protected]>

Resolves: #4680
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
2 participants