Why should we use arena? #4327

little-bird-in-china · 2018-02-22T03:19:26Z

As far as I know, Arena is not a memory pool which can reuse allocated memory by maintain a freelist, it just cache more and more memory when create message with arena; Isn't google tcmalloc a better and straighter way to improve overall performance? I just want to take the advantage of network transmission with protobuf.

xfxyjwf · 2018-02-22T23:57:45Z

Right now using arena with opensource protobuf doesn't gain you much, but inside Google, we have seen massive improvement by adopting arena. I think protobuf arena has two advantages that tcmalloc can't offer:

The ability to deallocate a entire proto message tree in big chunks. With arena, it's possible to allocate everything in a proto message tree in one or several bulk chunks of memory. When you are done with the message, you just need to deallocate these few large chunks. Without arena, deleting a proto message tree will result in numerous small delete calls for every single small object a proto may hold. Basically we have the ability to skip all the desctructor calls with arena which we can't do otherwise.
Better locality. With protobuf arena, objects belong to the same proto message tree are put in adjacent memory whereas tcmalloc doesn't know whether an object is part of a proto message tree and is likely interleave protos with non-protos.

I think the most benefit we saw is from (1). This unfortunately isn't the case with opensource protobuf because all string fields are not allocated in the arena. Internally we have a hack to allocate something that looks like a string in the arena and cast it to a string with accessed, but that isn't portable. We also don't have ctype=STRING_PIECE support in opensource which can help with the issue. I know there are some users using arena with their own patch to implement ctype=STRING_PIECE. I don't think arena will can be widely used until we address the string issue.

little-bird-in-china · 2018-02-23T09:05:25Z

@xfxyjwf Thanks for you quickly reply, I still have two more question in my scene, a server holds ten thousands of tcp connection keeping alive with heartbeat package:

Should I set a threshold for each connection to control the overall memory usage? when the threshold reached, free all messages by reset the corresponding arena? or should I use arena in different way?
Since I only need to hold a few messages for each thread in memory, can I detach some out-of-date messages and reuse the memory they hold? so I needn't to request memory again from os.

xfxyjwf · 2018-02-23T19:43:40Z

The common patterns:

With arena: one arena for one message. Something like:

{
  proto2::Arena arena;
  unique_ptr<Foo> foo(Arena::CreateMessage<Foo>(&arena));
  foo->ParseFromString(data);
  ... use foo ...
  // arena is destructed
}

Without arena: reuse proto messages with a free-list.

Foo* foo = free_list_->Pop();
foo->ParseFromString(data);
... use foo ...
free_list_->Push(foo);

(1) works well if the message structure is complex. You can also fine-control the memory allocation using ArenaOptions. For example, you can provide an initial block so if the message fits into this block no memory allocation/deallocation will happen. However, as I mentioned, string fields won't be allocated on arena so it doesn't help if you have lots of string fields.

(2) is the most common pattern used before we have arena support. That's probably still true today. Protobuf objects have the property that proto.Clear() doesn't deallocate any memory but instead caches them for reuse. So if you reuse the same proto object, memory allocation will be kept minimum. Compared to arena, proto.Clear() still has a cost because it needs to traverse the entire message tree structure, but it's much better then deleting the proto object and therefore is used very widely. This is likely the best pattern for your use case as well. You can either use a global free list or per-thread free list. In its simplest form you can just reuse one single proto object again and again. There is one catch: because proto.Clear() doesn't deallocate memory, the memory usage of the reused proto will keep increasing. The reused proto basically allocates enough memory to accommodate every message parsed into it. For example, if one message uses repeated field "a" and another message uses repeated field "b", the reused proto will keep both. The more complex your message structure is, the faster the memory usage increases. For this reason the free-list implementation usually delete an object after a certain number of uses and newly allocated object will start to accumulate memory afresh.

little-bird-in-china · 2018-02-24T12:10:48Z

I think i got it.

ryanolson · 2018-08-12T01:57:10Z

@xfxyjwf

You mention strings not working great in arenas, but what about bytes. Bytes are pseudo strings, but since they don’t need to marshaled into some object, my assumption would be that arenas would be excellent for receiving bytes.

Especially if you wanted to receive these bytes directly into some special block of pinned memory, eg. cudaMallocHost memory using ArenaOptions.

Do arenas make sense for FlatBuffers? It seems like this might be the mechanism to do zero copy directly in and out of the memory blocks you reserve for messages.

xfxyjwf · 2018-08-12T05:19:37Z

@ryanolson In protobuf C++ API, string fields and bytes fields are both stored as std::string so the same issue applies: neither of them will be stored efficiently in protobuf arena. That can be solved by open-sourcing the zero copy support (see #1896), which includes StringPiece (basically std::string_view) support and that will allow a string or bytes field to alias memory in the arena directly.

0x007004 · 2018-10-15T09:59:42Z

@xfxyjwf hi , I have an problem about arena .
now protobuf-3.6.1 has support create string in arena , so about this advice "I don't think arena will can be widely used until we address the string issue"
now Can I use this version to improve performance.

sorry , my english is bad . thank you .
Looking forward to your reply.

acozzette · 2018-10-15T17:00:58Z

@ly82882592 No, we still do not yet have a solution for this unfortunately. We will probably need to introduce a string ctype based on std::string_view to be able to store string data directly on the arena.

0x007004 · 2018-10-16T02:44:10Z

@acozzette Oh ， thank you

liuzhijiang · 2022-01-19T03:06:45Z

Is arena-allocated strings class going to be included in official protobuf releases any time soon ?

aagor · 2024-11-22T13:54:20Z

@acozzette As protobuf supports features.(pb.cpp).string_type = VIEW now, are there any plans to support allocating string contents on the arena (and not only std::string on arena, contents on heap)?

With this, using protobuf without dynamic memory allocation should be possible.

acozzette · 2024-11-22T17:06:26Z

Yes, we do plan to have VIEW-type strings support arena allocation. We don't have a specific timeline for it, though.

xfxyjwf added question c++ labels Feb 22, 2018

little-bird-in-china closed this as completed Feb 24, 2018

wangyoucao577 mentioned this issue Jan 6, 2020

Refine Traffic Proxy GRPC Protocol Telenav/osrm-backend#127

Closed

osrf-migration mentioned this issue Apr 15, 2020

Consider flatbuffers gazebosim/gz-transport#23

Closed

take-cheeze mentioned this issue Apr 2, 2021

Enable protobuf C++ arena onnx/onnx#3359

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why should we use arena? #4327

Why should we use arena? #4327

little-bird-in-china commented Feb 22, 2018

xfxyjwf commented Feb 22, 2018

little-bird-in-china commented Feb 23, 2018 •

edited

Loading

xfxyjwf commented Feb 23, 2018

little-bird-in-china commented Feb 24, 2018

ryanolson commented Aug 12, 2018 •

edited

Loading

xfxyjwf commented Aug 12, 2018

0x007004 commented Oct 15, 2018

acozzette commented Oct 15, 2018

0x007004 commented Oct 16, 2018

liuzhijiang commented Jan 19, 2022

aagor commented Nov 22, 2024

acozzette commented Nov 22, 2024

Why should we use arena? #4327

Why should we use arena? #4327

Comments

little-bird-in-china commented Feb 22, 2018

xfxyjwf commented Feb 22, 2018

little-bird-in-china commented Feb 23, 2018 • edited Loading

xfxyjwf commented Feb 23, 2018

little-bird-in-china commented Feb 24, 2018

ryanolson commented Aug 12, 2018 • edited Loading

xfxyjwf commented Aug 12, 2018

0x007004 commented Oct 15, 2018

acozzette commented Oct 15, 2018

0x007004 commented Oct 16, 2018

liuzhijiang commented Jan 19, 2022

aagor commented Nov 22, 2024

acozzette commented Nov 22, 2024

little-bird-in-china commented Feb 23, 2018 •

edited

Loading

ryanolson commented Aug 12, 2018 •

edited

Loading