Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing performance issues and out-of-order packets #916

Merged
merged 11 commits into from
May 19, 2021
Merged

Fixing performance issues and out-of-order packets #916

merged 11 commits into from
May 19, 2021

Conversation

urbanishimwe
Copy link
Collaborator

@urbanishimwe urbanishimwe commented Apr 11, 2021

Reducing CPU context switching and number of goroutines.

Packet capture and packet processing now use only two goroutines which helps to minimize CPU context switches. Spawning too many goroutines is harmful here.

Optimized packet capture - allocated memory only when required, and only for data which is used

Using ZeroCopy methods from libpcap library to avoid unnecessary allocations. Now memory gets allocated ONLY for the valid packets, and only for the packets which have the data. E.g. no SYN/FIN packets are used now. Additionally we now use sync.Pool for re-using packet objects, which helps to re-use already allocated memory.

Simplification and optimization of request/response detection

There is no SYN/FIN packets anymore etc. Now only packet payload is used to detect start and end of the packet. More over payload detection now does not require generating a total “message” buffer, and works with individual packet payloads.

Message payloads now concatenated from packets only in the end when message is dispatched. Also, before checking if message is complete, added additional check if all received packets in the valid order, e.g. if their SEQ is valid, and no packets are missing.

Reworked chunked encoding validation, and now it does not need expensive operation of re-calculating all the chunks. Now it “trust” that client gives valid chunk body, check if packets are in the right order (e.g. SEQ match), and checks if message ends with the right suffix. All is done with 0 allocations.

Parsing all Headers using proto.GetHeaders was proved to be very slow. Now we only parse the headers we need(and do it only once).

Packets gets matched together using ACK, which on high RPS removed chances of duplicating IDs. Additionally, even if packets are received out of order, now it will properly sort them, before dispatching the message.

Changes in ID generation algorithm

Message ID generation and relations between request and response IDs is fully rewritten. Responses now do not have to lookup for request data in order to get the same ID. ID no rely on the fact that SEQ of the first packet of the response should be the same as ACK of the request. If previously Message ID contained random values, like current timestamp, now it has a consistent algorithm which is based on TCP stream id (SrcPort + DstPort + SrcIP/DstIP) and current ACK/SEQ number (to distinguish multiple messages within the same stream).

BPF filter optimizations

When tracking response it now uses a more accurate BPF rule to filter only needed traffic.

Misc

The packet code is now fully moved to tcp/Packet, so packet processing done only once in one place.

TCP output now has a 5 second timeout, and has a proper Close method.

Fully switching to go modules and removing vendoring.

tcp/tcp_message.go Outdated Show resolved Hide resolved
tcp/tcp_message.go Outdated Show resolved Hide resolved
@buger
Copy link
Owner

buger commented Apr 12, 2021

Since there are a lot of mutex usage now, maybe think about using https://github.com/cornelk/hashmap instead.

See the difference:

BenchmarkReadHashMapWithWritesUint-8      	  200000	      8395 ns/op
BenchmarkReadGoMapWithWritesUintMutex-8   	   10000	    143793 ns/op

@buger
Copy link
Owner

buger commented Apr 12, 2021

The main thing to ensure here is that packet "capture" happens in one thread, and packet "processing" in another.
And actually if we ensure that all packet processing is inside "single" goroutine, then we do not need to put mutexes around maps. And "slower" performance processing goroutine will not affect that much "capture" goroutine.

@urbanishimwe urbanishimwe requested a review from buger April 12, 2021 09:52
@urbanishimwe urbanishimwe force-pushed the tcp-timer branch 2 times, most recently from 00a86fd to 6ea7244 Compare April 25, 2021 19:04
}
}

func (parser *MessageParser) Close() error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method MessageParser.Close should have comment or be unexported

tcp/tcp_message.go Show resolved Hide resolved
tcp/tcp_message.go Show resolved Hide resolved
Copy link
Collaborator Author

@urbanishimwe urbanishimwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this naming makes more sense, thanks!

proto/proto.go Outdated Show resolved Hide resolved
proto/proto.go Outdated Show resolved Hide resolved
proto/proto.go Outdated Show resolved Hide resolved
proto/proto.go Outdated Show resolved Hide resolved
buger added 2 commits May 2, 2021 21:17
Too smart, and can cause to unexpected errors.
// Message is an interface used to provide feedback or store dummy data for future use
type Message interface {
// Message is an interface used to provide protocol state or store dummy data for future use
type ProtocolStateSetter interface {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this interface has both setter(SetProtocolState) and getter(ProtocolState), it may be called ProtocolState

buger added 2 commits May 3, 2021 14:34
proto/proto.go Outdated
@@ -19,6 +19,7 @@ package proto
import (
"bufio"
"bytes"
_ "fmt"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a blank import should be only in a main or test package, or have a comment justifying it

@buger buger changed the title global message timer and improved proto.HasFullPayload Fixing performance issues and out-of-order packets May 3, 2021
@@ -129,3 +143,7 @@ func (o *TCPOutput) connect(address string) (conn net.Conn, err error) {
func (o *TCPOutput) String() string {
return fmt.Sprintf("TCP output %s, limit: %d", o.address, o.limit)
}

func (o *TCPOutput) Close() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method TCPOutput.Close should have comment or be unexported

state.headerStart = MIMEHeadersStartPos(data)
if state.headerStart < 0 {
return false
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if block ends with a return statement, so drop this else and outdent its block

@@ -129,3 +143,7 @@ func (o *TCPOutput) connect(address string) (conn net.Conn, err error) {
func (o *TCPOutput) String() string {
return fmt.Sprintf("TCP output %s, limit: %d", o.address, o.limit)
}

func (o *TCPOutput) Close() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method TCPOutput.Close should have comment or be unexported

state.headerStart = MIMEHeadersStartPos(data)
if state.headerStart < 0 {
return false
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if block ends with a return statement, so drop this else and outdent its block

parser.Emit(m)
}

func (parser *MessageParser) Emit(m *Message) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method MessageParser.Emit should have comment or be unexported

@@ -117,171 +157,176 @@ func (m *Message) Sort() {
sort.SliceStable(m.packets, func(i, j int) bool { return m.packets[i].Seq < m.packets[j].Seq })
}

// Handler message handler
type Handler func(*Message)
func (m *Message) Finalize() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method Message.Finalize should have comment or be unexported

return false
}

func (m *Message) PacketData() [][]byte {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method Message.PacketData should have comment or be unexported

}

// Packets returns packets of the message
func (m *Message) Packets() []*Packet {
return m.packets
}

func (m *Message) MissingChunk() bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method Message.MissingChunk should have comment or be unexported

return
}

func (pckt *Packet) MessageID() uint64 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method Packet.MessageID should have comment or be unexported

@buger buger merged commit e74e945 into master May 19, 2021
@buger buger deleted the tcp-timer branch May 19, 2021 17:11
xingren23 referenced this pull request in xingren23/goreplay Sep 4, 2021
### Reducing CPU context switching and number of goroutines. 
Packet capture and packet processing now use only two goroutines which helps to minimize CPU context switches. Spawning too many goroutines is harmful here. 

### Optimized packet capture - allocated memory only when required, and only for data which is used
Using ZeroCopy methods from libpcap library to avoid unnecessary allocations. Now memory gets allocated ONLY for the valid packets, and only for the packets which have the data. E.g. no SYN/FIN packets are used now. Additionally we now use `sync.Pool` for re-using packet objects, which helps to re-use already allocated memory. 

### Simplification and optimization of request/response detection
There is no SYN/FIN packets anymore etc. Now only packet payload is used to detect start and end of the packet. More over payload detection now does not require generating a total “message” buffer, and works with individual packet payloads. 

Message payloads now concatenated from packets only in the end when message is dispatched. Also, before checking if message is complete, added additional check if all received packets in the valid order, e.g. if their SEQ is valid, and no packets are missing. 

Reworked chunked encoding validation, and now it does not need expensive operation of re-calculating all the chunks. Now it “trust” that client gives valid chunk body, check if packets are in the right order (e.g. SEQ match), and checks if message ends with the right suffix. All is done with 0 allocations. 

Parsing all Headers using `proto.GetHeaders` was proved to be very slow. Now we only parse the headers we need(and do it only once).

Packets gets matched together using ACK, which on high RPS removed chances of duplicating IDs. Additionally, even if packets are received out of order, now it will properly sort them, before dispatching the message.

### Changes in ID generation algorithm
Message ID generation and relations between request and response IDs is fully rewritten. Responses now do not have to lookup for request data in order to get the same ID. ID no rely on the fact that SEQ of the first packet of the response should be the same as ACK of the request. If previously Message ID contained random values, like current timestamp, now it has a consistent algorithm which is based on TCP stream id (SrcPort + DstPort + SrcIP/DstIP) and current ACK/SEQ number (to distinguish multiple messages within the same stream).

### BPF filter optimizations
When tracking response it now uses a more accurate BPF rule to filter only needed traffic. 

### Misc
The packet code is now fully moved to tcp/Packet, so packet processing done only once in one place.

TCP output now has a 5 second timeout, and has a proper Close method.

Fully switching to go modules and removing vendoring.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants