Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Cache the absolute map square position of the first submap #77668

Merged
merged 1 commit into from
Nov 9, 2024

Conversation

CLIDragon
Copy link
Contributor

@CLIDragon CLIDragon commented Nov 8, 2024

Summary

Performance "the absolute map square position of the first submap"

Purpose of change

bub_from_abs previously took ~3% of CPU time (when waiting in the refugee centre). It now takes ~0.2%.

Describe the solution

Cache the result of project_to and manually inline the tripoint subtraction. Manually inlining is surprisingly important. Compare the following assembly for un-inlined and inlined variants:

Non-Inlined

	sub	rsp, 24
; File point.h
; 145  :         return tripoint( x - rhs.x, y - rhs.y, z - rhs.z );
	mov	eax, DWORD PTR [r8]
	sub	eax, DWORD PTR [rcx+64]
; 136  :     constexpr tripoint( int X, int Y, int Z ) : x( X ), y( Y ), z( Z ) {}
	mov	DWORD PTR $T1[rsp], eax
; 145  :         return tripoint( x - rhs.x, y - rhs.y, z - rhs.z );
	mov	eax, DWORD PTR [r8+4]
	sub	eax, DWORD PTR [rcx+68]
; 136  :     constexpr tripoint( int X, int Y, int Z ) : x( X ), y( Y ), z( Z ) {}
	mov	DWORD PTR $T1[rsp+4], eax
; 145  :         return tripoint( x - rhs.x, y - rhs.y, z - rhs.z );
	mov	eax, DWORD PTR [r8+8]
	sub	eax, DWORD PTR [rcx+72]
; File coordinates.h
; 114  :         explicit constexpr coord_point_base( Ts &&... ts ) : raw_( std::forward<Ts>( ts )... ) {}
	movsd	xmm0, QWORD PTR $T1[rsp]
	movsd	QWORD PTR [rdx], xmm0
	mov	DWORD PTR [rdx+8], eax
; File map.cpp
; 10053:     return tripoint_bub_ms{ (p - abs_ms).raw() };
	mov	rax, rdx
; 10054: }
	add	rsp, 24
	ret	0

Inlined

; 10053:     return tripoint_bub_ms { p.x() - abs_ms.x(), p.y() - abs_ms.y(), p.z()};
	mov	eax, DWORD PTR [r8]
	mov	r9d, DWORD PTR [r8+4]
	sub	eax, DWORD PTR [rcx+64]
; File coordinates.h
; 114  :         explicit constexpr coord_point_base( Ts &&... ts ) : raw_( std::forward<Ts>( ts )... ) {}
	mov	r10d, DWORD PTR [r8+8]
; File map.cpp
; 10053:     return tripoint_bub_ms { p.x() - abs_ms.x(), p.y() - abs_ms.y(), p.z()};
	sub	r9d, DWORD PTR [rcx+68]
; File point.h
; 136  :     constexpr tripoint( int X, int Y, int Z ) : x( X ), y( Y ), z( Z ) {}
	mov	DWORD PTR [rdx], eax
; File map.cpp
; 10053:     return tripoint_bub_ms { p.x() - abs_ms.x(), p.y() - abs_ms.y(), p.z()};
	mov	rax, rdx
; File point.h
; 136  :     constexpr tripoint( int X, int Y, int Z ) : x( X ), y( Y ), z( Z ) {}
	mov	DWORD PTR [rdx+4], r9d
	mov	DWORD PTR [rdx+8], r10d
; File map.cpp
; 10054: }
	ret	0

Describe alternatives you've considered

None.

Testing

Nothing's visibly broken. This is a refactor before anything else so there shouldn't be any issues. abs_sub is only set through set_abs_sub so abs_ms and abs_sub should always be in sync.

Additional context

It should be possible to inline all tripoint operations automatically, perhaps using something like reinterpret_cast for construction.

@github-actions github-actions bot added [C++] Changes (can be) made in C++. Previously named `Code` Code: Performance Performance boosting code (CPU, memory, etc.) json-styled JSON lint passed, label assigned by github actions labels Nov 8, 2024
@CLIDragon CLIDragon force-pushed the map-get-abs-perf branch 2 times, most recently from fee4f96 to 5790e73 Compare November 8, 2024 04:33
@github-actions github-actions bot added the astyled astyled PR, label is assigned by github actions label Nov 8, 2024
@RenechCDDA
Copy link
Member

RenechCDDA commented Nov 8, 2024

If you're caching it and then only using the x and y coordinates you might as well store it as a point_abs_ms instead of a tripoint_abs_ms. This also emulates the old behavior exactly, just in case there should ever be a future case where abs_sub would ever be set to a non-0 z value.

In the case of a character waiting in the refugee centre, calls are in
the order of 10,000,000 each in-game hour.
@github-actions github-actions bot added the BasicBuildPassed This PR builds correctly, label assigned by github actions label Nov 8, 2024
@akrieger
Copy link
Member

akrieger commented Nov 9, 2024

Yeah ok I see what the asm is doing. For whatever reason the previous version was spilling intermediates to the stack, but the inlined version is stackless. I can't really explain it except maybe the optimizer thought using the sse mov to copy x+y from the stack was cheaper than two movs from individual registers (I have seen it aggressively optimize to shoving things into the wide registers in the past).|

I'm a little confused though because I'd love to see the asm before caching the call to project_to. While academically interesting I cannot imagine that the extra instructions and stack usage is measurably slower.

@akrieger akrieger merged commit b8793ea into CleverRaven:master Nov 9, 2024
20 of 26 checks passed
@CLIDragon
Copy link
Contributor Author

There was actually a measurable change between the inline and non-inline variants. However, most of the work done was in the project_to call, as you can see from the size of the asm.

sub     rsp, 40                             ; 00000028H
mov     eax, DWORD PTR [rcx+8]
mov     r10, rdx
mov     r9d, DWORD PTR [rcx+12]
mov     edx, DWORD PTR [r8]
lea     ecx, DWORD PTR [rax+rax*2]
shl     ecx, 2
lea     eax, DWORD PTR [r9+r9*2]
shl     eax, 2
sub     edx, ecx
mov     DWORD PTR $T5[rsp+4], eax
mov     eax, DWORD PTR [r8+4]
mov     DWORD PTR $T5[rsp], ecx
mov     rcx, QWORD PTR $T5[rsp]
shr     rcx, 32                             ; 00000020H
sub     eax, ecx
mov     DWORD PTR $T3[rsp], edx
mov     ecx, DWORD PTR [r8+8]
mov     DWORD PTR $T3[rsp+4], eax
mov     rax, QWORD PTR $T3[rsp]
shr     rax, 32                             ; 00000020H
mov     DWORD PTR $T4[rsp+4], eax
mov     rax, r10
mov     DWORD PTR $T4[rsp], edx
movsd   xmm0, QWORD PTR $T4[rsp]
movsd   QWORD PTR [r10], xmm0
mov     DWORD PTR [r10+8], ecx
add     rsp, 40                             ; 00000028H
ret     0

@CLIDragon CLIDragon deleted the map-get-abs-perf branch November 10, 2024 01:37
@xcdxbaxa9 xcdxbaxa9 mentioned this pull request Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
astyled astyled PR, label is assigned by github actions BasicBuildPassed This PR builds correctly, label assigned by github actions [C++] Changes (can be) made in C++. Previously named `Code` Code: Performance Performance boosting code (CPU, memory, etc.) json-styled JSON lint passed, label assigned by github actions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants