-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support neon, sse simd and dynamic dispatch #56
Conversation
This PR provides 2 ways to support multi-arch dispatch: dispatch at compile (static dispatch) and dispatch at runtime (dynamic dispatch). Dynamic dispatch is implemented by using gcc/clang multiversioning-functions, which causes these function cannot be inlined when compile and the performance will be worse. The structure of arch folder├── avx2
│ ├── base.h
│ ├── itoa.h
│ ├── quote.h
│ ├── simd.h
│ ├── skip.h
│ ├── str2int.h
│ └── unicode.h
├── common
│ ├── quote_common.h
│ ├── quote_tables.h
│ ├── skip_common.h
│ ├── unicode_common.h
│ └── x86_common
│ ├── itoa.h
│ ├── quote.inc.h
│ └── skip.inc.h
├── neon
│ ├── base.h
│ ├── itoa.h
│ ├── quote.h
│ ├── simd.h
│ ├── skip.h
│ ├── str2int.h
│ └── unicode.h
├── simd_base.h
├── simd_dispatch.h
├── simd_itoa.h
├── simd_quote.h
├── simd_skip.h
├── simd_str2int.h
├── sonic_cpu_feature.h
├── sse
│ ├── base.h
│ ├── itoa.h
│ ├── quote.h
│ ├── simd.h
│ ├── skip.h
│ ├── str2int.h
│ └── unicode.h
├── target_macro.h
└── x86_ifuncs
├── base.h
├── ifunc_macro.h
├── itoa.h
├── quote.h
├── skip.h
└── str2int.h How to add a new functionIf you want to add a new simd function which is called
namespace sonic_json {
namespace internal {
namespace avx2 {
void foo() { return; }
} // namespace avx2
} // namespace internal
} // namespace sonic_json
namespace sonic_json {
namespace internal {
__attribute__((target(HASWELL))) inline void foo() { return avx2::foo(); }
__attribute__((target(WESTMERE))) inline void foo() { return sse::foo(); }
__attribute__((target("default"))) inline void foo() { return sse::foo(); }
}
}
#pragma once
#include "simd_dispatch.h"
#include INCLUDE_ARCH_FILE(foo.h)
namespace sonic_json {
namespace internal {
SONIC_USING_ARCH_FUNC(foo);
}
} How to add a new architectureIf there is a new architecture named
#if defined(__Y86__)
#define SONIC_HAVE_Y86
#endif
#if defined(SONIC_STATIC_DISPATCH)
#if defined(SONIC_HAVE_Y86)
#define SONIC_USING_ARCH_FUNC(func) using Y86::func
#define INCLUDE_ARCH_FILE(file) SONIC_STRINGIFY(Y86/file)
#endif
#elif defined(SONIC_DYNAMIC_DISPATCH)
#if defined(SONIC_HAVE_Y86)
#define SONIC_USING_ARCH_FUNC(func)
#define INCLUDE_ARCH_FILE(file) SONIC_STRINGIFY(y86_ifuncs/file)
#endif
#endif
sonic 的多架构设计同时支持在编译期间选择指定的指令和在运行时根据运行的平台选择合适的指令。同时支持两种方式是因为在运行时抉择会让使用 simd 的函数/接口无法在编译期间 inline,这会引起一些性能下降。 arch 目录结构├── avx2
│ ├── base.h
│ ├── itoa.h
│ ├── quote.h
│ ├── simd.h
│ ├── skip.h
│ ├── str2int.h
│ └── unicode.h
├── common
│ ├── quote_common.h
│ ├── quote_tables.h
│ ├── skip_common.h
│ ├── unicode_common.h
│ └── x86_common
│ ├── itoa.h
│ ├── quote.inc.h
│ └── skip.inc.h
├── neon
│ ├── base.h
│ ├── itoa.h
│ ├── quote.h
│ ├── simd.h
│ ├── skip.h
│ ├── str2int.h
│ └── unicode.h
├── simd_base.h
├── simd_dispatch.h
├── simd_itoa.h
├── simd_quote.h
├── simd_skip.h
├── simd_str2int.h
├── sonic_cpu_feature.h
├── sse
│ ├── base.h
│ ├── itoa.h
│ ├── quote.h
│ ├── simd.h
│ ├── skip.h
│ ├── str2int.h
│ └── unicode.h
├── target_macro.h
└── x86_ifuncs
├── base.h
├── ifunc_macro.h
├── itoa.h
├── quote.h
├── skip.h
└── str2int.h avx2, sse, neon。特定架构下的 simd 实现代码 如何添加新的函数
namespace sonic_json {
namespace internal {
namespace avx2 {
void foo() { return; }
} // namespace avx2
} // namespace internal
} // namespace sonic_json
namespace sonic_json {
namespace internal {
__attribute__((target(HASWELL))) inline void foo() { return avx2::foo(); }
__attribute__((target(WESTMERE))) inline void foo() { return sse::foo(); }
__attribute__((target("default"))) inline void foo() { return sse::foo(); }
}
}
#pragma once
#include "simd_dispatch.h"
#include INCLUDE_ARCH_FILE(foo.h)
namespace sonic_json {
namespace internal {
SONIC_USING_ARCH_FUNC(foo);
}
} 如何添加新的架构假如有个新的架构叫Y86,需要在 sonic 中添加其 simd 支持,则:
#if defined(__Y86__)
#define SONIC_HAVE_Y86
#endif
#if defined(SONIC_STATIC_DISPATCH)
#if defined(SONIC_HAVE_Y86)
#define SONIC_USING_ARCH_FUNC(func) using Y86::func
#define INCLUDE_ARCH_FILE(file) SONIC_STRINGIFY(Y86/file)
#endif
#elif defined(SONIC_DYNAMIC_DISPATCH)
#if defined(SONIC_HAVE_Y86)
#define SONIC_USING_ARCH_FUNC(func)
#define INCLUDE_ARCH_FILE(file) SONIC_STRINGIFY(y86_ifuncs/file)
#endif
#endif
|
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## master #56 +/- ##
==========================================
+ Coverage 95.04% 95.88% +0.84%
==========================================
Files 22 21 -1
Lines 2785 2431 -354
==========================================
- Hits 2647 2331 -316
+ Misses 138 100 -38
... and 3 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Performance
|
|
||
using common::EqBytes4; | ||
using common::SkipLiteral; | ||
using sse::GetNextToken; // !!!Not efficency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个注释是什么原因
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetNextToken 是一个模板函数,没办法使用 multiversion function 的机制让编译器自动选择版本。这里在所有的架构下都选择了sse 版本。
@@ -479,7 +515,15 @@ struct simd256<int8_t> : num256<int8_t> { | |||
template <> | |||
struct simd256<uint8_t> : num256<uint8_t> { | |||
using Base = num256<uint8_t>; | |||
using Base::Base; | |||
// using Base::Base; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为啥不能复用,按理说O3 优化下,应该都能内联
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
和内联没关系。这里没有提供构造函数,所以构造函数是默认的,没有内联。这样会导致构造函数在非指定的架构下被应用,引发编译器报错。
最好分别贴下static 模式和 dynamic 模式下,目前分支和master分支的相对性能测试数据,这样应该更清楚一点 |
Updated. |
9c10f87
to
227105a
Compare
sum = sum * 10 + (c[i] - '0'); | ||
i++; | ||
} | ||
man_nd = i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为啥没有实现neon simd版本。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
暂时没想到在 neon 下怎么实现这个函数
|
||
__attribute__((target("default"))) inline uint8_t skip_space_safe( | ||
const uint8_t*, size_t&, size_t, size_t&, uint64_t&) { | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里返回0 可能会有问题吗?相当于fallback逻辑,然后在非west和 haswell下会执行到这里。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,在没有实现 fallback 的时候,这里使用 static assert 阻止编译比较好。
}) +\ | ||
select({ | ||
"static_dispatch": static_dispatch_copts, | ||
"dynamic_dispatch": dynamic_dispatch_copts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dynamic 模式下需要加 mavx2 编译options吗,理论上应该不需要了吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里没有指定必须使用 mavx2 编译选项。
Main changes