-
Notifications
You must be signed in to change notification settings - Fork 50
problem with build #24
Comments
It's look like that your compiler is missing _mm_testz_si128() and has a strange definition of _mm_blend_epi16(). |
Hi, |
So, I have tested on a clean ubuntu-server install (14.04.4).
Are you doing something different ? |
I copy/paste Your way (In my case there were only one difference: I downloaded source from dpdk.org, not from git) but problem remain the same.. :( : You are in 'detached HEAD' state. You can look around, make experimental If you want to create a new branch to retain commits you create, you may git checkout -b new_branch_name HEAD is now at a38e5ec... version: 2.2.0 Can You show from Your fresh ubuntu mine is: ii gcc 4:4.8.2-1ubuntu6 amd64 GNU C compiler |
Same as yours:
May you try to force the target machine in the dpdk .config ? Something like that should do the trick:
and then try to re-build dpdk and pktj. |
Hi same output:
|
If your CPU doesn't support the SSE4 instructions, _mm_testz_si128 and _mm_blend_epi16 probably won't be defined. Try 'grep sse4 /proc/cpuinfo'. If it doesn't output one or more lines with "flags" at the beginning that has sse4_1 and maybe sse4_2 somewhere in the list of flags, you need to build on a box with a newer CPU. |
And this is the case - I have :
So there is no chance to use packet-journey on this cpu ? |
You could replace those SSE4 intrinsic calls with a combination of SSE2 operations. But you're probably better off just trying to use a newer CPU. |
OK, thank You for answer. I'm testing DPDK on old computer with two gigabit ports, I will stay with this hardware. |
Some SSE4 ops here are mixed with unaligned loads and stores, so it might be something you want to replace with memcpy() anyway. The BGP filtering logic seems weird (mask with value) and unaligned too, I guess we can remove this _mm_testz_si128 for a more readable form reading header structs without performance penalty. For the memory aligned SSE4.1 blends in processx4_step3, I don't know. read/blend/store is neat but we could probably store 64+32 bits without reading the contents first. Do you need to overwrite 12 bytes of mac header ? Or is dst enough ? Did you try memcpy() already and had poor perfs ? Do we still have a lab to benchmark this ? |
Hi, |
@kalou :
For the story, the SSE calls were all aligned at the beginning and are mostly from the DPDK source code. We cannot keep them aligned because when we enabled some features (mostly related to the vlan handling by the nic in the qemu case) the alignment became broken, so we just added a 'u' in the call. We must make a new pass on those calls for checking if we can remove the unaligmnent. For the concern about the use of a memcpy() like version, I agree that with the current state of the code it may be won't make a that big difference because of the current state of processx4_step_checkneighbor() which has came from something simple to the big current mess with have now because of the additional features and is now the bottleneck. Anyway making a version without those SIMD calls hard-coded but switchable at compile time would be definitely something interesting, especially if we can have a memcpy() version and an ARM SIMD version. This comment include the calls in kni_rate_limit_step(), more to say about that function bellow.
So, my feeling is, yes, we must overwrite the src mac too. Even if in some case it won't be mandatory (for example for our current internal usage I believe it may be ok to not overwrite it, but I'm unsure) but in a more general scenario it will be required. In the more general case concerning performances, even if replacing all SIMD calls with memcpy() will be only a 5% drop of perf, those 5% represent a certain bunch of packets when you are at your maximum line rate so I feel that it would be better to keep the SIMD calls when possible.
Yep we have a new one, not currently setup for testing pktj but it would be easy to modify it. The old one on which I have made my tests was teared down for another usage. Now, if I would have some time, I would:
I hope to have some time (read here Gandi allocated time)soon to work on at least the first point and a little bit the second point and work on the other points during my free time (read here during the night , especially the ubpf thing). I will refresh the perf tests with the new lab when I will work back on pktj. I will publish also some tests without the SIMD calls at this occasion. |
Those lovely SIMD calls are mostly efficient in long, unrolled chunks of code where many operations are applied to intermediate values. The load/unaligned load/op/store case is probably not so much of a bargain when all you need is an unconditional store. The MAC question was whether DPDK or something else (something I understand poorly) will be rewriting the source mac or not - maybe only when it's zeroed out, maybe always. Simplifies writing memcpy(eth, mac, ETH_ALEN) - I'm such a lazy person. I agree this rate-limit should be a more configurable, ACL based one - only our corner case use justifies keeping this as-is here. I'd go for the ip->proto + tcp->port check in the meantime. |
@brushek you might want to try the https://github.com/Gandi/packet-journey/tree/devel/memcpy branch, that basically replaces SSE4.1 calls with plain C. @klnikita If you want to check this out and compare the perf, I'd be glad - I'll try to spawn a vm locally here now |
@kalou - thank You - it is building now on my old dell 1950 :) - I will test tomorrow. |
@kalou I misunderstood your MAC question sorry, you have to set the source mac yourself yes (so it's missing in your commit), the generic tx offload capabilities are listed here http://www.dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ethdev.h#n828 . I saw your memcpy branch, its look fine, you may have use http://www.dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ether.h#n254 for copying those MAC but let's try the rep movsb version first. @kalou @brushek I won't have the time to properly test the perf of the memcpy branch before a few days but I will be happy to make a release right now with the memcpy version being switchable at compile time. @kalou if you are ok with that I will push a small macro/ifdef commit in the memcpy branch. |
@klnikita thanks for the ether_addr_copy pointer, I think that's exactly what we needed here - I've pushed a new version. |
Sorry for the late reply, I have setup a new test lab, it is not exactly the same as the previous one since right now I'm using Intel 82599ES 10-Gigabit cards and not the xl710 40-Gigabit cards (I will try to put two servers with xl710 cards later this week). The version with ether_addr_copy() has almost the same results as the version with SIMD, we have less than 1% of performance loss. The tests were done with really few ACLs and routes in the LPM4/6 but the impact should be similar with both version if I will do a more realistic test. |
Hello,
I have following error messages during compilation of packet-journey:
CC main.o
/usr/src/packet-journey/app/main.c: In function 'kni_rate_limit_step':
/usr/src/packet-journey/app/main.c:544:3: error: implicit declaration of function '_mm_testz_si128' [-Werror=implicit-function-declaration]
if (_mm_testz_si128(data, data)) {
^
/usr/src/packet-journey/app/main.c:544:3: error: nested extern declaration of '_mm_testz_si128' [-Werror=nested-externs]
/usr/src/packet-journey/app/main.c: In function 'process_step3':
/usr/src/packet-journey/app/main.c:874:2: error: implicit declaration of function '_mm_blend_epi16' [-Werror=implicit-function-declaration]
te = _mm_blend_epi16(te, ve, MASK_ETH);
^
/usr/src/packet-journey/app/main.c:874:2: error: nested extern declaration of '_mm_blend_epi16' [-Werror=nested-externs]
/usr/src/packet-journey/app/main.c:874:5: error: incompatible types when assigning to type '__m128i' from type 'int'
te = _mm_blend_epi16(te, ve, MASK_ETH);
^
/usr/src/packet-journey/app/main.c: In function 'processx4_step3':
/usr/src/packet-journey/app/main.c:942:8: error: incompatible types when assigning to type '__m128i' from type 'int'
te[0] = _mm_blend_epi16(te[0], ve[0], MASK_ETH);
^
/usr/src/packet-journey/app/main.c:943:8: error: incompatible types when assigning to type '__m128i' from type 'int'
te[1] = _mm_blend_epi16(te[1], ve[1], MASK_ETH);
^
/usr/src/packet-journey/app/main.c:944:8: error: incompatible types when assigning to type '__m128i' from type 'int'
te[2] = _mm_blend_epi16(te[2], ve[2], MASK_ETH);
^
/usr/src/packet-journey/app/main.c:945:8: error: incompatible types when assigning to type '__m128i' from type 'int'
te[3] = _mm_blend_epi16(te[3], ve[3], MASK_ETH);
^
cc1: all warnings being treated as errors
make[2]: *** [main.o] Error 1
make[1]: *** [all] Error 2
make: *** [app] Error 2
Can You help ?
The text was updated successfully, but these errors were encountered: