Skip to content
This repository has been archived by the owner on Mar 26, 2021. It is now read-only.

problem with build #24

Open
brushek opened this issue Mar 7, 2016 · 19 comments
Open

problem with build #24

brushek opened this issue Mar 7, 2016 · 19 comments

Comments

@brushek
Copy link

brushek commented Mar 7, 2016

Hello,

I have following error messages during compilation of packet-journey:
CC main.o
/usr/src/packet-journey/app/main.c: In function 'kni_rate_limit_step':
/usr/src/packet-journey/app/main.c:544:3: error: implicit declaration of function '_mm_testz_si128' [-Werror=implicit-function-declaration]
if (_mm_testz_si128(data, data)) {
^
/usr/src/packet-journey/app/main.c:544:3: error: nested extern declaration of '_mm_testz_si128' [-Werror=nested-externs]
/usr/src/packet-journey/app/main.c: In function 'process_step3':
/usr/src/packet-journey/app/main.c:874:2: error: implicit declaration of function '_mm_blend_epi16' [-Werror=implicit-function-declaration]
te = _mm_blend_epi16(te, ve, MASK_ETH);
^
/usr/src/packet-journey/app/main.c:874:2: error: nested extern declaration of '_mm_blend_epi16' [-Werror=nested-externs]
/usr/src/packet-journey/app/main.c:874:5: error: incompatible types when assigning to type '__m128i' from type 'int'
te = _mm_blend_epi16(te, ve, MASK_ETH);
^
/usr/src/packet-journey/app/main.c: In function 'processx4_step3':
/usr/src/packet-journey/app/main.c:942:8: error: incompatible types when assigning to type '__m128i' from type 'int'
te[0] = _mm_blend_epi16(te[0], ve[0], MASK_ETH);
^
/usr/src/packet-journey/app/main.c:943:8: error: incompatible types when assigning to type '__m128i' from type 'int'
te[1] = _mm_blend_epi16(te[1], ve[1], MASK_ETH);
^
/usr/src/packet-journey/app/main.c:944:8: error: incompatible types when assigning to type '__m128i' from type 'int'
te[2] = _mm_blend_epi16(te[2], ve[2], MASK_ETH);
^
/usr/src/packet-journey/app/main.c:945:8: error: incompatible types when assigning to type '__m128i' from type 'int'
te[3] = _mm_blend_epi16(te[3], ve[3], MASK_ETH);
^
cc1: all warnings being treated as errors
make[2]: *** [main.o] Error 1
make[1]: *** [all] Error 2
make: *** [app] Error 2

Can You help ?

@klnikita
Copy link
Contributor

klnikita commented Mar 7, 2016

It's look like that your compiler is missing _mm_testz_si128() and has a strange definition of _mm_blend_epi16().
Which compiler are you using ?

@brushek
Copy link
Author

brushek commented Mar 7, 2016

Hi,
I have gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.1)

@klnikita
Copy link
Contributor

klnikita commented Mar 9, 2016

So, I have tested on a clean ubuntu-server install (14.04.4).
I have exactly the same version of gcc as you but I'm unable to reproduce your issue.
I have done:

git clone git://dpdk.org/dpdk ; cd dpdk ; git checkout v2.2.0 ; make config T=x86_64-native-linuxapp-gcc ; make
export RTE_SDK=~/dpdk
export RTE_TARGET=build
git clone https://github.com/Gandi/packet-journey.git ; cd packet-journey ; make

Are you doing something different ?

@brushek
Copy link
Author

brushek commented Mar 9, 2016

I copy/paste Your way (In my case there were only one difference: I downloaded source from dpdk.org, not from git) but problem remain the same.. :( :
brushek@rtr-krk-new:$ rm -rf dpdk/
brushek@rtr-krk-new:
$ ls
brushek@rtr-krk-new:$ ls
brushek@rtr-krk-new:
$ git clone git://dpdk.org/dpdk ; cd dpdk ; git checkout v2.2.0 ; make config T=x86_64-native-linuxapp-gcc ; make;export RTE_SDK=~/dpdk;export RTE_TARGET=build;git clone https://github.com/Gandi/packet-journey.git ; cd packet-journey ; make
Cloning into 'dpdk'...
remote: Counting objects: 39236, done.
remote: Compressing objects: 100% (9261/9261), done.
remote: Total 39236 (delta 30921), reused 37603 (delta 29719)
Receiving objects: 100% (39236/39236), 23.77 MiB | 6.64 MiB/s, done.
Resolving deltas: 100% (30921/30921), done.
Checking connectivity... done.
Note: checking out 'v2.2.0'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

git checkout -b new_branch_name

HEAD is now at a38e5ec... version: 2.2.0
Configuration done
== Build lib
== Build lib/librte_compat
SYMLINK-FILE include/rte_compat.h
== Build lib/librte_eal
== Build lib/librte_eal/common
SYMLINK-FILE include/generic/rte_atomic.h
[....]
CC main.o
LD dpdk_proc_info
INSTALL-APP dpdk_proc_info
INSTALL-MAP dpdk_proc_info.map
Build complete [build]
Cloning into 'packet-journey'...
remote: Counting objects: 2525, done.
remote: Total 2525 (delta 0), reused 0 (delta 0), pack-reused 2525
Receiving objects: 100% (2525/2525), 1.50 MiB | 662.00 KiB/s, done.
Resolving deltas: 100% (1719/1719), done.
Checking connectivity... done.
== lib
== libneighbour
CC neighbour.o
AR libneighbour.a
SYMLINK-FILE include/libneighbour.h
INSTALL-LIB libneighbour.a
== libnetlink
CC netlink.o
AR libnetlink.a
SYMLINK-FILE include/libnetlink.h
INSTALL-LIB libnetlink.a
== app
CC main.o
/home/brushek/dpdk/packet-journey/app/main.c: In function ‘kni_rate_limit_step’:
/home/brushek/dpdk/packet-journey/app/main.c:544:3: error: implicit declaration of function ‘_mm_testz_si128’ [-Werror=implicit-function-declaration]
if (_mm_testz_si128(data, data)) {
^
/home/brushek/dpdk/packet-journey/app/main.c:544:3: error: nested extern declaration of ‘_mm_testz_si128’ [-Werror=nested-externs]
/home/brushek/dpdk/packet-journey/app/main.c: In function ‘process_step3’:
/home/brushek/dpdk/packet-journey/app/main.c:874:2: error: implicit declaration of function ‘_mm_blend_epi16’ [-Werror=implicit-function-declaration]
te = _mm_blend_epi16(te, ve, MASK_ETH);
^
/home/brushek/dpdk/packet-journey/app/main.c:874:2: error: nested extern declaration of ‘_mm_blend_epi16’ [-Werror=nested-externs]
/home/brushek/dpdk/packet-journey/app/main.c:874:5: error: incompatible types when assigning to type ‘__m128i’ from type ‘int’
te = _mm_blend_epi16(te, ve, MASK_ETH);
^
/home/brushek/dpdk/packet-journey/app/main.c: In function ‘processx4_step3’:
/home/brushek/dpdk/packet-journey/app/main.c:942:8: error: incompatible types when assigning to type ‘__m128i’ from type ‘int’
te[0] = _mm_blend_epi16(te[0], ve[0], MASK_ETH);
^
/home/brushek/dpdk/packet-journey/app/main.c:943:8: error: incompatible types when assigning to type ‘__m128i’ from type ‘int’
te[1] = _mm_blend_epi16(te[1], ve[1], MASK_ETH);
^
/home/brushek/dpdk/packet-journey/app/main.c:944:8: error: incompatible types when assigning to type ‘__m128i’ from type ‘int’
te[2] = _mm_blend_epi16(te[2], ve[2], MASK_ETH);
^
/home/brushek/dpdk/packet-journey/app/main.c:945:8: error: incompatible types when assigning to type ‘__m128i’ from type ‘int’
te[3] = _mm_blend_epi16(te[3], ve[3], MASK_ETH);
^
cc1: all warnings being treated as errors
make[2]: *** [main.o] Error 1
make[1]: *** [all] Error 2
make: *** [app] Error 2

Can You show from Your fresh ubuntu
dpkg -l | grep gcc ?

mine is:

ii gcc 4:4.8.2-1ubuntu6 amd64 GNU C compiler
ii gcc-4.8 4.8.4-2ubuntu114.04.1 amd64 GNU C compiler
ii gcc-4.8-base:amd64 4.8.4-2ubuntu1
14.04.1 amd64 GCC, the GNU Compiler Collection (base package)
ii gcc-4.9-base:amd64 4.9.3-0ubuntu4 amd64 GCC, the GNU Compiler Collection (base package)
ii libgcc-4.8-dev:amd64 4.8.4-2ubuntu1~14.04.1 amd64 GCC support library (development files)
ii libgcc1:amd64 1:4.9.3-0ubuntu4 amd64 GCC support library

@klnikita
Copy link
Contributor

klnikita commented Mar 9, 2016

Same as yours:

nikita@ubuntu:~$ dpkg -l | grep gcc
ii  gcc                                4:4.8.2-1ubuntu6                 amd64        GNU C compiler
ii  gcc-4.8                            4.8.4-2ubuntu1~14.04.1           amd64        GNU C compiler
ii  gcc-4.8-base:amd64                 4.8.4-2ubuntu1~14.04.1           amd64        GCC, the GNU Compiler Collection (base package)
ii  gcc-4.9-base:amd64                 4.9.3-0ubuntu4                   amd64        GCC, the GNU Compiler Collection (base package)
ii  libgcc-4.8-dev:amd64               4.8.4-2ubuntu1~14.04.1           amd64        GCC support library (development files)
ii  libgcc1:amd64                      1:4.9.3-0ubuntu4                 amd64        GCC support library

May you try to force the target machine in the dpdk .config ? Something like that should do the trick:

sed -ri -e 's,(RTE_MACHINE=).*,\1"default",' dpdk/build/.config

and then try to re-build dpdk and pktj.

@brushek
Copy link
Author

brushek commented Mar 9, 2016

Hi same output:


brushek@rtr-krk-new:~$ sed -ri -e 's,(RTE_MACHINE=).*,\1"default",' dpdk/build/.config
brushek@rtr-krk-new:~$ cd dpdk/
brushek@rtr-krk-new:~/dpdk$ make
== Build lib
== Build lib/librte_compat
[...]
Build complete [build]
brushek@rtr-krk-new:~/dpdk$ cd packet-journey/
brushek@rtr-krk-new:~/dpdk/packet-journey$ make
== lib
== libneighbour
  CC neighbour.o
  AR libneighbour.a
  INSTALL-LIB libneighbour.a
== libnetlink
  CC netlink.o
  AR libnetlink.a
  INSTALL-LIB libnetlink.a
== app
  CC main.o
/home/brushek/dpdk/packet-journey/app/main.c: In function ‘kni_rate_limit_step’:
/home/brushek/dpdk/packet-journey/app/main.c:544:3: error: implicit declaration of function ‘_mm_testz_si128’ [-Werror=implicit-function-declaration]
   if (_mm_testz_si128(data, data)) {
   ^
/home/brushek/dpdk/packet-journey/app/main.c:544:3: error: nested extern declaration of ‘_mm_testz_si128’ [-Werror=nested-externs]
/home/brushek/dpdk/packet-journey/app/main.c: In function ‘process_step3’:
/home/brushek/dpdk/packet-journey/app/main.c:874:2: error: implicit declaration of function ‘_mm_blend_epi16’ [-Werror=implicit-function-declaration]
  te = _mm_blend_epi16(te, ve, MASK_ETH);
  ^
/home/brushek/dpdk/packet-journey/app/main.c:874:2: error: nested extern declaration of ‘_mm_blend_epi16’ [-Werror=nested-externs]
/home/brushek/dpdk/packet-journey/app/main.c:874:5: error: incompatible types when assigning to type ‘__m128i’ from type ‘int’
  te = _mm_blend_epi16(te, ve, MASK_ETH);
     ^
/home/brushek/dpdk/packet-journey/app/main.c: In function ‘processx4_step3’:
/home/brushek/dpdk/packet-journey/app/main.c:942:8: error: incompatible types when assigning to type ‘__m128i’ from type ‘int’
  te[0] = _mm_blend_epi16(te[0], ve[0], MASK_ETH);
        ^
/home/brushek/dpdk/packet-journey/app/main.c:943:8: error: incompatible types when assigning to type ‘__m128i’ from type ‘int’
  te[1] = _mm_blend_epi16(te[1], ve[1], MASK_ETH);
        ^
/home/brushek/dpdk/packet-journey/app/main.c:944:8: error: incompatible types when assigning to type ‘__m128i’ from type ‘int’
  te[2] = _mm_blend_epi16(te[2], ve[2], MASK_ETH);
        ^
/home/brushek/dpdk/packet-journey/app/main.c:945:8: error: incompatible types when assigning to type ‘__m128i’ from type ‘int’
  te[3] = _mm_blend_epi16(te[3], ve[3], MASK_ETH);
        ^

@mgsmith1000
Copy link

If your CPU doesn't support the SSE4 instructions, _mm_testz_si128 and _mm_blend_epi16 probably won't be defined.

Try 'grep sse4 /proc/cpuinfo'. If it doesn't output one or more lines with "flags" at the beginning that has sse4_1 and maybe sse4_2 somewhere in the list of flags, you need to build on a box with a newer CPU.

@brushek
Copy link
Author

brushek commented Mar 9, 2016

And this is the case - I have :

model name : Intel(R) Xeon(R) CPU 5130 @ 2.00GHz

So there is no chance to use packet-journey on this cpu ?

@mgsmith1000
Copy link

You could replace those SSE4 intrinsic calls with a combination of SSE2 operations. But you're probably better off just trying to use a newer CPU.

@brushek
Copy link
Author

brushek commented Mar 10, 2016

OK, thank You for answer. I'm testing DPDK on old computer with two gigabit ports, I will stay with this hardware.

@kalou
Copy link

kalou commented Mar 11, 2016

Some SSE4 ops here are mixed with unaligned loads and stores, so it might be something you want to replace with memcpy() anyway. The BGP filtering logic seems weird (mask with value) and unaligned too, I guess we can remove this _mm_testz_si128 for a more readable form reading header structs without performance penalty.

For the memory aligned SSE4.1 blends in processx4_step3, I don't know. read/blend/store is neat but we could probably store 64+32 bits without reading the contents first.

Do you need to overwrite 12 bytes of mac header ? Or is dst enough ? Did you try memcpy() already and had poor perfs ?

Do we still have a lab to benchmark this ?

@brushek
Copy link
Author

brushek commented Mar 11, 2016

Hi,
I can this server hold to run later - I'm trying to make ruter with bird (for routing bgp ospf, ospfv3) and zebra (only zebra) for setting IPs. Now I just run KNI and set all on top of it. If I could try your project, I can test Your changes in code, but I'm not so familiar with this kind of programming in C.

@klnikita
Copy link
Contributor

@kalou :

Some SSE4 ops here are mixed with unaligned loads and stores, so it might be something you want to replace with memcpy() anyway. The BGP filtering logic seems weird (mask with value) and unaligned too, I guess we can remove this _mm_testz_si128 for a more readable form reading header structs without performance penalty.

For the story, the SSE calls were all aligned at the beginning and are mostly from the DPDK source code. We cannot keep them aligned because when we enabled some features (mostly related to the vlan handling by the nic in the qemu case) the alignment became broken, so we just added a 'u' in the call. We must make a new pass on those calls for checking if we can remove the unaligmnent.

For the concern about the use of a memcpy() like version, I agree that with the current state of the code it may be won't make a that big difference because of the current state of processx4_step_checkneighbor() which has came from something simple to the big current mess with have now because of the additional features and is now the bottleneck.

Anyway making a version without those SIMD calls hard-coded but switchable at compile time would be definitely something interesting, especially if we can have a memcpy() version and an ARM SIMD version. This comment include the calls in kni_rate_limit_step(), more to say about that function bellow.

For the memory aligned SSE4.1 blends in processx4_step3, I don't know. read/blend/store is neat but we could probably store 64+32 bits without reading the contents first.

Do you need to overwrite 12 bytes of mac header ? Or is dst enough ? Did you try memcpy() already and had poor perfs ?

So, my feeling is, yes, we must overwrite the src mac too. Even if in some case it won't be mandatory (for example for our current internal usage I believe it may be ok to not overwrite it, but I'm unsure) but in a more general scenario it will be required.
Concerning the perf test with a memcpy() version, as I say previously, it would be interesting to propose that version too.

In the more general case concerning performances, even if replacing all SIMD calls with memcpy() will be only a 5% drop of perf, those 5% represent a certain bunch of packets when you are at your maximum line rate so I feel that it would be better to keep the SIMD calls when possible.

Do we still have a lab to benchmark this ?

Yep we have a new one, not currently setup for testing pktj but it would be easy to modify it. The old one on which I have made my tests was teared down for another usage.

Now, if I would have some time, I would:

  • refactor processx4_step_checkneighbor() into smaller steps for removing some branches and make it a little bit more vectorized (yep, more of those unreadable SIMD calls, I know your feeling)
  • rewrite entirely the kni_rate_limit_step() function with something more flexible (ideally with something like eBPF, I was planning to try https://github.com/rlane/ubpf)
  • make the SIMD calls switchable at compile time (some macros).

I hope to have some time (read here Gandi allocated time)soon to work on at least the first point and a little bit the second point and work on the other points during my free time (read here during the night , especially the ubpf thing).

I will refresh the perf tests with the new lab when I will work back on pktj. I will publish also some tests without the SIMD calls at this occasion.

@kalou
Copy link

kalou commented Mar 11, 2016

Those lovely SIMD calls are mostly efficient in long, unrolled chunks of code where many operations are applied to intermediate values. The load/unaligned load/op/store case is probably not so much of a bargain when all you need is an unconditional store.

The MAC question was whether DPDK or something else (something I understand poorly) will be rewriting the source mac or not - maybe only when it's zeroed out, maybe always. Simplifies writing memcpy(eth, mac, ETH_ALEN) - I'm such a lazy person.

I agree this rate-limit should be a more configurable, ACL based one - only our corner case use justifies keeping this as-is here. I'd go for the ip->proto + tcp->port check in the meantime.

@kalou
Copy link

kalou commented Mar 11, 2016

@brushek you might want to try the https://github.com/Gandi/packet-journey/tree/devel/memcpy branch, that basically replaces SSE4.1 calls with plain C.

@klnikita If you want to check this out and compare the perf, I'd be glad - I'll try to spawn a vm locally here now

@brushek
Copy link
Author

brushek commented Mar 12, 2016

@kalou - thank You - it is building now on my old dell 1950 :) - I will test tomorrow.

@klnikita
Copy link
Contributor

@kalou I misunderstood your MAC question sorry, you have to set the source mac yourself yes (so it's missing in your commit), the generic tx offload capabilities are listed here http://www.dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ethdev.h#n828 . I saw your memcpy branch, its look fine, you may have use http://www.dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ether.h#n254 for copying those MAC but let's try the rep movsb version first.

@kalou @brushek I won't have the time to properly test the perf of the memcpy branch before a few days but I will be happy to make a release right now with the memcpy version being switchable at compile time. @kalou if you are ok with that I will push a small macro/ifdef commit in the memcpy branch.

@kalou
Copy link

kalou commented Mar 12, 2016

@klnikita thanks for the ether_addr_copy pointer, I think that's exactly what we needed here - I've pushed a new version.
I did not test anything here - so let's not bother releasing or ifdef'ing this until we have an idea of how it works.

@klnikita
Copy link
Contributor

Sorry for the late reply, I have setup a new test lab, it is not exactly the same as the previous one since right now I'm using Intel 82599ES 10-Gigabit cards and not the xl710 40-Gigabit cards (I will try to put two servers with xl710 cards later this week).

The version with ether_addr_copy() has almost the same results as the version with SIMD, we have less than 1% of performance loss. The tests were done with really few ACLs and routes in the LPM4/6 but the impact should be similar with both version if I will do a more realistic test.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants