Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add standard deviation mode for pcap captures #13

Closed
shu-soleil opened this issue Jul 1, 2021 · 15 comments
Closed

add standard deviation mode for pcap captures #13

shu-soleil opened this issue Jul 1, 2021 · 15 comments
Assignees

Comments

@shu-soleil
Copy link

Hello,

We'd like to add a new mode of pcap captures for standard deviation calculation. We implemented on the FPGA side, the capture of the sum value of input^2 (commit to "pcap_std_dev" branch on PandABlocks-FPGA repo). On the server side, we'll need to add a new mode to calculate the standard deviation with the new sum of input^2 value, and the existing mean value also the captured number:
image

Could you give some advices of adding this new capture mode on the server ?
Many thanks !

@shu-soleil
Copy link
Author

Concerning the question of capture modes, we'd like to have at least "mean stddev" mode, "min max mean stddev" mode would be a nice plus.

@Araneidae
Copy link
Contributor

I'm trying to remind myself of how the existing hardware interface to capture works. Can you please document in some detail how your new firmware implementation fits into this, please? I have the following description of the existing control interface taken from hardware.h:

/* Macros for formatting position and extension bus entries into capture field
 * values.  The bit layout for each type is as follows:
 *
 *                  32            9 8        4   2    0
 *                  +------------+-+----------+-+------+
 *  Position bus    |        0   |0|  pos-ix  |0| mode |
 *                  +------------+-+----------+-+------+
 *
 *                                9   7      4
 *                  +------------+-+-+--------+--------+
 *  Extension bus   |        0   |1|0| ext-ix |    0   |
 *                  +------------+-+-+--------+--------+
 */
    
/* Definitions of position capture fields. */
#define POS_FIELD_VALUE         0
#define POS_FIELD_DIFF          1
#define POS_FIELD_SUM_LOW       2
#define POS_FIELD_SUM_HIGH      3
#define POS_FIELD_MIN           4
#define POS_FIELD_MAX           5

If I remember correctly, each selected mode returns a single 32-bit value, hence the SUM field is returned as two separate values.

So I think my first questions are:

  • How many new position capture fields have you defined in your firmware?
  • What values are you returning? I'm presuming that you've defined a sum of squares result, but I then have to ask how many bits of result are you assigning? You'll need a lot of bits, naïvely something in the region of 96 (three 32-bit words).
  • That's going to be a lot of resources, so it occurs to me that we might want some way to interrogate the FPGA image to know whether this feature is available ... unfortunately we don't have a register with this function defined yet, however there are unallocated *REG spaces in slots 10 to 12.

@shu-soleil
Copy link
Author

We added 3 more fields: POS_FIELD_SUM2_LOW, POS_FIELD_SUM2_MID, POS_FIELD_SUM2_LOW

  • How many new position capture fields have you defined in your firmware?

The firmware returns the sum of squares result composed of 3 registers of 32-bit words (VHDL file):

  • Mode 6 - Sum^2 Low
  • Mode 7 - Sum^2 Middle
  • Mode 8 - Sum^2 High
  • What values are you returning? I'm presuming that you've defined a sum of squares result, but I then have to ask how many bits of result are you assigning? You'll need a lot of bits, naïvely something in the region of 96 (three 32-bit words).

In the firmware we used principally DSP (92 used among 400) to calcul squares. We've made a firmware test by putting them in existing position fields (min, max and diff) and verify their captured values which seem to be ok.

  • That's going to be a lot of resources, so it occurs to me that we might want some way to interrogate the FPGA image to know whether this feature is available ... unfortunately we don't have a register with this function defined yet, however there are unallocated *REG spaces in slots 10 to 12.

@Araneidae
Copy link
Contributor

Araneidae commented Jul 13, 2021

This looks perfectly sensible to me, and it should be reasonably straightforward to add this functionality to the server. However, we don't want to offer "stddev" capture options when the firmware option isn't available, and I suspect we may want to conditionally include this functionality.

@thomascobb , how do you want to proceed with this? I suspect if you add this firmware option to our existing FGPA builds a number of them will fail to synthesise; this certainly needs to be checked!

I suggest adding register *REG slot 10 15 as an FPGA_OPTIONS register to which we can add an "StdDev Enabled" bit, or you may prefer to add this as another configuration flag in the files accompanying the firmware image.

Regarding server implementation, there are a number of issues that will make it a bit tricky.

  • The list of available capture options is defined in pos_out.c. This will be easy enough to extend, though we'll need to filter the options depending on whether StdDev is enabled in the firmware.
  • The processing of captured data is mainly managed in capture.c, though the hardware configuration is prepared in prepare.c. The first complication is that we'll need to add a new capture frame size for uint-96.
  • One remaining difficulty is that we'll need to do some arithmetic on 96-bit numbers, and I'm sorry to see we don't have __uint128_t on our platform, so this will need to some workaround.

@thomascobb
Copy link

@shu-soleil please could you merge master into the std_dev branch of the FPGA repo?

@tomtrafford do you have time to work on this? If so, please could you check the FPGA utilization on the current 3.0a1 tag zpkgs that you made the other day, then when master is merged into the std_dev branch, build that and check the FPGA utilization there too?

@Araneidae
Copy link
Contributor

In the firmware we used principally DSP (92 used among 400)

How does that add up? I'm guessing your POS bus wasn't fully populated, as I would have guessed your usage would be at least 4 DSPs per channel unless you're not accumulating on every tick.

I'm struggling to find the implementation in your branch: can you link me to the implementation, please?

@thomascobb
Copy link

thomascobb commented Jul 13, 2021

Is it this: pcap_frame_mode.vhd lines 187-213?

@Araneidae
Copy link
Contributor

Thank you, that looks like it. I was wondering if the accumulator could be done inside the DSP, and it would be interesting to look at what fabric has been generated for that (the actual multiplication occurs on line 104).

What's with the configurable 8 bit shift on the accumulator? Is that something I already take into account (I see it also applies to the sum as well as the sum of squares)? Can't say I remember that functionality.

@Araneidae
Copy link
Contributor

@shu-soleil please could you merge master into the std_dev branch of the FPGA repo?

I've created a pull request: PandABlocks/PandABlocks-FPGA#49

@thomascobb
Copy link

What's with the configurable 8 bit shift on the accumulator? Is that something I already take into account (I see it also applies to the sum as well as the sum of squares)? Can't say I remember that functionality.

The sample_count is a 32-bit number, so will overflow after ~30s. We put in the shift to allow frames of longer than 30s to be averaged:

https://github.com/PandABlocks/PandABlocks-FPGA/blob/aa1f9bdf1d3381cc6c77c41138a12c9e66899e4b/modules/pcap/hdl/pcap_frame.vhd#L193

As we shift both the sum and the number of counts, the server doesn't need to do anything.

@shu-soleil
Copy link
Author

shu-soleil commented Jul 15, 2021

Regarding server implementation, there are a number of issues that will make it a bit tricky.

  • The list of available capture options is defined in pos_out.c. This will be easy enough to extend, though we'll need to filter the options depending on whether StdDev is enabled in the firmware.
  • The processing of captured data is mainly managed in capture.c, though the hardware configuration is prepared in prepare.c. The first complication is that we'll need to add a new capture frame size for uint-96.
  • One remaining difficulty is that we'll need to do some arithmetic on 96-bit numbers, and I'm sorry to see we don't have __uint128_t on our platform, so this will need to some workaround.

Thanks for the precisions, do you think we could make the calculation like below to avoid using __uint128_t ?

@Araneidae
Copy link
Contributor

I haven't worked out the details yet, but I anticipate that we'll need to calculate

using 96 bits of integer precision. Simple arithmetic (multiply, add, subtract) is pretty easy to extend to multiple words, the ARM assembler is pretty friendly, and gcc now even provides a builtin __builtin_add_overflow to help with writing wide arithmetic operations. I am though a bit concerned about the factor, I'll need to work out the best way to compute that.

Once we've computed this difference of two large numbers, for which we'll need to use all 96 bits, we'll be back to 64 bits and will probably be able to fall back to floating point for the rest of the calculation.

As I say, I haven't worked out the details yet, but it will clearly require a little care.

@thomascobb
Copy link

@shu-soleil I have talked to @Araneidae and he is happy to do the work on the server.

That's going to be a lot of resources, so it occurs to me that we might want some way to interrogate the FPGA image to know whether this feature is available ... unfortunately we don't have a register with this function defined yet, however there are unallocated *REG spaces in slots 10 to 12.

I've made PandABlocks/PandABlocks-FPGA#62 with a task for this, and for you to capture the work on the FPGA side

@shu-soleil
Copy link
Author

Hi thanks ! We'll take a look with Thierry.

Araneidae added a commit that referenced this issue Nov 11, 2021
The capture options remain as before, but can now be any combination of
Value, Min, Max, Diff, Sum, Mean.

This implements github issue #23 and is a step towards #13
@Araneidae
Copy link
Contributor

This is now implemented in commit 3fb86b6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants