Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anomalies in scheduling of TCP tasks #22

Open
dchirgwin opened this issue Apr 22, 2020 · 7 comments
Open

Anomalies in scheduling of TCP tasks #22

dchirgwin opened this issue Apr 22, 2020 · 7 comments

Comments

@dchirgwin
Copy link

dchirgwin commented Apr 22, 2020

I shall prefix all my other comments by saying that the timing is very complicated, and it's possible that I have misunderstood the intention of the code in places. However, I have used detailed logs from Tracealyzer to analyze this, and I'm fairly sure there is a problem. I'm running on PIC32MX under FreeRTOS.

First Issue

First issue is that stackTaskRate is not being calculated properly. The value is set in _TCPIPStackCreateTimer() based on a complicated rounding calculation involving SYS_TMR_TickCounterFrequencyGet().

static bool _TCPIPStackCreateTimer(void)
{
    tcpip_stack_tickH = SYS_TMR_CallbackPeriodic(TCPIP_STACK_TICK_RATE, 0, _TCPIP_STACK_TickHandler);
    if(tcpip_stack_tickH != SYS_TMR_HANDLE_INVALID)
    {
        uint32_t sysRes = SYS_TMR_TickCounterFrequencyGet();
        uint32_t rateMs = ((sysRes * TCPIP_STACK_TICK_RATE) + 999 )/1000;    // round up
        stackTaskRate = (rateMs * 1000) / sysRes;
        // SYS_TMR_CallbackPeriodicSetRate(tcpip_stack_tickH, rateMs);
        return true;
    }

    SYS_ERROR_PRINT(SYS_ERROR_ERROR, TCPIP_STACK_HDR_MESSAGE "Tick registration failed: %d\r\n", TCPIP_STACK_TICK_RATE);
    return false;
}

However, the timer uses SYS_TMR_CallbackPeriodic() which is based on a pure millisecond timer, and is not derived from the SYS_TMR_TickCounterFrequency.

I am using the following configuration:

  • TCPIP_TCP_TASK_TICK_RATE = 10
  • TCPIP_STACK_TICK_RATE = 5
  • TCPIP_STACK_Task() is called at 1ms intervals

Instead of getting a call to TCPIP_TCP_Task() every 10ms, I get a call every 7ms (on average).
This is because stackTaskRate has been set to 7 instead of 5, which means that _TCPIPStackSignalTmo() is eating away at pSigEntry->currTmo in larger chunks than it should.

Solution

I think the code that sets stackTaskRate just needs to be simplified like this:

        stackTaskRate = TCPIP_STACK_TICK_RATE;

Second (related) Issue

Originally I had a slightly different configuration:

  • TCPIP_TCP_TASK_TICK_RATE = 5
  • TCPIP_STACK_TICK_RATE = 5
  • TCPIP_STACK_Task() is called at 1ms intervals

This caused very weird errors because TCPIP_TCP_TASK_TICK_RATE is smaller than stackTaskRate, which caused periodic overflows in _TCPIPStackSignalTmo() as pSigEntry->currTmo got increasingly negative, and eventually wrapped around to be positive.
The end result was that the periodic TCP processsing (e.g. the Nagle algorithm using TCPIP_TCP_AUTO_TRANSMIT_TIMEOUT_VAL) went idle for about 23 seconds every 106 seconds.

When the signal handler is installed by _TCPIPStackSignalHandlerRegister() there is a check to make sure the period of the signal handler asyncTmoMsis not smaller than stackTaskRate.

            if(pSignalEntry->signalHandler == 0 || pSignalEntry->signalHandler == signalHandler)
            {   // found module slot
                pSignalEntry->signalHandler = signalHandler;
                if ((asyncTmoMs != 0) && (asyncTmoMs < stackTaskRate))
                {
                    asyncTmoMs = stackTaskRate;
                }
                pSignalEntry->asyncTmo = pSignalEntry->currTmo = asyncTmoMs;
                return pSignalEntry;

The problem is that, due to the order of initialization, stackTaskRate has not been initialized yet, so this test always passes!

_TCPIPStackSignalHandlerRegister() is called during initialization of the TCPIP stack, but TCPIP_STACK_CreateTimer() is called from _TCPIPStackIsRunState() which is called from TCPIP_STACK_Task(), i.e. AFTER initialization.

Solution

One way to address this might be to separate the initialization of stackTaskRate from the actual starting of the timer. startTaskRate can be calculated early during intialization, and the timer started from TCPIP_STACK_Task() as currently.

@adrian-aur
Copy link
Collaborator

Thank you for your report.

  • The second issue that you reported has already been fixed in the stack some time ago and will be part of the v3.6.0 release. Yes, it was a bug that could cause the currTmo to go negative.

  • For the 1st issue, the code was originally developed for a tick system where there is no guarantee that the TCP/IP process period is always a multiple of the system tick. For example, the system tick is 2 ms and the required callback period is 5 ms. That's why the rounding up and the calculation will render in this case the rateMs == 3 (instead of 2.5) and stackTaskRate == 6 ticks (i.e. 3 ms). Also, notice the extra call to SYS_TMR_CallbackPeriodicSetRate() to adjust the result to the correct integer of ticks.

For your example though, if the system timer resolution is 1 ms, i.e. SYS_TMR_TickCounterFrequencyGet() returns 1000, the calculation should work without any rounding taking effect:
TCPIP_STACK_TICK_RATE = 5, then
sysRes = 1000
rateMs = ((1000 * 5) + 999) / 1000 = 5
stackTaskRate = (5 * 1000) / 1000 = 5

so the stack TCPIP_STACK_Task() will be called every 5 ms, as intended.
Is there an error in the SYS_TMR_TickCounterFrequencyGet(), it doesn't return 1000, or what is the problem?

@dchirgwin
Copy link
Author

Thanks for your reply, Adrian.
You're correct that SYS_TMR_TickCounterFrequencyGet() doesn't return 1000 in my project.
It returns 266, so maybe that's the root of the problem I was seeing.

This is derived from SYS_COUNT_TICK_CONV, as seen in sys_time_h2_adapter.h

#define SYS_COUNT_TICK_CONV         150000
static __inline__ uint32_t __attribute__((always_inline)) SYS_TMR_TickCounterFrequencyGet ( void )
{
	return SYS_TIME_FrequencyGet()/SYS_COUNT_TICK_CONV;
}

My SYS_Time_Frequency is 40MHz,
hence SYS_TMR_TickCounterFrequency is defined as 40,000,000 / 150,000 = 266

I understand the intention of the rounding, but my concern is that the SYS_TMR_TickCounterFrequency value is not actually related to how the task "ticks" are generated.

SYS_TMR_CallbackPeriodic() does not specify a number of ticks, rather it specifies a number of milliseconds. Therefore, I think that stackTaskRate should be calculated using milliseconds, not using ticks.

Obviously, if SYS_COUNT_TICK_CONV was 1000 then 1 SYS_TMR tick = 1 millisecond and there would be no difference. But if SYS_COUNT_TICK_CONV is not 1000, this anomaly occurs.

@dchirgwin
Copy link
Author

I've just seen what may be the critical line of code:

You mention a call to SYS_TMR_CallbackPeriodicSetRate() to adjust the result. In my generated code, this call is commented out, so the adjustment is never made.

@adrian-aur
Copy link
Collaborator

The call to the SYS_TMR_CallbackPeriodicSetRate() is commented out as the H3 timer no longer supports it. So it's the same in my code, no problem here.

There seems to be a disconnect between the setting of the timer and the result of the SYS_TMR_TickCounterFrequencyGet(). It should match the settings on your system. So practically that "hard coded" value SYS_COUNT_TICK_CONV == 150000 is wrong.
I'll have to investigate to understand how that value is generated and if it should adjust itself depending on the settings. For now please adjust it manually so that you get the correct timing, until I get an answer.

@adrian-aur
Copy link
Collaborator

The issue has been already corrected and it will be part of the next release. The sys_time_h2_adapter should not use the hard coded SYS_COUNT_TICK_CONV but the value returned by the call SYS_TIME_MSToCount(1).
Please give it a try, it should fix your issue.

Of course the 2nd issue remains (the one caused by the late initialization of the system timer and wrong values for the module timeouts). I can send you a quick snapshot of the fix, if you want to implement it yourself until the release is available.

@dchirgwin
Copy link
Author

Thanks for the update. I'm pleased and impressed you found the "smoking gun" - it's horrendously complex stuff around there!
Regarding the fix, I've got some workarounds in place so I'm happy to wait for the next release, but thanks for the offer.

@adrian-aur
Copy link
Collaborator

Some of the complexity is caused by the need of signaling of RX/TX and TMO events, and running under an RTOS but also bare metal.
Regarding the timing itself, I agree that it should be updated to use directly the H3 timing scheme, ms based, as the tick scheme is no longer present.
I'll create an internal issue for this update. It won't be probably part of the next release, but of a future one.
Thank you for reporting these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants