Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of #752 breaks v1.1 flow #772

Closed
msmits-definium opened this issue Aug 14, 2019 · 9 comments
Closed

Implementation of #752 breaks v1.1 flow #772

msmits-definium opened this issue Aug 14, 2019 · 9 comments
Assignees
Labels

Comments

@msmits-definium
Copy link

In implementing #752 the CIDsStickyAnsCmds[] array got removed. This means that now every sticky command is removed.

This change breaks V1.1:
After joining, the MAC command RekeyInd is sent as DataUp (CID 11). The server responds with RekeyConf. This is now removed from the mac commands list because there is no check with CIDsStickyAnsCmds[]. It is however also removed a second time in ProcessMacCommands, case SRV_MAC_REKEY_CONF.

case SRV_MAC_REKEY_CONF:
{
    uint8_t serverMinorVersion = payload[macIndex++];

    // Compare own LoRaWAN Version with server's
    if( MacCtx.NvmCtx->Version.Fields.Minor == serverMinorVersion )
    {
        // If they equal remove the sticky RekeyInd MAC-Command.
        LoRaMacCommandsGetCmd( MOTE_MAC_REKEY_IND, &macCmd);
        LoRaMacCommandsRemoveCmd( macCmd );
    }
    break;
}

Because there is no check on the return value of LoRaMacCommandsGetCmd, the remove proceeds and results in an invalid deletion on the underlying linked list, It causes a hardfault on my LPC 54628.

See trace below which gives an idea of th flow:

[lorawan] initialisation
[lorawan] join;attempt=0
[lorawan] join=executing
[lorawan] lwn_network_task RECV
DEBUG - ProcessRadioTxDone
$ [lorawan] lwn_network_task RECV
DEBUG - ProcessRadioRxDone
LoRaMacCommandsAddCmd - cid: 11 <---- mac command added
DEBUG - ProcessRadioRxDone - exit
[lorawan] JOIN OK
[lorawan] MIB_DEV_ADDR: FC0023C4
[lorawan] join=success
[lorawan] session=get
[lorawan] SEND START
[lorawan] lwn_network_task RECV
DEBUG - ProcessRadioTxDone
[lorawan] lwn_network_task RECV
DEBUG - ProcessRadioRxDone
LoRaMacCommandsRemoveCmd - cid: 11 <---- mac command removed first time, OK
ProcessMacCommands - payload[macIndex]: 11
LoRaMacCommandsRemoveCmd - cid: 181 <----- mac command remove attempted 2nd time but on empty list, FAIL

As mentioned above this fail results in a system crash / hardfault.

@catsup
Copy link

catsup commented Aug 14, 2019

Could you elaborate on why you believe the change to LoRaMacCommandsRemoveStickyAnsCmds() in #752 caused this failure ?

Unless I missed something, I don't see any change in behavior since curElement->IsSticky is set up from IsSticky(), which is equivalent to looping through the CIDsStickyAnsCmds[] array:

static bool IsSticky( uint8_t cid )
{
    switch( cid )
    {
        case MOTE_MAC_DL_CHANNEL_ANS:
        case MOTE_MAC_RX_PARAM_SETUP_ANS:
        case MOTE_MAC_RX_TIMING_SETUP_ANS:
            return true;
        default:
            return false;
    }
}

@ConnyBusy
Copy link
Contributor

@catsup

Note that @msmits-definium is right and he is referring to LW1.1 which has this function instead:

static bool IsSticky( uint8_t cid )
{
    switch( cid )
    {
        case MOTE_MAC_RESET_IND:
        case MOTE_MAC_REKEY_IND:
        case MOTE_MAC_DEVICE_MODE_IND:
        case MOTE_MAC_DL_CHANNEL_ANS:
        case MOTE_MAC_RX_PARAM_SETUP_ANS:
        case MOTE_MAC_RX_TIMING_SETUP_ANS:
            return true;
        default:
            return false;
    }
}

@catsup
Copy link

catsup commented Aug 14, 2019

Thanks @ConnyBusy

Still, I'd expect a function called LoRaMacCommandsRemoveStickyAnsCmds() to remove all sticky commands, not just a select few.

That said, the problem described by @msmits-definium rather seems to be the missing check of a return value:

Because there is no check on the return value of LoRaMacCommandsGetCmd, the remove proceeds and results in an invalid deletion on the underlying linked list

@ConnyBusy
Copy link
Contributor

In LW1.1 not all sticky mac commands are removed on the same condition. Meanwhile e.g.
MOTE_MAC_DL_CHANNEL_ANS, MOTE_MAC_RX_PARAM_SETUP_ANS, MOTE_MAC_RX_TIMING_SETUP_ANS can be removed after a reception of any downlink in RX windows, others like MOTE_MAC_RESET_IND and MOTE_MAC_REKEY_IND will be removed upon reception of a confirmation from the server with the corresponding MAC responses. Meaning those must only be removed if a confirmation is received otherwise it will cause a device to resend a join request after ADR_ACK_LIMIT transmission of REKEY_IND for example.

@catsup
Copy link

catsup commented Aug 14, 2019

I'm not much up to speed on 1.1 at this stage, but if that is so, I guess we may need a distinct function for that purpose.

Nevertheless, I strongly believe that error handling is not optional - if a function may return an error, the return value should be checked rather than relying on incorrect data leading to a crash further down the chain.

From a quick glance at the code, it appears that in case of failure of LoRaMacCommandsGetCmd(), the returned mac command pointer may contain random data, passing that to LoRaMacCommandsRemoveCommand() obviously leading to unpredictable results.

@msmits-definium
Copy link
Author

Thanks everyone for the helpful comments so far.

To summarize the problem:

  • I'm using the v1.1 flow. This is important to note, because the problem won't appear in v1.0.3
  • At one point during ProcessRadioRxDone, a MAC command RekeyConf is processed.
  • The RekeyConf gets removed in ProcessRadioRxDone at the call to RemoveMacCommands
  • Then, still in ProcessRadioRxDone, ProcessMacCommands gets called.
  • In ProcessMacCommands we end up in case SRV_MAC_REKEY_CONF
  • There, LoRaMacCommandsRemoveCmd is always called because there is no check on the return value of LoRaMacCommandsGetCmd.
  • So that in turn results in the double deleteion on the underlying linked list, which causes a crash.

As mentioned above by user catsup, I've simply worked around this as follows, with a check on the return value of LoRaMacCommandsGetCmd:

        case SRV_MAC_REKEY_CONF:

        {

            uint8_t serverMinorVersion = payload[macIndex++];



            // Compare own LoRaWAN Version with server's

            if( MacCtx.NvmCtx->Version.Fields.Minor == serverMinorVersion )

            {

                // If they equal remove the sticky RekeyInd MAC-Command.

                // todo: workaround for github issue 752

                if (LoRaMacCommandsGetCmd( MOTE_MAC_REKEY_IND, &macCmd) == LORAMAC_COMMANDS_SUCCESS) {

                    LoRaMacCommandsRemoveCmd( macCmd );

                } 

            }

            break;

        }

For now this works well enough for me. I will leave it up to the discretion of the Stackforce people as to how they want to implement a permamennt solution fo this.

@djaeckle
Copy link

Hi all,

many thanks to @msmits-definium for creating a new issue. Thanks for all your analysis and reports. We will apply both, an error handling for LoRaMacCommandsGetCmd before we perform other operations on the commands and will fix LoRaMacCommandsGetCmd to not pass random data to macCmd in case of failure.

I will push the fixes as soon as possible.

@djaeckle djaeckle added the bug label Aug 15, 2019
@djaeckle djaeckle added this to the Release Version 4.4.3 milestone Aug 15, 2019
@djaeckle djaeckle self-assigned this Aug 15, 2019
@msmits-definium
Copy link
Author

Thanks @djaeckle.
Just to let you know that I had to do the same fix of checking the return value in two more cases:
SRV_MAC_RESET_CONF
SRV_MAC_DEVICE_MODE_CONF

@djaeckle
Copy link

djaeckle commented Sep 2, 2019

Hi @msmits-definium ,

thanks for the amendment, i will take it into account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants