Removal of TAN messages and new capability to record in-transit messages in the RTI #61

Soroosh129 · 2022-04-04T17:06:50Z

This PR removes TAN messages entirely.

Instead, a federate with a physical action (that is connected to a network output) is going to periodically create a dummy event (with the period controlled by coordination-options: {advance-message-interval: 10 msec}) which forces the federate to advance its tag and allow downstream federates to make progress.

After fixing this bug, another bug was exposed in the RTI, in which the RTI could potentially lose track of a federate's actual earliest next event (see this comment for more detail). This caused the RTI to grant incorrect tag advance grant (TAG) messages. This bug was fixed by adding a queue to the RTI that keeps a record of all currently in-transit messages.

edwardalee

LGTM.

Soroosh129 · 2022-04-04T17:26:16Z

Actually, I realized that this fix is a bit of a red herring.

The added second check for _lf_bounded_NET(&tag) is checking a modified version of tag, where tag.time = get_physical_time() at some point in the near past, against the current physical time. So it always returns false.

This is effectively disabling TAN, and that "fixed" the issue. However, TAN messages are important to ensure that progress is made.

Background of the problem: TAN messages appear to cause incorrect Tag Advance Grants to be sent by the RTI, causing STP violations in centralized coordination. I'm not sure why. It's very clear they are the issue because commenting out the content of handle_time_advance_notice in the RTI causes the STP violations to go away.

This reverts commit 54b31c8.

This reverts commit 8fe8d60.

edwardalee

The cosmetic changes look good to me, but the one substantive change is probably not right. Before the change, a TAN would result in calling send_downstream_advance_grants_if_safe for all downstream federates, and after the change only for immediately downstream federates. But at worst, it should be harmless to call it for all downstream federates because send_downstream_advance_grants_if_safe calls send_advance_grant_if_safe, which checks for each federate whether it is actually safe to send a TAG. Since the docs for that latter function say clearly it should be called on all downstream federates, I suspect there was a reason for that. I suggest reverting this change and merging in the cosmetic changes.

core/federated/RTI/rti.c

Race condition is where a NET message from a federate from a previous cycle crosses a message to the federate being forwarded by the RTI, which causes the RTI's view of the NET of the federate to be incorrect.

edwardalee · 2022-04-09T15:42:49Z

The RTI, as defined in this branch, causes the following tests in federated to lock up and time out: LoopDistributedDouble.lf, PingPongDistributed.lf, LoopDistributedCentralized.lf. Looks like TANs are not being sent when they should be.

The new logic does two things:\n 1- The RTI double checks that if we are replacing the NET of a federate with a larger value, it has finished the previously (already) promised NET, and,\n 2- the RTI now attempts to send TAG and PTAGs if it is updating the next event of a federate upon forwarding a message

Soroosh129 · 2022-06-03T22:37:30Z

@lhstrh @edwardalee While the number of changed lines appears to be large (+1,134 −697), it is mostly inflated by replacement of tags with spaces. I appreciate your indulgence for this long overdue stylistic change :)

edwardalee

LGTM, except for one potential error raised in the code. Also, the PR is misnamed because it does much more than remove TAN messages. Perhaps "Remove TAN messages and record in-transit messages in the RTI"? I did a double take on in_transit_message_record_q_t not being a pointer, but then I realized it is a pair of pointers, so this seems reasonable to me.

core/federated/RTI/message_record/message_record.c

petervdonovan · 2022-06-04T21:50:55Z

core/federated/RTI/message_record/message_record.c

+    in_transit_message_record_t* head_of_in_transit_messages = (in_transit_message_record_t*)pqueue_peek(queue->main_queue);
+    while (head_of_in_transit_messages != NULL) { // Queue is not empty
+        // The message record queue is ordered according to the `time` field, so we need to check
+        // all records with the minimum `time` and find those that have the smallest tag.


If I understand, the reason why this procedure (and the one above it) are complicated is that pqueue priorities have to be 64 bits, which makes it hard to sort items by tag instead of time. This implementation looks like it might be complicated, and it looks like it might have suboptimal time complexity in programs that frequently schedule events a microstep in the future.

I have already suggested that attempts to cram priorities into a single word might not be serving us very well; maybe now is a good time to reconsider that?

This has been part of a long-running and interesting conversation. Would you like to create an issue or a discussion for this?

core/federated/RTI/rti.h

Hotfix: Fixed an issue wher TAN was being sent incorrectly

3f66b56

Soroosh129 requested a review from edwardalee April 4, 2022 17:06

Soroosh129 changed the title ~~Hotfix: Fixed an issue wher TAN was being sent incorrectly~~ Hotfix: Fixed an issue where TAN messages were being sent incorrectly Apr 4, 2022

Slightly adjusted debug message

972fffb

edwardalee approved these changes Apr 4, 2022

View reviewed changes

Soroosh129 added 5 commits April 4, 2022 14:15

Prevent transitive TAGs upon receiving a TAN

83d8d00

Update completed based on TAN not next_event

8fe8d60

Update next_event unconditionally

54b31c8

Revert "Update next_event unconditionally"

ebb19dc

This reverts commit 54b31c8.

Revert "Update completed based on TAN not next_event"

b0e8500

This reverts commit 8fe8d60.

edwardalee requested changes Apr 5, 2022

View reviewed changes

core/federated/RTI/rti.c Outdated Show resolved Hide resolved

Soroosh129 and others added 2 commits April 5, 2022 13:02

Revert back to the transitive logic for TAN messages in the RTI

808a189

Address a race condition

5ac67c4

Race condition is where a NET message from a federate from a previous cycle crosses a message to the federate being forwarded by the RTI, which causes the RTI's view of the NET of the federate to be incorrect.

Soroosh129 added 15 commits May 27, 2022 09:35

Merge remote-tracking branch 'origin/main' into hotfix-TAN

3e2d329

Fix merge artifacts

e59095b

Fixed more merge artifacts

29c3f04

Do not exit

40b190e

Removed in_transit_message because it's too coarse-grained

77449b9

Moved handle_timed_message and adjusted logic a bit

8e18f27

Updated LF ref

76b1980

Tabs only

74d588e

Adjusted checks for TAG and PTAG

48f85bf

Comments

b0aace2

Intsert dummy events for TAN messages

bcd65af

Comments, FIXME, and a small fix

5aaafb0

Purge TAN. Instead, create dummy events if needed

44dc5a1

Updated LF ref

4082de9

Soroosh129 added 9 commits May 30, 2022 23:14

Converted tabs to spaces

9801a4f

Removed in_transit_message

15b2f43

Adjusted log messages

fe87760

Added a pqueue for in-transit messages

98dd2c2

Comments only

f26d342

Comments only

f46abe8

Updated LF ref

f257a3b

Updated LF ref

a228fb3

Fixed issue where a dummy event could be created at FOREVER

eaec0e6

Soroosh129 changed the title ~~Hotfix: Fixed an issue where TAN messages were being sent incorrectly~~ Remove TAN messages Jun 3, 2022

Soroosh129 requested review from edwardalee and lhstrh June 3, 2022 22:33

edwardalee requested changes Jun 4, 2022

View reviewed changes

core/federated/RTI/message_record/message_record.c Outdated Show resolved Hide resolved

core/federated/RTI/message_record/message_record.c Show resolved Hide resolved

Soroosh129 changed the title ~~Remove TAN messages~~ Remove TAN messages and record in-transit messages in the RTI Jun 4, 2022

petervdonovan reviewed Jun 5, 2022

View reviewed changes

Soroosh129 added 3 commits June 10, 2022 09:42

Comments only

b2baf96

Address comment from @edwardalee

f05be58

Updated LF ref

5ea5303

edwardalee approved these changes Jun 10, 2022

View reviewed changes

Soroosh129 force-pushed the hotfix-TAN branch from 0b1f196 to 5ea5303 Compare June 10, 2022 20:52

Soroosh129 added 2 commits June 10, 2022 16:20

Merge remote-tracking branch 'origin/main' into hotfix-TAN

2b1d3ec

Updated LF ref

445b2c8

petervdonovan mentioned this pull request Jun 10, 2022

PQueue priorities sometimes need more bits #87

Open

Fixed bizarre merge artifact

47c70ed

Soroosh129 merged commit c1eb156 into main Jun 11, 2022

Soroosh129 deleted the hotfix-TAN branch June 11, 2022 05:26

lhstrh changed the title ~~Remove TAN messages and record in-transit messages in the RTI~~ Removal of TAN messages and new capability to record in-transit messages in the RTI Jul 20, 2022

lhstrh added the enhancement Enhancement of existing feature label Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removal of TAN messages and new capability to record in-transit messages in the RTI #61

Removal of TAN messages and new capability to record in-transit messages in the RTI #61

Soroosh129 commented Apr 4, 2022 •

edited

Loading

edwardalee left a comment

Soroosh129 commented Apr 4, 2022 •

edited

Loading

edwardalee left a comment

edwardalee commented Apr 9, 2022

Soroosh129 commented Jun 3, 2022

edwardalee left a comment

petervdonovan Jun 4, 2022

Soroosh129 Jun 10, 2022 •

edited

Loading

Removal of TAN messages and new capability to record in-transit messages in the RTI #61

Removal of TAN messages and new capability to record in-transit messages in the RTI #61

Conversation

Soroosh129 commented Apr 4, 2022 • edited Loading

edwardalee left a comment

Choose a reason for hiding this comment

Soroosh129 commented Apr 4, 2022 • edited Loading

edwardalee left a comment

Choose a reason for hiding this comment

edwardalee commented Apr 9, 2022

Soroosh129 commented Jun 3, 2022

edwardalee left a comment

Choose a reason for hiding this comment

petervdonovan Jun 4, 2022

Choose a reason for hiding this comment

Soroosh129 Jun 10, 2022 • edited Loading

Choose a reason for hiding this comment

Soroosh129 commented Apr 4, 2022 •

edited

Loading

Soroosh129 commented Apr 4, 2022 •

edited

Loading

Soroosh129 Jun 10, 2022 •

edited

Loading