Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #928 and #929 - Modularize software bus routing, add msg map hash #947

Merged

Conversation

skliper
Copy link
Contributor

@skliper skliper commented Oct 13, 2020

Describe the contribution
Fix #928 - refactor SB and move routing to a module
Fix #929 - adds message map hash implementation
Partial #948 - fixed double lock potential

Key concepts:

  • added SBR module
  • Message map and routing table are in SBR, access APIs on SB side
  • SB still owns destination logic
  • no more route index or pointers being passed around, all route id and message id based
  • hash message map "oversized" (4x) to minimize collisions (only ~10% single collisions on the real system with a full routing table)
    • added debug event for collisions during add
  • dropped routing push/pop, dropped "key" in direct implementation
  • around a 10-20% performance hit to hash via rough comparison on a linux box, no memory impact from message id "space", just based on needed routes (with overhead)
  • hash designed for 32 bit, a change in CFE_SB_MsgId_Atom_t size may require implementation updates

Remaining work:

  • Unit test stubs (could split into new PR), 2 related TODOs in code

Also

  • Deletes unused code CFE_SB_FindGlobalMsgIdCnt
  • Note writes to file not protected by lock (before or after refactor)
  • CFE_SB_SendPrevSubs isn't really protected by the lock... might be worth removing the locks completely (they don't really solve anything, double lock potential eliminated
    • could just use the private CFE_SBR_ForEachRouteId without a lock instead of messages and reduce all the extra logic
  • Whitespace/alignment is very difficult to work with
  • Excessive comments (in my opinion)
  • violates declare variables at the start of function coding standard, all observed violations fixed

Testing performed

  • All unit tests pass, rough performance testing, spot checked routing table size impacts on core with nominal (256), 64, and 32. 32 causes dropped subscriptions/full table and was useful for observing collisions (~10% one level seen) on a realistic set of message ids. Everything performed as expected.
  • Passed bundle CI at https://travis-ci.com/github/skliper/cFS/builds/191437587

Expected behavior changes

  • Individual events for deleting destinations when deleting a pipe removed to avoid a race condition
  • It's resource intensive to try to report message ID map in message ID order (either search or brute force) for any significantly sized map on an unsorted routing table, allowing out of order makes this reasonable again and gets rid of the need for ForEachMsgId API. Need to trade out of order vs implementing a search.

System(s) tested on

  • Hardware: cFS Dev Server
  • OS: Ubuntu 18.04
  • Versions: bundle + this commit

Additional context
None

Third party code
None

Contributor Info - All information REQUIRED for consideration of pull request
Jacob Hageman - NASA/GSFC

@skliper skliper added the CCB:Ready Ready for discussion at the Configuration Control Board (CCB) label Oct 13, 2020
@skliper
Copy link
Contributor Author

skliper commented Oct 13, 2020

Ready for proof of concept discussion...

@skliper skliper force-pushed the fix928-route-direct-module branch from 48abc44 to 80bce1a Compare October 13, 2020 22:09
@skliper
Copy link
Contributor Author

skliper commented Oct 14, 2020

CCB 2020-10-14: Delayed discussion, will splinter or discuss next week.

@skliper
Copy link
Contributor Author

skliper commented Oct 14, 2020

Worth noting this is a step in the direction of implementing just the direct lookup (no longer has the extra abstraction layers for other implementations). The hash implementation will have the extra hash layer, but no need to maintain it in the direct lookup implementation. This results in simplification of the direct lookup implementation.

@skliper skliper requested review from a user and jwilmot October 14, 2020 17:02
Copy link
Contributor

@jphickey jphickey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks pretty good ... nice cleanup/simplification to SB.

Most critical is really to get the API right, then the rest of the implementation can fall from that.

I'd also like to seriously consider changing the list structure to be circular in nature (where the end of the list points back to the beginning again, rather than NULL) as this makes the code much simpler.

fsw/cfe-core/src/sb/cfe_sbr.h Outdated Show resolved Hide resolved
fsw/cfe-core/src/sb/cfe_sbr.h Outdated Show resolved Hide resolved
fsw/cfe-core/src/sb/cfe_sb_priv.c Show resolved Hide resolved
@skliper
Copy link
Contributor Author

skliper commented Oct 15, 2020

Trade removing the individual unsubscribe events when deleting a pipe to avoid having to unlock which introduces a possible race (old implementation unlocked so could race). Would be messy to store all the messages up to send at the end after the unlock, other suggestions welcome.

@astrogeco astrogeco removed the CCB:Ready Ready for discussion at the Configuration Control Board (CCB) label Oct 15, 2020
@skliper
Copy link
Contributor Author

skliper commented Oct 16, 2020

Implementation/testing update - add a debug event for collisions (on subscribe), added hash implementation.

Enabled hash and there were no collisions with the nominal routing table size. Reduced table size to 32 and there were only 4 single level collisions so the factor of 4 on the hash table seems very reasonable for handing a full routing table (and the hash performed well). The messages past 32 (there's more like 40 or so in the framework) got rejected as expected.

Performing minor cleanup on the test cases (fails for size 32 routing table). Some event counts could theoretically change due to the new debug message but none seen in practice. Could switch subscribe test event count asserts to be >= to handle it, but as long as the hash function isn't changed and the routing table isn't incredibly small (16 or less) they will work fine.

@skliper
Copy link
Contributor Author

skliper commented Oct 20, 2020

Brute force ForEachMsgId is unreasonable for any significantly large MsgId space for the unsorted routing table implementation. Search would also be resource intensive for larger routing tables... only behavior difference from ForEachRouteId is order, propose removing ForEachMsgId (only used for message map reporting, could be sorted offline if needed). Note for sorted routing tables, the behavior of the two APIs is the same (in msgid order).

EDIT - specified this difference is only for unsorted implementation

@skliper skliper added the CCB:Ready Ready for discussion at the Configuration Control Board (CCB) label Oct 20, 2020
@excaliburtb
Copy link

tagging self

- Removed all index/pointer accesses of message map and routing table
- All references now via IDs and APIs
- Note SB still owns destination logic (unchanged linear linked list)
- Limited whitespace fixes for readability
- Resolved observed instances of variables not declared at the
  start of functions
- Cleaned comments
- Resolved potential double locks in CFE_SB_SendPrevSubs
- Route and message write to file no longer guaranteed in msgid
  order to maintain performance for large msgid space implementations
- Removed unused CFE_SB_FindGlobalMsgIdCnt
- Clarified CFE_PLATFORM_SB_MAX_MSG_IDS config param description
- Eliminated potential race in CFE_SB_PipeDeleteFull
- Individual destination removal debug events no longer reported
  during a CFE_SB_PipeDeleteFull
@skliper skliper force-pushed the fix928-route-direct-module branch from eed8c3a to 7bb7585 Compare October 21, 2020 13:52
@skliper
Copy link
Contributor Author

skliper commented Oct 21, 2020

Refactored commits for review. Ready.

@skliper skliper changed the title (WIP) Fix #928 - Modularize software bus routing Fix #928 and #929 - Modularize software bus routing, add msg map hash Oct 21, 2020
@skliper skliper marked this pull request as ready for review October 21, 2020 13:55
- Implementation for direct map and unsorted routing table
- Includes full coverage tests
- Removed msg key and route stack concepts from direct map
- Message map size based on used routes
- Oversized (4x) to limit collisions while retaining
  resonable size related to routing table (still smaller)
- ~10% single collisions seen for full routing table
  with realistic message ID use
- Oversizing means map can never fill, simplifies logic
- Observed approximately 10%-20% performance hit, trade
  against memory use (can now use full 32 bit MsgId space)
- Hash intended for 32 bit, if CFE_SB_MsgId_Atom_t size
  changes may require modification to hash
- Also added full coverage unit tests
- Linking SBR with SB unit test, not stubbed
- Confirms matching functionality (with updates for
  intended changes)
@skliper skliper force-pushed the fix928-route-direct-module branch from 7bb7585 to 9b56d52 Compare October 21, 2020 14:03
@skliper
Copy link
Contributor Author

skliper commented Oct 21, 2020

Squashed cppcheck warning.

@astrogeco
Copy link
Contributor

astrogeco commented Oct 21, 2020

CCB 2020-10-21 APPROVED

Conversation about collisions. System can be sized appropriately to avoid collisions. Users can deterministically predict the bound by appropriately choosing the MsgIDs. There's a debug event upon subscription which alerts about collisions.

Open issue to add documentation about ideal usage and predetermining IDs to avoid collisions.

Because it's a module, users can replace the algorithm for efficient. QUESTION: How is this set up in the build system?

Make separate PR for stubs.

@skliper
Copy link
Contributor Author

skliper commented Oct 21, 2020

Issues submitted, see #961 and #962.

@astrogeco astrogeco removed the CCB:Ready Ready for discussion at the Configuration Control Board (CCB) label Oct 21, 2020
@astrogeco astrogeco changed the base branch from main to integration-candidate October 21, 2020 22:41
@astrogeco astrogeco merged commit 12f2ff1 into nasa:integration-candidate Oct 21, 2020
astrogeco added a commit to nasa/cFS that referenced this pull request Oct 21, 2020
* Note: algorithm designed for a 32 bit int, changing the size of
* CFE_SB_MsgId_Atom_t may require an update to this impelementation
*/
CFE_SB_MsgId_Atom_t CFE_SBR_MsgIdHash(CFE_SB_MsgId_t MsgId)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should an internal helper function be static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly could be. I'm not consistently strict about static since they are occasionally useful from a coverage test point of view, but it's not necessary in this case. Happy to change if you want to put in an issue.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good.
I'm a little less clear about the organization. We have the fsw/cfe-core directory with the cFE subsystems, then we have the modules directory that has the msg, sbr, cfe_assert, cfe_testcase, and cfe_testrunner.

In the mission_defaults.cmake we have cfe-core, osal, PSP, msg, and sbr. Seems like msg and sbr are peers to the cfe-core subsystems, (sb, es, tbl, eve, time) or maybe even subsystems of SB. I know mission_defaults.cmake does not define the architecture, but it leaves the impression to me that these modules are peers to osal, psp, etc.

The routing module has a common file and APIs, then it has optional (hash or direct) modules. Is the intention to allow the replacement of everything with a different implementation? Or would the common files (in unsorted.c) always be used? If these common functions in unsorted will always be used, I would argue for leaving them in the SB directory and having modules for hash and direct routing.

In the end, I think we need a big reorganization of the code to reflect the evolving nature of how things go together. But at a minimum I think it would be nice at least put the sbr and msg modules in the "fsw" directory next to "cfe-core":
Then we would have:
fsw/
fsw/cfe-core
fsw/modules
fsw/modules/sbr
fsw/modules/msg

or maybe even fsw/cfe-core/modules/...

@skliper
Copy link
Contributor Author

skliper commented Oct 27, 2020

The routing module has a common file and APIs, then it has optional (hash or direct) modules. Is the intention to allow the replacement of everything with a different implementation?

Yes. That's the real difference from the core, the module can be completely replaced with a custom implementation.

Or would the common files (in unsorted.c) always be used?

Not if someone implements a sorted routing table with a search instead of a message map.

In the end, I think we need a big reorganization of the code to reflect the evolving nature of how things go together.

Agreed! opened #972 to discuss. I like the msg/sbr modules and core being peers, and eventually all the cfe elements could become modules (someone could entirely replace table services or time services for example). It's been an as-needed process so far though.

@skliper skliper deleted the fix928-route-direct-module branch February 1, 2021 22:08
@skliper skliper added this to the 7.0.0 milestone Sep 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add message id to message key hash option Move SB route lookup (including insert method) to a module
4 participants