-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1.2] CHIP_CONFIG_MAX_GROUP_ENDPOINTS_PER_FABRIC and Matter Bridge #36882
Comments
The log output is related to the value CHIP_CONFIG_MAX_GROUP_ENDPOINTS_PER_FABRIC=17 |
@Christoph6848 Well, you need to change CHIP_CONFIG_MAX_GROUP_ENDPOINTS_PER_FABRIC in your bridge's config to have the right value, no? |
@bzbarsky-apple Yes of course. We changed this value to 50 (due o the fact that our bridge should support 50 dynamic endpoints and every endpoint has to support the groups cluster). Afterwards we got a group table size of 200( 4*50). After filling the group table the read group table request will fail (no really fails but not all groups will be added to the response). It seems that max packet size is reached and therefore some entries are missing!! Probably this is because of a NO_MEMORY error within ReadGroupTable method of group key management server implementation. We based our implementation on Tag v1.2.0.1 of connectedhomeip repository. |
If the list is too long, it won't fit in a single packet, but will get chunked into multiple packets, so that part should be ok.... |
@bzbarsky-apple this is exactly the problem that the list will NOT be chunked. I assume the problem in TLVWriter. I changed already mUseChainedBuffers to true but it seems that in this case I got some more issues. |
It seens that the problem is in Engine.cpp .... In BuildAndSendSingleReportData the writer is configured to support only one Packet wih SDU size. This will be used for ReadGroupTable and therefore some entries are missing. @bzbarsky-apple Could you please confirm this?? I recognized that in the meantime (newer version than 1.2.0.1) some changes are done in the area so the size of the buffer is increased. |
Attached the calling tree when error occurs Use Case: Read the group table containing more bytes than supported by a single package. In this case as mentioned just some entries are missing and the resposne is send w/o this. I assume the problem is that RetrieveClusterdData (Engine.cpp) does not return an error and therefore the response is send directly despite the fact not all entries are encoded. We have to stay on 1.2.0.1 branch therefore it would be very nice to have a fix available based on this branch. With the bug we will not have the chance to get our certification. |
Ah! Because |
Thank you very much!!! Hopefully than also based on Tag v1.2.0.1. We planned our certification to be done on February ..... |
It depends on how easy it is to backport the fix. You might need to do that part yourself.... |
…ectly. We were not propagating out encoding status, so when we reached end of packet we would just silently return CHIP_NO_ERROR instead of indicating we had more data to encode. Fixes project-chip#36882
Thank you very much for your support!!! The fix helps and was more or less easy to integrate in our software. You saved (for the moment) our project schedule. Nevetheless it seems that in case of CHIP_CONFIG_MAX_GROUP_ENDPOINTS_PER_FABRIC 50 now we got an timeout error probably due to very, very fast KVS storage writes which are even in case of running the software on a powerful Linux PC leads to a timeout. We used the CHIPLinuxStorageIni.cpp for KVS storage. Probably I have to reduce "real" file system writes to run the test successful. Or do you have another idea for this issue?? In case of disabling file system writes at all (comment the CommitConfig method) the test runs fine w/o any problems. |
@Christoph6848 I think your actual storage needs to do something to throttle disk writes... CHIPLinuxStorageIni.cpp is not very efficient that way, as you note. :( This issue keeps coming up every so often; maybe there is something groups could do to not stress storage so much, but no good suggestions for it yet. |
@bzbarsky-apple It seems that also in case of disabling persistent storage at all (by patching the Commit Message) on target device (also a Linux system) the timeout occurs during GroupKeySetWrite .... I attached a Log of the TC-RR-1.1 run for Test harness and the DUT. I recognized a lot of "Dropped ..." messages perhaps this is the root cause. Test Harness Log: DUT Log: |
…ectly. (project-chip#36975) We were not propagating out encoding status, so when we reached end of packet we would just silently return CHIP_NO_ERROR instead of indicating we had more data to encode. Fixes project-chip#36882
…ectly. (project-chip#36975) We were not propagating out encoding status, so when we reached end of packet we would just silently return CHIP_NO_ERROR instead of indicating we had more data to encode. Fixes project-chip#36882
…ectly. (project-chip#36975) We were not propagating out encoding status, so when we reached end of packet we would just silently return CHIP_NO_ERROR instead of indicating we had more data to encode. Fixes project-chip#36882
…ectly. (project-chip#36975) We were not propagating out encoding status, so when we reached end of packet we would just silently return CHIP_NO_ERROR instead of indicating we had more data to encode. Fixes project-chip#36882
No, that's just saying the event (in the IM sense) buffer is full and old events are being evicted; that's normal. So something else is going on. |
…ectly. (project-chip#36975) We were not propagating out encoding status, so when we reached end of packet we would just silently return CHIP_NO_ERROR instead of indicating we had more data to encode. Fixes project-chip#36882
Reproduction steps
Due to definition the definition mentioned above indicates the maximum number of EPs containing the Groups Cluster. For a Bridge (e.g. which will support 50 dynamic Endpoints) this has to be 50.
After changing this the TC TC_RR_1.1 will always fail (it seems that 16 is the limit until the TC will PASS).
Error
[1734507253.734478][55:58] CHIP:DMG: ], [1734507253.734515][55:58] CHIP:DMG: 0x3 = "" (0 chars), [1734507253.734548][55:58] CHIP:DMG: }, [1734507253.734577][55:58] CHIP:DMG: { [1734507253.734607][55:58] CHIP:DMG: 0xfe = 6, [1734507253.734634][55:58] CHIP:DMG: 0x1 = 274, [1734507253.734659][55:58] CHIP:DMG: 0x2 = [ [1734507253.734686][55:58] CHIP:DMG: 3, [1734507253.734720][55:58] CHIP:DMG: ], [1734507253.734753][55:58] CHIP:DMG: 0x3 = "" (0 chars), [1734507253.734789][55:58] CHIP:DMG: }, [1734507253.734820][55:58] CHIP:DMG: ], [1734507253.734848][55:58] CHIP:DMG: }, [1734507253.735062][55:58] CHIP:DMG: [1734507253.735086][55:58] CHIP:DMG: }, [1734507253.735345][55:58] CHIP:DMG: [1734507253.735377][55:58] CHIP:DMG: ], [1734507253.735607][55:58] CHIP:DMG: [1734507253.735692][55:58] CHIP:DMG: SuppressResponse = true, [1734507253.735721][55:58] CHIP:DMG: InteractionModelRevision = 11 [1734507253.735744][55:58] CHIP:DMG: } [1734507253.759441][55:58] CHIP:EM: <<< [E:41133i S:17225 M:172345987 (Ack:188164374)] (S) Msg TX to 14:0000000012344321 [28DA] --- Type 0000:10 (SecureChannel:StandaloneAck) [1734507253.759549][55:58] CHIP:IN: (S) Sending msg 172345987 on secure session with LSID: 17225 [1734507253.759747][55:58] CHIP:EM: Flushed pending ack for MessageCounter:188164374 on exchange 41133i [1734507253.759852][55:58] CHIP:DL: HandlePlatformSpecificBLEEvent 32793 [MatterTest] 12-18 07:34:13.762 ERROR Exception occurred in test_TC_RR_1_1. Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/mobly/base_test.py", line 783, in exec_one_test test_method() File "/root/python_testing/matter_testing_support.py", line 1008, in async_runner return asyncio.run(body(*args, **kwargs)) File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/root/python_testing/TC_RR_1_1.py", line 482, in test_TC_RR_1_1 await self.validate_group_table(num_fabrics_to_commission, fabric_unique_clients, group_table_written, fabric_table) File "/root/python_testing/TC_RR_1_1.py", line 690, in validate_group_table asserts.assert_equal(found_groups, len(group_table_written[client_idx]), File "/usr/local/lib/python3.10/dist-packages/mobly/asserts.py", line 68, in assert_equal _call_unittest_assertion(_pyunit_proxy.assertEqual, File "/usr/local/lib/python3.10/dist-packages/mobly/asserts.py", line 52, in _call_unittest_assertion raise signals.TestFailure(my_msg, extras=extras) mobly.signals.TestFailure: Details=67 != 68 Found group count does not match written value., Extras=None
Bug prevalence
Blocker for certification
GitHub hash of the SDK that was being used
181b0cb
Platform
other
Platform Version(s)
v1.2.0.1
Type
Spec Compliance Issue
Anything else?
No response
The text was updated successfully, but these errors were encountered: