Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.2] CHIP_CONFIG_MAX_GROUP_ENDPOINTS_PER_FABRIC and Matter Bridge #36882

Closed
Christoph6848 opened this issue Dec 18, 2024 · 14 comments · Fixed by #36975
Closed

[1.2] CHIP_CONFIG_MAX_GROUP_ENDPOINTS_PER_FABRIC and Matter Bridge #36882

Christoph6848 opened this issue Dec 18, 2024 · 14 comments · Fixed by #36975

Comments

@Christoph6848
Copy link

Reproduction steps

Due to definition the definition mentioned above indicates the maximum number of EPs containing the Groups Cluster. For a Bridge (e.g. which will support 50 dynamic Endpoints) this has to be 50.

After changing this the TC TC_RR_1.1 will always fail (it seems that 16 is the limit until the TC will PASS).

Error

[1734507253.734478][55:58] CHIP:DMG: ], [1734507253.734515][55:58] CHIP:DMG: 0x3 = "" (0 chars), [1734507253.734548][55:58] CHIP:DMG: }, [1734507253.734577][55:58] CHIP:DMG: { [1734507253.734607][55:58] CHIP:DMG: 0xfe = 6, [1734507253.734634][55:58] CHIP:DMG: 0x1 = 274, [1734507253.734659][55:58] CHIP:DMG: 0x2 = [ [1734507253.734686][55:58] CHIP:DMG: 3, [1734507253.734720][55:58] CHIP:DMG: ], [1734507253.734753][55:58] CHIP:DMG: 0x3 = "" (0 chars), [1734507253.734789][55:58] CHIP:DMG: }, [1734507253.734820][55:58] CHIP:DMG: ], [1734507253.734848][55:58] CHIP:DMG: }, [1734507253.735062][55:58] CHIP:DMG: [1734507253.735086][55:58] CHIP:DMG: }, [1734507253.735345][55:58] CHIP:DMG: [1734507253.735377][55:58] CHIP:DMG: ], [1734507253.735607][55:58] CHIP:DMG: [1734507253.735692][55:58] CHIP:DMG: SuppressResponse = true, [1734507253.735721][55:58] CHIP:DMG: InteractionModelRevision = 11 [1734507253.735744][55:58] CHIP:DMG: } [1734507253.759441][55:58] CHIP:EM: <<< [E:41133i S:17225 M:172345987 (Ack:188164374)] (S) Msg TX to 14:0000000012344321 [28DA] --- Type 0000:10 (SecureChannel:StandaloneAck) [1734507253.759549][55:58] CHIP:IN: (S) Sending msg 172345987 on secure session with LSID: 17225 [1734507253.759747][55:58] CHIP:EM: Flushed pending ack for MessageCounter:188164374 on exchange 41133i [1734507253.759852][55:58] CHIP:DL: HandlePlatformSpecificBLEEvent 32793 [MatterTest] 12-18 07:34:13.762 ERROR Exception occurred in test_TC_RR_1_1. Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/mobly/base_test.py", line 783, in exec_one_test test_method() File "/root/python_testing/matter_testing_support.py", line 1008, in async_runner return asyncio.run(body(*args, **kwargs)) File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/root/python_testing/TC_RR_1_1.py", line 482, in test_TC_RR_1_1 await self.validate_group_table(num_fabrics_to_commission, fabric_unique_clients, group_table_written, fabric_table) File "/root/python_testing/TC_RR_1_1.py", line 690, in validate_group_table asserts.assert_equal(found_groups, len(group_table_written[client_idx]), File "/usr/local/lib/python3.10/dist-packages/mobly/asserts.py", line 68, in assert_equal _call_unittest_assertion(_pyunit_proxy.assertEqual, File "/usr/local/lib/python3.10/dist-packages/mobly/asserts.py", line 52, in _call_unittest_assertion raise signals.TestFailure(my_msg, extras=extras) mobly.signals.TestFailure: Details=67 != 68 Found group count does not match written value., Extras=None

Bug prevalence

Blocker for certification

GitHub hash of the SDK that was being used

181b0cb

Platform

other

Platform Version(s)

v1.2.0.1

Type

Spec Compliance Issue

Anything else?

No response

@Christoph6848 Christoph6848 added bug Something isn't working needs triage labels Dec 18, 2024
@Christoph6848
Copy link
Author

The log output is related to the value CHIP_CONFIG_MAX_GROUP_ENDPOINTS_PER_FABRIC=17

@bzbarsky-apple
Copy link
Contributor

@Christoph6848 Well, you need to change CHIP_CONFIG_MAX_GROUP_ENDPOINTS_PER_FABRIC in your bridge's config to have the right value, no?

@Christoph6848
Copy link
Author

@Christoph6848 Well, you need to change CHIP_CONFIG_MAX_GROUP_ENDPOINTS_PER_FABRIC in your bridge's config to have the right value, no?

@bzbarsky-apple Yes of course. We changed this value to 50 (due o the fact that our bridge should support 50 dynamic endpoints and every endpoint has to support the groups cluster). Afterwards we got a group table size of 200( 4*50). After filling the group table the read group table request will fail (no really fails but not all groups will be added to the response). It seems that max packet size is reached and therefore some entries are missing!!

Probably this is because of a NO_MEMORY error within ReadGroupTable method of group key management server implementation. We based our implementation on Tag v1.2.0.1 of connectedhomeip repository.

@bzbarsky-apple
Copy link
Contributor

If the list is too long, it won't fit in a single packet, but will get chunked into multiple packets, so that part should be ok....

@Christoph6848
Copy link
Author

@bzbarsky-apple this is exactly the problem that the list will NOT be chunked. I assume the problem in TLVWriter. I changed already mUseChainedBuffers to true but it seems that in this case I got some more issues.

@Christoph6848
Copy link
Author

It seens that the problem is in Engine.cpp .... In BuildAndSendSingleReportData the writer is configured to support only one Packet wih SDU size. This will be used for ReadGroupTable and therefore some entries are missing.

@bzbarsky-apple Could you please confirm this?? I recognized that in the meantime (newer version than 1.2.0.1) some changes are done in the area so the size of the buffer is increased.

@Christoph6848
Copy link
Author

Attached the calling tree when error occurs

image

Use Case: Read the group table containing more bytes than supported by a single package. In this case as mentioned just some entries are missing and the resposne is send w/o this. I assume the problem is that RetrieveClusterdData (Engine.cpp) does not return an error and therefore the response is send directly despite the fact not all entries are encoded.

We have to stay on 1.2.0.1 branch therefore it would be very nice to have a fix available based on this branch. With the bug we will not have the chance to get our certification.

@bzbarsky-apple
Copy link
Contributor

this is exactly the problem that the list will NOT be chunked

Ah! Because ReadGroupTable is buggy... Fix coming up.

@Christoph6848
Copy link
Author

this is exactly the problem that the list will NOT be chunked

Ah! Because ReadGroupTable is buggy... Fix coming up.

Thank you very much!!! Hopefully than also based on Tag v1.2.0.1. We planned our certification to be done on February .....

@bzbarsky-apple
Copy link
Contributor

It depends on how easy it is to backport the fix. You might need to do that part yourself....

bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jan 7, 2025
…ectly.

We were not propagating out encoding status, so when we reached end of packet we
would just silently return CHIP_NO_ERROR instead of indicating we had more data
to encode.

Fixes project-chip#36882
@Christoph6848
Copy link
Author

Thank you very much for your support!!! The fix helps and was more or less easy to integrate in our software. You saved (for the moment) our project schedule.

Nevetheless it seems that in case of CHIP_CONFIG_MAX_GROUP_ENDPOINTS_PER_FABRIC 50 now we got an timeout error probably due to very, very fast KVS storage writes which are even in case of running the software on a powerful Linux PC leads to a timeout. We used the CHIPLinuxStorageIni.cpp for KVS storage. Probably I have to reduce "real" file system writes to run the test successful.

Or do you have another idea for this issue?? In case of disabling file system writes at all (comment the CommitConfig method) the test runs fine w/o any problems.

@bzbarsky-apple
Copy link
Contributor

@Christoph6848 I think your actual storage needs to do something to throttle disk writes... CHIPLinuxStorageIni.cpp is not very efficient that way, as you note. :(

This issue keeps coming up every so often; maybe there is something groups could do to not stress storage so much, but no good suggestions for it yet.

@Christoph6848
Copy link
Author

@bzbarsky-apple It seems that also in case of disabling persistent storage at all (by patching the Commit Message) on target device (also a Linux system) the timeout occurs during GroupKeySetWrite ....

I attached a Log of the TC-RR-1.1 run for Test harness and the DUT. I recognized a lot of "Dropped ..." messages perhaps this is the root cause.

Test Harness Log:
tc_rr_11_timeout.log

DUT Log:
dut_tc_rr_11.log

@Christoph6848 Christoph6848 reopened this Jan 8, 2025
@github-project-automation github-project-automation bot moved this from Done to In Progress in [Device Type] Bridge Jan 8, 2025
@mergify mergify bot closed this as completed in #36975 Jan 9, 2025
@mergify mergify bot closed this as completed in c459b64 Jan 9, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in [Device Type] Bridge Jan 9, 2025
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jan 9, 2025
…ectly. (project-chip#36975)

We were not propagating out encoding status, so when we reached end of packet we
would just silently return CHIP_NO_ERROR instead of indicating we had more data
to encode.

Fixes project-chip#36882
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jan 9, 2025
…ectly. (project-chip#36975)

We were not propagating out encoding status, so when we reached end of packet we
would just silently return CHIP_NO_ERROR instead of indicating we had more data
to encode.

Fixes project-chip#36882
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jan 9, 2025
…ectly. (project-chip#36975)

We were not propagating out encoding status, so when we reached end of packet we
would just silently return CHIP_NO_ERROR instead of indicating we had more data
to encode.

Fixes project-chip#36882
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jan 9, 2025
…ectly. (project-chip#36975)

We were not propagating out encoding status, so when we reached end of packet we
would just silently return CHIP_NO_ERROR instead of indicating we had more data
to encode.

Fixes project-chip#36882
@bzbarsky-apple
Copy link
Contributor

I recognized a lot of "Dropped ..." messages perhaps this is the root cause.

No, that's just saying the event (in the IM sense) buffer is full and old events are being evicted; that's normal. So something else is going on.

andy31415 pushed a commit that referenced this issue Jan 10, 2025
…ectly. (#36975) (#37019)

We were not propagating out encoding status, so when we reached end of packet we
would just silently return CHIP_NO_ERROR instead of indicating we had more data
to encode.

Fixes #36882
andy31415 pushed a commit that referenced this issue Jan 10, 2025
…ectly. (#36975) (#37020)

We were not propagating out encoding status, so when we reached end of packet we
would just silently return CHIP_NO_ERROR instead of indicating we had more data
to encode.

Fixes #36882
andy31415 pushed a commit that referenced this issue Jan 10, 2025
…ectly. (#36975) (#37021)

We were not propagating out encoding status, so when we reached end of packet we
would just silently return CHIP_NO_ERROR instead of indicating we had more data
to encode.

Fixes #36882
Alami-Amine pushed a commit to Alami-Amine/connectedhomeip that referenced this issue Jan 12, 2025
…ectly. (project-chip#36975)

We were not propagating out encoding status, so when we reached end of packet we
would just silently return CHIP_NO_ERROR instead of indicating we had more data
to encode.

Fixes project-chip#36882
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
2 participants