Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[watermarkorch] add watermarkorch, extend queue and pg counters with wat… #629

Merged
merged 5 commits into from
Nov 23, 2018

Conversation

mykolaf
Copy link
Collaborator

@mykolaf mykolaf commented Sep 26, 2018

…ermark stats

What I did

  • Add new orch - watermarkorch
  • add new flex counter groups - QUEUE_WATERMARK, PG_WATERMARK
  • add new queue & pg flex plugins - watermark_queue.lua, watermark_pg.lua
    Why I did it
    This is new code forthe watermarks feature
    High Level Design Document
    How I verified it
    Manually verified on mlnx switch.
    Details if related

@mykolaf mykolaf force-pushed the wm branch 4 times, most recently from 489ab81 to e2df6cf Compare September 27, 2018 18:40
@lguohan
Copy link
Contributor

lguohan commented Oct 3, 2018

retest this please

@mykolaf mykolaf force-pushed the wm branch 2 times, most recently from adf20ae to ef7a81f Compare October 12, 2018 09:18
Mykola Faryma added 2 commits October 12, 2018 09:30
…ermark stats

Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
@mykolaf mykolaf force-pushed the wm branch 3 times, most recently from b7a9a41 to de001b4 Compare October 16, 2018 11:37
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
@mykolaf
Copy link
Collaborator Author

mykolaf commented Oct 16, 2018

Depends on sonic-utilities sonic-net/sonic-utilities#327

@wendani wendani self-requested a review October 16, 2018 16:59
m_telemetryInterval = to_uint<uint32_t>(i.second.c_str());
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else case of not supported key?

{
counters_stream << delimiter << sai_serialize_ingress_priority_group_stat(it);
delimiter = comma;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add the pg watermark stats here, they will be enabled if the corresponding "FLEX_COUNTER_TABLE|FLEX_COUNTER_STATUS" is "enabled" in the CONFIG_DB. We may take the similar approach as in the PFCWD that we listen to the SET_COMMAND and DEL_COMMAND to be able to enable and disable polling watermark stats at run time. Similar case for the queue watermark stats above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syncd has the logic for listening to FC STATUS in place:
https://github.com/Azure/sonic-sairedis/blob/master/syncd/syncd.cpp#L2767
There is also the ability to enable/disable WM polling via counterpoll utility:
https://github.com/Azure/sonic-utilities/blob/614dbeec6a0bb99c0020e276f95cfe60bc6a1a60/counterpoll/main.py#L99

#include "timer.h"

extern "C" {
#include "sai.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orch.h is already including sai.h

void doTask(NotificationConsumer& consumer);
void doTask(SelectableTimer &timer);

void init_pg_ids();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we follow same format for all member functions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@@ -505,6 +506,7 @@ void PfcWdSwOrch<DropHandler, ForwardHandler>::registerInWdDb(const Port& port,
if (!c_queueStatIds.empty())
{
string str = counterIdsToStr(c_queueStatIds, sai_serialize_queue_stat);
queueFieldValues.emplace_back(STATS_MODE_FIELD, STATS_MODE_READ);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for PORT_STATS, QUEUE_STATS, PG_WATERMARK_STATS, and QUEUE_WATERMARK_STATS, STATS_MODE is set to the corresponding GROUP_TABLE. For PFCWD, it is set to FLEX_COUNTER_TABLE. Why the difference?

Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>

vector<FieldValueTuple> fieldValues;
fieldValues.emplace_back(QUEUE_PLUGIN_FIELD, queueWmSha);
fieldValues.emplace_back(POLL_INTERVAL_FIELD, QUEUE_WATERMARK_FLEX_STAT_COUNTER_POLL_MSECS);
Copy link
Contributor

@wendani wendani Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into the flex counter thread in syncd. Before I get more acquainted, a general question in my mind is that each flex counter table has its own polling interval. How does the flex counter thread in syncd make sure each one gets polled and serviced just on time when the interval expires?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the code in sycnd. flex counter thread is per flex counter table. So it makes sense to me now.

m_pgTable = unique_ptr<Table>(new Table(m_counter_db.get(), COUNTERS_PG_NAME_MAP));
m_pgPortTable = unique_ptr<Table>(new Table(m_counter_db.get(), COUNTERS_PG_PORT_MAP));
m_pgIndexTable = unique_ptr<Table>(new Table(m_counter_db.get(), COUNTERS_PG_INDEX_MAP));

m_flex_db = shared_ptr<DBConnector>(new DBConnector(FLEX_COUNTER_DB, DBConnector::DEFAULT_UNIXSOCKET, 0));
m_flexCounterTable = unique_ptr<ProducerTable>(new ProducerTable(m_flex_db.get(), FLEX_COUNTER_TABLE));
m_flexCounterGroupTable = unique_ptr<ProducerTable>(new ProducerTable(m_flex_db.get(), FLEX_COUNTER_GROUP_TABLE));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general question: why the flex counter table and the flex counter group table need to be ProducerTable? Qi mentioned that ProducerTable is a FIFO queue. So it is used when we care about the sequences the redis request is queued.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mykolaf We should use ProducerTable for flex counter table and flex counter group table. This is what used in the pfcwd. Just curious about the reason behind it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenda Flex counter uses ConsumerTable, so we use the related producer. There is no specific reason for using it, it was just implemented that way.

@mykolaf mykolaf force-pushed the wm branch 2 times, most recently from cc3f266 to 3881120 Compare October 19, 2018 16:20
@qiluo-msft
Copy link
Contributor

retest this please

@lguohan
Copy link
Contributor

lguohan commented Oct 26, 2018

retest this please

@mykolaf
Copy link
Collaborator Author

mykolaf commented Oct 29, 2018

VS test dependencies:

  1. "set counter" infra (sonic-sairedis):
    Add VS support for setting stats via redis DB channel sonic-sairedis#366
  2. watermark CLI
    [watermarks] add watermarkstat, watermarkcfg and aliases sonic-utilities#327
    @wendani could you take a look?

Probably sonic-utilities pointer also needs to be updated in order to support WM CLI in the docker-vs)

@lguohan
Copy link
Contributor

lguohan commented Oct 31, 2018

retest this please

@mykolaf
Copy link
Collaborator Author

mykolaf commented Nov 2, 2018

@lguohan all 3 of the watermark tests passed, but some other recently introduced tests failed (

@lguohan
Copy link
Contributor

lguohan commented Nov 2, 2018

retest this please

@lguohan
Copy link
Contributor

lguohan commented Nov 2, 2018

@mykolaf , can you sign the cla?

Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
@mykolaf
Copy link
Collaborator Author

mykolaf commented Nov 2, 2018

@lguohan That was some bug, I already signed it 8 month ago.
Anyway, now it shows that it's signed.

vector<FieldValueTuple> pgPortVector;
vector<FieldValueTuple> pgIndexVector;

for (size_t pgIndex = 0; pgIndex < port.m_priority_group_ids.size(); ++pgIndex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lossy PG does not have SAI_INGRESS_PRIORITY_GROUP_STAT_XOFF_ROOM_WATERMARK_BYTES. So the query value will always return 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on the SAI implementation of it. If the SAI returns some status other than success, the flex counter will mark the counter as unsupported and stop polling. For this cases we will show the value as 'N/A' in the output.
But having made a little test, I see that I get a 0 for lossy pg. Is this different on some other platform?
From my point of view, it seems logical that even lossy PG has a counter for headroom watermark, even if it has lossy profile applied. Looks like the structure of PG's counters doesn't care of pg being lossy/lossless.

@lguohan
Copy link
Contributor

lguohan commented Nov 21, 2018

retest this please

2 similar comments
@lguohan
Copy link
Contributor

lguohan commented Nov 21, 2018

retest this please

@lguohan
Copy link
Contributor

lguohan commented Nov 23, 2018

retest this please

@lguohan lguohan merged commit b750a4b into sonic-net:master Nov 23, 2018
@mykolaf mykolaf deleted the wm branch February 21, 2019 11:59
m_appDb.get(),
"WATERMARK_CLEAR_REQUEST");
auto clearNotifier = new Notifier(m_clearNotificationConsumer, this, "WM_CLEAR_NOTIFIER");
Orch::addExecutor(clearNotifier);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For user issued clear, why not just directly clear through the CLI but signal through an APPL_DB event channel, and delegate to WatermarkOrch to do so? What is the benefit here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed in the draft design of the feature, in current implementation it doesn't give any advantages.

if (pg_shared_wm) then
redis.call('HSET', periodic_table_name .. ':' .. KEYS[i], 'SAI_INGRESS_PRIORITY_GROUP_STAT_SHARED_WATERMARK_BYTES', periodic_shared_wm and math.max(tonumber(pg_shared_wm), tonumber(periodic_shared_wm)) or pg_shared_wm)
redis.call('HSET', persistent_table_name .. ':' .. KEYS[i], 'SAI_INGRESS_PRIORITY_GROUP_STAT_SHARED_WATERMARK_BYTES', persistent_shared_wm and math.max(tonumber(pg_shared_wm), tonumber(persistent_shared_wm)) or pg_shared_wm)
redis.call('HSET', user_table_name .. ':' .. KEYS[i], 'SAI_INGRESS_PRIORITY_GROUP_STAT_SHARED_WATERMARK_BYTES', user_shared_wm and math.max(tonumber(pg_shared_wm), tonumber(user_shared_wm)) or pg_shared_wm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of ternary operator is fancy, but we do not need to issue a redis hset command if the new value is no greater than the historic high, which is an expense operation in the context of a syncd flex counter thread.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
Need to mirror the neighbor solicitation packets - type 135.

Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
oleksandrivantsiv pushed a commit to oleksandrivantsiv/sonic-swss that referenced this pull request Mar 1, 2023
…onic-net#629)

* add a README to tests directory to describe how to run 'make check'

* small spelling and grammar fix

Co-authored-by: Syd Logan <slogan621@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants