Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[show] Add show ssd CLI #502

Closed
wants to merge 2 commits into from
Closed

Conversation

mudsut4ke
Copy link

- What I did
Add commands to display SSD information

  ssd_all               Execute all test iterms of SSD
  ssd_badblock          Check the later bad block status which may...
  ssd_capcity           Show ssd capcity
  ssd_firmwareinfo      Show ssd fwinfo
  ssd_health            Check the health status
  ssd_help              Show all SSD cmd
  ssd_pecycle           Show ssd P/E cycle
  ssd_remaintime        Show ssd Remaining Time
  ssd_sn                Show ssd serial number
  ssd_temperature       Read SSD temperature C

- How I did it
Add function to display ssd information to 'show/main.py'

- How to verify it

  1. Install smartmontool ([docker-platform-monitor]: Add smartmontools 6.6-1 sonic-buildimage#2703)
  2. Run show ssd command as above

- Previous command output (if the output of a command-line utility has changed)

admin@sonic:~$ show
Usage: show [OPTIONS] COMMAND [ARGS]...

  SONiC command line - 'show' command

Options:
  -?, -h, --help  Show this message and exit.

Commands:
  aaa                   Show AAA configuration
  acl                   Show ACL related information
  arp                   Show IP ARP table
  clock                 Show date and time
  ecn                   Show ECN configuration
  environment           Show environmentals (voltages, fans, temps)
  interfaces            Show details of the network interfaces
  ip                    Show IP (IPv4) commands
  ipv6                  Show IPv6 commands
  line                  Show all /dev/ttyUSB lines and their info
  lldp                  LLDP (Link Layer Discovery Protocol)...
  logging               Show system log
  mac                   Show MAC (FDB) entries
  mirror_session        Show existing everflow sessions
  mmu                   Show mmu configuration
  ndp                   Show IPv6 Neighbour table
  ntp                   Show NTP information
  pfc                   Show details of the priority-flow-control...
  platform              Show platform-specific hardware info
  priority-group        Show details of the PGs
  processes             Display process information
  queue                 Show details of the queues
  reboot-cause          Show cause of most recent reboot
  route-map             show route-map
  runningconfiguration  Show current running configuration...
  services              Show all daemon services
  startupconfiguration  Show startup configuration information
  system-memory         Show memory information
  tacacs                Show TACACS+ configuration
  techsupport           Gather information for troubleshooting
  uptime                Show system uptime
  users                 Show users
  version               Show version information
  vlan                  Show VLAN information
  warm_restart          Show warm restart configuration and state
  watermark             Show details of watermark

- New command output (if the output of a command-line utility has changed)

admin@sonic:~$ show
Usage: show [OPTIONS] COMMAND [ARGS]...

  SONiC command line - 'show' command

Options:
  -?, -h, --help  Show this message and exit.

Commands:
  aaa                   Show AAA configuration
  acl                   Show ACL related information
  arp                   Show IP ARP table
  clock                 Show date and time
  ecn                   Show ECN configuration
  environment           Show environmentals (voltages, fans, temps)
  interfaces            Show details of the network interfaces
  ip                    Show IP (IPv4) commands
  ipv6                  Show IPv6 commands
  line                  Show all /dev/ttyUSB lines and their info
  lldp                  LLDP (Link Layer Discovery Protocol)...
  logging               Show system log
  mac                   Show MAC (FDB) entries
  mirror_session        Show existing everflow sessions
  mmu                   Show mmu configuration
  ndp                   Show IPv6 Neighbour table
  ntp                   Show NTP information
  pcie_checkid          Chedk Vender ID and Devie ID Check
  pcie_help             Show all SSD cmd
  pcie_lnkspeed         Chedk the link speed
  pfc                   Show details of the priority-flow-control...
  platform              Show platform-specific hardware info
  priority-group        Show details of the PGs
  processes             Display process information
  queue                 Show details of the queues
  reboot-cause          Show cause of most recent reboot
  route-map             show route-map
  runningconfiguration  Show current running configuration...
  services              Show all daemon services
  ssd_all               Execute all test iterms of SSD
  ssd_badblock          Check the later bad block status which may...
  ssd_capcity           Show ssd capcity
  ssd_firmwareinfo      Show ssd fwinfo
  ssd_health            Check the health status
  ssd_help              Show all SSD cmd
  ssd_pecycle           Show ssd P/E cycle
  ssd_remaintime        Show ssd Remaining Time
  ssd_sn                Show ssd serial number
  ssd_temperature       Read SSD temperature C
  startupconfiguration  Show startup configuration information
  system-memory         Show memory information
  tacacs                Show TACACS+ configuration
  techsupport           Gather information for troubleshooting
  uptime                Show system uptime
  users                 Show users
  version               Show version information
  vlan                  Show VLAN information
  warm_restart          Show warm restart configuration and state
  watermark             Show details of watermark

@mudsut4ke
Copy link
Author

mudsut4ke commented Apr 8, 2019

@sandycelestica Please follow this PR

Copy link
Contributor

@jleveque jleveque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make 'ssd' a group and all ssd-related commands should be nested under that group. This will help consolidate the commands. For example,

show ssd all
show ssd badblock
show ssd capcity
...

Many examples of groups like this already exist in this file.

* [show] update ssd cli format
* [show] fix invalid capacity text
click.echo(line.strip())
return
else:
for line in output:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code block repeats 4 times. Function?



#
# function name : print_test_title
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why printing firmware version or capacity is a "test"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this part later.If you have the detail request of the content.Please let me know.I plan to display like this:"Show SSD Firmware info"


checkin = 1
testname = "Vender ID and Devie ID Check Test"
command = "sudo lspci -n | grep b960 "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is b960? vid? pid?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b960 is the switch chip(BCM56960) of Seastone. Here we just check switch chip info.

else:
checkin = 0
click.echo(line.strip())
click.echo("Device Model Not Match 3ME3 3ME4 or 3IE3")
Copy link
Collaborator

@andriymoroz-mlnx andriymoroz-mlnx Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to extend the code if disk vendor is not one from the list?
upd:
to make it extendable consider using plugins model
see: https://github.com/Azure/SONiC/wiki/SSD-Health-design-(Draft)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggest. Currently this SSD diag is just for Seastone. So not support other types of SSD right now. And we are planning to extend it.

Copy link
Collaborator

@andriymoroz-mlnx andriymoroz-mlnx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show utility is mostly a wrapper
I suggest to have a separate utility (like sfputil, psuutil, intfutil, etc) and call it from show

@mudsut4ke
Copy link
Author

@sandycelestica, Please check the above comment

checkin = 1

if(checkin == 1):
remainingtime = poweron * 100 / 7.62 - poweron
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remainingtime = poweron * 100 / 7.62 - poweron
remainingtime = poweron * (100 / 7.62 - 1)
remainingtime = poweron * 12.12
what is remaining time for SSD?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a formula to calculate this value:
ID Hex Attribute

09 (09) Power on Hours
To estimate remaining hours of use: ID# 9 (100/7.26) – 1 (x ID# 9)

In our code,
if "Power_On_Hours" in line:
rawval = line.split()[-1]
poweron = int(rawval)
checkin = 1

we get the poweron first and then calculate the remaining time follow the formula.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't understand why remaining time is calculated as poweron*12
why remaining is always greater?

@jleveque
Copy link
Contributor

@mudsut4ke, @sandycelestica: Are you aware of this in-progress design for SSD health: https://github.com/Azure/SONiC/wiki/SSD-Health-design-(Draft)?

@sandycelestica
Copy link
Contributor

show utility is mostly a wrapper
I suggest to have a separate utility (like sfputil, psuutil, intfutil, etc) and call it from show

From the beginning we discussed with Guo Han. He suggested to modify the main.py in show. So we merged it into show.

@mudsut4ke
Copy link
Author

mudsut4ke commented Apr 10, 2019

Add @grantshao Please help Sandy review this change

@mudsut4ke
Copy link
Author

show utility is mostly a wrapper
I suggest to have a separate utility (like sfputil, psuutil, intfutil, etc) and call it from show

From the beginning we discussed with Guo Han. He suggested to modify the main.py in show. So we merged it into show.

@lguohan, Do you have a comment ?

checkin = 1

if(checkin == 1):
remainingtime = poweron * 100 / 7.62 - poweron
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't understand why remaining time is calculated as poweron*12
why remaining is always greater?


@ssd.command()
@click.argument("device")
def all(device):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all() can call other functions and not duplicate the code...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,I'll update it as your suggestion.Thanks



#
# check_pcie_speed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to have pcie in a separate PR
or at least fix the subject...

Copy link
Collaborator

@liat-grozovik liat-grozovik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment i think it is bad practice to add a CLI which can work on specific platforms.
The idea is that the code/utility provided can be extended to other vendors at any time and for that it should be designed to be like that.

Also, as this is only a utility I believe it is only show the current value which is good but it is partially. I think it should be added as a new API to the Platform API as a component and can be monitored as well. for example if some threshold expires we raise an error so the user/admin can handle.

In addition as it is relevant to multiple vendors we usually discuss the design before the PR in the community meeting or at least share a design doc. Can you please share?

@liat-grozovik
Copy link
Collaborator

We had an SSD HLD in the last community meeting. I think merging this CLI now will contradict the design held. The design is also referred earlier on this PR discussion.
My suggestion is to close the PR and to wait for the SSD implementation by Andiry Moroz.

@andriymoroz-mlnx could you please share the design document in https://github.com/Azure/SONiC/tree/master/doc as PRfollowing the review comments?

@andriymoroz-mlnx
Copy link
Collaborator

@andriymoroz-mlnx could you please share the design document in https://github.com/Azure/SONiC/tree/master/doc as PRfollowing the review comments?

Created https://github.com/Azure/SONiC/pull/378/files

@mudsut4ke mudsut4ke closed this May 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants