Skip to content

Commit

Permalink
powerpc/powernv: Read OPAL error log and export it through sysfs
Browse files Browse the repository at this point in the history
Based on a patch by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

This patch adds support to read error logs from OPAL and export
them to userspace through a sysfs interface.

We export each log entry as a directory in /sys/firmware/opal/elog/

Currently, OPAL will buffer up to 128 error log records, we don't
need to have any knowledge of this limit on the Linux side as that
is actually largely transparent to us.

Each error log entry has the following files: id, type, acknowledge, raw.
Currently we just export the raw binary error log in the 'raw' attribute.
In a future patch, we may parse more of the error log to make it a bit
easier for userspace (e.g. to be able to display a brief summary in
petitboot without having to have a full parser).

If we have >128 logs from OPAL, we'll only be notified of 128 until
userspace starts acknowledging them. This limitation may be lifted in
the future and with this patch, that should "just work" from the linux side.

A userspace daemon should:
- wait for error log entries using normal mechanisms (we announce creation)
- read error log entry
- save error log entry safely to disk
- acknowledge the error log entry
- rinse, repeat.

On the Linux side, we read the error log when we're notified of it. This
possibly isn't ideal as it would be better to only read them on-demand.
However, this doesn't really work with current OPAL interface, so we
read the error log immediately when notified at the moment.

I've tested this pretty extensively and am rather confident that the
linux side of things works rather well. There is currently an issue with
the service processor side of things for >128 error logs though.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
  • Loading branch information
stewartsmith authored and ozbenh committed Mar 7, 2014
1 parent 60b9622 commit 774fea1
Show file tree
Hide file tree
Showing 6 changed files with 394 additions and 1 deletion.
60 changes: 60 additions & 0 deletions Documentation/ABI/stable/sysfs-firmware-opal-elog
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
What: /sys/firmware/opal/elog
Date: Feb 2014
Contact: Stewart Smith <stewart@linux.vnet.ibm.com>
Description:
This directory exposes error log entries retrieved
through the OPAL firmware interface.

Each error log is identified by a unique ID and will
exist until explicitly acknowledged to firmware.

Each log entry has a directory in /sys/firmware/opal/elog.

Log entries may be purged by the service processor
before retrieved by firmware or retrieved/acknowledged by
Linux if there is no room for more log entries.

In the event that Linux has retrieved the log entries
but not explicitly acknowledged them to firmware and
the service processor needs more room for log entries,
the only remaining copy of a log message may be in
Linux.

Typically, a user space daemon will monitor for new
entries, read them out and acknowledge them.

The service processor may be able to store more log
entries than firmware can, so after you acknowledge
an event from Linux you may instantly get another one
from the queue that was generated some time in the past.

The raw log format is a binary format. We currently
do not parse this at all in kernel, leaving it up to
user space to solve the problem. In future, we may
do more parsing in kernel and add more files to make
it easier for simple user space processes to extract
more information.

For each log entry (directory), there are the following
files:

id: An ASCII representation of the ID of the
error log, in hex - e.g. "0x01".

type: An ASCII representation of the type id and
description of the type of error log.
Currently just "0x00 PEL" - platform error log.
In the future there may be additional types.

raw: A read-only binary file that can be read
to get the raw log entry. These are
<16kb, often just hundreds of bytes and
"average" 2kb.

acknowledge: Writing 'ack' to this file will acknowledge
the error log to firmware (and in turn
the service processor, if applicable).
Shortly after acknowledging it, the log
entry will be removed from sysfs.
Reading this file will list the supported
operations (curently just acknowledge).
13 changes: 13 additions & 0 deletions arch/powerpc/include/asm/opal.h
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,11 @@ extern int opal_enter_rtas(struct rtas_args *args,
#define OPAL_LPC_READ 67
#define OPAL_LPC_WRITE 68
#define OPAL_RETURN_CPU 69
#define OPAL_ELOG_READ 71
#define OPAL_ELOG_WRITE 72
#define OPAL_ELOG_ACK 73
#define OPAL_ELOG_RESEND 74
#define OPAL_ELOG_SIZE 75
#define OPAL_FLASH_VALIDATE 76
#define OPAL_FLASH_MANAGE 77
#define OPAL_FLASH_UPDATE 78
Expand Down Expand Up @@ -823,6 +828,13 @@ int64_t opal_lpc_write(uint32_t chip_id, enum OpalLPCAddressType addr_type,
uint32_t addr, uint32_t data, uint32_t sz);
int64_t opal_lpc_read(uint32_t chip_id, enum OpalLPCAddressType addr_type,
uint32_t addr, __be32 *data, uint32_t sz);

int64_t opal_read_elog(uint64_t buffer, size_t size, uint64_t log_id);
int64_t opal_get_elog_size(uint64_t *log_id, size_t *size, uint64_t *elog_type);
int64_t opal_write_elog(uint64_t buffer, uint64_t size, uint64_t offset);
int64_t opal_send_ack_elog(uint64_t log_id);
void opal_resend_pending_logs(void);

int64_t opal_validate_flash(uint64_t buffer, uint32_t *size, uint32_t *result);
int64_t opal_manage_flash(uint8_t op);
int64_t opal_update_flash(uint64_t blk_list);
Expand Down Expand Up @@ -863,6 +875,7 @@ extern void opal_get_rtc_time(struct rtc_time *tm);
extern unsigned long opal_get_boot_time(void);
extern void opal_nvram_init(void);
extern void opal_flash_init(void);
extern int opal_elog_init(void);

extern int opal_machine_check(struct pt_regs *regs);
extern bool opal_mce_check_early_recovery(struct pt_regs *regs);
Expand Down
2 changes: 1 addition & 1 deletion arch/powerpc/platforms/powernv/Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
obj-y += setup.o opal-takeover.o opal-wrappers.o opal.o
obj-y += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
obj-y += rng.o
obj-y += rng.o opal-elog.o

obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_PCI) += pci.o pci-p5ioc2.o pci-ioda.o
Expand Down
Loading

0 comments on commit 774fea1

Please sign in to comment.