Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement Request: Alerts from Vitals #158

Closed
DerickJohnson opened this issue Jan 14, 2023 · 32 comments
Closed

Enhancement Request: Alerts from Vitals #158

DerickJohnson opened this issue Jan 14, 2023 · 32 comments
Labels
enhancement New feature or request

Comments

@DerickJohnson
Copy link

Hi Jason!

I love the work you've done with pypowerwall and the dashboard. It's helped IMMENSELY when trying to talk to support when they often don't have the information they need to help.

I was wondering if there was an easy way to add the alerts information from :8675/vitals to the influxdb for monitoring long term (to see how they change over time in different scenarios). I want to try and create a panel for the information but I didn't see it in the current set of vitals.

My powerwall seems to get stuck in infinite loops trying to update firmware and then failing to update (FWUpdateFailed) bringing the whole system down. I use the CLI to pull the alerts but they change constantly. It would be nice to capture how they change in the dashboard to see if there are any patterns. Let me know if there's an easy way to get these into influx.

Thank you again for all the great work!

@DerickJohnson
Copy link
Author

Actually, I'm going to try and figure this out and report back here in case others want an example for the same thing.

My first attempt is using the telegraf.local file:

[[inputs.http]]
urls = [
"http://pypowerwall:8675/vitals"
]
name_override = "alerts"
method = "GET"
insecure_skip_verify = true
timeout = "4s"
data_format = "json"
json_query = "[my gateway identifier from json].alerts"

Getting a parse error in the telegraf logs, so going to experiment some more.

@jasonacox
Copy link
Owner

Awesome! Thanks for opening this @DerickJohnson - Let us know what you come up with.

FYI - The pypowerwall proxy has a macro that aggregates all alerts if that helps: http://pypowerwall:8675/alerts

@BuongiornoTexas
Copy link
Contributor

Getting a parse error in the telegraf logs, so going to experiment some more.

Try json_query = "[my gateway identifier from json].alerts.0". If this fixes the parse error, the problem is the query is returning an array, while telegraf/influx is expecting a unique value at a time stamp.

@DerickJohnson
Copy link
Author

Getting a parse error in the telegraf logs, so going to experiment some more.

Try json_query = "[my gateway identifier from json].alerts.0". If this fixes the parse error, the problem is the query is returning an array, while telegraf/influx is expecting a unique value at a time stamp.

Thank you @BuongiornoTexas! Yes, I noticed that issue in the docs about it looking for an object or array of objects. I’m going to continue later in the evening looking at the processors for influx to convert the array to an object or array of objects. I want to experiment with something like a cached set of alerts with true/false values so as new alerts come in, they get cached and show a false value so over time I can see which are flagging and choose to filter out ones that are not relevant. I’ll update here as I make progress.

@DerickJohnson
Copy link
Author

The array is unfortunately dynamically sized for only the alerts showing up and they won’t be in the same position even as the alerts present are constant over multiple calls.

@BuongiornoTexas
Copy link
Contributor

You could follow jasoncox's example and create a pypowerwall proxy page similar to vitals that returns a json dictionary of all active alerts, or maybe even all possible alerts and true/false values for the alerts that are active/inactive. This would deal with the problem of uniqueness.

@DerickJohnson
Copy link
Author

Didn’t think of that, I’ll check it out!

@DerickJohnson
Copy link
Author

DerickJohnson commented Jan 16, 2023

Alright, so I got something working, but I'm still trying to figure out the best way of representing it in a Grafana Panel. I ended up creating an additional proxy page per @BuongiornoTexas's recommendation to create a dictionary with 1 values for those errors that are on. I then added a starlark processor to keep track of states that have flagged in the past so that the metric shows a value for the errors that no longer show up (there may be a way to utilize "null, undefined or NaN" values, but the couple methods I tried didn't work as I had expected in the chart). I'm going to experiment more tomorrow with the Grafana component.

Here are the minor updates I made in the meantime:

server.py

elif self.path == '/alerts/pw':     # should this be a different url? I followed temps                                                  
     # Alerts in dictionary format                                                                         
      pwalerts = {}                                                                     
      idx = 1                                                                           
      alerts = pw.alerts()                                                              
      for alert in alerts:                                                              
           pwalerts[alert] = 1                                                           
      message = json.dumps(pwalerts)     

Which shows the data at the endpoint /alerts/pw for now like this:
image

telegraf.local

[[inputs.http]]
        urls = [
                "http://pypowerwall:8675/alerts/pw"
        ]
        name_override = "alerts"
        method = "GET"
        insecure_skip_verify = true
        timeout = "4s"
        data_format = "json"
        
[[processors.starlark]]
source = '''
state = {
        "last": {}
}
def dict_union(x, y):
        z = {}
        z.update(x)
        z.update(y)
        return z
def apply(metric):
        url = metric.tags.get("url")
        last = state["last"]
        if url and url  == "http://pypowerwall:8675/alerts/pw":
                base = {x: 0 for x in metric.fields.keys()} #For updating existing total key set
                current = {x: 1 for x in metric.fields.keys()} #Currently flagging keys
                result = dict_union(last,current)
                state["last"] = dict_union(last, base)
                new_metric = Metric("all_alerts")
                for k, v in result.items():
                        new_metric.fields[str(k)] = v
                return new_metric
        else:
                return metric
'''

Which reformats the data so that errors that no longer show up still have a key with a value of 0 for continued coverage (as long as telegraf is up).

I then used this query

SELECT *::field from raw.all_alerts

and the "Status History" visualization to get this:
image

So still some work to do to finalize a couple of things in the visuals (to make sure the data format is right). Once things look good I can submit this as an additional commented example in the telegraf.local.sample file as well as the new endpoint in server.py (and allow people to create their own visuals) or include a visual as well.

If this is something you think others will want to use of course, it could just be for me since I have so many errors all the time 😄

As a side note, I'm primarily a JavaScript developer, so I had to learn a few things to get familiar with the setup. If anything is done incorrectly, I apologize. I couldn't use any of my fancy object spread or array operators that I'm used to.

@jasonacox
Copy link
Owner

Nice job @DerickJohnson ! On the extension of server.py, I could accept that as a PR for pypowerwall if you wanted to submit it.

I think the dynamic horizontal history graph is useful. I suggest removing the "OK" and "Err" text and just use a color vs. blank (transparent) to indicate state, since some alerts are actually positive (not Err). The only problem is that some of these alerts seem to have a long TTL. For example, "FWUpdateSucceeded" stays lit, similar to the "FWUpdateFailed". I suppose it would still be useful information to see (appear/disappear).

I have also thought it would be nice to have a vertical scrolling time log of alerts. Each alert would have a timestamp of when it shows up, perhaps even events we define like (PW at 100%, Storm Event, Grid offline, Reserve Level changed). It could be joined to your panel. Something like this (ignore the data - just example):

image

@DerickJohnson
Copy link
Author

Absolutely, I haven't perfected that visual yet since what you mentioned is true about some alerts being positive when present so not really something to want to be alerted about (all the data is probably good to keep but not to show). I'll submit that PR and keep working on the visual. I like that scrolling time log idea as well.

@DerickJohnson
Copy link
Author

DerickJohnson commented Jan 22, 2023

Hey Jason! I'm going to close this issue as the request is complete now. I ended up using the state transition visual to help me find out why (or at least get more information around) the system getting stuck in an infinite loop and not being able to operate. I think I narrowed it down to losing connection to the meters that do all the measurements. The PVInverter comms alert flagging seems to be a leading indicator before it dies (not just in this instance but in the many others I have). Here's a fun visual to show you what I was working with and how the alerts timeline is helping. You can see the disabledRelay and PVInverterComms around the time it goes out and the PVInverterComms never returns (the reason it stays red is because even after a reboot, it's there with systemConnectedToGrid as the first alerts). All others are reset:

image

I also did a forced grid outage in there just to see what alerts came up :-). The blue color is for "informational" alerts as discussed earlier in the thread. I used overrides in Grafana for that.

@jasonacox
Copy link
Owner

@DerickJohnson - This is amazing! Can you share the panel or dashboard JSON for anyone else wanting to set it up?

Did this help you make a case with Tesla?

@DerickJohnson
Copy link
Author

DerickJohnson commented Jan 24, 2023

Absolutely, here is the Panel JSON:

{
  "id": 65,
  "gridPos": {
    "h": 16,
    "w": 16,
    "x": 0,
    "y": 1
  },
  "type": "state-timeline",
  "title": "Alerts",
  "transformations": [],
  "datasource": {
    "type": "influxdb",
    "uid": "q8odLDzgz"
  },
  "pluginVersion": "9.1.2",
  "fieldConfig": {
    "defaults": {
      "custom": {
        "lineWidth": 0,
        "fillOpacity": 70,
        "spanNulls": true
      },
      "color": {
        "mode": "continuous-GrYlRd"
      },
      "mappings": [
        {
          "options": {
            "0": {
              "color": "transparent",
              "index": 0
            },
            "1": {
              "color": "red",
              "index": 1
            }
          },
          "type": "value"
        },
        {
          "options": {
            "match": "null+nan",
            "result": {
              "color": "transparent",
              "index": 2
            }
          },
          "type": "special"
        }
      ],
      "thresholds": {
        "mode": "absolute",
        "steps": [
          {
            "color": "transparent",
            "value": null
          }
        ]
      },
      "unit": "none"
    },
    "overrides": [
      {
        "matcher": {
          "id": "byName",
          "options": "FWUpdateSucceeded"
        },
        "properties": [
          {
            "id": "mappings",
            "value": [
              {
                "options": {
                  "0": {
                    "color": "transparent",
                    "index": 1
                  },
                  "1": {
                    "color": "blue",
                    "index": 0
                  }
                },
                "type": "value"
              }
            ]
          }
        ]
      },
      {
        "matcher": {
          "id": "byName",
          "options": "GridCodesWrite"
        },
        "properties": [
          {
            "id": "mappings",
            "value": [
              {
                "options": {
                  "0": {
                    "color": "transparent",
                    "index": 1
                  },
                  "1": {
                    "color": "blue",
                    "index": 0
                  }
                },
                "type": "value"
              }
            ]
          }
        ]
      },
      {
        "matcher": {
          "id": "byName",
          "options": "PINV_a067_overvoltageNeutralChassis"
        },
        "properties": [
          {
            "id": "mappings",
            "value": [
              {
                "options": {
                  "0": {
                    "color": "transparent",
                    "index": 1
                  },
                  "1": {
                    "color": "blue",
                    "index": 0
                  }
                },
                "type": "value"
              }
            ]
          }
        ]
      },
      {
        "matcher": {
          "id": "byName",
          "options": "POD_w110_SW_EOC"
        },
        "properties": [
          {
            "id": "mappings",
            "value": [
              {
                "options": {
                  "0": {
                    "color": "transparent",
                    "index": 1
                  },
                  "1": {
                    "color": "blue",
                    "index": 0
                  }
                },
                "type": "value"
              }
            ]
          }
        ]
      },
      {
        "matcher": {
          "id": "byName",
          "options": "PVS_a019_MciStringC"
        },
        "properties": [
          {
            "id": "mappings",
            "value": [
              {
                "options": {
                  "0": {
                    "color": "transparent",
                    "index": 1
                  },
                  "1": {
                    "color": "blue",
                    "index": 0
                  }
                },
                "type": "value"
              }
            ]
          }
        ]
      },
      {
        "matcher": {
          "id": "byName",
          "options": "PVS_a020_MciStringD"
        },
        "properties": [
          {
            "id": "mappings",
            "value": [
              {
                "options": {
                  "0": {
                    "color": "transparent",
                    "index": 1
                  },
                  "1": {
                    "color": "blue",
                    "index": 0
                  }
                },
                "type": "value"
              }
            ]
          }
        ]
      },
      {
        "matcher": {
          "id": "byName",
          "options": "PodCommissionTime"
        },
        "properties": [
          {
            "id": "mappings",
            "value": [
              {
                "options": {
                  "1": {
                    "color": "blue",
                    "index": 0
                  }
                },
                "type": "value"
              }
            ]
          }
        ]
      },
      {
        "matcher": {
          "id": "byName",
          "options": "SYNC_a001_SW_App_Boot"
        },
        "properties": [
          {
            "id": "mappings",
            "value": [
              {
                "options": {
                  "0": {
                    "color": "transparent",
                    "index": 1
                  },
                  "1": {
                    "color": "blue",
                    "index": 0
                  }
                },
                "type": "value"
              }
            ]
          }
        ]
      },
      {
        "matcher": {
          "id": "byName",
          "options": "SYNC_a044_IslanderDisconnectWithin2s"
        },
        "properties": [
          {
            "id": "mappings",
            "value": [
              {
                "options": {
                  "0": {
                    "color": "transparent",
                    "index": 1
                  },
                  "1": {
                    "color": "blue",
                    "index": 0
                  }
                },
                "type": "value"
              }
            ]
          }
        ]
      },
      {
        "matcher": {
          "id": "byName",
          "options": "SystemConnectedToGrid"
        },
        "properties": [
          {
            "id": "mappings",
            "value": [
              {
                "options": {
                  "0": {
                    "color": "transparent",
                    "index": 1
                  },
                  "1": {
                    "color": "blue",
                    "index": 0
                  }
                },
                "type": "value"
              }
            ]
          }
        ]
      },
      {
        "matcher": {
          "id": "byName",
          "options": "THC_w061_CAN_TX_FIFO_Overflow"
        },
        "properties": [
          {
            "id": "mappings",
            "value": [
              {
                "options": {
                  "0": {
                    "color": "transparent",
                    "index": 1
                  },
                  "1": {
                    "color": "blue",
                    "index": 0
                  }
                },
                "type": "value"
              }
            ]
          }
        ]
      },
      {
        "matcher": {
          "id": "byName",
          "options": "PodCommissionTime"
        },
        "properties": [
          {
            "id": "mappings",
            "value": [
              {
                "options": {
                  "0": {
                    "color": "transparent",
                    "index": 1
                  },
                  "1": {
                    "color": "blue",
                    "index": 0
                  }
                },
                "type": "value"
              }
            ]
          }
        ]
      }
    ]
  },
  "options": {
    "mergeValues": true,
    "showValue": "never",
    "alignValue": "left",
    "rowHeight": 0.51,
    "legend": {
      "showLegend": true,
      "displayMode": "list",
      "placement": "bottom"
    },
    "tooltip": {
      "mode": "single",
      "sort": "none"
    }
  },
  "targets": [
    {
      "datasource": {
        "type": "influxdb",
        "uid": "q8odLDzgz"
      },
      "groupBy": [
        {
          "params": [
            "$__interval"
          ],
          "type": "time"
        },
        {
          "params": [
            "null"
          ],
          "type": "fill"
        }
      ],
      "hide": false,
      "measurement": "alerts4",
      "orderByTime": "ASC",
      "policy": "raw",
      "query": "SELECT *::field from raw.all_alerts where $timeFilter",
      "rawQuery": true,
      "refId": "A",
      "resultFormat": "table",
      "select": [
        [
          {
            "params": [
              "HighCPU"
            ],
            "type": "field"
          },
          {
            "params": [],
            "type": "count"
          }
        ]
      ],
      "tags": []
    }
  ]
}

The "all_alerts" measurement comes from that starlark processor I posted earlier in the thread. Although for most it's probably unnecessary and could be targeted directly to the "alerts" measurement. It was my attempt to keep around the alerts that disappear over time so that they'd stay in the view (makes it easier to color-code than a value that doesn't show up too). I'll let you know if it ends up helping them root cause it!

@jasonacox
Copy link
Owner

Thanks @DerickJohnson! I love this. I added your telegraf/starlark processor to my test rig. I really like having the alert data in the dashboard!

image

I believe this is handy enough to add to the standard telegraf.conf along with a basic panel at the bottom of the stock dashboard.json. I agree with you that we could just add the /alerts/pw data without the starlark processor for most use cases (but we could leave a commented out version in telegraf.local if someone wanted to easily turn it on). I picked a neutral "blue" for all alerts but I like "dimming" the normal alerts (a union between what you had in there and what I had) and somehow highlighting the rest.

I removed the starlark processor and tested an "Off Grid" event. Because the processor wasn't sending in zeroes, it didn't detect the "SystemConnectedToGrid" had fallen off. I switched to basic thresholds and tweaked the alerts related to grid for more color. ;)

image

The one issue I see is that the "state timeline" panel doesn't auto-size so in your case (someone who sees a lot of different alerts), you will get a compressed squashed list. It is easy for the user to expand but not as dynamic as I would like.

In any case, I do think this would be useful to the community. Thank you!

jasonacox added a commit that referenced this issue Jan 24, 2023
@DerickJohnson
Copy link
Author

@jasonacox, that's amazing! I just pulled in your changes and will keep that panel as well. I like the aesthetics of the different shades/thresholding.

This data has been super interesting, not only am I seeing a wide range of alerts over time (this is the laundry list I have):
image

But also to see little "blips" that I wouldn't have noticed otherwise:
image

Thanks again for all the work on this. It's been a lot of fun!

@jasonacox jasonacox added the enhancement New feature or request label Jan 25, 2023
@jasonacox
Copy link
Owner

I agree, I never knew about the blips! The time series logging of these alerts is brilliant. Thanks @DerickJohnson .

It is super clear by the number of alerts that there is something seriously wrong with your system. I hope you are able to get it fixed soon! While it's no consolation, your broken system has been a treasure of discovery for the community. Thanks for telling your story and contributing to the dashboard!

I've added the new alerts you mention above to our growing list on the pyPowerwall README.

Thanks again! 🙏

@youzer-name
Copy link
Contributor

youzer-name commented Jan 31, 2023

I tried to add this to my system today, but I am either doing something wrong or there is an error in the code. I've made a lot of customizations to my setup, so I don't use the upgrade or install scripts, which could be my issue.

I removed pypowerwall and recreated it.

When I go to http://[IP]:8675/version I get: {"version": "22.26.2 8cd8cac4", "vint": 222602}

I added the inputs.http to telegraf.conf as per the current code:

# Alert Data
[[inputs.http]]
        urls = [
                "http://pypowerwall:8675/alerts/pw"
        ]
        name_override = "alerts"
        method = "GET"
        insecure_skip_verify = true
        timeout = "4s"
        data_format = "json"

Nothing is going into InfluxDB and Telegraf is logging:

2023-01-31T16:38:00Z E! [inputs.http] Error in plugin: [url=http://pypowerwall:8675/alerts/pw]: parsing metrics failed: invalid character '<' looking for beginning of value

When I go to http://pypowerwall:8675/alerts/pw in my browser, I get a 404 Not Found error and a redirect to the main :8675/ page

When I go to http://pypowerwall:8675/alerts (without the /pw at the end) I get:

[
    "GridCodesWrite",
    "GridCodesWrite",
    "PodCommissionTime",
    "FWUpdateSucceeded",
    "SystemConnectedToGrid",
    "SelfConsumptionReservedLimit",
    "PINV_a067_overvoltageNeutralChassis",
    "PINV_a067_overvoltageNeutralChassis",
    "SYNC_a001_SW_App_Boot",
    "SYNC_a046_DoCloseArguments"
]

Do I have something configured wrong or should the telegraf.conf file not have that "/pw" on the end of the URL?

Edit: I tried without the /pw and just got a different error.

2023-01-31T16:50:05Z E! [inputs.http] Error in plugin: [url=http://pypowerwall:8675/alerts]: parsing metrics failed: must be an object or an array of objects

Is this maybe a Powerwall+ only feature? I have two Powerwall 2's, but not the +.

@mcbirse
Copy link
Collaborator

mcbirse commented Feb 1, 2023

Hi @youzer-name - what pypowerwall version are you running?

# Show pypowerwall version
docker logs pypowerwall

Does it say [t24] in the logs?

01/29/2023 05:19:44 PM [proxy] [INFO] pyPowerwall [0.6.0] Proxy Server [t24] - HTTP Port 8675
01/29/2023 05:19:44 PM [proxy] [INFO] pyPowerwall Proxy Started

The http://pypowerwall:8675/alerts/pw endpoint was only added in t24. I am wondering if you might need to pull the latest version since you received a 404 error for that url.

@youzer-name
Copy link
Contributor

@mcbirse
I thought I had pulled the latest, but I was on T22. That sorted it, thanks!

@jasonacox
Copy link
Owner

Thanks @mcbirse !

@youzer-name FYI -

You can also get the version of pypowerwall by hitting the endpoing: http://pypowerwall:8675/stats

If it won't pull latest, the way to upgrade pypowerwall (which is in upgrade.sh):

# stop and delete pypowerwall
docker stop pypowerwall
docker rm pypowerwall
docker images | grep pypowerwall | awk '{print $3}' | xargs docker rmi -f

# restart stack
./compose-dash.sh up -d

BTW, I would love to see screenshots of Alerts to see how they differ between systems. Over time we can better color code the alerts.

@BJReplay
Copy link
Contributor

BJReplay commented Feb 1, 2023

BTW, I would love to see screenshots of Alerts to see how they differ between systems. Over time we can better color code the alerts.

image

Installed just before 5pm

At 6:56pm put a kettle on which triggered the change in state in RealPowerAvailableLimited, though the reported SOC was still 100%. That was the end of export for the day.

Powerwall was discharging from that point, feeding the home, but reported SOC didn't drop below 100% until 7:56pm when POD_w110_SW_EOC dropped off.

@wreiske
Copy link
Contributor

wreiske commented Feb 9, 2023

Loving the alert visibility!!

image

@jasonacox
Copy link
Owner

Nice!! Thanks @wreiske - you have some interesting things going on there!

One bug that I have found with the state-timeline graph (it is beta after all) is that it will glitch and not always align the labels to the lines if you zoom the browser window. I noticed that happened to you when you posted that. A refresh will fix it but is a bit annoying. Still, the data is awesome to have in a graph form.

@BJReplay
Copy link
Contributor

Just a quick note - I know this is a closed issue, but this is where the discussion is, and I figured people who cared would see this: I had the firmware upgrade to 22.26.5 today, and this is the alerts panel covering the outage:

image

Of most interest to me is the Max CPU alert for about 10 minutes after the upgrade.

@mcbirse
Copy link
Collaborator

mcbirse commented Feb 23, 2023

@BJReplay - that's cool, thanks for posting! I think it's helpful to post alert examples that are seen under certain conditions. I have some examples from today also, and not sure where else to post these, so this seems like a good place.

Just curious, have you updated to the latest dashboard.json that includes the rename by regex transform? As it should be showing the alerts without the "max_" prefix in that case.

Oh and, just noting your firmware upgrade - guessing mine may be imminent in that case as I think you are in Aus too?

Anyway, today I had some electrical work being done on my house, so it was a great opportunity to obtain some alert data from the Powerwall.

The electrician had to shut off the grid supply, as well as switch off all breakers in the gateway. My server running Powerwall-Dashboard is on a UPS though, and connected via ethernet direct to the gateway - so monitoring was still active (at least until the server shutdown after about 50mins when the UPS got low on battery).

Below are the alerts received during this scenario (power switched off just after 10:40am to just before midday).

image

And what the Tesla app was showing:

image
image
image

I also have a shell script running that polls the gateway for some basic data such as grid status etc. and sends alert e-mails on changes.

This highlighted some never seen before grid status values of:

  • SystemMicroGridFaulted, and
  • SystemWaitForUser

As below:

EvnTime: 2023-02-23 10:44:57
Message: Grid status changed to unknown value
Details: Status changed from SystemIslandedActive to SystemMicroGridFaulted
EvnTime: 2023-02-23 10:45:02
Message: Power outage detected
Details: Status changed from SystemMicroGridFaulted to SystemIslandedActive

The system switched back and forth between SystemIslandedActive and SystemMicroGridFaulted for a few minutes, until finally responding with SystemWaitForUser:

EvnTime: 2023-02-23 10:47:13
Message: Grid status changed to unknown value
Details: Status changed from SystemIslandedActive to SystemWaitForUser

Unfortunately I did not get any alert data or grid status codes from when all the power was turned back on, as my server had shut down by that time.

@DerickJohnson
Copy link
Author

@BJReplay Yes! I see the Max CPU All the time when the gateway has been rebooted (and you see "updating devices") on the gateway 192.168.91.1 page.

@mcbirse I also had a bunch of those same alerts when my system unexpectedly when down during a grid outage (the wait for user one I remember specifically).

Really cool to see all the different alert scenarios. It's definitely helped me understand more about the system operation. For example, I think the _EOC alert we see might mean "end of charge" or something like that since it happens when my battery is full. I also found a couple of explanations for other alerts like "battery unexpected power" in this document: https://sunbridgesolar.com/wp-content/uploads/2021/03/Tesla_Powerhub_Manual_User.pdf. Most are the self explanatory ones, but a couple were helpful.

@BJReplay
Copy link
Contributor

Just curious, have you updated to the latest dashboard.json that includes the rename by regex transform? As it should be showing the alerts without the "max_" prefix in that case.

@mcbirse No, I haven't - I have a heavily customised dashboard, and I haven't bothered. I guess I will load it, save the panel as a library panel, then load that panel into my customised panel.

@mcbirse
Copy link
Collaborator

mcbirse commented Feb 24, 2023

@BJReplay - no worries, all good. I have a custom dashboard setup as well. Typically I merge new changes into mine in a somewhat manual process... by comparing the .json files in VSCode (or BeyondCompare sometimes) and then merging the new elements I want.

Considering yours is heavily customised though, it might be easier to just edit your alerts panel manually if you like and add the rename by regex, as below:

image
image

@jasonacox
Copy link
Owner

@mcbirse and @BJReplay - this is gold! Thanks for documenting.

@DerickJohnson It would be good to capture some of the info in the Alerts Table in that document - seems most of it would apply to residential Powerwall users too. I have been documenting Device names and Alerts as I discover them here: https://github.com/jasonacox/pypowerwall#devices-and-alerts - I'll try to add what makes sense.

This highlighted some never seen before grid status values of:

SystemMicroGridFaulted, and
SystemWaitForUser

@mcbirse great discovery!! This is something I should add to pypowerwall's grid_status() function. Right now it is sending back an undefined Null (None) for those conditions. Would it be correct to still categorize them as "DOWN" conditions? Unlike "SystemTransitionToGrid" or "SystemTransitionToIsland" these are not waiting on SYNC for the transition.

@mcbirse
Copy link
Collaborator

mcbirse commented Feb 27, 2023

Hi @jasonacox - I agree that makes sense, those status responses should be classified as "DOWN" rather than returning Null/None.

The Tesla app itself was displaying "Grid outage" but with some extra text after that in this case of "Powerwall Inactive". Regardless, it classified it as a grid outage as well.

@ibmaster
Copy link

Thanks for the Alerts feature, it seems to have caught one of my Powerwalls dying.
Specifically these Alert codes were thrown, and persist, from time of failure:

  • BatteryFault
  • CANUsageAlert (transient error code)
  • POD_f003_FW_Batt_OV (firmware detected a cell over-voltage condition?)
  • POD_f007_SW_Batt_OC (software detected a cell over-current condition?)
  • POD_f113_DCDC_Self_Test_Fail
  • POD_w113_DCDC_Self_Test_Fail

Thought others might find this info useful.

@jasonacox
Copy link
Owner

Thanks @ibmaster !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants