Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_binary_to_variant function crashing on some blocks #2228

Closed
cshintov opened this issue Feb 8, 2024 · 10 comments · Fixed by #2246, #2247, #2248 or #2249
Closed

_binary_to_variant function crashing on some blocks #2228

cshintov opened this issue Feb 8, 2024 · 10 comments · Fixed by #2246, #2247, #2248 or #2249
Assignees
Labels
👍 lgtm OCI Work exclusive to OCI team
Milestone

Comments

@cshintov
Copy link

cshintov commented Feb 8, 2024

I'm running 6 mainnet archive nodes, today morning onwards on all of them, a few blocks are returning 500, internal server error!

curl --location $url/v1/trace_api/get_block --header 'Content-Type: application/json' --data '{"block_num": "356339452" }' | jq .

{
  "code": 500,
  "message": "Internal Service Error",
  "error": {
    "code": 3015013,
    "name": "unpack_exception",
    "what": "Unpack data exception",
    "details": [
      {
        "message": "Stream unexpectedly ended; unable to unpack field 'to' of struct 'transfer'",
        "file": "abi_serializer.cpp",
        "line_number": 374,
        "method": "_binary_to_variant"
      }
    ]
  }
}

The nodes are running v4.0.5.

I tried rewinding the node back using

nodeos --genesis.json /root/local/genesis.json --data-dir /data --config-dir /root/local \
--hard-replay-blockchain --terminate-at-block 356339451

But afterwards it complained about database version mismatch.

warn  2024-02-08T11:54:53.441 nodeos    chain_plugin.cpp:1077         plugin_initialize    ] 3060005 bad_database_version_exception: Database is an unknown or unsupported version
state database version pre-dates versioning, please restore from a compatible snapshot or replay!
    {}
    nodeos  controller.cpp:687 validate_db_version

error 2024-02-08T11:54:53.472 nodeos    main.cpp:161                  main                 ] 3060005 bad_database_version_exception: Database is an unknown or unsupported version
state database version pre-dates versioning, please restore from a compatible snapshot or replay!
    {}
    nodeos  controller.cpp:687 validate_db_version
rethrow
    {}
    nodeos  chain_plugin.cpp:1077 plugin_initialize

So I tried syncing from a recent snapshot snapshot-2024-02-08-04-eos-v6-0356339018.bin

Even then this block was returning the same error.

There are a few more blocks like this in the next 10k range!

356339452
356339534
356340111
356340363
356340822

Is this a network wide issue, or only for me, how to recover?

Only thing left to do is to upgrade to v5.0.0 which I'm currently trying.

@cshintov
Copy link
Author

cshintov commented Feb 8, 2024

The upgrade to v5.0.0 also didn't help. Something bad is lurking here:

EOS_THROW( unpack_exception, "Stream unexpectedly ended; unable to unpack field '${f}' of struct '${p}'",

@heifner
Copy link
Member

heifner commented Feb 8, 2024

curl --location http://eos.greymass.com/v1/trace_api/get_block --header 'Content-Type: application/json' --data '{"block_num": "356339452" }' | jq

Works.

@heifner
Copy link
Member

heifner commented Feb 8, 2024

What non-default options do you have on your trace_api_plugin node?

@cshintov
Copy link
Author

cshintov commented Feb 8, 2024

config.ini

plugin = eosio::chain_plugin
plugin = eosio::chain_api_plugin
plugin = eosio::net_plugin
plugin = eosio::http_plugin
plugin = eosio::state_history_plugin
plugin = eosio::trace_api_plugin
abi-serializer-max-time-ms = 50000
chain-state-db-size-mb = 1000000
enable-account-queries = true
http-server-address = 0.0.0.0:{{ env "NOMAD_PORT_rpc" }}
access-control-allow-origin = *
access-control-allow-headers = Origin, X-Requested-With, Content-Type, Accept
http-max-response-time-ms = 5000
verbose-http-errors = true
http-validate-host = false
p2p-listen-endpoint = 0.0.0.0:{{ env "NOMAD_PORT_wire" }}
p2p-server-address = {{ env "NOMAD_IP_wire" }}:{{ env "NOMAD_HOST_PORT_wire" }}
p2p-peer-address = peer.main.alohaeos.com:9876
p2p-peer-address = eos.edenia.cloud:9876
p2p-peer-address = p2p.eos.cryptolions.io:9876
p2p-peer-address = p2p.donates2eden.io:9876
p2p-peer-address = mainnet.eosamsterdam.net:9876
p2p-peer-address = p2p.eosflare.io:9876
p2p-peer-address = p2p.bitmars.one:8080
p2p-peer-address = 34.96.75.100:8099
p2p-peer-address = p2p.eos.detroitledger.tech:1337
p2p-peer-address = eos.seed.eosnation.io:9876
p2p-peer-address = peer1.eosphere.io:9876
p2p-peer-address = p2p.eossweden.org:9876
p2p-peer-address = eos.hashfin.com:9876
p2p-peer-address = eos.p2p.eosusa.io:9882
p2p-peer-address = eos.newdex.one:9876
p2p-max-nodes-per-host = 150
max-clients = 150
sync-fetch-span = 1000
trace-history = true
chain-state-history = true
state-history-endpoint = 0.0.0.0:{{ env "NOMAD_PORT_history" }}
trace-history-debug-mode = true
state-history-log-retain-blocks = 10713600
trace-rpc-abi = eosio=/root/local/eosio.abi
trace-rpc-abi = eosio.token=/app/reference-contracts/build/contracts/eosio.token/eosio.token.abi
trace-rpc-abi = eosio.msig=/app/reference-contracts/build/contracts/eosio.msig/eosio.msig.abi
trace-rpc-abi = eosio.wrap=/app/reference-contracts/build/contracts/eosio.wrap/eosio.wrap.abi

And running the chain as

command = "nodeos"
args    = [
  "--data-dir","/data",
  "--config-dir", "/root/local",
  "--disable-replay-opts",
  "--genesis-json", "/root/local/genesis.json",
]

normally, and with snapshot

        command = "nodeos"
        args    = [
          "--data-dir", "/data",
          "--config-dir" ,"/root/local",
          "--snapshot", "/data/snapshots/snapshot-2024-02-08-04-eos-v6-0356339018.bin",
        ]

Building the image with

Dockerfile

ARG version="5.0.0"
ARG cdt_version="4.0.1"

FROM ubuntu:20.04

ARG version
ARG cdt_version

ENV VERSION=${version}
LABEL VERSION=${version}

ADD scripts/ /opt/tatum.io

ENV USER_ID=3002
ENV GROUP_ID=3002

ARG DEBIAN_FRONTEND=noninteractive
ARG TZ=Etc/UTC

RUN apt-get update
RUN apt-get update --fix-missing
RUN apt-get install -y apt-utils
RUN apt-get install -y curl tzdata
RUN apt-get install -y zip unzip libncurses5 wget git build-essential cmake libboost-all-dev libcurl4-openssl-dev libgmp-dev libssl-dev libusb-1.0.0-dev libzstd-dev time pkg-config llvm-11-dev nginx npm yarn jq gdb lldb
RUN apt-get install -y gcc g++ make tar jq bash nano netcat-openbsd
RUN curl -fsSL https://deb.nodesource.com/setup_lts.x | bash -
RUN apt-get update
RUN apt-get install -y nodejs
RUN apt-get autoremove -y

WORKDIR /app
RUN npm install -g npm

RUN npm install -D webpack-cli
RUN npm install -D webpack
RUN npm install -D webpack-dev-server

COPY scripts/bootstrap_env.sh .
RUN ./bootstrap_env.sh ${version} ${cdt_version}

RUN if [ ${USER_ID:-0} -ne 0 ] && [ ${GROUP_ID:-0} -ne 0 ]; then \
    userdel -f www-data && \
    if getent group www-data ; then groupdel www-data; fi && \
    groupadd -g ${GROUP_ID} www-data && \
    useradd -l -u ${USER_ID} -g www-data www-data && \
    install -d -m 0755 -o www-data -g www-data /home/www-data && \
    chown --changes --silent --no-dereference --recursive \
          --from=33:33 ${USER_ID}:${GROUP_ID} \
          /home/www-data \
          /app \
   ;fi

RUN mkdir -p /home/www-data/nodes

# port for nodeos p2p
EXPOSE 9876
# port for nodeos http
EXPOSE 8888
# port for state history
EXPOSE 8080
# port for webapp
EXPOSE 8000

STOPSIGNAL SIGINT

bootstrap_env.sh

#! /bin/sh

set -e

ARCH=`uname -m`

ORG="AntelopeIO"
NODE_VERSION=${1:-"4.0.4"}
CDT_VERSION=${2:-"4.0.0"}

# Fetches the first layer of the container image from the GitHub Container Registry, and extracts the contents of the downloaded layer.
# Gives you the binaries of the container image without having to pull the entire image.
# You get leap-dev_<version>_<arch>.deb from here.
CONTAINER_PACKAGE=AntelopeIO/experimental-binaries
GH_ANON_BEARER=$(curl -s "https://ghcr.io/token?service=registry.docker.io&scope=repository:${CONTAINER_PACKAGE}:pull" | jq -r .token)
curl -s -L -H "Authorization: Bearer ${GH_ANON_BEARER}" https://ghcr.io/v2/${CONTAINER_PACKAGE}/blobs/$(curl -s -L -H "Authorization: Bearer ${GH_ANON_BEARER}" https://ghcr.io/v2/${CONTAINER_PACKAGE}/manifests/v${NODE_VERSION} | jq -r .layers[0].digest) | tar -xz

# Choose architecture
if [ "${ARCH}" = "x86_64" ]; then
   wget https://github.com/${ORG}/leap/releases/download/v${NODE_VERSION}/leap_${NODE_VERSION}_amd64.deb
   apt install -y ./leap_${NODE_VERSION}_amd64.deb
   apt install -y ./leap-dev_${NODE_VERSION}-ubuntu20.04_amd64.deb
   wget https://github.com/${ORG}/cdt/releases/download/v${CDT_VERSION}/cdt_${CDT_VERSION}_amd64.deb
   apt install -y ./cdt_${CDT_VERSION}_amd64.deb
else
   apt install -y ./leap-${NODE_VERSION}_arm64.deb
   apt install -y ./leap-dev_${NODE_VERSION}-ubuntu20.04_arm64.deb
   wget https://github.com/${ORG}/cdt/releases/download/v${CDT_VERSION}/cdt_${CDT_VERSION}_arm64.deb
   apt install -y ./cdt_${CDT_VERSION}_arm64.deb
fi

# Removing *.deb files that were pulled earlier to save space
rm *.deb

# Clone and build reference contracts
git clone https://github.com/${ORG}/reference-contracts
cd reference-contracts
mkdir build
cd build
cmake ..
make -j4

# replace org and other variables

@cshintov
Copy link
Author

Hi @heifner, did you get a chance to look into this?

@heifner
Copy link
Member

heifner commented Feb 14, 2024

Hi @heifner, did you get a chance to look into this?

Sorry, no. Hopefully someone can take a look soon. I will say that we had no other reports of issues with trace_api_plugin. Potentially some kind of environment issue.

@aaroncox
Copy link
Contributor

aaroncox commented Feb 14, 2024

If it helps debug, the trace API running on eos.greymass.com only uses the following configuration for the trace_api:

plugin = eosio::trace_api_plugin
trace-dir = /mnt/history/traces
trace-no-abis = true

We don't use any other trace related configuration options.

We haven't seen any issues like this before - but we also don't decode server side, we handle decoding on the client that's making the request.

@tedcahalleos
Copy link
Member

Hi @cshintov - if Aaron's response does not correct the issue, you can contact me via email at ted.cahall@eosnetwork.com and we can set up a call or chat to discuss other options.

@tmeinlschmidt
Copy link

tmeinlschmidt commented Feb 15, 2024

hi all,

on behald of @cshintov

@aaroncox yes, you're not deserializing using abi as we do on server side, so no issues when we tried that option, we're getting raw output

@tedcahalleos I've tried to comment out eosio.token abi from the config file - then I'm able to get trace from the node (with trace-no-abis = false, so I assume there's some issue with certain blocks/transactions

using latest cdt 4.0.1

all these don't work with eosio.token abi enabled for the same error

357118176
357118965
357118255
357291244
357291896
357295793
357295889

thanks

@tedcahalleos
Copy link
Member

@tmeinlschmidt Please email me at ted.cahall@eosnetwork.com so we can discuss offline. We are done with this ticket in terms of helping diagnose without speaking to you offline. Happy to set up a call or zoom once you contact me.

@heifner heifner self-assigned this Feb 15, 2024
@heifner heifner added the OCI Work exclusive to OCI team label Feb 15, 2024
@heifner heifner moved this from Todo to In Progress in Team Backlog Feb 20, 2024
@heifner heifner added this to the Leap v3.2.6 milestone Feb 20, 2024
@heifner heifner linked a pull request Feb 20, 2024 that will close this issue
heifner added a commit that referenced this issue Feb 20, 2024
[3.2] TraceAPI: Correctly convert return value via ABI
heifner added a commit that referenced this issue Feb 20, 2024
[3.2 -> 4.0] TraceAPI: Correctly convert return value via ABI
heifner added a commit that referenced this issue Feb 20, 2024
heifner added a commit that referenced this issue Feb 20, 2024
heifner added a commit that referenced this issue Feb 20, 2024
[4.0 -> 5.0] TraceAPI: Correctly convert return value via ABI
heifner added a commit that referenced this issue Feb 20, 2024
[5.0 -> main] TraceAPI: Correctly convert return value via ABI
@github-project-automation github-project-automation bot moved this from In Progress to Done in Team Backlog Feb 20, 2024
@heifner heifner removed the triage label Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment