Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teraslice Elasticsearch Reader ES6 Worker Error #3962

Closed
godber opened this issue Feb 10, 2025 · 8 comments
Closed

Teraslice Elasticsearch Reader ES6 Worker Error #3962

godber opened this issue Feb 10, 2025 · 8 comments

Comments

@godber
Copy link
Member

godber commented Feb 10, 2025

We have been doing a large re-index and have occasionally (like 1 in 200k slices) seen slice failures with the following error attached to the slice. This job is using the ES asset version 4.0.5 and the following api config (redacted) and reading from an Elasticsearch 6.5.4 cluster. It writes to Kafka but that doesn't seem relevant.

    "apis": [
        {
            "_name": "elasticsearch_reader_api:id",
            "connection": "es_data",
            "index": "docs-2024.11.17.3",
            "type": "doc",
            "key_type": "base64url",
            "starting_key_depth": 6,
            "id_field_name": "_key",
            "size": 300000
        }
    ],

I'm not sure ... is this a worker processing a slice experiencing a Transport fault talking to ES6?

TSError: aborted
    at Module.pRetry (file:///app/source/packages/utils/dist/src/promises.js:103:21)
    at async SliceExecution.run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:38:22)
    at async Worker.runOnce (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:169:13)
    at async _run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:124:17)
    at pRetry (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:87682:17)
    at async ElasticsearchReaderAPI.fetch (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:89520:20)
    at async ElasticsearchIDFetcher.handle (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:80183:33)
    at async file:///app/source/packages/job-components/dist/src/execution-context/worker.js:75:33
    at async file:///app/source/packages/utils/dist/src/promises.js:215:19
    at async file:///app/source/packages/utils/dist/src/promises.js:215:19
    at async WorkerExecutionContext._runSliceOnce (file:///app/source/packages/job-components/dist/src/execution-context/worker.js:295:29)
    at async Module.pRetry (file:///app/source/packages/utils/dist/src/promises.js:88:16)
    at async SliceExecution.run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:38:22)
    at async Worker.runOnce (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:169:13)
    at async _run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:124:17)
    at _errorHandlerFn (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:107674:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
Caused by: TSError: aborted
    at pRetry (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:87682:17)
    at async ElasticsearchReaderAPI.fetch (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:89520:20)
    at async ElasticsearchIDFetcher.handle (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:80183:33)
    at async file:///app/source/packages/job-components/dist/src/execution-context/worker.js:75:33
    at async file:///app/source/packages/utils/dist/src/promises.js:215:19
    at async file:///app/source/packages/utils/dist/src/promises.js:215:19
    at async WorkerExecutionContext._runSliceOnce (file:///app/source/packages/job-components/dist/src/execution-context/worker.js:295:29)
    at async Module.pRetry (file:///app/source/packages/utils/dist/src/promises.js:88:16)
    at async SliceExecution.run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:38:22)
    at async Worker.runOnce (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:169:13)
    at async _run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:124:17)
    at _errorHandlerFn (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:107674:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
Caused by: TSError: aborted
    at _errorHandlerFn (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:107674:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
Caused by: ConnectionError: aborted
    at IncomingMessage.<anonymous> (/app/source/node_modules/elasticsearch6/lib/Transport.js:260:23)
    at IncomingMessage.emit (node:events:524:28)
    at emitErrorNT (node:internal/streams/destroy:170:8)
    at emitErrorCloseNT (node:internal/streams/destroy:129:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)

The Teraslice cluster version is as follows:

{
    "arch": "x64",
    "clustering_type": "kubernetesV2",
    "name": "teraslice-cluster1",
    "node_version": "v22.13.0",
    "platform": "linux",
    "teraslice_version": "v2.12.3"
}

Edit: Changed ES cluster version from 6.5.2 to 6.5.4.

@godber
Copy link
Member Author

godber commented Feb 10, 2025

I guess its possible this should be filed on the Elasticsearch asset instead.

@godber
Copy link
Member Author

godber commented Feb 10, 2025

There don't appear to be any errors in the ES data node logs that correlate with these slice errors and the clusters were all in an OK state, green, no GCs bigger/longer than usual.

@godber
Copy link
Member Author

godber commented Feb 11, 2025

I've tracked down a worker that experienced one of these slice errors and it didn't have anything else to add really. Just the procedural stuff (redacted a bit and some of these lines are clipped):

[2025-02-10T22:52:53.715Z]  INFO: teraslice/7 on ts-wkr-es-store-kafka-4411806d-19a5-a9f6896654xqg4: slice d6218c85-d7b0-461b-a9a4-897930bcd4a4 completed (assignment=worker, module=worker, worker_id=hvf_iu9m, ex_id=9268a52>
[2025-02-10T22:52:53.829Z] ERROR: teraslice/7 on ts-wkr-es-store-kafka-4411806d-19a5-a9f6896654xqg4: (assignment=worker, module=worker_context, worker_id=hvf_iu9m, ex_id=9268a523-8401-4bc2-b09c-b1742b321d24, job_id=54a1806>
    A slice error occurred {
      slice: {
        slice_id: '83d83549-a4d3-4a32-bc10-17231c152f49',
        slicer_id: 2,
        slicer_order: 68,
        request: { keys: [Array], count: 4850 },
        _created: '2025-02-10T22:51:22.799Z'
      }
    }
    --
    TSError: aborted
        at pRetry (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:87682:17)
        at async ElasticsearchReaderAPI.fetch (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:89520:20)
        at async ElasticsearchIDFetcher.handle (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:80183:33)
        at async file:///app/source/packages/job-components/dist/src/execution-context/worker.js:75:33
        at async file:///app/source/packages/utils/dist/src/promises.js:215:19
        at async file:///app/source/packages/utils/dist/src/promises.js:215:19
        at async WorkerExecutionContext._runSliceOnce (file:///app/source/packages/job-components/dist/src/execution-context/worker.js:295:29)
        at async Module.pRetry (file:///app/source/packages/utils/dist/src/promises.js:88:16)
        at async SliceExecution.run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:38:22)
        at async Worker.runOnce (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:169:13)
        at async _run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:124:17)
        at _errorHandlerFn (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:107674:11)
        at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
[2025-02-10T22:52:53.840Z] ERROR: teraslice/7 on ts-wkr-es-store-kafka-4411806d-19a5-a9f6896654xqg4: slice state for 9268a523-8401-4bc2-b09c-b1742b321d24 has been marked as error (assignment=worker, module=slice, worker_id>
    TSError: aborted
        at Module.pRetry (file:///app/source/packages/utils/dist/src/promises.js:103:21)
        at async SliceExecution.run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:38:22)
        at async Worker.runOnce (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:169:13)
        at async _run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:124:17)
        at pRetry (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:87682:17)
        at async ElasticsearchReaderAPI.fetch (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:89520:20)
        at async ElasticsearchIDFetcher.handle (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:80183:33)
        at async file:///app/source/packages/job-components/dist/src/execution-context/worker.js:75:33
        at async file:///app/source/packages/utils/dist/src/promises.js:215:19
        at async file:///app/source/packages/utils/dist/src/promises.js:215:19
        at async WorkerExecutionContext._runSliceOnce (file:///app/source/packages/job-components/dist/src/execution-context/worker.js:295:29)
        at async Module.pRetry (file:///app/source/packages/utils/dist/src/promises.js:88:16)
        at async SliceExecution.run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:38:22)
        at async Worker.runOnce (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:169:13)
        at async _run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:124:17)
        at _errorHandlerFn (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:107674:11)
        at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
[2025-02-10T22:52:53.841Z] ERROR: teraslice/7 on ts-wkr-es-store-kafka-4411806d-19a5-a9f6896654xqg4: slice 83d83549-a4d3-4a32-bc10-17231c152f49 run error (assignment=worker, module=worker, worker_id=hvf_iu9m, ex_id=9268a52>
    TSError: Slice failed processing, caused by TSError: aborted
        at SliceExecution._markFailed (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:117:15)
        at async SliceExecution.run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:48:17)
        at async Worker.runOnce (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:169:13)
        at async _run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:124:17)
[2025-02-10T22:52:54.313Z]  INFO: teraslice/7 on ts-wkr-es-store-kafka-4411806d-19a5-a9f6896654xqg4: analytics for slice: slice_id: "62e5d797-3ec9-4bbe-9338-dfdcd781e67b", slicer_id: 2, slicer_order: 69, _created: "2025-02>
[2025-02-10T22:52:54.313Z]  INFO: teraslice/7 on ts-wkr-es-store-kafka-4411806d-19a5-a9f6896654xqg4: slice 62e5d797-3ec9-4bbe-9338-dfdcd781e67b completed (assignment=worker, module=worker, worker_id=hvf_iu9m, ex_id=9268a52>
[2025-02-10T22:52:55.500Z]  INFO: teraslice/7 on ts-wkr-es-store-kafka-4411806d-19a5-a9f6896654xqg4: analytics for slice: slice_id: "8b6369c5-0274-4947-b92c-5dfe9d555068", slicer_id: 2, slicer_order: 75, _created: "2025-02>

@godber
Copy link
Member Author

godber commented Feb 11, 2025

It's worth pointing out that the jobs that had slice errors were reading from es 6.5.4, but we have one other job reading from es 6.8.6 that has NOT had a slice failure.

@sotojn sotojn self-assigned this Feb 11, 2025
@godber godber changed the title Teraslice Elasticsearch Reader ES6 Slicer(??) Error Teraslice Elasticsearch Reader ES6 Worker Error Feb 11, 2025
@sotojn
Copy link
Contributor

sotojn commented Feb 11, 2025

I started tracing at what lines of code were being ran through the stack trace and will list it below to give a better idea one whats going on:

Caused by: ConnectionError: aborted
    at IncomingMessage.<anonymous> (/app/source/node_modules/elasticsearch6/lib/Transport.js:260:23)

Error above happens here:
https://github.com/elastic/elasticsearch-js/blob/098aef0a5826ee1124ac9618293612b5b2b84da4/lib/Transport.js#L251-L255

Caused by: TSError: aborted
    at _errorHandlerFn (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:107674:11)

TSError above occures here on line 839:

function _errorHandler(fn, data, reject, fnName = '->unknown()') {
const retry = _retryFn(fn, data, reject);
return function _errorHandlerFn(err) {
const retryable = isErrorRetryable(err);
if (retryable) {
retry();
} else {
reject(
new TSError(err, {
context: {
fnName,
connection,
},
})
);
}
};
}

    at async ElasticsearchIDFetcher.handle (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:80183:33)

Occures in job-components here:

async handle(sliceRequest?: unknown): Promise<DataEntity[]> {
return DataEntity.makeArray(await this.fetch(sliceRequest));
}
}

    at async ElasticsearchReaderAPI.fetch (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:89520:20)

Ocurres in elasticsearch-asset-apis here:
https://github.com/terascope/elasticsearch-assets/blob/4bbd428e07382199a2cddaea2e3788ea0e8716ed/packages/elasticsearch-asset-apis/src/elasticsearch-reader-api/ElasticsearchReaderAPI.ts#L158-L170

@sotojn
Copy link
Contributor

sotojn commented Feb 13, 2025

I've validated that this fix terascope/elasticsearch-assets#1365 resolves the issue above. I used chaos mesh to fail incoming http requests to elasticsearch 6.5.4 using elasticsearch-assets:v4.0.5 and was able to produce a similar error.

Elasticsearch used:

{
  "name" : "gliqkEy",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "m-M337BXRoSiWdNKqjTZIQ",
  "version" : {
    "number" : "6.5.4",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "d2ef93d",
    "build_date" : "2018-12-17T21:17:40.758843Z",
    "build_snapshot" : false,
    "lucene_version" : "7.5.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

Job file used:

{
    "name": "es-to-noop",
    "lifecycle": "once",
    "workers": 1,
    "log_level": "info",
    "assets": [
        "elasticsearch:4.0.5"
    ],
    "operations": [
        {
            "_op": "elasticsearch_reader",
            "connection": "es6",
            "index": "random-data-1",
            "size": 2500,
            "date_field_name": "created"
        },
        {
            "_op": "noop"
        }
    ]
}

Worker logs using elasticsearch-assets:v4.0.5:

[2025-02-13T00:02:00.085Z]  INFO: teraslice/10 on ts-wkr-es-to-noop-713f41de-6cc7-5d88db94df-fpgpc: analytics for slice: slice_id: "e6c537d1-44f4-4734-9c9c-917f2f96b589", slicer_id: 0, slicer_order: 170, _created: "2025-02-13T00:01:27.563Z", time: [157, 0], memory: [29164088, 1568], size: [20000, 20000] (assignment=worker, module=slice, worker_id=NcrF4JHE, ex_id=18a5d575-3bf8-4378-9661-9fed1daacae1, job_id=713f41de-6cc7-43f9-a982-de3f69cc2899, slice_id=e6c537d1-44f4-4734-9c9c-917f2f96b589)
[2025-02-13T00:02:00.085Z]  INFO: teraslice/10 on ts-wkr-es-to-noop-713f41de-6cc7-5d88db94df-fpgpc: slice e6c537d1-44f4-4734-9c9c-917f2f96b589 completed (assignment=worker, module=worker, worker_id=NcrF4JHE, ex_id=18a5d575-3bf8-4378-9661-9fed1daacae1, job_id=713f41de-6cc7-43f9-a982-de3f69cc2899)
[2025-02-13T00:02:00.190Z] ERROR: teraslice/10 on ts-wkr-es-to-noop-713f41de-6cc7-5d88db94df-fpgpc: (assignment=worker, module=worker_context, worker_id=NcrF4JHE, ex_id=18a5d575-3bf8-4378-9661-9fed1daacae1, job_id=713f41de-6cc7-43f9-a982-de3f69cc2899, err.code=INTERNAL_SERVER_ERROR)
    A slice error occurred {
      slice: {
        slice_id: 'e693a559-e098-4ca3-a7bd-e1546714e371',
        slicer_id: 0,
        slicer_order: 171,
        request: {
          start: '2025-02-12T21:14:02.000Z',
          end: '2025-02-12T21:14:03.000Z',
          limit: '2025-02-12T21:15:57.000Z',
          holes: [],
          count: 20000
        },
        _created: '2025-02-13T00:01:27.563Z'
      }
    }
    --
    TSError: aborted
        at pRetry (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:87682:17)
        at async ElasticsearchReaderAPI.fetch (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:89520:20)
        at async ElasticsearchDateFetcher.handle (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:80183:33)
        at async file:///app/source/packages/job-components/dist/src/execution-context/worker.js:75:33
        at async file:///app/source/packages/utils/dist/src/promises.js:215:19
        at async file:///app/source/packages/utils/dist/src/promises.js:215:19
        at async WorkerExecutionContext._runSliceOnce (file:///app/source/packages/job-components/dist/src/execution-context/worker.js:295:29)
        at async Module.pRetry (file:///app/source/packages/utils/dist/src/promises.js:88:16)
        at async SliceExecution.run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:38:22)
        at async Worker.runOnce (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:169:13)
        at async _run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:124:17)
        at _errorHandlerFn (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:107674:11)
        at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
[2025-02-13T00:02:00.195Z] ERROR: teraslice/10 on ts-wkr-es-to-noop-713f41de-6cc7-5d88db94df-fpgpc: slice state for 18a5d575-3bf8-4378-9661-9fed1daacae1 has been marked as error (assignment=worker, module=slice, worker_id=NcrF4JHE, ex_id=18a5d575-3bf8-4378-9661-9fed1daacae1, job_id=713f41de-6cc7-43f9-a982-de3f69cc2899, slice_id=e693a559-e098-4ca3-a7bd-e1546714e371, err.code=INTERNAL_SERVER_ERROR)
    TSError: aborted
        at Module.pRetry (file:///app/source/packages/utils/dist/src/promises.js:103:21)
        at async SliceExecution.run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:38:22)
        at async Worker.runOnce (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:169:13)
        at async _run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:124:17)
        at pRetry (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:87682:17)
        at async ElasticsearchReaderAPI.fetch (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:89520:20)
        at async ElasticsearchDateFetcher.handle (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:80183:33)
        at async file:///app/source/packages/job-components/dist/src/execution-context/worker.js:75:33
        at async file:///app/source/packages/utils/dist/src/promises.js:215:19
        at async file:///app/source/packages/utils/dist/src/promises.js:215:19
        at async WorkerExecutionContext._runSliceOnce (file:///app/source/packages/job-components/dist/src/execution-context/worker.js:295:29)
        at async Module.pRetry (file:///app/source/packages/utils/dist/src/promises.js:88:16)
        at async SliceExecution.run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:38:22)
        at async Worker.runOnce (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:169:13)
        at async _run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:124:17)
        at _errorHandlerFn (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:107674:11)
        at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
[2025-02-13T00:02:00.195Z] ERROR: teraslice/10 on ts-wkr-es-to-noop-713f41de-6cc7-5d88db94df-fpgpc: slice e693a559-e098-4ca3-a7bd-e1546714e371 run error (assignment=worker, module=worker, worker_id=NcrF4JHE, ex_id=18a5d575-3bf8-4378-9661-9fed1daacae1, job_id=713f41de-6cc7-43f9-a982-de3f69cc2899, err.code=INTERNAL_SERVER_ERROR)
    TSError: Slice failed processing, caused by TSError: aborted
        at SliceExecution._markFailed (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:117:15)
        at async SliceExecution.run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:48:17)
        at async Worker.runOnce (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:169:13)
        at async _run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:124:17)
[2025-02-13T00:02:00.493Z] ERROR: teraslice/10 on ts-wkr-es-to-noop-713f41de-6cc7-5d88db94df-fpgpc: (assignment=worker, module=worker_context, worker_id=NcrF4JHE, ex_id=18a5d575-3bf8-4378-9661-9fed1daacae1, job_id=713f41de-6cc7-43f9-a982-de3f69cc2899, err.code=INTERNAL_SERVER_ERROR)
    A slice error occurred {
      slice: {
        slice_id: 'cdc2e891-e326-42ac-b1f2-56109354cd8a',
        slicer_id: 0,
        slicer_order: 172,
        request: {
          start: '2025-02-12T21:14:03.000Z',
          end: '2025-02-12T21:14:04.000Z',
          limit: '2025-02-12T21:15:57.000Z',
          holes: [],
          count: 16137
        },
        _created: '2025-02-13T00:01:27.569Z'
      }
    }
    --
    TSError: socket hang up
        at pRetry (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:87682:17)
        at async ElasticsearchReaderAPI.fetch (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:89520:20)
        at async ElasticsearchDateFetcher.handle (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:80183:33)
        at async file:///app/source/packages/job-components/dist/src/execution-context/worker.js:75:33
        at async file:///app/source/packages/utils/dist/src/promises.js:215:19
        at async file:///app/source/packages/utils/dist/src/promises.js:215:19
        at async WorkerExecutionContext._runSliceOnce (file:///app/source/packages/job-components/dist/src/execution-context/worker.js:295:29)
        at async Module.pRetry (file:///app/source/packages/utils/dist/src/promises.js:88:16)
        at async SliceExecution.run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:38:22)
        at async Worker.runOnce (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:169:13)
        at async _run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:124:17)
        at _errorHandlerFn (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:107674:11)
        at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
[2025-02-13T00:02:00.496Z] ERROR: teraslice/10 on ts-wkr-es-to-noop-713f41de-6cc7-5d88db94df-fpgpc: slice state for 18a5d575-3bf8-4378-9661-9fed1daacae1 has been marked as error (assignment=worker, module=slice, worker_id=NcrF4JHE, ex_id=18a5d575-3bf8-4378-9661-9fed1daacae1, job_id=713f41de-6cc7-43f9-a982-de3f69cc2899, slice_id=cdc2e891-e326-42ac-b1f2-56109354cd8a, err.code=INTERNAL_SERVER_ERROR)
    TSError: socket hang up
        at Module.pRetry (file:///app/source/packages/utils/dist/src/promises.js:103:21)
        at async SliceExecution.run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:38:22)
        at async Worker.runOnce (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:169:13)
        at async _run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:124:17)
        at pRetry (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:87682:17)
        at async ElasticsearchReaderAPI.fetch (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:89520:20)
        at async ElasticsearchDateFetcher.handle (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:80183:33)
        at async file:///app/source/packages/job-components/dist/src/execution-context/worker.js:75:33
        at async file:///app/source/packages/utils/dist/src/promises.js:215:19
        at async file:///app/source/packages/utils/dist/src/promises.js:215:19
        at async WorkerExecutionContext._runSliceOnce (file:///app/source/packages/job-components/dist/src/execution-context/worker.js:295:29)
        at async Module.pRetry (file:///app/source/packages/utils/dist/src/promises.js:88:16)
        at async SliceExecution.run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:38:22)
        at async Worker.runOnce (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:169:13)
        at async _run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:124:17)
        at _errorHandlerFn (file:///app/assets/bd33953c2886d977354da2d9a90a4fa11015fed5/index.js:107674:11)
        at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
[2025-02-13T00:02:00.496Z] ERROR: teraslice/10 on ts-wkr-es-to-noop-713f41de-6cc7-5d88db94df-fpgpc: slice cdc2e891-e326-42ac-b1f2-56109354cd8a run error (assignment=worker, module=worker, worker_id=NcrF4JHE, ex_id=18a5d575-3bf8-4378-9661-9fed1daacae1, job_id=713f41de-6cc7-43f9-a982-de3f69cc2899, err.code=INTERNAL_SERVER_ERROR)
    TSError: Slice failed processing, caused by TSError: socket hang up
        at SliceExecution._markFailed (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:117:15)
        at async SliceExecution.run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/slice.js:48:17)
        at async Worker.runOnce (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:169:13)
        at async _run (file:///app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:124:17)

Could not reproduce it on elasticsearch-assets:v4.2.1.

@sotojn
Copy link
Contributor

sotojn commented Feb 13, 2025

For whats its worth here is my scheduled mesh-chaos "experiment" that I ran against it if we ever need to reproduce this:

kind: Schedule
apiVersion: chaos-mesh.org/v1alpha1
metadata:
  namespace: services-dev1
  name: abort-3-2
  annotations:
    experiment.chaos-mesh.org/pause: 'false'
spec:
  schedule: '*/1 * * * *'.      # this is just cron syntax for "every minute do this"
  startingDeadlineSeconds: null
  concurrencyPolicy: Forbid
  historyLimit: 1
  type: HTTPChaos
  httpChaos:
    selector:
      namespaces:
        - services-dev1
      labelSelectors:
        app: elasticsearch
    mode: all
    target: Response
    abort: true
    port: 9200
    path: '*'
    duration: 500ms

@godber
Copy link
Member Author

godber commented Feb 13, 2025

@godber godber closed this as completed Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants