Split processor in Pipelines removing the trailing empty values #48498

pankaj-k · 2019-10-25T00:17:14Z

Elasticsearch version (7.2.0):

Plugins installed: [none]

JVM version: java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

OS version : Windows 10, Version 1709

Description of the problem including expected versus actual behavior:
Using the split processor (since there is no csv processor in pipeline) to split a csv line drops the trailing empty spaces.
A,,B,, gives A, '', B.
Expected behaviour is : A, '', B, '', ''

In Java the default behaviour is this only but they provide an overload of passing -1 as a parameter to retain the trailing empty spaces. There is no such support in split processor.

Steps to reproduce:

Create a simple pipeline:

PUT _ingest/pipeline/test_pipeline
{
  "description": "test",
  "processors": [
    {
      "split": {
        "field": "message",
        "target_field": "splitdata",
        "separator": ","
      }
    }
  ]
}

Test it.

GET _ingest/pipeline/test_pipeline/_simulate
{
  "docs": [
    {
      "_source" :{
        "message" : "A,,B,,"
      }
    }
  ]
}

Results

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "message" : "A,,B,,",
          "splitdata" : [
            "A",
            "",
            "B"
          ]
        },
        "_ingest" : {
          "timestamp" : "2019-10-23T04:25:26.277Z"
        }
      }
    }
  ]
}

Two empty fields after the character 'B' are dropped.

Test with a different input.

GET _ingest/pipeline/test_pipeline/_simulate
{
  "docs": [
    {
      "_source" :{
        "message" : "A,,B,,C"
      }
    }
  ]
}

Result:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "message" : "A,,B,,C",
          "splitdata" : [
            "A",
            "",
            "B",
            "",
            "C"
          ]
        },
        "_ingest" : {
          "timestamp" : "2019-10-23T04:27:38.400Z"
        }
      }
    }
  ]
}

The empty values are preserved.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-10-25T09:07:59Z

Pinging @elastic/es-core-features (:Core/Features/Ingest)

cbuescher added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Oct 25, 2019

danhermann self-assigned this Oct 25, 2019

danhermann added the team-discuss label Oct 28, 2019

danhermann mentioned this issue Oct 29, 2019

Add option to split processor for preserving trailing empty fields #48664

Merged

danhermann removed the team-discuss label Oct 29, 2019

danhermann closed this as completed in #48664 Oct 30, 2019

danhermann mentioned this issue Oct 30, 2019

[7.x] Add option to split processor for preserving trailing empty fields #48685

Merged

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split processor in Pipelines removing the trailing empty values #48498

Split processor in Pipelines removing the trailing empty values #48498

pankaj-k commented Oct 25, 2019

elasticmachine commented Oct 25, 2019

Split processor in Pipelines removing the trailing empty values #48498

Split processor in Pipelines removing the trailing empty values #48498

Comments

pankaj-k commented Oct 25, 2019

elasticmachine commented Oct 25, 2019