Comatibility with spark 2.3.0 problems #13

fhuertas · 2018-06-05T13:27:46Z

This method is not valid for spark 2.3.0 because addBatch function now has a streaming dataset that cannot be stored with official batch connector

dannybusch · 2018-06-18T14:13:25Z

Is there already any new hint on how to write a Spark Cassandra Sink for Spark 2.3.x in Structured Streaming?

fhuertas · 2018-06-18T16:25:48Z

I worked on it 1 month ago and I didn't found an easy solution. I just read that there is a new spark version. I will try to check if with 2.3.1 this method is valid. If not, I should thinking other way to solve this problem.

fhuertas · 2018-06-19T07:27:54Z

Confirmed that Spark 2.3.1 keeps the dataframe as streaming. Then this solution is not valid.

When I have time, I will try to implement with other way, maybe using foreach

redsk · 2018-08-23T10:07:03Z

Hi @fhuertas , do you have any news on this?

gitsparky · 2018-10-30T15:38:07Z

The idea from a PR for Hadoop seems to work, see:
hortonworks-spark/shc#238

fhuertas · 2018-11-02T08:53:58Z

This weekend I'm going to try to adapt this code and test if it works.

 override def addBatch(batchId: Long, data: DataFrame): Unit = synchronized {
    // use a local variable to make sure the map closure doesn't capture the whole DataFrame
    val schema = data.schema
    val res = data.queryExecution.toRdd.mapPartitions { rows =>
        val converter = CatalystTypeConverters.createToScalaConverter(schema)
        rows.map(converter(_).asInstanceOf[Row])
      }

    val df = sqlContext.sparkSession.createDataFrame(res,schema)
    df.write
      .options(specifiedHBaseParams)
      .format(defaultFormat)
      .save()
  }

It looks good and could be a good solution

fhuertas · 2018-11-06T11:19:18Z

I have done a tests that works with 2.3.0/1/2 and it looks good, I will upload this afternoon a branch with the preliminary version.

It is the same idea that proposed in the issue: hortonworks-spark/shc#238

Thanks @gitsparky for the link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comatibility with spark 2.3.0 problems #13

Comatibility with spark 2.3.0 problems #13

fhuertas commented Jun 5, 2018

dannybusch commented Jun 18, 2018 •

edited

Loading

fhuertas commented Jun 18, 2018 via email •

edited

Loading

fhuertas commented Jun 19, 2018

redsk commented Aug 23, 2018

gitsparky commented Oct 30, 2018

fhuertas commented Nov 2, 2018

fhuertas commented Nov 6, 2018

Comatibility with spark 2.3.0 problems #13

Comatibility with spark 2.3.0 problems #13

Comments

fhuertas commented Jun 5, 2018

dannybusch commented Jun 18, 2018 • edited Loading

fhuertas commented Jun 18, 2018 via email • edited Loading

fhuertas commented Jun 19, 2018

redsk commented Aug 23, 2018

gitsparky commented Oct 30, 2018

fhuertas commented Nov 2, 2018

fhuertas commented Nov 6, 2018

dannybusch commented Jun 18, 2018 •

edited

Loading

fhuertas commented Jun 18, 2018 via email •

edited

Loading