Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rf_agg_approx_quantiles function #429

Merged
merged 11 commits into from
Jan 13, 2020

Conversation

vpipkt
Copy link
Member

@vpipkt vpipkt commented Nov 21, 2019

Initial implementation with quantiles

To Do

  • Add to python API
  • Add to SQL API
  • Docs
  • release notes

Initial implementation with quantiles

Signed-off-by: Jason T. Brown <jason@astraea.earth>
@vpipkt vpipkt requested a review from metasim November 21, 2019 18:54

describe("DataFrame.tileStats extension methods") {

val df = TestData.sampleGeoTiff.toDF()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like to see a test of this with projected raster tile, raster ref, raster source etc. maybe in integration tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get there with Expressions...

Copy link
Member

@metasim metasim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I working on some alternative approaches to discuss that I think will be more "idiomatic RasterFrames".

@@ -155,6 +155,9 @@ trait DataFrameMethods[DF <: DataFrame] extends MethodExtensions[DF] with Metada
def withPrefixedColumnNames(prefix: String): DF =
self.columns.foldLeft(self)((df, c) ⇒ df.withColumnRenamed(c, s"$prefix$c").asInstanceOf[DF])

/** */
def tileStat(): RasterFrameStatFunctions = new RasterFrameStatFunctions(self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on an alternative approach to discuss, one that takes an approach more in line with the other aggregate functions (i.e. rf_agg_foo_bar). While I know that Spark DataFrames uses this extension method approach in a couple of places (stats, na, ...) I think it adds an inconsistency that can feel arbitrary if the same thing can be achieved with columnar functions. na is a good example of one that can't, because it's effectively implementing a builder for NA handling, but I don't think were in that situation.

*
* @since 2.0.0
*/
def approxTileQuantile(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this copied? If so, we need to attribute and reference, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya oops


describe("DataFrame.tileStats extension methods") {

val df = TestData.sampleGeoTiff.toDF()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get there with Expressions...

val result = df.tileStat().approxTileQuantile(
Array("tile", "tilePlus2"),
Array(0.25, 0.75),
0.00001
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to default this? Or is it strongly scale dependent?

…lt-in functions all the way to a custom aggregate expression.
@metasim
Copy link
Member

metasim commented Nov 25, 2019

See this for a couple alternate approaches that don't depend DataFrame extension methods.

s22s#57

Copy link
Member

@metasim metasim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See alternate approaches and let's converge on one.

@vpipkt vpipkt changed the title Add DataFrame tileStats extension Add rf_agg_approx_quantiles function Jan 2, 2020
@vpipkt
Copy link
Member Author

vpipkt commented Jan 2, 2020

@metasim tests failing on the SQL expression. Not sure how to work in all the different parameters. We could easily register some functions to compute default percentiles like median, quartiles, quantiles, deciles, etc at default relative error.

@metasim
Copy link
Member

metasim commented Jan 3, 2020

@vpipkt I think enabling SQL support is going to take some more work, given the SQL function parameter requirements.... perhaps we need to look for something in the official API that uses non-columnar parameters to a function (does it exist)? For some reason I'm loath to make all of those parameters columnar, but perhaps that's the proper way to do it

I think the current question is do we merge this without support for SQL, or do we work this out first?

@vpipkt
Copy link
Member Author

vpipkt commented Jan 3, 2020

I am fine with a little divergence in the API between SQL and python/scala over this.

Possible paths:

  1. No SQL function. In docs state there is no SQL support.
  2. Expose SQL functions for "canned" quantiles; Can add these to Python and Scala as well
    1. rf_agg_approx_median
    2. rf_agg_approx_quartiles
    3. rf_agg_approx_quantiles
    4. rf_agg_approx_deciles

Signed-off-by: Jason T. Brown <jason@astraea.earth>
@vpipkt
Copy link
Member Author

vpipkt commented Jan 3, 2020

Latest commit should pass tests and is basically the option 1 above

Recommend we add an issue for option 2, lower priority.

Signed-off-by: Jason T. Brown <jason@astraea.earth>
tile.foreachDouble(d => if (!isNoData(d)) result = result.insert(d))
buffer.update(0, result.toRow)
}
else buffer
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiler is complaining about this. Since the function returns Unit this line could be omitted?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, that was me not thinking "mutably". It can go.

@vpipkt vpipkt merged commit 1730af9 into locationtech:develop Jan 13, 2020
@vpipkt vpipkt deleted the feature/tile-quantile branch January 13, 2020 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants