-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate schemaParser property to avroSchemaParser task #33
Conversation
With this approach there is more freedom in what Schema Parser can be provided
@nevillelyh wdyt? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @RustedBones WDYT?
src/main/scala/sbtavro/SbtAvro.scala
Outdated
@@ -49,11 +49,26 @@ object SbtAvro extends AutoPlugin { | |||
|
|||
lazy val avroArtifactTasks: Seq[TaskKey[File]] = Seq(Compile, Test).map(_ / packageAvro) | |||
|
|||
val avroSchemaParser = taskKey[() => Schema.Parser](".avsc schema parser") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to have TaskKey[() => Schema.Parser]
instead of TaskKey[Schema.Parser]
as signature ?
I felt that tasks are executed every time if not cached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is unfortunate. The behaviour of Schema Parser is as such that it changes internal state while parsing new files - see AvscFilesCompiler.java
. That is the reason why it is needed to actually re-create (or clone, but no such option is given) new Parser object and we can not reuse original from the task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to get rid of this
val schemaParser = new AtomicReference(new Schema.Parser())
which is just here to store the parser configuration.
What about using a builder setting avroSchemaParserBuilder: SettingKey[SchemaParserBuilder]
with
public class SchemaParserBuilder {
private final Schema.Parser parser = new Schema.Parser();
private SchemaParserBuilder() { }
public Schema.Parser build() {
Schema.Parser copy = new Schema.Parser();
copy.addTypes(parser.getTypes());
copy.setValidate(parser.getValidate());
copy.setValidateDefaults(parser.getValidateDefaults());
return copy;
}
public SchemaParserBuilder withTypes(Map<String, Schema> types) {
parser.addTypes(types);
return this;
}
public SchemaParserBuilder withValidate(boolean validate) {
parser.setValidate(validate);
return this;
}
public SchemaParserBuilder withValidateDefaults(boolean validate) {
parser.setValidateDefaults(validate);
return this;
}
public static SchemaParserBuilder newBuilder() {
return new SchemaParserBuilder();
}
}
We can also define the extra avroExternalTypes: SettingKey[Map[String, Schema]]
to ease external schema injection.
If user want to have a complete custom parser, they will have to give a new builder implementation.
Overall it's more code but it guides more the users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is on a way to remove val schemaParser = new AtomicReference(new Schema.Parser())
(annotating as deprecated).
I did not remove it because not wanting to break suddenly user code.
Builder might be a better api than supplier.
I think it might sense to deprecate settings avroValidate
and avroValidateDefaults
.
Then from user POV code would look like:
avroSchemaParserBuilder := SchemaParserBuilder.newBuilder().withValidate(???). withValidateDefaults(???)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Let's avoid sudden breaking changes.
See this PR
override def parse(file: java.io.File): org.apache.avro.Schema = { | ||
val schema = super.parse(file) | ||
if (schema.getType == org.apache.avro.Schema.Type.RECORD) { | ||
schema.addProp("com.cavorite.sbt-avro.artifact", moduleID.toString()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the effect of this statement ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the test to demonstrate our use case - I would like to add custom property (avro allows un-recognized properties in schema, citation needed...) in this case the artifact where compiled schema is published, so data consumers can easily see how to include compiled dependency in their project.
This particular case would generate schema like:
{
"type": "record",
"name": "...",
"namespace": "...",
"fields": ["..."],
"com.cavorite.sbt-avro.artifact": "avscparser-test:avscparser-test:0.1.0-SNAPSHOT"
}
val parser = new AnnotateWithArtifactSchemaParser(projectID.value) | ||
val b = org.apache.avro.Schema.createEnum( | ||
"B", null, "com.cavorite.test.avscparser", java.util.Collections.singletonList("B1")) | ||
parser.addTypes(java.util.Collections.singletonMap("B", b)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that adding the types is required if you don't want to break caching.
This also ignore the validation settings
val types = schemaParser.get().getTypes
val validate = avroValidate.value
val validateDefaults = avroValidateDefaults.value
// copy of global schemaParser to avoid race condition
// when deprecated schemaParser is removed, just create new parser instance
new Schema.Parser()
.addTypes(types)
.setValidate(validate)
.setValidateDefaults(validateDefaults)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a demonstration in test how I see user code - by implementing task they choose explicitly validations (or rely on default which changes from false in v1.8 to true in v1.9...), and they migrate adding types from now deprecated schemaParser
to this task instead.
parser builder
This allows for much simplier approach to SchemaParserBuilder setting
import com.spotify.avro.mojo.SchemaParserBuilder | ||
import org.apache.avro.Schema | ||
|
||
case class DefaultSchemaParserBuilder(types: util.Map[String, Schema], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice ! we should probably move types
to a scala collection and call java converters in the build. This will give a cleaner api in sbt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense. Also I will make it List instead, as key is just schema name (and not used anyways later in Avro implementation for this call :D)
import org.apache.avro.Schema; | ||
|
||
public interface SchemaParserBuilder { | ||
Schema.Parser build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not keeping the withTypes
, withValidate
and withValidateDefaults
since all Schema.Parser
have to support those ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When users implement their own builder, they most likely set those values themselves within it.
The default implementation still has ability for setting validations.
This approach allows for smaller api footprint allowing more flexibility how users want to implement builder, and not restricting us to future changes in Avro API (adding more validations, removing some?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then again - adding those function would allow easier control over validations for users if org infra team would provide custom builder (as I imagine it would be for us at Spotify :))
@sisidra @RustedBones good to merge? We can do a release for now & iterate from there? |
Can we update version in ThisBuild := "3.0.0-SNAPSHOT" |
With this approach there is more freedom in what Schema Parser can be provided.