Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate schemaParser property to avroSchemaParser task #33

Merged
merged 9 commits into from
Aug 7, 2020

Conversation

sisidra
Copy link
Contributor

@sisidra sisidra commented Jul 22, 2020

With this approach there is more freedom in what Schema Parser can be provided.

With this approach there is more freedom in what Schema Parser can be provided
@sisidra
Copy link
Contributor Author

sisidra commented Jul 22, 2020

@nevillelyh wdyt?

@nevillelyh nevillelyh requested a review from RustedBones July 23, 2020 19:02
Copy link
Collaborator

@nevillelyh nevillelyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @RustedBones WDYT?

@@ -49,11 +49,26 @@ object SbtAvro extends AutoPlugin {

lazy val avroArtifactTasks: Seq[TaskKey[File]] = Seq(Compile, Test).map(_ / packageAvro)

val avroSchemaParser = taskKey[() => Schema.Parser](".avsc schema parser")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to have TaskKey[() => Schema.Parser] instead of TaskKey[Schema.Parser] as signature ?
I felt that tasks are executed every time if not cached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is unfortunate. The behaviour of Schema Parser is as such that it changes internal state while parsing new files - see AvscFilesCompiler.java. That is the reason why it is needed to actually re-create (or clone, but no such option is given) new Parser object and we can not reuse original from the task.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to get rid of this

val schemaParser = new AtomicReference(new Schema.Parser())

which is just here to store the parser configuration.

What about using a builder setting avroSchemaParserBuilder: SettingKey[SchemaParserBuilder] with

public class SchemaParserBuilder {

    private final Schema.Parser parser = new Schema.Parser();

    private SchemaParserBuilder() { }

    public Schema.Parser build() {
        Schema.Parser copy = new Schema.Parser();
        copy.addTypes(parser.getTypes());
        copy.setValidate(parser.getValidate());
        copy.setValidateDefaults(parser.getValidateDefaults());
        return copy;
    }

    public SchemaParserBuilder withTypes(Map<String, Schema> types) {
        parser.addTypes(types);
        return this;
    }

    public SchemaParserBuilder withValidate(boolean validate) {
        parser.setValidate(validate);
        return this;
    }

    public SchemaParserBuilder withValidateDefaults(boolean validate) {
        parser.setValidateDefaults(validate);
        return this;
    }

    public static SchemaParserBuilder newBuilder() {
        return new SchemaParserBuilder();
    }
}

We can also define the extra avroExternalTypes: SettingKey[Map[String, Schema]] to ease external schema injection.

If user want to have a complete custom parser, they will have to give a new builder implementation.
Overall it's more code but it guides more the users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is on a way to remove val schemaParser = new AtomicReference(new Schema.Parser()) (annotating as deprecated).
I did not remove it because not wanting to break suddenly user code.
Builder might be a better api than supplier.
I think it might sense to deprecate settings avroValidate and avroValidateDefaults.

Then from user POV code would look like:

  avroSchemaParserBuilder := SchemaParserBuilder.newBuilder().withValidate(???). withValidateDefaults(???)

Copy link
Collaborator

@RustedBones RustedBones Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Let's avoid sudden breaking changes.
See this PR

override def parse(file: java.io.File): org.apache.avro.Schema = {
val schema = super.parse(file)
if (schema.getType == org.apache.avro.Schema.Type.RECORD) {
schema.addProp("com.cavorite.sbt-avro.artifact", moduleID.toString())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the effect of this statement ?

Copy link
Contributor Author

@sisidra sisidra Jul 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the test to demonstrate our use case - I would like to add custom property (avro allows un-recognized properties in schema, citation needed...) in this case the artifact where compiled schema is published, so data consumers can easily see how to include compiled dependency in their project.
This particular case would generate schema like:

{
  "type": "record",
  "name": "...",
  "namespace": "...",
  "fields": ["..."],
  "com.cavorite.sbt-avro.artifact": "avscparser-test:avscparser-test:0.1.0-SNAPSHOT"
}

val parser = new AnnotateWithArtifactSchemaParser(projectID.value)
val b = org.apache.avro.Schema.createEnum(
"B", null, "com.cavorite.test.avscparser", java.util.Collections.singletonList("B1"))
parser.addTypes(java.util.Collections.singletonMap("B", b))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that adding the types is required if you don't want to break caching.
This also ignore the validation settings

          val types = schemaParser.get().getTypes
          val validate = avroValidate.value
          val validateDefaults = avroValidateDefaults.value
          // copy of global schemaParser to avoid race condition
          // when deprecated schemaParser is removed, just create new parser instance
          new Schema.Parser()
            .addTypes(types)
            .setValidate(validate)
            .setValidateDefaults(validateDefaults)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a demonstration in test how I see user code - by implementing task they choose explicitly validations (or rely on default which changes from false in v1.8 to true in v1.9...), and they migrate adding types from now deprecated schemaParser to this task instead.

import com.spotify.avro.mojo.SchemaParserBuilder
import org.apache.avro.Schema

case class DefaultSchemaParserBuilder(types: util.Map[String, Schema],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice ! we should probably move types to a scala collection and call java converters in the build. This will give a cleaner api in sbt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense. Also I will make it List instead, as key is just schema name (and not used anyways later in Avro implementation for this call :D)

import org.apache.avro.Schema;

public interface SchemaParserBuilder {
Schema.Parser build();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keeping the withTypes, withValidate and withValidateDefaults since all Schema.Parser have to support those ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When users implement their own builder, they most likely set those values themselves within it.
The default implementation still has ability for setting validations.
This approach allows for smaller api footprint allowing more flexibility how users want to implement builder, and not restricting us to future changes in Avro API (adding more validations, removing some?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then again - adding those function would allow easier control over validations for users if org infra team would provide custom builder (as I imagine it would be for us at Spotify :))

@nevillelyh
Copy link
Collaborator

@sisidra @RustedBones good to merge? We can do a release for now & iterate from there?

@RustedBones
Copy link
Collaborator

Can we update version.sbt with

version in ThisBuild := "3.0.0-SNAPSHOT"

@RustedBones RustedBones merged commit 2518fad into sbt:master Aug 7, 2020
@sisidra sisidra deleted the custom-avsc-parser branch August 25, 2020 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants