Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codegen generates inefficient Java code for Long and Boolean mandatory parameters #283

Open
jkronegg opened this issue Feb 6, 2025 · 3 comments

Comments

@jkronegg
Copy link
Contributor

jkronegg commented Feb 6, 2025

👓 What did you see?

For Long parameters, the codegen generates Java code which looks like this:

private final Long seconds;
private final Long nanos;

public Duration(
    Long seconds,
    Long nanos
) {
    this.seconds = requireNonNull(seconds, "Duration.seconds cannot be null");
    this.nanos = requireNonNull(nanos, "Duration.nanos cannot be null");
}

The parameters of type Long are most of the time required to be non-null, but still we are storing them into a object, not a primitive type.

This occurs for the following generated types:

  • Duration (both seconds and nanos fields)
  • Location (line field but not column field)
  • TestCaseStarted (attempt field)
  • Timestamp (both seconds and nanos fields)

Note that Group contains a non-mandatory field start.

The Boolean type is also impacted by the same issue, i.e. for the generated classes :

  • ParameterType (preferForRegularExpressionMatch and useForSnippets fields)
  • TestCaseFinished (willBeRetried field)
  • TestRunFinished (success field)

✅ What did you expect to see?

If the java.java.erb code generator see a Long which is mandatory, it should generate a long primitive type, so that no conversion is done.
This will avoid back and forth Long <-> long conversions, so will lead to faster code.

The same apply to Boolean type.

This would probably cause a breaking change because all interfaces to the impacted classes will change.

📦 Which tool/library version are you using?

gherkin 31.0.0 (which uses messages-27.2.0)

🔬 How could we reproduce it?

N/A (Look at the generated code).

📚 Any additional context?

The performance impact has been detected while working on cucumber/gherkin#361, using IntelliJ Profiler (it's hard to see the real impact because there is a lot of generated classes which suffer from this syndrome).

@mpkorstanje
Copy link
Contributor

I've always understood null checks to be quite efficient and we're not boxing/unboxing any primitives.

How much time is lost here in the context of parsing Gherkin?

@jkronegg
Copy link
Contributor Author

jkronegg commented Feb 6, 2025

It's hard to tell the full performance impact, but at least creating the Location object takes about 6% of io.cucumber.gherkin.Parser.parse(). That's relatively low and because of the breaking change, so I'm not sure it's the top priority to correct this issue.

@mpkorstanje
Copy link
Contributor

That doesn't sound too terrible, nearly every line has a location, multiple if there table cells on the line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants