Codegen generates inefficient Java code for `Long` and `Boolean` mandatory parameters #283

jkronegg · 2025-02-06T13:01:17Z

👓 What did you see?

For Long parameters, the codegen generates Java code which looks like this:

private final Long seconds;
private final Long nanos;

public Duration(
    Long seconds,
    Long nanos
) {
    this.seconds = requireNonNull(seconds, "Duration.seconds cannot be null");
    this.nanos = requireNonNull(nanos, "Duration.nanos cannot be null");
}

The parameters of type Long are most of the time required to be non-null, but still we are storing them into a object, not a primitive type.

This occurs for the following generated types:

Duration (both seconds and nanos fields)
Location (line field but not column field)
TestCaseStarted (attempt field)
Timestamp (both seconds and nanos fields)

Note that Group contains a non-mandatory field start.

The Boolean type is also impacted by the same issue, i.e. for the generated classes :

ParameterType (preferForRegularExpressionMatch and useForSnippets fields)
TestCaseFinished (willBeRetried field)
TestRunFinished (success field)

✅ What did you expect to see?

If the java.java.erb code generator see a Long which is mandatory, it should generate a long primitive type, so that no conversion is done.
This will avoid back and forth Long <-> long conversions, so will lead to faster code.

The same apply to Boolean type.

This would probably cause a breaking change because all interfaces to the impacted classes will change.

📦 Which tool/library version are you using?

gherkin 31.0.0 (which uses messages-27.2.0)

🔬 How could we reproduce it?

N/A (Look at the generated code).

📚 Any additional context?

The performance impact has been detected while working on cucumber/gherkin#361, using IntelliJ Profiler (it's hard to see the real impact because there is a lot of generated classes which suffer from this syndrome).

The text was updated successfully, but these errors were encountered:

mpkorstanje · 2025-02-06T13:33:23Z

I've always understood null checks to be quite efficient and we're not boxing/unboxing any primitives.

How much time is lost here in the context of parsing Gherkin?

jkronegg · 2025-02-06T13:58:45Z

It's hard to tell the full performance impact, but at least creating the Location object takes about 6% of io.cucumber.gherkin.Parser.parse(). That's relatively low and because of the breaking change, so I'm not sure it's the top priority to correct this issue.

mpkorstanje · 2025-02-06T19:52:16Z

That doesn't sound too terrible, nearly every line has a location, multiple if there table cells on the line.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codegen generates inefficient Java code for `Long` and `Boolean` mandatory parameters #283

Codegen generates inefficient Java code for `Long` and `Boolean` mandatory parameters #283

jkronegg commented Feb 6, 2025

mpkorstanje commented Feb 6, 2025

jkronegg commented Feb 6, 2025

mpkorstanje commented Feb 6, 2025

Codegen generates inefficient Java code for Long and Boolean mandatory parameters #283

Codegen generates inefficient Java code for Long and Boolean mandatory parameters #283

Comments

jkronegg commented Feb 6, 2025

👓 What did you see?

✅ What did you expect to see?

📦 Which tool/library version are you using?

🔬 How could we reproduce it?

📚 Any additional context?

mpkorstanje commented Feb 6, 2025

jkronegg commented Feb 6, 2025

mpkorstanje commented Feb 6, 2025

Codegen generates inefficient Java code for `Long` and `Boolean` mandatory parameters #283

Codegen generates inefficient Java code for `Long` and `Boolean` mandatory parameters #283