Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce StringFormattingCheck checkstyle rule #81603

Merged

Conversation

pugnascotia
Copy link
Contributor

The new rule Checks for calls to String#formatted(Object...) that
include format specifiers that are not locale-safe. This method always
uses the default Locale, and so for our purposes it is safer to use
String#format(Locale, String, Object...).

Note that this rule can currently only detect violations when calling
formatted() on a string literal or text block. In theory, it could be
extended to detect violations in local variables or statics.

The new rule Checks for calls to `String#formatted(Object...)` that
include format specifiers that are not locale-safe. This method always
uses the default `Locale`, and so for our purposes it is safer to use
`String#format(Locale, String, Object...)`.

Note that this rule can currently only detect violations when calling
`formatted()` on a string literal or text block. In theory, it could be
extended to detect violations in local variables or statics.
@pugnascotia pugnascotia added >enhancement :Delivery/Tooling Developer tooliing and automation v8.1.0 labels Dec 9, 2021
@elasticmachine elasticmachine added the Team:Delivery Meta label for Delivery team label Dec 9, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

@mark-vieira
Copy link
Contributor

Wouldn't it be simpler to just use forbidden apis for this? We'd just have to add the method signature here.

@pugnascotia
Copy link
Contributor Author

It would be simpler, yes, but the .formatted() method is nice to have and in most cases won't cause a problem. I thought it would be a shame to ban it outright.

@mark-vieira
Copy link
Contributor

It would be simpler, yes, but the .formatted() method is nice to have and in most cases won't cause a problem. I thought it would be a shame to ban it outright.

How would that differ to this checkstyle rule? In what scenarios are we allowing it's use? You can always use @SuppressForbidden for one-off scenarios.

@pugnascotia
Copy link
Contributor Author

This PR arose from #80751 where we're moving to text blocks in a lot of places, and at present it uses formatted in a lots of places.

There are 2 ways of looking at this:

  1. Since .formatted(...) is equivalent to String.format(String, Object...), we should just ban it, since the latter is already banned. The existing ban is reasonable since there's an alternative.
  2. Alternatively, since there's no .formatted(...) that accepts a locale, permit safe uses and warn about unsafe uses.

I suppose a problem with allowing .formatted(...) is that it raises the question of whether we should also have a rule that allows String.format(String, Object...). I'm inclined to say now, but I confess that feels arbitrary.

Basically, I like the flow and succinctness of being able to do e.g. "Hello, %s!".format("world") instead of String.format(Locale.ROOT, "Hello, %s!", "world"). But I accept that this is an inconsistent position.

@mark-vieira
Copy link
Contributor

permit safe uses and warn about unsafe uses

Right, how do we determine what is a "safe" usage though. Does the current checkstyle rule take that into account?

@mark-vieira
Copy link
Contributor

Maybe just banning it production code but allowing it in test code is a good compromise. I suspect we intend to use it in tests much more often, and the terseness will make reading tests considerably easier so there's considerable benefit there.

@pugnascotia
Copy link
Contributor Author

permit safe uses and warn about unsafe uses

Right, how do we determine what is a "safe" usage though. Does the current checkstyle rule take that into account?

@arteam and I both went through an exercise of using all the format specifier with a locale known to be problematic, and this checkstyle rule checks for those format specifiers that will break.

I think banning in prod and allowing in test is a good compromise, I'll adjust this PR accordingly.

@pugnascotia
Copy link
Contributor Author

I've added a forbidden APIs signature in es-server-signatures.txt, so that it doesn't apply to test code.

Am I right in saying that checkstyle runs before forbiddenApis? I wondered about swapping them around, so that someone doesn't spent time fixing a Checkstyle violation, only to then hit a forbidden API error.

@mark-vieira
Copy link
Contributor

Am I right in saying that checkstyle runs before forbiddenApis?

There's no explicit ordering between them so far as I understand. We don't want to make checkstyle run after forbiddenApis because the latter requires compiling the code and the former does not. Such an ordering rule would now make checkstyle wait on code compilation to finish which would reduce build performance.

@pugnascotia
Copy link
Contributor Author

OK, so what do we think - OK to merge? @arteam WDYT?

@mark-vieira
Copy link
Contributor

Do we still need this check if we have added the method to forbidden APIs? Isn't this redundant?

@pugnascotia
Copy link
Contributor Author

Well I only banned it in non-test code, which is what I thought you were suggesting?

@arteam
Copy link
Contributor

arteam commented Dec 14, 2021

We still want to prevent the user to use %d or %f with formatting in test code, because it can fail on some random East Asian locale.

@arteam
Copy link
Contributor

arteam commented Dec 14, 2021

The check looks good to me!

@mark-vieira
Copy link
Contributor

We still want to prevent the user to use %d or %f with formatting in test code, because it can fail on some random East Asian locale.

Ah, ok. I wasn't aware the check was that specific. Roger that.

@arteam
Copy link
Contributor

arteam commented Dec 14, 2021

Some practical examples:

|  Welcome to JShell -- Version 17.0.1
|  For an introduction type: /help intro

jshell> Arrays.stream(Locale.getAvailableLocales())
   ...>         .map(locale -> Map.entry(locale, String.format(locale, "%d", 93425)))
   ...>         .sorted(Map.Entry.<Locale, String>comparingByValue().reversed()
   ...>                 .thenComparing(e -> e.getKey().toString()))
   ...>         .forEach(e -> System.out.printf("%16s - %s%n", e.getKey(), e.getValue()));
             sat - ᱙᱓᱔᱒᱕
          sat_IN - ᱙᱓᱔᱒᱕
    sat_IN_#Olck - ᱙᱓᱔᱒᱕
      sat__#Olck - ᱙᱓᱔᱒᱕
              my - ၉၃၄၂၅
           my_MM - ၉၃၄၂၅
     my_MM_#Mymr - ၉၃၄၂၅
              dz - ༩༣༤༢༥
           dz_BT - ༩༣༤༢༥
     dz_BT_#Tibt - ༩༣༤༢༥
th_TH_TH_#u-nu-thai - ๙๓๔๒๕
              as - ৯৩৪২৫
           as_IN - ৯৩৪২৫
     as_IN_#Beng - ৯৩৪২৫
              bn - ৯৩৪২৫
           bn_BD - ৯৩৪২৫
     bn_BD_#Beng - ৯৩৪২৫
           bn_IN - ৯৩৪২৫
             mni - ৯৩৪২৫
          mni_IN - ৯৩৪২৫
    mni_IN_#Beng - ৯৩৪২৫
      mni__#Beng - ৯৩৪২৫
              mr - ९३४२५
           mr_IN - ९३४२५
     mr_IN_#Deva - ९३४२५
              ne - ९३४२५
           ne_IN - ९३४२५
           ne_NP - ९३४२५
     ne_NP_#Deva - ९३४२५
              sa - ९३४२५
           sa_IN - ९३४२५
     sa_IN_#Deva - ९३४२५
              fa - ۹۳۴۲۵
           fa_AF - ۹۳۴۲۵
           fa_IR - ۹۳۴۲۵
     fa_IR_#Arab - ۹۳۴۲۵
              ks - ۹۳۴۲۵

@arteam
Copy link
Contributor

arteam commented Dec 15, 2021

I've merged #80751, let's check this rule on the new changes!

@arteam
Copy link
Contributor

arteam commented Dec 15, 2021

@elasticmachine update branch

@pugnascotia
Copy link
Contributor Author

The new rule found some problems, which I've addressed.

@arteam
Copy link
Contributor

arteam commented Dec 15, 2021

Thank you!

@pugnascotia
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/eql-correctness

@pugnascotia
Copy link
Contributor Author

Unsurprisingly, I also had to make some changes for the new forbidden API config.

@mark-vieira Back to you.

Copy link
Contributor

@mark-vieira mark-vieira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jkakavas
Copy link
Member

Can this be backported to 8.0 ? We need it to resolve #81804

pugnascotia added a commit that referenced this pull request Dec 23, 2021
The new rule Checks for calls to `String#formatted(Object...)` that
include format specifiers that are not locale-safe. This method always
uses the default `Locale`, and so for our purposes it is safer to use
`String#format(Locale, String, Object...)`.

Note that this rule can currently only detect violations when calling
`formatted()` on a string literal or text block. In theory, it could be
extended to detect violations in local variables or statics.

Note that this change also forbids `.formatted()` in server code, so
we are only permitted to use `.formatted()` in test code.
@pugnascotia
Copy link
Contributor Author

@jkakavas now backported in 4d9f96b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Tooling Developer tooliing and automation >enhancement Team:Delivery Meta label for Delivery team v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants