-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
common code refactoring #976
common code refactoring #976
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most things seem to fall on the "right" side of the re-grouping, at least in terms of making a common reusable core goes.
I'm not really sure about where the License Checker should "live", but it seems both deeply tied to both the pool's implementation and the Elasticsearch output's requirements. Since we know that the requirements of other consumers using this common core will have differing requirements, maybe we should build out a preflight abstraction so that a consumer can define its own preflight check.
I mashed my keyboard a bit to prototype the kind of thing I'm envisioning. The setup is a bit verbose, but the usage ends up being pretty readable https://gist.github.com/yaauie/d73c4ab4284909cd3e0a5309ea3cd938
# @param url [LogStash::Util::SafeURI] | ||
# @param meta [Hash] | ||
def license_check!(url, meta) | ||
if oss? || valid_es_license?(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- consider pulling
oss?
into this module - consider pulling
valid_es_license?
and its chain of helper functions (get_license
, others?) down into this module - document exactly what is needed from the class this module is included into (maybe just
perform_request_to_url
?) If this can only be mixed into pool and depends on its internals, it might not be worth breaking out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I considered this, and settled for only exposing just a main license_check!
method because I do not think more will be required on the data_streams side to implement the check but I think oss?
and valid_es_license?
should probably go there too. I also wanted to avoid performing too deep a refactoring just for a slight change in the check logic. This struck me as an acceptable way to externalize the check - I will definitely add a big comment about its usage and context; the fact it is only meant to live within the Pool class.
I know this a rather loose integration and I'll see how I can add relevant specs to make sure the api for this mixin is somehow protected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I opted to also extract oss?
, valid_es_license?
, log_license_deprecation_warn
together with the main entry point license_check!
.
end | ||
|
||
# defined in license_check.rb | ||
license_check!(url, meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a pre-flight proc (connection_preflight
?) could be registered by the thing creating the client pool, which would enable different common consumers to do their own pre-flights, perhaps including their own specific license-checking behaviour or registering a no-op preflight if there is no extra steps needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The license checking logic is badly intertwined in the Pool logic and I am a bit worried about performing this refactor at this stage. I agree that it would be a positive code cleanup but I am unsure it is worth the investment now-now just for the benefit of making the data_streams output work. This proposed solution is rather non-intrusive and I will add more comments about the intent. Maybe keep a tab on this as a future refactoring followup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can disregard this above comment, I moved that discussion at the top level.
b15e7e7
to
53dab0c
Compare
0e4f82b
to
e7b0b06
Compare
@yaauie For some reason I did not catch your top level comment above and only followed up the inline comments. I agree in essence with your license checking proposal. As you know the license checking logic is badly intertwined in the Pool logic and I am a bit worried about performing this refactor at this stage with the added risks. I agree that it would be a positive code cleanup but I am unsure it is worth the investment now-now just for the benefit of making the data_streams output work. This proposed solution is rather non-intrusive with minimal code change. I suggest we followup with your license checking proposal after the first data_streams iteration? |
WDYT about renaming |
This is ready be moved from Draft to Ready for review.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the Elasticsearch Output's license checker is statically embedded in the pool implementation (e.g., by including a module into the pool's class), we cannot reuse the pool from another plugin that has its own license-checking logic. This is the primary reason why I think we need to pass a pre-flight checker to the pool via the client builder, so that each consumer of the pool can define its own license checking independently of the pool's implementation.
@yaauie as discussed, thanks for the heads up on this, I now see the potential problem with my suggestion and definitely need to refactor it. |
@yaauie I refactored the license checking to use a proper class and not a mixin which will solve the problem of collision when/if both the elasticsearch and the data_streams outputs are used in the same VM. This license checking class is required in the top The license checking class, similar to the prior mixin is made to minimized change as much as possible. Other than minimizing risk, one of the reason is that all this code between the pool, http_client, http_client_builder and the related specs is really entangled and a refactor similar to what you suggested would require quite a deeper refactoring and lots of specs changes and at this point I am not sure we should take that on plus the related risks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A big part of the motivation to my previous recommendation (pre-flight), was to separate concerns. Requiring the license check's implementation to manipulate the pool's internals is dangerous, because it effectively makes the internal details of the pool implementation part of its API.
If the pre-flight is only responsible for emitting true
/false
and its check?
method has access to the pool and base url being pre-flighted with no expectation that it be used to mutate said pool or url, then any implementation can tell us whether it is licensed without having to care about pool internals. This is especially important because the first consumer of these APIs we plan creating is going to live in a separate repository, with separate versioning and release timing, and very little ability to positively ensure lock-step matching of versions when installed into Logstash.
I think that a pre-flight as I suggested can be plumbed through in very much the same way you have plumbed a license checker through, except that when no pre-flight is given we should rely on a noop implementation whose Preflight#check?
always returns true
(the LogStash::Outputs::Elasticsearch#build_client
is used from logstash core's central pipeline management, so we want to avoid changes that would affect a no-arguments version of that method).
|
||
# license_check in mixed in from LogStash::Outputs::ElasticSearchPoolMixin::LicenseChecker | ||
# separately defined in the elasticsearch output and the elasticsearch_data_streams output. | ||
@license_checker.license_check!(url, meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we only had a pre-flight checker as I had suggested, that was given a reference to this pool and the url
and simply returned true or false but was not expected to manipulate pool internals, then we could worry about pool internals here, in the pool implementation, and not in different consumers of this API in different code-bases.
# license_check in mixed in from LogStash::Outputs::ElasticSearchPoolMixin::LicenseChecker | |
# separately defined in the elasticsearch output and the elasticsearch_data_streams output. | |
@license_checker.license_check!(url, meta) | |
if @preflight.check?(this, url) | |
meta[:state] = :alive | |
else | |
logger.warn("Connection failed pre-flight check", :url => url) | |
end |
The Elasticsearch Output then, could define its own pre-check that considered all of the stages in its license check without having to care about pool internals:
https://gist.github.com/yaauie/d73c4ab4284909cd3e0a5309ea3cd938#file-preflight_check-rb-L89
# normally this class is passed from the Commons#build_client method to allow | ||
# alternate license checking logic form other implementations like the data streams output. | ||
require "logstash/outputs/elasticsearch/license_check" | ||
@license_checker = LogStash::ElasticSearchOutputLicenseChecker.new(self, logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may break Logstash core's central pipeline management, which already reaches into the Elasticsearch Output to build_client
without any args. I think we need to have a no-op always-success implementation if no checker is given.
@yaauie I agree with your assessment, my goal was do really minimize change. I also agree we can probably improve without too much code disruption. WRT to core's pipeline management which relies on the current |
Central Management does its own license checking once the pool is established, and the flow includes helpful hints to a user when they connect to an Elasticsearch that is unlicensed. Right now, the Elasticsearch Output does warn when connecting a default-distribution Logstash to an unlicensed Elasticsearch (and Central Management gets this warning "for free" whether it wants it or not), but in 8.0 we intend to change that behaviour to block the connection from being usable. When we do this, if Logstash's Central management continues to use the Elasticsearch Output's I believe that the current license check warning missed the fact that the pool was also being used by Logstash core. It is really hard to keep track of what things are reaching into this plugin and depend on the specific shape of its internals, and because we cannot lock-step this plugin with the things that reach into it, it is very important that we establish clear boundaries. Refactoring this license check up to the specific consumer that needs it (and back-tracking its accidental exposure out of the consumers that don't) is the only way I see to make this maintainable, especially as we look to add consumers. |
1- IMO refactoring core's usage of There are 2 places where
Both places display this comment:
The good news here is that once we are done with this PR, we will be able to update core x-pack code in both these places to use the extracted common code here and properly build a client with an appropriate license checking class without having to instantiate the whole plugin. 2- I pushed changes to remove dependencies to the Pool class form the license checker. I believe we are getting closer to what you would like to see but also by keeping very close & similar implementation and license checking flow. The only functional difference now is that the HTTP request to fetch the license is done systematically on every Again I understand this is not exactly how you see how the license checker should be ideally structured but this implementation has the benefit of not disturbing the specs too much which is preferable because if you refactor the code and also have to significantly change the specs then you increase the risk of introducing behaviour changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO refactoring core's usage of build_client is out of scope and my intention is to keep the no-arg build_client compatible
[...]
The good news here is that once we are done with this PR, we will be able to update core x-pack code in both these places to use the extracted common code here and properly build a client with an appropriate license checking class without having to instantiate the whole plugin.
My point is that we must not do connection-level license checking in the Logstash core usages of this plugin's shared pool, because those usages require a valid connection to walk people toward the correct configuration. We accidentally do so now, which is a bug (thankfully we only emit a warning and haven't yet made the breaking change to prevent the connection from being added to the pool). We have the opportunity to not perpetuate this bug.
This implementation has the benefit of not disturbing the specs too much which is preferable because if you refactor the code and also have to significantly change the specs then you increase the risk of introducing behaviour changes.
Exposing an API is an inflection point, and failure to get that API right injects a lot of risk that we cannot ignore.
I know we're doing a lot of back-and-forth here, and it is probably because I failed to fully-explain why I was proposing the pre-flight interface in the shape I proposed it.
- the
Preflight#check?
method in my proposal was carefully crafted to take thisPool
and theurl
as arguments. This allows any implementation to do whatever pre-flight checking it needs to do, including but not limited to license checks. When a consumer of this API can only be used with specific versions of Elasticsearch (e.g., because it depends on a feature introduced mid-major) or with specific Elasticsearch features enabled (e.g., data streams) it can codify such requirement in its pre-check. I believe that this is a boundary that solves our current use-case (the Elasticsearch Output), while providing sufficient room for planned and unplanned future work.
I've put together a PR against your branch to show what I'm talking about. It includes the existing on-pool spec validation of the ElasticsearchOutputLicenseChecker, adds unit-specs for the pool using a license checker that doesn't care bout the license checker's internals, and includes a noop license checker implementation to be used by external consumers like Logstash core when build_client
is called without specifying a license checker.
Ok sure, but I don't think changing this behaviour in v7 is a good idea. For v8 core x-pack can pass in a no-op license checking class to its |
@yaauie about your proposal colinsurprenant#1 is essentially what I had right before I removed passing As for your proposal for using the +1 on the changes you suggest, minus the change in behaviour of the no-arg |
Connection-level license checking was implemented on the Elasticsearch Output to deprecate a specific use of the Elasticsearch Output (#875), and was accidentally consumed by the Logstash core classes that reach in and use the Elasticsearch Output's |
@yaauie I understand but still I don't think we should change that behaviour which will apply to any logstash version installing that new version of the elasticseach output plugin, I think this is risky and that we should only introduce the explicit no-op license checker into |
@yaauie I pushed the changes as suggested and discussed above. |
@yaauie following up on our discussion and your pointers on the x-pack license checking logic, AFAICS both the monitoring and management are in fact doing the license checking outside the connection creation of The x-pack license checking in https://github.com/elastic/logstash/tree/master/x-pack/lib/license_checker is used in both places :
So in all x-pack cases So I am +1 on making the no-op license checker the default upon a no-arg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 this is looking like all the necessary separations and should set us up for success moving forward.
I'm not particularly comfortable about dropping things directly into the LogStash
namespace, and would prefer if file paths and class names generally matched.
We have prior art for plugins that publish sharable code doing so in the LogStash::PluginMixins
namespace. Perhaps with that in mind:
- The Elasticsearch Output's license checker could live as
LogStash::Outputs::Elasticsearch::LicenseChecker
inlib/logstash/outputs/elasticsearch/license_checker.rb
, which would clearly identify it as being part of the ES Output plugin. - The shared noop license checker could live as
LogStash::PluginMixins::Elasticsearch::LicenseChecker::NOOP_INSTANCE
or similar inlib/logstash/plugin_mixins/elasticsearch/license_checker.rb
- The shared common module you extracted could live as
LogStash::PluginMixins::Elasticsearch::Common
inlib/logstash/plugin_mixins/elasticsearch/common.rb
While the pool and http client various other code is also now effectively shared by nature of the common module exposing them, there certainly is risk in change file paths, and that probably outweighs the benefit of a wholesale rearrangement.
@yaauie +1 on the namespacing. WDYT about #973 |
28144f8
to
60ebbd9
Compare
44db991
to
efc1c06
Compare
@yaauie I pushed the namespace shuffle. |
end | ||
|
||
alive = @license_checker.appropriate_license?(self, url) | ||
meta[:state] = alive ? :alive : :unlicensed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The :unlicensed
state isn't really a thing at this point, it was a state I could foresee that we need once we enforce license checking in 8.0.0
Actually setting it here may have consequences we aren't aware of, for example in places where we check for :state != :alive
or :state == :dead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so would :dead
be the right state for this then?
The valid states and their meanings should ideally be encoded somewhere as constants or such. No sure we want to introduce that refactor here but we should clean that up at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So looking more closely at the code the 2 places where meta is really used is:
logstash-output-elasticsearch/lib/logstash/outputs/elasticsearch/http_client/pool.rb
Line 267 in efc1c06
@state_mutex.synchronize { @url_info.select {|url,meta| meta[:state] != :alive } }.each do |url,meta|
healthcheck!
cycles through all @url_info
with a meta other than :alive
to see if it can be reused, so here any value, like :unlicensed
is ok, a health check will be performed periodically on these.
logstash-output-elasticsearch/lib/logstash/outputs/elasticsearch/http_client/pool.rb
Lines 441 to 451 in efc1c06
@url_info.each do |url,meta| meta_in_use = meta[:in_use] next if meta[:state] == :dead if lowest_value_seen.nil? || meta_in_use < lowest_value_seen lowest_value_seen = meta_in_use eligible_set = [[url, meta]] elsif lowest_value_seen == meta_in_use eligible_set << [url, meta] end end
get_connection
will pick a connection if it is not :dead
. Here I am not sure if the logic should not be to only consider the :alive
ones instead of the not :dead
ones if we introduce another state like :unlicensed
. The question is: would is make sense for get_connection
to return a :unlicensed
connection? I wouldn't think so.
Currently a :dead
connection is set upon HostUnreachableError
in
logstash-output-elasticsearch/lib/logstash/outputs/elasticsearch/http_client/pool.rb
Lines 402 to 405 in efc1c06
rescue HostUnreachableError => e | |
# Mark the connection as dead here since this is likely not transient | |
mark_dead(url, e) | |
raise e |
So with this, it seems that we could also simply set the meta[:state] = :dead
instead of a new :unlicensed
state and we would not have to change anything else in the logic. The above 2 places where meta is looked at would work; healthcheck!
would perform a check on a :dead
url and get_connection
would not pick it up if it was maked as :dead
.
LMKWYT.
@jsvd I think I addressed all your review comments. Only thing I'd ask you to double check is the LMKWYT. |
def discover_cluster_uuid | ||
return unless defined?(plugin_metadata) | ||
cluster_info = client.get('/') | ||
plugin_metadata.set(:cluster_uuid, cluster_info['cluster_uuid']) | ||
rescue => e | ||
# TODO introducing this logging message breaks many tests that need refactoring | ||
# @logger.error("Unable to retrieve elasticsearch cluster uuid", error => e.message) | ||
end | ||
|
||
def successful_connection? | ||
!!maximum_seen_major_version | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe data streams output will need this cluster uuid and connection predicate too, maybe these should be in common.rb?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm good question; currently discover_cluster_uuid
is called in the context of the template installer
logstash-output-elasticsearch/lib/logstash/outputs/elasticsearch.rb
Lines 436 to 448 in 9451cfb
@template_installer ||= Thread.new do | |
sleep_interval = @retry_initial_interval | |
until successful_connection? || @stopping.true? | |
@logger.debug("Waiting for connectivity to Elasticsearch cluster. Retrying in #{sleep_interval}s") | |
Stud.stoppable_sleep(sleep_interval) { @stopping.true? } | |
sleep_interval = next_sleep_interval(sleep_interval) | |
end | |
if successful_connection? | |
discover_cluster_uuid | |
install_template | |
setup_ilm if ilm_in_use? | |
end | |
end |
and I assumed this was rather specific to the ES output only - this setup_after_successful_connection
method is only called from ES output register
and not from the DS output.
Is this something the DS output will require? the the :cluster_uuid
in the plugin metadata useful in other places?
plugin_metadata.set(:cluster_uuid, cluster_info['cluster_uuid']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cluster_uuid has been used for linking the logstash instance with the ES cluster for monitoring purposes when metric collection is done externally. So we still need to provide this metadata when the plugin uses datastreams, i.e. have something like:
def setup_after_successful_connection
sleep_interval = @retry_initial_interval
until successful_connection? || @stopping.true?
@logger.debug("Waiting for connectivity to Elasticsearch cluster. Retrying in #{sleep_interval}s")
Stud.stoppable_sleep(sleep_interval) { @stopping.true? }
sleep_interval = next_sleep_interval(sleep_interval)
end
discover_cluster_uuid if successful_connection?
end
Looking at the test failures, can you check if the OSS test that took 6h was a hiccup somehow? Looking at the output log from travis it all seemed to work well? |
@jsvd yeah I looked and did not understand a) why it took 6hrs and b) why is shows a failed state although all tests passed and the exit code was 0 🤔 |
@jsvd the previous build error is gone but now there's a new one which seems unrelated.
I am trying to reproduce locally. |
Can't reproduce locally, I am restarting the job to see. |
Green ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran a few tests with Elasticsearch OSS and default distro and it all seems to work fine.
Overall I'm not too giddy about the license checker injection, but we first need to stop using plugin internals for Central Management.
@colinsurprenant I missed the uuid comment, which I think justifies a last change to have the successful_connection and discover_cluster_uuid apparatus in the common file #976 (comment) |
@jsvd ok, makes sense for UUID - good catch. |
@jsvd I pushed the changes related to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after green build
60ebbd9
to
38a7d36
Compare
65c48f8
to
049a1a0
Compare
specific elasticsearch methods are now into the main elasticsearch class file and shared methods have been moved into the PluginMixins::ElasticSearch::Common namespace. The license checking code has been externalized and can now be specified as argument to the build_client method. This is code refactorting that has no end-user impact.
049a1a0
to
8fe59c4
Compare
This version bump wraps #973 and #976 to refactor methods and configuration options. This release does not contain user-facing changes other than the management/monitoring in logstash core which depends on this plugin should produce less irrelevant licensing logs by benefiting the no-op license checking class.
This is work toward Data Streams integration elastic/logstash#12178
Specific elasticsearch methods are now into the main elasticsearch class file and shared methods have been moved into the PluginMixins::ElasticSearch::Common namespace. The license checking code has been externalized and can now be specified as argument to the
build_client
method.This is code refactorting that has no end-user impact.