-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add internal/inputs package and rewrite tsbs_generate_data #64
Conversation
385e049
to
abb6960
Compare
For a long time, our two generation binaries -- tsbs_generate_data and tsbs_generate_queries -- have shared (roughly) a fair bit of code, especially when it comes to flags and validation. However, they were never truly in sync and combining them has been a long wanted to-do. Similarly, to enable better tooling around TSBS, it would be beneficial if more of its functionality was exposed as a library instead of a CLI that needs to be called. To those ends, this PR is a first step in addressing both of them. It introduces the internal/inputs package, which can eventually be moved to pkg/inputs when we are ready for other things to consume its API. This package will contain the structs, interfaces, and functions for generating 'input' to other TSBS tools. For now, that only includes generating data files (for tsbs_load_* binaries) and query files (for tsbs_run_queries_* binaries). This PR starts by introducing these building blocks and converting tsbs_generate_data to use it. The idea is that each type of input (e.g., data, queries) is handled by a Generator, which is customized by a GeneratorConfig. The config contains fields such as the PRNG seed, number of items to generate, etc, which are used by the Generator to control the output. These GeneratorConfigs come with a means of easily adding their fields to a flag.FlagSet, making them work well with CLIs while also not restricting their use to only CLIs. Once configured, this GeneratorConfig is passed to a Generator, which then produces the output. This design has a few other nice features to help cleanup TSBS. One, it uses an approach of bubbling up errors and passing them back to the caller, allowing for more graceful error handling. CLIs can output them to the console, while other programs using the library can pass them to another error handling system if they desire. Two, Generators should be designed with an Out field that allows the caller to point to any io.Writer it wants -- not just the console or a file. The next step will be to convert tsbs_generate_queries to use this as well, which will be done in a follow up PR.
Codecov Report
@@ Coverage Diff @@
## master #64 +/- ##
==========================================
+ Coverage 51.58% 52.14% +0.55%
==========================================
Files 73 75 +2
Lines 3586 3592 +6
==========================================
+ Hits 1850 1873 +23
+ Misses 1720 1699 -21
- Partials 16 20 +4
Continue to review full report at Codecov.
|
return fmt.Errorf(errCannotParseTimeFmt, g.config.TimeEnd, err) | ||
} | ||
|
||
if g.Out == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential suggestion (but not really a blocker for merging this): One thing that might be useful here for external tooling is if we could somehow specify an in-memory buffer that implements the io.Writer interface (like bytes.Buffer or any custom writer) instead of just stdout or a file. Then a tool could potentially handle the entire data generation/loading pipeline in memory without having to rely on intermediate artifacts on disk or stdout, which could be useful in some cases.
This becomes way more compelling for handling the output of queries and inserts, where we could also implement custom readers to help tools digest TSBS output more easily. But could definitely see it being useful for the generation phase as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I designed this to work a bit like https://golang.org/pkg/os/exec/#Cmd here, so you actually can do that. Basically you would just set Out
to the io.Writer
you want it to go to and it should work as expected. This conditional only captures the case where you don't do that, in which case it sets it to os.Stdout.
Example:
func main() {
c := &DataGeneratorConfig{...}
var buf bytes.Buffer
dg := &DataGenerator{Out: &buf} // all output should be written to buf, not Stdout
dg.Generate(c)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great -- huge step forward for getting better tooling around TSBS.
For a long time, our two generation binaries -- tsbs_generate_data
and tsbs_generate_queries -- have shared (roughly) a fair bit of
code, especially when it comes to flags and validation. However,
they were never truly in sync and combining them has been a long
wanted to-do. Similarly, to enable better tooling around TSBS, it
would be beneficial if more of its functionality was exposed as a
library instead of a CLI that needs to be called.
To those ends, this PR is a first step in addressing both of them.
It introduces the internal/inputs package, which can eventually be
moved to pkg/inputs when we are ready for other things to consume
its API. This package will contain the structs, interfaces, and
functions for generating 'input' to other TSBS tools. For now, that
only includes generating data files (for tsbs_load_* binaries) and
query files (for tsbs_run_queries_* binaries). This PR starts by
introducing these building blocks and converting tsbs_generate_data
to use it.
The idea is that each type of input (e.g., data, queries) is handled
by a Generator, which is customized by a GeneratorConfig. The config
contains fields such as the PRNG seed, number of items to generate,
etc, which are used by the Generator to control the output. These
GeneratorConfigs come with a means of easily adding their fields
to a flag.FlagSet, making them work well with CLIs while also not
restricting their use to only CLIs. Once configured, this
GeneratorConfig is passed to a Generator, which then produces the
output.
This design has a few other nice features to help cleanup TSBS.
One, it uses an approach of bubbling up errors and passing them
back to the caller, allowing for more graceful error handling. CLIs
can output them to the console, while other programs using the
library can pass them to another error handling system if they
desire. Two, Generators should be designed with an Out field that
allows the caller to point to any io.Writer it wants -- not
just the console or a file.
The next step will be to convert tsbs_generate_queries to use this
as well, which will be done in a follow up PR.