Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add internal/inputs package and rewrite tsbs_generate_data #64

Merged
merged 1 commit into from
Apr 4, 2019

Conversation

RobAtticus
Copy link
Member

For a long time, our two generation binaries -- tsbs_generate_data
and tsbs_generate_queries -- have shared (roughly) a fair bit of
code, especially when it comes to flags and validation. However,
they were never truly in sync and combining them has been a long
wanted to-do. Similarly, to enable better tooling around TSBS, it
would be beneficial if more of its functionality was exposed as a
library instead of a CLI that needs to be called.

To those ends, this PR is a first step in addressing both of them.
It introduces the internal/inputs package, which can eventually be
moved to pkg/inputs when we are ready for other things to consume
its API. This package will contain the structs, interfaces, and
functions for generating 'input' to other TSBS tools. For now, that
only includes generating data files (for tsbs_load_* binaries) and
query files (for tsbs_run_queries_* binaries). This PR starts by
introducing these building blocks and converting tsbs_generate_data
to use it.

The idea is that each type of input (e.g., data, queries) is handled
by a Generator, which is customized by a GeneratorConfig. The config
contains fields such as the PRNG seed, number of items to generate,
etc, which are used by the Generator to control the output. These
GeneratorConfigs come with a means of easily adding their fields
to a flag.FlagSet, making them work well with CLIs while also not
restricting their use to only CLIs. Once configured, this
GeneratorConfig is passed to a Generator, which then produces the
output.

This design has a few other nice features to help cleanup TSBS.
One, it uses an approach of bubbling up errors and passing them
back to the caller, allowing for more graceful error handling. CLIs
can output them to the console, while other programs using the
library can pass them to another error handling system if they
desire. Two, Generators should be designed with an Out field that
allows the caller to point to any io.Writer it wants -- not
just the console or a file.

The next step will be to convert tsbs_generate_queries to use this
as well, which will be done in a follow up PR.

For a long time, our two generation binaries -- tsbs_generate_data
and tsbs_generate_queries -- have shared (roughly) a fair bit of
code, especially when it comes to flags and validation. However,
they were never truly in sync and combining them has been a long
wanted to-do. Similarly, to enable better tooling around TSBS, it
would be beneficial if more of its functionality was exposed as a
library instead of a CLI that needs to be called.

To those ends, this PR is a first step in addressing both of them.
It introduces the internal/inputs package, which can eventually be
moved to pkg/inputs when we are ready for other things to consume
its API. This package will contain the structs, interfaces, and
functions for generating 'input' to other TSBS tools. For now, that
only includes generating data files (for tsbs_load_* binaries) and
query files (for tsbs_run_queries_* binaries). This PR starts by
introducing these building blocks and converting tsbs_generate_data
to use it.

The idea is that each type of input (e.g., data, queries) is handled
by a Generator, which is customized by a GeneratorConfig. The config
contains fields such as the PRNG seed, number of items to generate,
etc, which are used by the Generator to control the output. These
GeneratorConfigs come with a means of easily adding their fields
to a flag.FlagSet, making them work well with CLIs while also not
restricting their use to only CLIs. Once configured, this
GeneratorConfig is passed to a Generator, which then produces the
output.

This design has a few other nice features to help cleanup TSBS.
One, it uses an approach of bubbling up errors and passing them
back to the caller, allowing for more graceful error handling. CLIs
can output them to the console, while other programs using the
library can pass them to another error handling system if they
desire. Two, Generators should be designed with an Out field that
allows the caller to point to any io.Writer it wants -- not
just the console or a file.

The next step will be to convert tsbs_generate_queries to use this
as well, which will be done in a follow up PR.
@codecov-io
Copy link

codecov-io commented Apr 3, 2019

Codecov Report

Merging #64 into master will increase coverage by 0.55%.
The diff coverage is 83.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #64      +/-   ##
==========================================
+ Coverage   51.58%   52.14%   +0.55%     
==========================================
  Files          73       75       +2     
  Lines        3586     3592       +6     
==========================================
+ Hits         1850     1873      +23     
+ Misses       1720     1699      -21     
- Partials       16       20       +4
Impacted Files Coverage Δ
internal/inputs/generator.go 52.63% <52.63%> (ø)
internal/inputs/utils.go 63.15% <63.15%> (ø)
internal/inputs/generator_data.go 90% <90%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f07cde4...abb6960. Read the comment docs.

return fmt.Errorf(errCannotParseTimeFmt, g.config.TimeEnd, err)
}

if g.Out == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential suggestion (but not really a blocker for merging this): One thing that might be useful here for external tooling is if we could somehow specify an in-memory buffer that implements the io.Writer interface (like bytes.Buffer or any custom writer) instead of just stdout or a file. Then a tool could potentially handle the entire data generation/loading pipeline in memory without having to rely on intermediate artifacts on disk or stdout, which could be useful in some cases.

This becomes way more compelling for handling the output of queries and inserts, where we could also implement custom readers to help tools digest TSBS output more easily. But could definitely see it being useful for the generation phase as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I designed this to work a bit like https://golang.org/pkg/os/exec/#Cmd here, so you actually can do that. Basically you would just set Out to the io.Writer you want it to go to and it should work as expected. This conditional only captures the case where you don't do that, in which case it sets it to os.Stdout.

Example:

func main() {
  c := &DataGeneratorConfig{...}
  var buf bytes.Buffer
  dg := &DataGenerator{Out: &buf}  // all output should be written to buf, not Stdout
  dg.Generate(c)

Copy link
Contributor

@LeeHampton LeeHampton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great -- huge step forward for getting better tooling around TSBS.

@RobAtticus RobAtticus merged commit b581528 into timescale:master Apr 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants