Skip to content
This repository has been archived by the owner on Aug 13, 2019. It is now read-only.

Reduce index file size caused by symbol references by using indicies rather than offsets #262

Merged
merged 4 commits into from
Jan 18, 2018

Conversation

cstyan
Copy link
Contributor

@cstyan cstyan commented Jan 17, 2018

fixes #249

Left comments in sections where depending on whether we're using v1 or v2 all that would change is the var name (offset vs index) but not any of the functionality.

Wasn't entirely sure if we want to change this struct:

type hashEntry struct {
	keys   []string
	offset uint64
}

cc: @gouthamve @fabxc

│ │ len(str_n) <uvarint> │ str_1 <bytes> │ │
│ └──────────────────────┴───────────────┘ │
│ ├──────────────────┬───────────────────┤ │
│ │ n <uvarint> │ str_1 <bytes> │ │
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(str) was actually the number of bytes we have to subsequently read to know where the string ends. The field was not the offset.

Looking at the code actually does not serilize the index (correctly) so the update of the docs here seems to be revertible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay I misread this then. We don't want to include a field for the index in this diagram?

index/index.go Outdated
w.symbols[s] = uint32(w.pos) + headerSize + uint32(w.buf2.len())
if w.Version == 2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this change I suppose but do we really want the writer to be subject to versioning. I think we are fine with always have the writer write the most recent format version and just having the reader supporting multiple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is dependent on how we intend to support different index file versions in the future. Is the intention to only allow older versions for read purposed with the intention that they can load in an old tsdb database with a newer version of the code and it would then be written to an index file again with the new format?

If we don't intend to allow writing of index files with old versions then I'll remove the version checks from the writer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to write older versions.

@gouthamve
Copy link
Collaborator

Thanks!

@gouthamve gouthamve merged commit 467948f into prometheus-junkyard:master Jan 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tsdb: Reduce index file size caused by symbol references
3 participants