Skip to content

Commit

Permalink
src.ctf.fs: add and use medops to iterate on a ds_file_group using th…
Browse files Browse the repository at this point in the history
…e index

This patch solves the problem of reading multiple snapshots of the same
tracing session taken quickly.  Taking the snapshots quickly enough can
cause them to overlap, in which case some complete / identical packets
will be found in different snapshots.

As an example, imagine we have three snapshots, and packets labeled from
A to G belonging the same stream instance in all snapshots.  We could
have the following sequence of packets in each snapshot:

  - snapshot 1: A B C D
  - snapshot 2:     C D E F
  - snapshot 3:       D E F G

Babeltrace 1 reads these three snapshots successfully.  In fact, it just
considers them as three different traces, so it will order events
individually.  As a result, events from packet D will be present three
times in the output.  So while it works (as in Babeltrace exits with
status 0), it's probably not what a user would want.

Babeltrace 2 (before this patch) hits the following assert, which
validates that messages produced by iterators never go back in time:

    11-11 15:13:23.874  8329  8329 F LIB/MSG-ITER call_iterator_next_method@iterator.c:872 Babeltrace 2 library postcondition not satisfied; error is:
    11-11 15:13:23.874  8329  8329 F LIB/MSG-ITER call_iterator_next_method@iterator.c:872 Clock snapshots are not monotonic
    11-11 15:13:23.874  8329  8329 F LIB/MSG-ITER call_iterator_next_method@iterator.c:872 Aborting...

This is because Babeltrace 2 groups all CTF traces sharing the same UUID
(which is the case for our three snapshots) and gives them to the same
src.ctf.fs component to read.  The component groups data stream files
from the various snapshots by stream instance id, and sorts them
according to the timestamp of their first event.  It then reads them
sequentially, from end to end, assuming that the start of the second
data stream file is after the end of the first data stream file.  Using
our example above, the src.ctf.fs component would therefore try to read
packets in this order:

    A B C D C D E F D E F G
           ^
	   `- ouch!

In this case, we want to read all packets exactly once, in the right
order.

The solution brought by this patch is to iterate on the packets by
following the index, instead of reading all data files from end to end.
Index entries were already de-duplicated by commit

    ctf: de-duplicate index entries

So the index already refers to a single instance of each packet.  We can
therefore use it as a playlist of the packets to read.

The change mainly revolves around adding a new kind of CTF message
iterator medium operations, called ctf_fs_ds_group_medops.  Instead of
the medium being a single data stream file, like in
ctf_fs_ds_file_medops, this new medium is conceptually the sequence of
all packets described by the index of a ctf_fs_ds_group, possibly spread
out in different data stream files.

A new optional medium operation called `switch_packet` is added.  When
the CTF message iterator is done reading a packet, it calls this method,
indicating to the medium that it is at the frontier of two packets.  If
the medium is aware of the packets (like ctf_fs_ds_group_medops is) and
knows that the following packet is not contiguous with the previous
packet, it can reposition its "read head" at the right place (open the
new file, go to the right offset).  Immediatly after calling
`switch_packet`, the message iterator calls the `request_bytes` method,
which allows the medium to return a buffer containing the bytes of the
next packet.

When the packet-aware medium has its `switch_packet` method called but
there are no more packets to read, it returns
CTF_MSG_ITER_MEDIUM_STATUS_EOF.  That brings the message iterator in the
STATE_CHECK_EMIT_MSG_STREAM_END state, which will close the stream.

If the `switch_packet` method is not provided by the medium, the message
iterator just continues, assuming that the bytes of the next packet are
contiguous with those of the previous packet.

The ctf_fs_ds_file_medops medium operations are still necessary for
reading individual ctf_fs_ds_files, when initializing src.ctf.fs
components.  This is needed when building the index from a stream or for
applying tracer fixups.

This simplifies a little bit the interaction between the src.ctf.fs
iterator and the ctf_msg_iter.  Previously, we would read each data
stream file until the message iterator returned EOF.  If it wasn't the
last data stream file, we would reset the iterator to read the next data
stream file.  Functions
ctf_msg_iter_set_emit_stream_{beginning,end}_message were necessary to
indicate to the message iterator whether to send the stream beginning or
end messages, because it had otherwise no idea of whether the data
stream file it is reading is the first one or last one.  The function
ctf_msg_iter_set_medops_data was necessary to swap the data of the
ctf_fs_ds_file_medops to point to the new data stream file.

With the ctf_fs_ds_group_medops, the CTF message iterator only returns
EOF when the stream is done.  The data passed to the
ctf_fs_ds_group_medops has everything it needs to go through all the
packets, so it doesn't need to change.

Change-Id: I72f6d1e09b87414fb83f68cb57abb1f2dc61b439
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
  • Loading branch information
simark authored and jgalar committed Nov 15, 2019
1 parent c0def27 commit 148f02c
Show file tree
Hide file tree
Showing 7 changed files with 375 additions and 157 deletions.
13 changes: 13 additions & 0 deletions src/logging/comp-logging.h
Original file line number Diff line number Diff line change
Expand Up @@ -271,4 +271,17 @@
} \
} while (0)

/*
* Logs error and appends error cause from message iterator context.
*
* There is no BT_SELF_MSG_LOGE yet, so use BT_COMP_LOGE for now.
*/
#define BT_MSG_ITER_LOGE_APPEND_CAUSE(_self_msg_iter, _fmt, ...) \
do { \
BT_COMP_LOG(BT_LOG_ERROR, bt_self_message_iterator_borrow_component(_self_msg_iter), \
_fmt, ##__VA_ARGS__); \
(void) BT_CURRENT_THREAD_ERROR_APPEND_CAUSE_FROM_MESSAGE_ITERATOR( \
_self_msg_iter, _fmt, ##__VA_ARGS__); \
} while (0)

#endif /* BABELTRACE_LOGGING_COMP_LOGGING_H */
35 changes: 26 additions & 9 deletions src/plugins/ctf/common/msg-iter/msg-iter.c
Original file line number Diff line number Diff line change
Expand Up @@ -708,7 +708,7 @@ void release_all_dscopes(struct ctf_msg_iter *msg_it)
static
enum ctf_msg_iter_status switch_packet_state(struct ctf_msg_iter *msg_it)
{
enum ctf_msg_iter_status status = CTF_MSG_ITER_STATUS_OK;
enum ctf_msg_iter_status status;
bt_self_component *self_comp = msg_it->self_comp;

/*
Expand All @@ -732,6 +732,30 @@ enum ctf_msg_iter_status switch_packet_state(struct ctf_msg_iter *msg_it)
release_all_dscopes(msg_it);
msg_it->cur_dscope_field = NULL;

if (msg_it->medium.medops.switch_packet) {
enum ctf_msg_iter_medium_status medium_status;

medium_status = msg_it->medium.medops.switch_packet(msg_it->medium.data);
if (medium_status == CTF_MSG_ITER_MEDIUM_STATUS_EOF) {
/* No more packets. */
msg_it->state = STATE_CHECK_EMIT_MSG_STREAM_END;
status = CTF_MSG_ITER_STATUS_OK;
goto end;
} else if (medium_status != CTF_MSG_ITER_MEDIUM_STATUS_OK) {
status = (int) medium_status;
goto end;
}

/*
* After the packet switch, the medium might want to give us a
* different buffer for the new packet.
*/
status = request_medium_bytes(msg_it);
if (status != CTF_MSG_ITER_STATUS_OK) {
goto end;
}
}

/*
* Adjust current buffer so that addr points to the beginning of the new
* packet.
Expand Down Expand Up @@ -768,6 +792,7 @@ enum ctf_msg_iter_status switch_packet_state(struct ctf_msg_iter *msg_it)
msg_it->snapshots.end_clock = UINT64_C(-1);
msg_it->state = STATE_DSCOPE_TRACE_PACKET_HEADER_BEGIN;

status = CTF_MSG_ITER_STATUS_OK;
end:
return status;
}
Expand Down Expand Up @@ -3149,14 +3174,6 @@ enum ctf_msg_iter_status read_packet_header_context_fields(
return status;
}

BT_HIDDEN
void ctf_msg_iter_set_medops_data(struct ctf_msg_iter *msg_it,
void *medops_data)
{
BT_ASSERT(msg_it);
msg_it->medium.data = medops_data;
}

BT_HIDDEN
enum ctf_msg_iter_status ctf_msg_iter_seek(struct ctf_msg_iter *msg_it,
off_t offset)
Expand Down
22 changes: 17 additions & 5 deletions src/plugins/ctf/common/msg-iter/msg-iter.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,8 @@
*/

/**
* Medium operations status codes.
* Medium operations status codes. These use the same values as
* libbabeltrace2.
*/
enum ctf_msg_iter_medium_status {
/**
Expand All @@ -64,6 +65,9 @@ enum ctf_msg_iter_medium_status {
/** General error. */
CTF_MSG_ITER_MEDIUM_STATUS_ERROR = -1,

/** Memory error. */
CTF_MSG_ITER_MEDIUM_STATUS_MEMORY_ERROR = -12,

/** Everything okay. */
CTF_MSG_ITER_MEDIUM_STATUS_OK = 0,
};
Expand Down Expand Up @@ -94,6 +98,9 @@ enum ctf_msg_iter_status {
/** General error. */
CTF_MSG_ITER_STATUS_ERROR = CTF_MSG_ITER_MEDIUM_STATUS_ERROR,

/** Memory error. */
CTF_MSG_ITER_STATUS_MEMORY_ERROR = CTF_MSG_ITER_MEDIUM_STATUS_MEMORY_ERROR,

/** Everything okay. */
CTF_MSG_ITER_STATUS_OK = CTF_MSG_ITER_MEDIUM_STATUS_OK,
};
Expand Down Expand Up @@ -186,6 +193,15 @@ struct ctf_msg_iter_medium_ops {
*/
enum ctf_msg_iter_medium_status (* seek)(off_t offset, void *data);

/**
* Called when the message iterator wishes to inform the medium that it
* is about to start a new packet.
*
* After the iterator has called switch_packet, the following call to
* request_bytes must return the content at the start of the next
* packet. */
enum ctf_msg_iter_medium_status (* switch_packet)(void *data);

/**
* Returns a stream instance (weak reference) for the given
* stream class.
Expand Down Expand Up @@ -293,10 +309,6 @@ BT_HIDDEN
enum ctf_msg_iter_status ctf_msg_iter_curr_packet_last_event_clock_snapshot(
struct ctf_msg_iter *msg_it, uint64_t *last_event_cs);

BT_HIDDEN
void ctf_msg_iter_set_medops_data(struct ctf_msg_iter *msg_it,
void *medops_data);

BT_HIDDEN
enum ctf_msg_iter_status ctf_msg_iter_seek(
struct ctf_msg_iter *msg_it, off_t offset);
Expand Down
215 changes: 214 additions & 1 deletion src/plugins/ctf/fs-src/data-stream-file.c
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,9 @@ enum ctf_msg_iter_medium_status ds_file_munmap(
* mapping. If the currently mmap-ed region already contains
* `requested_offset_in_file`, the mapping is kept.
*
* Set `ds_file->requested_offset_in_mapping` based on `request_offset_in_file`
* Set `ds_file->requested_offset_in_mapping` based on `request_offset_in_file`,
* such that the next call to `request_bytes` will return bytes starting at that
* position.
*
* `requested_offset_in_file` must be a valid offset in the file.
*/
Expand Down Expand Up @@ -301,6 +303,217 @@ struct ctf_msg_iter_medium_ops ctf_fs_ds_file_medops = {
.seek = medop_seek,
};

struct ctf_fs_ds_group_medops_data {
/* Weak, set once at creation time. */
struct ctf_fs_ds_file_group *ds_file_group;

/*
* Index (as in element rank) of the index entry of ds_file_groups'
* index we will read next (so, the one after the one we are reading
* right now).
*/
guint next_index_entry_index;

/*
* File we are currently reading. Changes whenever we switch to
* reading another data file.
*
* Owned by this.
*/
struct ctf_fs_ds_file *file;

/* Weak, for context / logging / appending causes. */
bt_self_message_iterator *self_msg_iter;
bt_logging_level log_level;
};

static
enum ctf_msg_iter_medium_status medop_group_request_bytes(
size_t request_sz,
uint8_t **buffer_addr,
size_t *buffer_sz,
void *void_data)
{
struct ctf_fs_ds_group_medops_data *data = void_data;

/* Return bytes from the current file. */
return medop_request_bytes(request_sz, buffer_addr, buffer_sz, data->file);
}

static
bt_stream *medop_group_borrow_stream(
bt_stream_class *stream_class,
int64_t stream_id,
void *void_data)
{
struct ctf_fs_ds_group_medops_data *data = void_data;

return medop_borrow_stream(stream_class, stream_id, data->file);
}

/*
* Set `data->file` to prepare it to read the packet described
* by `index_entry`.
*/

static
enum ctf_msg_iter_medium_status ctf_fs_ds_group_medops_set_file(
struct ctf_fs_ds_group_medops_data *data,
struct ctf_fs_ds_index_entry *index_entry,
bt_self_message_iterator *self_msg_iter,
bt_logging_level log_level)
{
enum ctf_msg_iter_medium_status status;

BT_ASSERT(data);
BT_ASSERT(index_entry);

/* Check if that file is already the one mapped. */
if (!data->file || strcmp(index_entry->path, data->file->file->path->str) != 0) {
/* Destroy the previously used file. */
ctf_fs_ds_file_destroy(data->file);

/* Create the new file. */
data->file = ctf_fs_ds_file_create(
data->ds_file_group->ctf_fs_trace,
self_msg_iter,
data->ds_file_group->stream,
index_entry->path,
log_level);
if (!data->file) {
BT_MSG_ITER_LOGE_APPEND_CAUSE(self_msg_iter,
"failed to create ctf_fs_ds_file.");
status = CTF_MSG_ITER_MEDIUM_STATUS_ERROR;
goto end;
}
}

/*
* Ensure the right portion of the file will be returned on the next
* request_bytes call.
*/
status = ds_file_mmap(data->file, index_entry->offset);
if (status != CTF_MSG_ITER_MEDIUM_STATUS_OK) {
goto end;
}

status = CTF_MSG_ITER_MEDIUM_STATUS_OK;

end:
return status;
}

static
enum ctf_msg_iter_medium_status medop_group_switch_packet(void *void_data)
{
struct ctf_fs_ds_group_medops_data *data = void_data;
struct ctf_fs_ds_index_entry *index_entry;
enum ctf_msg_iter_medium_status status;

/* If we have gone through all index entries, we are done. */
if (data->next_index_entry_index >=
data->ds_file_group->index->entries->len) {
status = CTF_MSG_ITER_MEDIUM_STATUS_EOF;
goto end;
}

/*
* Otherwise, look up the next index entry / packet and prepare it
* for reading.
*/
index_entry = g_ptr_array_index(
data->ds_file_group->index->entries,
data->next_index_entry_index);

status = ctf_fs_ds_group_medops_set_file(
data, index_entry, data->self_msg_iter, data->log_level);
if (status != CTF_MSG_ITER_MEDIUM_STATUS_OK) {
goto end;
}

data->next_index_entry_index++;

status = CTF_MSG_ITER_MEDIUM_STATUS_OK;
end:
return status;
}

BT_HIDDEN
void ctf_fs_ds_group_medops_data_destroy(
struct ctf_fs_ds_group_medops_data *data)
{
if (!data) {
goto end;
}

ctf_fs_ds_file_destroy(data->file);

g_free(data);

end:
return;
}

enum ctf_msg_iter_medium_status ctf_fs_ds_group_medops_data_create(
struct ctf_fs_ds_file_group *ds_file_group,
bt_self_message_iterator *self_msg_iter,
bt_logging_level log_level,
struct ctf_fs_ds_group_medops_data **out)
{
struct ctf_fs_ds_group_medops_data *data;
enum ctf_msg_iter_medium_status status;

BT_ASSERT(self_msg_iter);
BT_ASSERT(ds_file_group);
BT_ASSERT(ds_file_group->index);
BT_ASSERT(ds_file_group->index->entries->len > 0);

data = g_new0(struct ctf_fs_ds_group_medops_data, 1);
if (!data) {
BT_MSG_ITER_LOGE_APPEND_CAUSE(self_msg_iter,
"Failed to allocate a struct ctf_fs_ds_group_medops_data");
status = CTF_MSG_ITER_MEDIUM_STATUS_MEMORY_ERROR;
goto error;
}

data->ds_file_group = ds_file_group;
data->self_msg_iter = self_msg_iter;
data->log_level = log_level;

/*
* No need to prepare the first file. ctf_msg_iter will call
* switch_packet before reading the first packet, it will be
* done then.
*/

*out = data;
status = CTF_MSG_ITER_MEDIUM_STATUS_OK;
goto end;

error:
ctf_fs_ds_group_medops_data_destroy(data);

end:
return status;
}

void ctf_fs_ds_group_medops_data_reset(struct ctf_fs_ds_group_medops_data *data)
{
data->next_index_entry_index = 0;
}

struct ctf_msg_iter_medium_ops ctf_fs_ds_group_medops = {
.request_bytes = medop_group_request_bytes,
.borrow_stream = medop_group_borrow_stream,
.switch_packet = medop_group_switch_packet,

/*
* We don't support seeking using this medops. It would probably be
* possible, but it's not needed at the moment.
*/
.seek = NULL,
};

static
struct ctf_fs_ds_index_entry *ctf_fs_ds_index_entry_create(
bt_self_component *self_comp, bt_logging_level log_level)
Expand Down
Loading

0 comments on commit 148f02c

Please sign in to comment.