src.ctf.fs: add and use medops to iterate on a ds_file_group using th…

…e index This patch solves the problem of reading multiple snapshots of the same tracing session taken quickly. Taking the snapshots quickly enough can cause them to overlap, in which case some complete / identical packets will be found in different snapshots. As an example, imagine we have three snapshots, and packets labeled from A to G belonging the same stream instance in all snapshots. We could have the following sequence of packets in each snapshot: - snapshot 1: A B C D - snapshot 2: C D E F - snapshot 3: D E F G Babeltrace 1 reads these three snapshots successfully. In fact, it just considers them as three different traces, so it will order events individually. As a result, events from packet D will be present three times in the output. So while it works (as in Babeltrace exits with status 0), it's probably not what a user would want. Babeltrace 2 (before this patch) hits the following assert, which validates that messages produced by iterators never go back in time: 11-11 15:13:23.874 8329 8329 F LIB/MSG-ITER call_iterator_next_method@iterator.c:872 Babeltrace 2 library postcondition not satisfied; error is: 11-11 15:13:23.874 8329 8329 F LIB/MSG-ITER call_iterator_next_method@iterator.c:872 Clock snapshots are not monotonic 11-11 15:13:23.874 8329 8329 F LIB/MSG-ITER call_iterator_next_method@iterator.c:872 Aborting... This is because Babeltrace 2 groups all CTF traces sharing the same UUID (which is the case for our three snapshots) and gives them to the same src.ctf.fs component to read. The component groups data stream files from the various snapshots by stream instance id, and sorts them according to the timestamp of their first event. It then reads them sequentially, from end to end, assuming that the start of the second data stream file is after the end of the first data stream file. Using our example above, the src.ctf.fs component would therefore try to read packets in this order: A B C D C D E F D E F G ^ `- ouch! In this case, we want to read all packets exactly once, in the right order. The solution brought by this patch is to iterate on the packets by following the index, instead of reading all data files from end to end. Index entries were already de-duplicated by commit ctf: de-duplicate index entries So the index already refers to a single instance of each packet. We can therefore use it as a playlist of the packets to read. The change mainly revolves around adding a new kind of CTF message iterator medium operations, called ctf_fs_ds_group_medops. Instead of the medium being a single data stream file, like in ctf_fs_ds_file_medops, this new medium is conceptually the sequence of all packets described by the index of a ctf_fs_ds_group, possibly spread out in different data stream files. A new optional medium operation called `switch_packet` is added. When the CTF message iterator is done reading a packet, it calls this method, indicating to the medium that it is at the frontier of two packets. If the medium is aware of the packets (like ctf_fs_ds_group_medops is) and knows that the following packet is not contiguous with the previous packet, it can reposition its "read head" at the right place (open the new file, go to the right offset). Immediatly after calling `switch_packet`, the message iterator calls the `request_bytes` method, which allows the medium to return a buffer containing the bytes of the next packet. When the packet-aware medium has its `switch_packet` method called but there are no more packets to read, it returns CTF_MSG_ITER_MEDIUM_STATUS_EOF. That brings the message iterator in the STATE_CHECK_EMIT_MSG_STREAM_END state, which will close the stream. If the `switch_packet` method is not provided by the medium, the message iterator just continues, assuming that the bytes of the next packet are contiguous with those of the previous packet. The ctf_fs_ds_file_medops medium operations are still necessary for reading individual ctf_fs_ds_files, when initializing src.ctf.fs components. This is needed when building the index from a stream or for applying tracer fixups. This simplifies a little bit the interaction between the src.ctf.fs iterator and the ctf_msg_iter. Previously, we would read each data stream file until the message iterator returned EOF. If it wasn't the last data stream file, we would reset the iterator to read the next data stream file. Functions ctf_msg_iter_set_emit_stream_{beginning,end}_message were necessary to indicate to the message iterator whether to send the stream beginning or end messages, because it had otherwise no idea of whether the data stream file it is reading is the first one or last one. The function ctf_msg_iter_set_medops_data was necessary to swap the data of the ctf_fs_ds_file_medops to point to the new data stream file. With the ctf_fs_ds_group_medops, the CTF message iterator only returns EOF when the stream is done. The data passed to the ctf_fs_ds_group_medops has everything it needs to go through all the packets, so it doesn't need to change. Change-Id: I72f6d1e09b87414fb83f68cb57abb1f2dc61b439 Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
efficios · Nov 15, 2019 · 148f02c · 148f02c
1 parent c0def27
commit 148f02c
Show file tree

Hide file tree

Showing 7 changed files with 375 additions and 157 deletions.
diff --git a/src/logging/comp-logging.h b/src/logging/comp-logging.h
@@ -271,4 +271,17 @@
 		}										\
 	} while (0)
 
+/*
+ * Logs error and appends error cause from message iterator context.
+ *
+ * There is no BT_SELF_MSG_LOGE yet, so use BT_COMP_LOGE for now.
+ */
+#define BT_MSG_ITER_LOGE_APPEND_CAUSE(_self_msg_iter, _fmt, ...)					\
+	do {												\
+		BT_COMP_LOG(BT_LOG_ERROR, bt_self_message_iterator_borrow_component(_self_msg_iter),	\
+			_fmt, ##__VA_ARGS__);								\
+		(void) BT_CURRENT_THREAD_ERROR_APPEND_CAUSE_FROM_MESSAGE_ITERATOR( 			\
+			_self_msg_iter, _fmt, ##__VA_ARGS__);						\
+	} while (0)
+
 #endif /* BABELTRACE_LOGGING_COMP_LOGGING_H */
diff --git a/src/plugins/ctf/common/msg-iter/msg-iter.c b/src/plugins/ctf/common/msg-iter/msg-iter.c
@@ -708,7 +708,7 @@ void release_all_dscopes(struct ctf_msg_iter *msg_it)
 static
 enum ctf_msg_iter_status switch_packet_state(struct ctf_msg_iter *msg_it)
 {
-	enum ctf_msg_iter_status status = CTF_MSG_ITER_STATUS_OK;
+	enum ctf_msg_iter_status status;
 	bt_self_component *self_comp = msg_it->self_comp;
 
 	/*
@@ -732,6 +732,30 @@ enum ctf_msg_iter_status switch_packet_state(struct ctf_msg_iter *msg_it)
 	release_all_dscopes(msg_it);
 	msg_it->cur_dscope_field = NULL;
 
+	if (msg_it->medium.medops.switch_packet) {
+		enum ctf_msg_iter_medium_status medium_status;
+
+		medium_status = msg_it->medium.medops.switch_packet(msg_it->medium.data);
+		if (medium_status == CTF_MSG_ITER_MEDIUM_STATUS_EOF) {
+			/* No more packets. */
+			msg_it->state = STATE_CHECK_EMIT_MSG_STREAM_END;
+			status = CTF_MSG_ITER_STATUS_OK;
+			goto end;
+		} else if (medium_status != CTF_MSG_ITER_MEDIUM_STATUS_OK) {
+			status = (int) medium_status;
+			goto end;
+		}
+
+		/*
+		 * After the packet switch, the medium might want to give us a
+		 * different buffer for the new packet.
+		 */
+		status = request_medium_bytes(msg_it);
+		if (status != CTF_MSG_ITER_STATUS_OK) {
+			goto end;
+		}
+	}
+
 	/*
 	 * Adjust current buffer so that addr points to the beginning of the new
 	 * packet.
@@ -768,6 +792,7 @@ enum ctf_msg_iter_status switch_packet_state(struct ctf_msg_iter *msg_it)
 	msg_it->snapshots.end_clock = UINT64_C(-1);
 	msg_it->state = STATE_DSCOPE_TRACE_PACKET_HEADER_BEGIN;
 
+	status = CTF_MSG_ITER_STATUS_OK;
 end:
 	return status;
 }
@@ -3149,14 +3174,6 @@ enum ctf_msg_iter_status read_packet_header_context_fields(
 	return status;
 }
 
-BT_HIDDEN
-void ctf_msg_iter_set_medops_data(struct ctf_msg_iter *msg_it,
-		void *medops_data)
-{
-	BT_ASSERT(msg_it);
-	msg_it->medium.data = medops_data;
-}
-
 BT_HIDDEN
 enum ctf_msg_iter_status ctf_msg_iter_seek(struct ctf_msg_iter *msg_it,
 		off_t offset)

diff --git a/src/plugins/ctf/common/msg-iter/msg-iter.h b/src/plugins/ctf/common/msg-iter/msg-iter.h
@@ -45,7 +45,8 @@
  */
 
 /**
- * Medium operations status codes.
+ * Medium operations status codes.  These use the same values as
+ * libbabeltrace2.
  */
 enum ctf_msg_iter_medium_status {
 	/**
@@ -64,6 +65,9 @@ enum ctf_msg_iter_medium_status {
 	/** General error. */
 	CTF_MSG_ITER_MEDIUM_STATUS_ERROR = -1,
 
+	/** Memory error. */
+	CTF_MSG_ITER_MEDIUM_STATUS_MEMORY_ERROR = -12,
+
 	/** Everything okay. */
 	CTF_MSG_ITER_MEDIUM_STATUS_OK = 0,
 };
@@ -94,6 +98,9 @@ enum ctf_msg_iter_status {
 	/** General error. */
 	CTF_MSG_ITER_STATUS_ERROR = CTF_MSG_ITER_MEDIUM_STATUS_ERROR,
 
+	/** Memory error. */
+	CTF_MSG_ITER_STATUS_MEMORY_ERROR  = CTF_MSG_ITER_MEDIUM_STATUS_MEMORY_ERROR,
+
 	/** Everything okay. */
 	CTF_MSG_ITER_STATUS_OK = CTF_MSG_ITER_MEDIUM_STATUS_OK,
 };
@@ -186,6 +193,15 @@ struct ctf_msg_iter_medium_ops {
 	 */
 	enum ctf_msg_iter_medium_status (* seek)(off_t offset, void *data);
 
+	/**
+	 * Called when the message iterator wishes to inform the medium that it
+	 * is about to start a new packet.
+	 *
+	 * After the iterator has called switch_packet, the following call to
+	 * request_bytes must return the content at the start of the next
+	 * packet.	 */
+	enum ctf_msg_iter_medium_status (* switch_packet)(void *data);
+
 	/**
 	 * Returns a stream instance (weak reference) for the given
 	 * stream class.
@@ -293,10 +309,6 @@ BT_HIDDEN
 enum ctf_msg_iter_status ctf_msg_iter_curr_packet_last_event_clock_snapshot(
 		struct ctf_msg_iter *msg_it, uint64_t *last_event_cs);
 
-BT_HIDDEN
-void ctf_msg_iter_set_medops_data(struct ctf_msg_iter *msg_it,
-		void *medops_data);
-
 BT_HIDDEN
 enum ctf_msg_iter_status ctf_msg_iter_seek(
 		struct ctf_msg_iter *msg_it, off_t offset);

diff --git a/src/plugins/ctf/fs-src/data-stream-file.c b/src/plugins/ctf/fs-src/data-stream-file.c
@@ -99,7 +99,9 @@ enum ctf_msg_iter_medium_status ds_file_munmap(
  * mapping.  If the currently mmap-ed region already contains
  * `requested_offset_in_file`, the mapping is kept.
  *
- * Set `ds_file->requested_offset_in_mapping` based on `request_offset_in_file`
+ * Set `ds_file->requested_offset_in_mapping` based on `request_offset_in_file`,
+ * such that the next call to `request_bytes` will return bytes starting at that
+ * position.
  *
  * `requested_offset_in_file` must be a valid offset in the file.
  */
@@ -301,6 +303,217 @@ struct ctf_msg_iter_medium_ops ctf_fs_ds_file_medops = {
 	.seek = medop_seek,
 };
 
+struct ctf_fs_ds_group_medops_data {
+	/* Weak, set once at creation time. */
+	struct ctf_fs_ds_file_group *ds_file_group;
+
+	/*
+	 * Index (as in element rank) of the index entry of ds_file_groups'
+	 * index we will read next (so, the one after the one we are reading
+	 * right now).
+	 */
+	guint next_index_entry_index;
+
+	/*
+	 * File we are currently reading.  Changes whenever we switch to
+	 * reading another data file.
+	 *
+	 * Owned by this.
+	 */
+	struct ctf_fs_ds_file *file;
+
+	/* Weak, for context / logging / appending causes. */
+	bt_self_message_iterator *self_msg_iter;
+	bt_logging_level log_level;
+};
+
+static
+enum ctf_msg_iter_medium_status medop_group_request_bytes(
+		size_t request_sz,
+		uint8_t **buffer_addr,
+		size_t *buffer_sz,
+		void *void_data)
+{
+	struct ctf_fs_ds_group_medops_data *data = void_data;
+
+	/* Return bytes from the current file. */
+	return medop_request_bytes(request_sz, buffer_addr, buffer_sz, data->file);
+}
+
+static
+bt_stream *medop_group_borrow_stream(
+		bt_stream_class *stream_class,
+		int64_t stream_id,
+		void *void_data)
+{
+	struct ctf_fs_ds_group_medops_data *data = void_data;
+
+	return medop_borrow_stream(stream_class, stream_id, data->file);
+}
+
+/*
+ * Set `data->file` to prepare it to read the packet described
+ * by `index_entry`.
+ */
+
+static
+enum ctf_msg_iter_medium_status ctf_fs_ds_group_medops_set_file(
+		struct ctf_fs_ds_group_medops_data *data,
+		struct ctf_fs_ds_index_entry *index_entry,
+		bt_self_message_iterator *self_msg_iter,
+		bt_logging_level log_level)
+{
+	enum ctf_msg_iter_medium_status status;
+
+	BT_ASSERT(data);
+	BT_ASSERT(index_entry);
+
+	/* Check if that file is already the one mapped. */
+	if (!data->file || strcmp(index_entry->path, data->file->file->path->str) != 0) {
+		/* Destroy the previously used file. */
+		ctf_fs_ds_file_destroy(data->file);
+
+		/* Create the new file. */
+		data->file = ctf_fs_ds_file_create(
+			data->ds_file_group->ctf_fs_trace,
+			self_msg_iter,
+			data->ds_file_group->stream,
+			index_entry->path,
+			log_level);
+		if (!data->file) {
+			BT_MSG_ITER_LOGE_APPEND_CAUSE(self_msg_iter,
+				"failed to create ctf_fs_ds_file.");
+			status = CTF_MSG_ITER_MEDIUM_STATUS_ERROR;
+			goto end;
+		}
+	}
+
+	/*
+	 * Ensure the right portion of the file will be returned on the next
+	 * request_bytes call.
+	 */
+	status = ds_file_mmap(data->file, index_entry->offset);
+	if (status != CTF_MSG_ITER_MEDIUM_STATUS_OK) {
+		goto end;
+	}
+
+	status = CTF_MSG_ITER_MEDIUM_STATUS_OK;
+
+end:
+	return status;
+}
+
+static
+enum ctf_msg_iter_medium_status medop_group_switch_packet(void *void_data)
+{
+	struct ctf_fs_ds_group_medops_data *data = void_data;
+	struct ctf_fs_ds_index_entry *index_entry;
+	enum ctf_msg_iter_medium_status status;
+
+	/* If we have gone through all index entries, we are done. */
+	if (data->next_index_entry_index >=
+		data->ds_file_group->index->entries->len) {
+		status = CTF_MSG_ITER_MEDIUM_STATUS_EOF;
+		goto end;
+	}
+
+	/*
+	 * Otherwise, look up the next index entry / packet and prepare it
+	 *  for reading.
+	 */
+	index_entry = g_ptr_array_index(
+		data->ds_file_group->index->entries,
+		data->next_index_entry_index);
+
+	status = ctf_fs_ds_group_medops_set_file(
+		data, index_entry, data->self_msg_iter, data->log_level);
+	if (status != CTF_MSG_ITER_MEDIUM_STATUS_OK) {
+		goto end;
+	}
+
+	data->next_index_entry_index++;
+
+	status = CTF_MSG_ITER_MEDIUM_STATUS_OK;
+end:
+	return status;
+}
+
+BT_HIDDEN
+void ctf_fs_ds_group_medops_data_destroy(
+		struct ctf_fs_ds_group_medops_data *data)
+{
+	if (!data) {
+		goto end;
+	}
+
+	ctf_fs_ds_file_destroy(data->file);
+
+	g_free(data);
+
+end:
+	return;
+}
+
+enum ctf_msg_iter_medium_status ctf_fs_ds_group_medops_data_create(
+		struct ctf_fs_ds_file_group *ds_file_group,
+		bt_self_message_iterator *self_msg_iter,
+		bt_logging_level log_level,
+		struct ctf_fs_ds_group_medops_data **out)
+{
+	struct ctf_fs_ds_group_medops_data *data;
+	enum ctf_msg_iter_medium_status status;
+
+	BT_ASSERT(self_msg_iter);
+	BT_ASSERT(ds_file_group);
+	BT_ASSERT(ds_file_group->index);
+	BT_ASSERT(ds_file_group->index->entries->len > 0);
+
+	data = g_new0(struct ctf_fs_ds_group_medops_data, 1);
+	if (!data) {
+		BT_MSG_ITER_LOGE_APPEND_CAUSE(self_msg_iter,
+			"Failed to allocate a struct ctf_fs_ds_group_medops_data");
+		status = CTF_MSG_ITER_MEDIUM_STATUS_MEMORY_ERROR;
+		goto error;
+	}
+
+	data->ds_file_group = ds_file_group;
+	data->self_msg_iter = self_msg_iter;
+	data->log_level = log_level;
+
+	/*
+	 * No need to prepare the first file.  ctf_msg_iter will call
+	 * switch_packet before reading the first packet, it will be
+	 * done then.
+	 */
+
+	*out = data;
+	status = CTF_MSG_ITER_MEDIUM_STATUS_OK;
+	goto end;
+
+error:
+	ctf_fs_ds_group_medops_data_destroy(data);
+
+end:
+	return status;
+}
+
+void ctf_fs_ds_group_medops_data_reset(struct ctf_fs_ds_group_medops_data *data)
+{
+	data->next_index_entry_index = 0;
+}
+
+struct ctf_msg_iter_medium_ops ctf_fs_ds_group_medops = {
+	.request_bytes = medop_group_request_bytes,
+	.borrow_stream = medop_group_borrow_stream,
+	.switch_packet = medop_group_switch_packet,
+
+	/*
+	 * We don't support seeking using this medops.  It would probably be
+	 * possible, but it's not needed at the moment.
+	 */
+	.seek = NULL,
+};
+
 static
 struct ctf_fs_ds_index_entry *ctf_fs_ds_index_entry_create(
 		bt_self_component *self_comp, bt_logging_level log_level)