Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
src.ctf.fs: add and use medops to iterate on a ds_file_group using th…
…e index This patch solves the problem of reading multiple snapshots of the same tracing session taken quickly. Taking the snapshots quickly enough can cause them to overlap, in which case some complete / identical packets will be found in different snapshots. As an example, imagine we have three snapshots, and packets labeled from A to G belonging the same stream instance in all snapshots. We could have the following sequence of packets in each snapshot: - snapshot 1: A B C D - snapshot 2: C D E F - snapshot 3: D E F G Babeltrace 1 reads these three snapshots successfully. In fact, it just considers them as three different traces, so it will order events individually. As a result, events from packet D will be present three times in the output. So while it works (as in Babeltrace exits with status 0), it's probably not what a user would want. Babeltrace 2 (before this patch) hits the following assert, which validates that messages produced by iterators never go back in time: 11-11 15:13:23.874 8329 8329 F LIB/MSG-ITER call_iterator_next_method@iterator.c:872 Babeltrace 2 library postcondition not satisfied; error is: 11-11 15:13:23.874 8329 8329 F LIB/MSG-ITER call_iterator_next_method@iterator.c:872 Clock snapshots are not monotonic 11-11 15:13:23.874 8329 8329 F LIB/MSG-ITER call_iterator_next_method@iterator.c:872 Aborting... This is because Babeltrace 2 groups all CTF traces sharing the same UUID (which is the case for our three snapshots) and gives them to the same src.ctf.fs component to read. The component groups data stream files from the various snapshots by stream instance id, and sorts them according to the timestamp of their first event. It then reads them sequentially, from end to end, assuming that the start of the second data stream file is after the end of the first data stream file. Using our example above, the src.ctf.fs component would therefore try to read packets in this order: A B C D C D E F D E F G ^ `- ouch! In this case, we want to read all packets exactly once, in the right order. The solution brought by this patch is to iterate on the packets by following the index, instead of reading all data files from end to end. Index entries were already de-duplicated by commit ctf: de-duplicate index entries So the index already refers to a single instance of each packet. We can therefore use it as a playlist of the packets to read. The change mainly revolves around adding a new kind of CTF message iterator medium operations, called ctf_fs_ds_group_medops. Instead of the medium being a single data stream file, like in ctf_fs_ds_file_medops, this new medium is conceptually the sequence of all packets described by the index of a ctf_fs_ds_group, possibly spread out in different data stream files. A new optional medium operation called `switch_packet` is added. When the CTF message iterator is done reading a packet, it calls this method, indicating to the medium that it is at the frontier of two packets. If the medium is aware of the packets (like ctf_fs_ds_group_medops is) and knows that the following packet is not contiguous with the previous packet, it can reposition its "read head" at the right place (open the new file, go to the right offset). Immediatly after calling `switch_packet`, the message iterator calls the `request_bytes` method, which allows the medium to return a buffer containing the bytes of the next packet. When the packet-aware medium has its `switch_packet` method called but there are no more packets to read, it returns CTF_MSG_ITER_MEDIUM_STATUS_EOF. That brings the message iterator in the STATE_CHECK_EMIT_MSG_STREAM_END state, which will close the stream. If the `switch_packet` method is not provided by the medium, the message iterator just continues, assuming that the bytes of the next packet are contiguous with those of the previous packet. The ctf_fs_ds_file_medops medium operations are still necessary for reading individual ctf_fs_ds_files, when initializing src.ctf.fs components. This is needed when building the index from a stream or for applying tracer fixups. This simplifies a little bit the interaction between the src.ctf.fs iterator and the ctf_msg_iter. Previously, we would read each data stream file until the message iterator returned EOF. If it wasn't the last data stream file, we would reset the iterator to read the next data stream file. Functions ctf_msg_iter_set_emit_stream_{beginning,end}_message were necessary to indicate to the message iterator whether to send the stream beginning or end messages, because it had otherwise no idea of whether the data stream file it is reading is the first one or last one. The function ctf_msg_iter_set_medops_data was necessary to swap the data of the ctf_fs_ds_file_medops to point to the new data stream file. With the ctf_fs_ds_group_medops, the CTF message iterator only returns EOF when the stream is done. The data passed to the ctf_fs_ds_group_medops has everything it needs to go through all the packets, so it doesn't need to change. Change-Id: I72f6d1e09b87414fb83f68cb57abb1f2dc61b439 Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
- Loading branch information