Skip to content

axsaucedo/rust-io-file-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Rust benchmarking on Large File IO

This is a short report on the performance metrics obtained processing large files with a small rust/python script.

In this case the definition of "Large" is files that won't fit in memory easily (e.g. 100GB >) and require streaming / buffers.

Experiment overview

The experiment is as follows:

  • 2GB text file containing text information about objects
  • Each 3 consecutive lines has information about one object (i.e. 1 line = one attribute)
  • Each object is separated by one blank line

Objective

Objective is to read file, iterate through lines and write results to CSV/TSV

Example

input file example

OBJECT 1 ATTR 1: CONTENT OBJECT 1 ATTR 2: CONTENT OBJECT 1 ATTR 3: CONTENT

OBJECT 2 ATTR 1: CONTENT OBJECT 2 ATTR 2: CONTENT OBJECT 2 ATTR 3: CONTENT

...etc

expected output file example

OBJECT 1 ATTR1, OBJECT 1 ATTR 2, OBJECT 1 ATTR 3
OBJECT 2 ATTR1, OBJECT 2 ATTR 2, OBJECT 2 ATTR 3 ... etc

Improvements

Currently there are clear optimisations required for the Rust code, as there are several string operations.

Ideally it would be possible to process the files in rust as u8 (byte) format to save time, which would accelerate the processing, but unfortunately the BufReader class doesn't seem to provide functionality to read the files as bytes directly.

Results

The results are provided below.

Python:

Simple python implementation without any buffering, using the native python file IO read_line / write.

real 2m16.087s

user 1m4.397s

sys 0m4.352s

Rust 1.33 main.rs:

Rust implementation using the BufReader and BufWriter converting to string, appending and writing bytes.

real 7m28.602s

user 7m19.379s

sys 0m5.094s

Rust 1.33 main-vec.rs:

Rust implementation using BufReader and Bufwriter, and using a vector to attempt single string concat.

real 8m35.463s

user 8m24.227s

sys 0m5.918s

Rust 1.33 main-copy.rs:

Rust implementation using plain File reading in bytes and copying it to another location without performing and processing.

real 41m12.918s

user 22m55.845s

sys 18m15.949s

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published