Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce IOS_BUFSIZE to 32kb #34449

Closed
non-Jedi opened this issue Jan 20, 2020 · 6 comments · Fixed by #35618
Closed

Reduce IOS_BUFSIZE to 32kb #34449

non-Jedi opened this issue Jan 20, 2020 · 6 comments · Fixed by #35618
Labels
io Involving the I/O subsystem: libuv, read, write, etc.

Comments

@non-Jedi
Copy link
Contributor

non-Jedi commented Jan 20, 2020

Most other languages that buffer reads from files seem to have a default of reading whatever the platform's block size is (or sometimes simply 8 kb). Julia on the other hand will always buffer up to 128 kb no matter the platform. This was introduced way back in 2011 in commit e380648 (@JeffBezanson, do you remember the justification for the change?).

There may well be a good reason to have a higher value for this, but I can't find documentation for it, and this buffer allocation does cause much higher memory usage when reading from large files as compared to other languages.

Some other language examples:

python

❯ python3 -c "import io; print(io.DEFAULT_BUFFER_SIZE)"
8192

Rust

8192

https://doc.rust-lang.org/1.40.0/src/std/sys_common/io.rs.html#1

Java

8192

https://stackoverflow.com/questions/16973843/bufferedreader-default-buffer-size

@JeffBezanson JeffBezanson added the io Involving the I/O subsystem: libuv, read, write, etc. label Jan 20, 2020
@JeffBezanson
Copy link
Member

Probably ok to change this. 8k just seems small to me. Why does it cause much higher memory usage --- there's one buffer per file, so it seems it would only be a problem when reading a large number of files at once?

@non-Jedi
Copy link
Contributor Author

You're right of course. A single file isn't a big deal. Here's a mini benchmark from the benchmark game which is usually performance-bound by IO. There's only a slight (but still detectable) perf diff for this script which is close to a worst case for being IO-bound; about 3%. Going down to 32 kb instead, there's only about a 0.6 % penalty.

const LINE_LEN = 60

const COMPS = Vector{UInt8}(undef, 256)
for (i, j) in zip("AaCcGgTtUuMmRrWwSsYyKkVvHhDdBbNn",
                  "TTGGCCAAAAKKYYWWSSRRMMBBDDHHVVNN")
    @inbounds COMPS[i % UInt8] = j % UInt8
end

# Reverse the sequence chunk by chunk and print it.
# Looping "chunkwise" elides check for newline characters every loop.
function print_sequence(OUTIO, v)
    l = 1
    # Don't count the last newline
    r = length(v) - 1
    
    # The number of characters on the last line of the sequence determines the
    # left-side chunk-size
    chunkl = r % (LINE_LEN + 1)
    # The right-side chunk-size is everything left in each line
    chunkr = LINE_LEN - chunkl

    # Each loop iteration handles 1 full line from left of vector and
    # end-half + next start-half from right of vector.
    while true
        # Check whether next chunk for l and r is same chunk
        l + chunkl - 1 == r && break
        revcomp_chunks!(v, l, r, chunkl)
        l += chunkl
        # chunkl + 1 skips over a newline index
        r -= chunkl + 1

        l + chunkr - 1 == r && break
        revcomp_chunks!(v, l, r, chunkr)
        l += chunkr + 1
        r -= chunkr
    end

    # revcomp the last chunk between l and r
    revcomp_chunks!(v, l, r, cld(r - l + 1, 2))

    write(OUTIO, v)
end

# revcomp the n elements in v starting with l with the n elements ending with r
function revcomp_chunks!(v, l, r, n)
    for i=0:n-1
        li, ri = l + i, r - i
        @inbounds v[li], v[ri] = COMPS[v[ri]], COMPS[v[li]]
    end
end

function perf_revcomp(INIO, OUTIO)
    seek(INIO, 1)
    while !eof(INIO)
        seek(INIO, position(INIO) - 1)
        write(OUTIO, readline(INIO; keep=true))
        print_sequence(OUTIO, readuntil(INIO, '>' % UInt8; keep=false))
    end
end

isinteractive() || perf_revcomp(stdin, stdout)

A 243 MiB file that I can't upload because of size is piped into this script.

With 128 kb IOS_BUFSIZE:

julia> @benchmark open("/dev/null"; write=true) do fo
         open("/home/adam/tmp/fasta25000000.txt"; read=true) do fi
           perf_revcomp(fi, fo)
           end end
BenchmarkTools.Trial: 
  memory estimate:  242.40 MiB
  allocs estimate:  43
  --------------
  minimum time:     258.411 ms (4.92% GC)
  median time:      258.575 ms (4.93% GC)
  mean time:        258.734 ms (4.93% GC)
  maximum time:     259.589 ms (4.91% GC)
  --------------
  samples:          19
  evals/sample:     1

With 8 kb IOS_BUFSIZE:

julia> @benchmark open("/dev/null"; write=true) do fo
         open("fasta25000000.txt"; read=true) do fi
           perf_revcomp(fi, fo)
           end end
BenchmarkTools.Trial: 
  memory estimate:  242.40 MiB
  allocs estimate:  43
  --------------
  minimum time:     265.627 ms (4.52% GC)
  median time:      266.055 ms (4.53% GC)
  mean time:        266.095 ms (4.53% GC)
  maximum time:     266.984 ms (4.54% GC)
  --------------
  samples:          18
  evals/sample:     1

With 32 kb IOS_BUFSIZE:

julia> @benchmark open("/dev/null"; write=true) do fo
         open("/home/adam/tmp/fasta25000000.txt"; read=true) do fi
           perf_revcomp(fi, fo)
           end end
BenchmarkTools.Trial: 
  memory estimate:  242.40 MiB
  allocs estimate:  43
  --------------
  minimum time:     259.904 ms (4.79% GC)
  median time:      260.056 ms (4.78% GC)
  mean time:        260.183 ms (4.79% GC)
  maximum time:     260.902 ms (4.77% GC)
  --------------
  samples:          19
  evals/sample:     1

@non-Jedi
Copy link
Contributor Author

non-Jedi commented Jan 21, 2020

Would there be a downside to allow users to specify the buffer-size as a kwarg on open? There would still need to be a default value used for e.g. stdin and stdout of course, but most languages seem to be able to override the default size.

As additional info, apparently Rust made the change to 8 kb at least partially because of performance issues with the associated allocations.

@JeffBezanson
Copy link
Member

Adding a keyword argument for this is fine. Making the default 32kb also seems like a good compromise (the linked issue points to that as a key threshold in jemalloc at least).

@non-Jedi non-Jedi changed the title Reduce IOS_BUFSIZE to 8192 Reduce IOS_BUFSIZE to 32kb Apr 8, 2020
JeffBezanson added a commit that referenced this issue Apr 28, 2020
JeffBezanson added a commit that referenced this issue May 1, 2020
JeffBezanson added a commit that referenced this issue May 1, 2020
@garciaes
Copy link

Adding a keyword argument for this is fine. Making the default 32kb also seems like a good compromise (the linked issue points to that as a key threshold in jemalloc at least).

keyword is not present in 1.5... will it be in 1.6?

@StefanKarpinski
Copy link
Member

The default has been changed for 1.6, but there is no keyword to adjust the buffer size. Someone who wants this has to implement it in order for the feature to get added. The branch for 1.6 has already been forked and shouldn't get any new features, so the next release this feature could appear in is 1.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io Involving the I/O subsystem: libuv, read, write, etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants