<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Nerdvania]]></title><description><![CDATA[Nerdvania]]></description><link>https://nerdvania.blog</link><generator>RSS for Node</generator><lastBuildDate>Wed, 15 Apr 2026 16:06:11 GMT</lastBuildDate><atom:link href="https://nerdvania.blog/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Benchmarking basic IO functions on Linux]]></title><description><![CDATA[I am working on a [database](https://github.com/lnikon/tinykvpp), particularly building the storage engine right now, and I assigned myself the task of creating separate file abstractions(e.g. I really like how it is done in leveldb, thus having Rand...]]></description><link>https://nerdvania.blog/benchmarking-basic-io-functions-on-linux</link><guid isPermaLink="true">https://nerdvania.blog/benchmarking-basic-io-functions-on-linux</guid><category><![CDATA[Linux]]></category><category><![CDATA[io]]></category><category><![CDATA[kv]]></category><category><![CDATA[posix]]></category><category><![CDATA[C++]]></category><dc:creator><![CDATA[Vahag Bejanyan]]></dc:creator><pubDate>Wed, 10 Sep 2025 18:24:02 GMT</pubDate><content:encoded><![CDATA[<p>I am working on a [database](<a target="_blank" href="https://github.com/lnikon/tinykvpp">https://github.com/lnikon/tinykvpp</a>), particularly building the storage engine right now, and I assigned myself the task of creating separate file abstractions(e.g. I really like how it is done in leveldb, thus having RandomAccessFile, SequentialFile, AppendOnlyFile, etc…). Buffered IO versus System IO. So, while designing my approach, I started evaluating my current way of doing file IO - fstreams versus plain POSIX write(), and Google benchmarks.</p>
<h2 id="heading-intro">Intro</h2>
<p>Hi there!</p>
<p>I am working on a database, particularly building the storage engine(<a target="_blank" href="https://github.com/lnikon/tinykvpp">https://github.com/lnikon/tinykvpp</a>) right now, and I assigned myself the task of creating separate file abstractions(e.g. I really like how it is done in leveldb, thus having RandomAccessFile, SequentialFile, AppendOnlyFile, etc...).</p>
<h2 id="heading-buffered-io-versus-system-io">Buffered IO versus System IO</h2>
<p>So, while designing my approach to this, I started evaluating my current way of doing file IO - fstreams versus plain POSIX write(), and simple Google benchmark:</p>
<pre><code class="lang-cpp"><span class="hljs-function"><span class="hljs-keyword">static</span> <span class="hljs-keyword">void</span>
<span class="hljs-title">BM_BenchmarkFstreamWrite</span><span class="hljs-params">(benchmark::State &amp; state)</span> </span>{

  <span class="hljs-function"><span class="hljs-keyword">const</span> <span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span> <span class="hljs-title">filename</span><span class="hljs-params">(<span class="hljs-string">"test_stream.txt"</span>)</span></span>;
  <span class="hljs-function"><span class="hljs-built_in">std</span>::fstream <span class="hljs-title">fs</span><span class="hljs-params">(filename, <span class="hljs-built_in">std</span>::fstream::in |
    <span class="hljs-built_in">std</span>::fstream::out |
    <span class="hljs-built_in">std</span>::fstream::app |
    <span class="hljs-built_in">std</span>::fstream::ate)</span></span>;

  <span class="hljs-keyword">if</span> (!fs.is_open()) {
    <span class="hljs-built_in">std</span>::<span class="hljs-built_in">cerr</span> &lt;&lt; <span class="hljs-string">"unable to open"</span> &lt;&lt; filename &lt;&lt; <span class="hljs-string">'\n'</span>;
    <span class="hljs-built_in">exit</span>(EXIT_FAILURE);
  }

  <span class="hljs-function"><span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span> <span class="hljs-title">payload</span><span class="hljs-params">(<span class="hljs-string">"aaaaa"</span>)</span></span>;

  <span class="hljs-keyword">for</span> (<span class="hljs-keyword">auto</span> _: state) {
    benchmark::DoNotOptimize(fs.write(payload.c_str(), payload.size()));
  }

  <span class="hljs-built_in">std</span>::filesystem::remove(filename);
}
BENCHMARK(BM_BenchmarkFstreamWrite);

<span class="hljs-function"><span class="hljs-keyword">static</span> <span class="hljs-keyword">void</span>
<span class="hljs-title">BM_BenchmarkPosixWrite</span><span class="hljs-params">(benchmark::State &amp; state)</span> </span>{

  <span class="hljs-function"><span class="hljs-keyword">const</span>
    <span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span> <span class="hljs-title">filename</span><span class="hljs-params">(<span class="hljs-string">"test_stream_2.txt"</span>)</span></span>;

  <span class="hljs-keyword">int</span>
  fd = open(filename.c_str(), O_WRONLY | O_APPEND | O_CREAT, <span class="hljs-number">0644</span>);

  <span class="hljs-keyword">if</span> (fd == <span class="hljs-number">-1</span>) {
    <span class="hljs-built_in">std</span>::<span class="hljs-built_in">cerr</span> &lt;&lt; <span class="hljs-string">"Unable to open "</span> &lt;&lt; filename &lt;&lt; <span class="hljs-string">'\n'</span>;
  }

  <span class="hljs-function"><span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span> <span class="hljs-title">payload</span><span class="hljs-params">(<span class="hljs-string">"bbbbb"</span>)</span></span>;

  <span class="hljs-keyword">for</span> (
    <span class="hljs-keyword">auto</span> _: state) {
    write(fd, payload.c_str(), payload.size());
  }

  <span class="hljs-built_in">std</span>::filesystem::remove(filename);
}

BENCHMARK(BM_BenchmarkPosixWrite);
<span class="hljs-function"><span class="hljs-keyword">static</span> <span class="hljs-keyword">void</span> <span class="hljs-title">BM_BenchmarkFstreamWrite</span><span class="hljs-params">(benchmark::State &amp; state)</span> </span>{
  <span class="hljs-function"><span class="hljs-keyword">const</span> <span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span> <span class="hljs-title">filename</span><span class="hljs-params">(<span class="hljs-string">"test_stream.txt"</span>)</span></span>;
  <span class="hljs-function"><span class="hljs-built_in">std</span>::fstream <span class="hljs-title">fs</span><span class="hljs-params">(filename, <span class="hljs-built_in">std</span>::fstream::in | <span class="hljs-built_in">std</span>::fstream::out | <span class="hljs-built_in">std</span>::fstream::app | <span class="hljs-built_in">std</span>::fstream::ate)</span></span>;
  <span class="hljs-keyword">if</span> (!fs.is_open()) {
    <span class="hljs-built_in">std</span>::<span class="hljs-built_in">cerr</span> &lt;&lt; <span class="hljs-string">"unable to open"</span> &lt;&lt; filename &lt;&lt; <span class="hljs-string">'\n'</span>;
    <span class="hljs-built_in">exit</span>(EXIT_FAILURE);
  }

  <span class="hljs-function"><span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span> <span class="hljs-title">payload</span><span class="hljs-params">(<span class="hljs-string">"aaaaa"</span>)</span></span>;
  <span class="hljs-keyword">for</span> (<span class="hljs-keyword">auto</span> _: state) {
    benchmark::DoNotOptimize(fs.write(payload.c_str(), payload.size()));
  }

  <span class="hljs-built_in">std</span>::filesystem::remove(filename);
}
BENCHMARK(BM_BenchmarkFstreamWrite);

<span class="hljs-function"><span class="hljs-keyword">static</span> <span class="hljs-keyword">void</span> <span class="hljs-title">BM_BenchmarkPosixWrite</span><span class="hljs-params">(benchmark::State &amp; state)</span> </span>{
  <span class="hljs-function"><span class="hljs-keyword">const</span> <span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span> <span class="hljs-title">filename</span><span class="hljs-params">(<span class="hljs-string">"test_stream_2.txt"</span>)</span></span>;
  <span class="hljs-keyword">int</span> fd = open(filename.c_str(), O_WRONLY | O_APPEND | O_CREAT, <span class="hljs-number">0644</span>);
  <span class="hljs-keyword">if</span> (fd == <span class="hljs-number">-1</span>) {
    <span class="hljs-built_in">std</span>::<span class="hljs-built_in">cerr</span> &lt;&lt; <span class="hljs-string">"Unable to open "</span> &lt;&lt; filename &lt;&lt; <span class="hljs-string">'\n'</span>;
  }

  <span class="hljs-function"><span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span> <span class="hljs-title">payload</span><span class="hljs-params">(<span class="hljs-string">"bbbbb"</span>)</span></span>;
  <span class="hljs-keyword">for</span> (<span class="hljs-keyword">auto</span> _: state) {
    write(fd, payload.c_str(), payload.size());
  }

  <span class="hljs-built_in">std</span>::filesystem::remove(filename);
}
BENCHMARK(BM_BenchmarkPosixWrite);
</code></pre>
<p>shows following result:</p>
<pre><code class="lang-plaintext">    Benchmark Time CPU Iterations

    BM_BenchmarkFstreamWrite 24.0 ns 24.0 ns 30455681

    BM_BenchmarkPosixWrite 2001 ns 2001 ns 356052
</code></pre>
<h2 id="heading-payload-size-and-stdfstream-buffering">Payload size and std::fstream buffering</h2>
<p>So, simple POSIX write() is slower almost 100 times. This result really shocked me. I decided to strace the program and saw that for sufficiently big inputs std::fstream::write's got vectorized! What do I mean by 'sufficiently big input'?</p>
<p>Let's examine the following small programs and their straces. To simulate the load I wrapped the write() into a while loop. The payload size is the same for all test cases, only the iteration count changes.</p>
<p>Iteration count: 3</p>
<pre><code class="lang-cpp"><span class="hljs-keyword">size_t</span> count {
  <span class="hljs-number">3</span>
};
<span class="hljs-function"><span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span> <span class="hljs-title">payload</span><span class="hljs-params">(<span class="hljs-string">"aaa"</span>)</span></span>;
<span class="hljs-keyword">while</span> (count-- != <span class="hljs-number">0</span>) {
  fs &lt;&lt; payload;
}
</code></pre>
<p>Strace:</p>
<pre><code class="lang-plaintext">    openat(AT_FDCWD, "test_stream_3.txt", O_RDWR|O_CREAT|O_APPEND, 0666) = 3
    lseek(3, 0, SEEK_END) = 40755
    write(3, "aaaaaaaaa", 9) = 9
    close(3) = 0
</code></pre>
<p>What we see is a simple write of 9 characters. But what would I've expected is to see three writes of 3 chars. So std::fsteam does internal buffering? Maybe that's the way it derives from std::basic_filebuf?</p>
<p>Okay, next program:</p>
<p>Iteration count 9 * 1024:</p>
<pre><code class="lang-cpp"><span class="hljs-keyword">size_t</span> count {
  <span class="hljs-number">9</span> * <span class="hljs-number">1024</span>
};
<span class="hljs-function"><span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span> <span class="hljs-title">payload</span><span class="hljs-params">(<span class="hljs-string">"aaa"</span>)</span></span>;
<span class="hljs-keyword">while</span> (count-- != <span class="hljs-number">0</span>) {
  fs &lt;&lt; payload;
}
</code></pre>
<p>Strace:</p>
<pre><code class="lang-plaintext">    openat(AT_FDCWD, "test_stream_3.txt", O_RDWR|O_CREAT|O_APPEND, 0666) = 3

    lseek(3, 0, SEEK_END) = 49980

    writev(3, [{iov_base="aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., iov_len=8190}, {iov_base="aaa", iov_len=3}], 2) = 8193

    writev(3, [{iov_base="aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., iov_len=8190}, {iov_base="aaa", iov_len=3}], 2) = 8193

    writev(3, [{iov_base="aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., iov_len=8190}, {iov_base="aaa", iov_len=3}], 2) = 8193

    write(3, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 3069) = 3069

    close(3) = 0
</code></pre>
<p>So what we see is that std::fstream is smart enough to auto-vectorize IO! This is really surprising to me. After seeing this trace I decided to dig into the libstdc++ and find out the logic behind vectorization, and I found... nothing. Grep showed zero calls to writev(). A simple search in fstream and related headers did not reference writev(). So the natural question is: where does vectorization happen?</p>
<p>The next experiment I conducted was to try to see if maybe the kernel somehow decides that it can vectorize things?</p>
<p>So I drafted the same logic using plain write() calls.</p>
<pre><code class="lang-cpp"><span class="hljs-function"><span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span> <span class="hljs-title">payload</span><span class="hljs-params">(<span class="hljs-string">"bbbbb"</span>)</span></span>;
<span class="hljs-keyword">size_t</span> count {
  <span class="hljs-number">9</span> * <span class="hljs-number">1024</span>
};
<span class="hljs-keyword">while</span> (count-- != <span class="hljs-number">0</span>) {
  write(fd, payload.c_str(), payload.size());
}
</code></pre>
<p>strace:</p>
<pre><code class="lang-plaintext">    ......

    write(3, "bbbbb", 5) = 5

    write(3, "bbbbb", 5) = 5

    write(3, "bbbbb", 5) = 5

    write(3, "bbbbb", 5) = 5

    write(3, "bbbbb", 5) = 5

    write(3, "bbbbb", 5) = 5

    write(3, "bbbbb", 5) = 5

    write(3, "bbbbb", 5) = 5

    write(3, "bbbbb", 5) = 5

    exit_group(0) = ?
</code></pre>
<p>Strace just shows a ton of calls to write: exactly what I expected when std::fstream::write()-ing into the stream.</p>
<h2 id="heading-manual-io-vectorizaton-via-writev">Manual IO vectorizaton via writev()</h2>
<p>And the last experiment was to benchmark explicitly vectorized IO against fstream.</p>
<pre><code class="lang-cpp"><span class="hljs-built_in">std</span>::<span class="hljs-built_in">vector</span> &lt;iovec&gt; iov;
<span class="hljs-function"><span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span> <span class="hljs-title">payload</span><span class="hljs-params">(<span class="hljs-string">"bbbbb"</span>)</span></span>;
<span class="hljs-keyword">for</span> (
  <span class="hljs-keyword">auto</span> _: state) {
  iov.emplace_back(iovec {
    .iov_base = payload.data(), .iov_len = payload.size()
  });
}
writev(fd, iov.data(), iov.size());
</code></pre>
<p>And the benchmark showed interesting result:</p>
<pre><code class="lang-plaintext">    Benchmark Time CPU Iterations

    BM_BenchmarkFstreamWrite 24.0 ns 24.0 ns 28733493

    BM_BenchmarkPosixWrite 1828 ns 1828 ns 384717

    BM_BenchmarkPosixScatterWrite 37.9 ns 37.9 ns 16676420
</code></pre>
<p>DIY writev() is almost two times slower than std::fstream::write!</p>
<p>What are your interpretations of these results? How to understand where the vectorization happens? Maybe I should abandon my idea of having custom file abstractions and use std::fstream.</p>
<p>I am using 6.10.3-1-default with gcc 13.3.1 with optimization levels set to -O2.</p>
<p>Thank you for reading!</p>
]]></content:encoded></item></channel></rss>