By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: NimbusNet: Building a High‑Performance Echo & Chat Server Across Boost.Asio and Io_uring | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > NimbusNet: Building a High‑Performance Echo & Chat Server Across Boost.Asio and Io_uring | HackerNoon
Computing

NimbusNet: Building a High‑Performance Echo & Chat Server Across Boost.Asio and Io_uring | HackerNoon

News Room
Last updated: 2025/05/22 at 4:29 PM
News Room Published 22 May 2025
Share
SHARE

We design and benchmark a cross‑platform echo & chat server that scales from laptops to low‑latency Linux boxes. Starting with a Boost.Asio baseline, we add UDP and finally an io_uring implementation that closes the gap with DPDK‑style kernel‑bypass—all while preserving a single, readable codebase.

Full code is available here: https://github.com/hariharanragothaman/nimbus-echo

Motivation

Real‑time collaboration tools, multiplayer games, and HFT gateways all live or die by tail latency. Traditional blocking sockets waste cycles on context switches; bespoke bypass stacks (XDP, DPDK) achieve greatness at the cost of portability.

NimbusNet shows you can split the difference:

  • Run anywhere with Boost.Asio (macOS, Windows, CI containers).
  • Drop latency ~2× with UDP by eliminating TCP’s ordering tax.
  • Unlock sub‑25 µs RTT on Linux via io_uring—no kernel patches, no CAP_NET_RAW.

Build Environment:

Host

Toolchain

Runtime Variant(s)

macOS 14.5 (M2 Pro)

Apple clang 15, Homebrew Boost 1.85

Boost.Asio / TCP & UDP

Ubuntu 24.04 (x86‑64)

GCC 13, liburing 2.6

Boost.Asio / TCP & UDP, io_uring / TCP

GitHub Actions

macos‑14, ubuntu‑24.04

CI build + tests

Phase 1 – Establishing the Baseline (Boost.Asio, TCP)

We begin with a minimal asynchronous echo service that compiles natively on macOS.

Boost.Asio’s Proactor‑styleasync_read_some/async_writegives us a platform‑agnostic way to experiment before introducing kernel‑bypass techniques.

#include <boost/asio.hpp>
#include <array>
#include <iostream>

using boost::asio::ip::tcp;

class EchoSession : public std::enable_shared_from_this<EchoSession> {
    tcp::socket socket_;
    std::array<char, 4096> buf_{};

public:
    explicit EchoSession(tcp::socket s) : socket_(std::move(s)) {}
    void start() { read(); }

private:
    void read() {
        auto self = shared_from_this();
        socket_.async_read_some(boost::asio::buffer(buf_),
                                [this, self](auto ec, std::size_t n) { if (!ec) write(n); });
    }
    void write(std::size_t n) {
        auto self = shared_from_this();
        boost::asio::async_write(socket_, boost::asio::buffer(buf_, n),
                                 [this, self](auto ec, std::size_t) { if (!ec) read(); });
    }
};

int main() {
    boost::asio::io_context io;
    tcp::acceptor acc(io, {tcp::v4(), 9000});

    std::function<void()> do_accept = [&]() {
        acc.async_accept([&](auto ec, tcp::socket s) {
            if (!ec) std::make_shared<EchoSession>(std::move(s))->start();
            do_accept();
        });
    };
    do_accept();

    std::cout << "⚡  NimbusNet echo listening on 0.0.0.0:9000n";
    io.run();
}

2 – UDP vs. TCP: When Reliability Becomes a Tax

TCP’s 3‑way handshake, retransmit queues, and head‑of‑line blocking are lifesavers for file transfers—and millstones for chats that can drop an occasional emoji. TCP bakes in ordering, re‑transmission, and congestion avoidance; these guarantees come at the cost of extra context switches and kernel bookkeeping. Swapping to udp::socket: For chat or market‑data fan‑out, “best‑effort but immediate” sometimes wins.

#include <boost/asio.hpp>
#include <array>
#include <iostream>

using boost::asio::ip::udp;

class UdpEchoServer {
    udp::socket socket_;
    std::array<char, 4096> buf_{};
    udp::endpoint remote_;
public:
    explicit UdpEchoServer(boost::asio::io_context& io, unsigned short port)
            : socket_(io, udp::endpoint{udp::v4(), port}) { receive(); }

private:
    void receive() {
        socket_.async_receive_from(
                boost::asio::buffer(buf_), remote_,
                [this](auto ec, std::size_t n) {
                    if (!ec) send(n);
                });
    }
    void send(std::size_t n) {
        socket_.async_send_to(
                boost::asio::buffer(buf_, n), remote_,
                [this](auto /*ec*/, std::size_t /*n*/) { receive(); });
    }
};

int main() {
    try {
        boost::asio::io_context io;
        UdpEchoServer srv(io, 9001);
        std::cout << "⚡  UDP echo on 0.0.0.0:9001n";
        io.run();
    } catch (const std::exception& ex) {
        std::cerr << ex.what() << 'n';
        return 1;
    }
}

Latency table (localhost, 64‑byte payload):

Layer

TCP

UDP

Conn setup

3‑way handshake

0

HOL blocking

Yes

No

Kernel buffer

per‑socket

shared

RTT (median)

≈ 85 µs

≈ 45 µs

Here we replaced tcp::socket with udp::socket and removed the per‑session heap allocation; the code path is ~40 % shorter in perf traces.

If your application can tolerate an occasional drop (or do its own acks), UDP is the gateway to sub‑50 µs median latencies—even before kernel‑bypass. If you can tolerate packet loss (or roll your own ACK/NACK), UDP buys you ~40 µs on the spot.

Takeaway: if you can tolerate packet loss (or roll your own ACK/NACK), UDP buys you ~40 µs on the spot.

3 – io_uring: The Lowest‑Friction Doorway to Zero‑Copy

Linux 5.1 introduced io_uring; by 5.19 it rivals DPDK‑style bypass while staying in‑kernel.

  • Avoids per‑syscall overhead by batching accept/recv/send in a single submission queue.

  • Reuses a pre‑allocated ConnData buffer—no heap churn on the fast path.

  • Achieves ~20 µs RTT on Apple M2‑>QEMU→Ubuntu, a 3× improvement over Boost.Asio/TCP (~85 µs).

// Extremely small io_uring TCP echo server (edge‑triggered)
#include <liburing.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <unistd.h>
#include <cstring>
#include <iostream>

// ---------------------------------------------------------------------------
// Compat shim for old liburing (< 2.2) — Ubuntu 24.04 ships 2.0
// ---------------------------------------------------------------------------
#ifndef io_uring_cqe_get_res
/* If the helper isn't defined, just read the struct field directly */
#define io_uring_cqe_get_res(cqe) ((cqe)->res)
#endif


constexpr uint16_t PORT = 9002;
constexpr unsigned QUEUE_DEPTH = 256;
constexpr unsigned BUF_SZ = 4096;

struct ConnData {
    int fd;
    char buf[BUF_SZ];
};

int main() {
    // 1. Classic BSD socket setup
    int listen_fd = socket(AF_INET, SOCK_STREAM | SOCK_NONBLOCK, 0);
    sockaddr_in addr{}; addr.sin_family = AF_INET; addr.sin_port = htons(PORT);
    addr.sin_addr.s_addr = INADDR_ANY;
    bind(listen_fd, reinterpret_cast<sockaddr*>(&addr), sizeof(addr));
    listen(listen_fd, SOMAXCONN);

    // 2. uring setup
    io_uring ring{};
    io_uring_queue_init(QUEUE_DEPTH, &ring, 0);

    // helper lambda: submit an accept sqe
    auto prep_accept = [&]() {
        io_uring_sqe* sqe = io_uring_get_sqe(&ring);
        sockaddr_in* client = new sockaddr_in;
        socklen_t* len      = new socklen_t(sizeof(sockaddr_in));
        io_uring_prep_accept(sqe, listen_fd,
                             reinterpret_cast<sockaddr*>(client), len, SOCK_NONBLOCK);
        io_uring_sqe_set_data(sqe, client);   // stash ptr so we can free later
    };
    prep_accept();
    io_uring_submit(&ring);

    std::cout << "⚡  io_uring TCP echo on 0.0.0.0:" << PORT << 'n';

    // 3. Main completion loop
    while (true) {
        io_uring_cqe* cqe;
        int ret = io_uring_wait_cqe(&ring, &cqe);
        if (ret < 0) { perror("wait_cqe"); break; }

        void* data = io_uring_cqe_get_data(cqe);
        unsigned op = io_uring_cqe_get_res(cqe);

        // Accept completed → op = client_fd
        if (data && data != nullptr && op >= 0 && op < 0xFFFF) {
            int client_fd = op;
            delete static_cast<sockaddr_in*>(data); // free sockaddr
            io_uring_cqe_seen(&ring, cqe);

            // schedule next accept right away
            prep_accept();

            // schedule first read
            ConnData* cd = new ConnData{client_fd, {}};
            io_uring_sqe* r_sqe = io_uring_get_sqe(&ring);
            io_uring_prep_recv(r_sqe, client_fd, cd->buf, BUF_SZ, 0);
            io_uring_sqe_set_data(r_sqe, cd);
            io_uring_submit(&ring);
            continue;
        }

        // Read completed → if >0 bytes, write them back
        ConnData* cd = static_cast<ConnData*>(data);
        if (op > 0) {
            io_uring_sqe* w_sqe = io_uring_get_sqe(&ring);
            io_uring_prep_send(w_sqe, cd->fd, cd->buf, op, 0);
            io_uring_sqe_set_data(w_sqe, cd);  // reuse struct
            io_uring_submit(&ring);
        } else { // client closed
            close(cd->fd);
            delete cd;
        }
        io_uring_cqe_seen(&ring, cqe);
    }
    close(listen_fd);
    io_uring_queue_exit(&ring);
    return 0;
}

Even without privileged NIC drivers, io_uring brings sub‑50 µs latency into laptop‑class hardware—ideal for prototyping HFT engines before deploying on SO_REUSEPORT + XDP in production.

4 – Running Benchmarks: Quantifying the wins

We wrap each variant into Google Benchmarks

#include <benchmark/benchmark.h>
#include <boost/asio.hpp>
#include <thread>
#include <array>

using boost::asio::ip::tcp;
using boost::asio::ip::udp;

/* ---------- Helpers ------------------------------------------------------ */

// blocking Boost.Asio TCP echo client (loop‑back)
static void tcp_roundtrip(size_t payload) {
    boost::asio::io_context io;
    tcp::socket c(io);
    c.connect({boost::asio::ip::make_address("127.0.0.1"), 9000});
    std::string msg(payload, 'x');
    c.write_some(boost::asio::buffer(msg));
    std::array<char, 8192> buf{};
    c.read_some(boost::asio::buffer(buf, payload));
}

// blocking Boost.Asio UDP echo client
static void udp_roundtrip(size_t payload) {
    boost::asio::io_context io;
    udp::socket s(io, udp::v4());
    udp::endpoint server(boost::asio::ip::make_address("127.0.0.1"), 9001);
    std::string msg(payload, 'x');
    s.send_to(boost::asio::buffer(msg), server);
    std::array<char, 8192> buf{};
    s.receive_from(boost::asio::buffer(buf, payload), server);
}

#if defined(__linux__)
// tiny wrapper for the io_uring server (assumes it’s already running on 9002)
static void uring_tcp_roundtrip(size_t payload) {
    boost::asio::io_context io;
    tcp::socket c(io);
    c.connect({boost::asio::ip::make_address("127.0.0.1"), 9002});
    std::string msg(payload, 'x');
    c.write_some(boost::asio::buffer(msg));
    std::array<char, 8192> buf{};
    c.read_some(boost::asio::buffer(buf, payload));
}
#endif

/* ---------- Benchmarks --------------------------------------------------- */

static void BM_AsioTCP_64B(benchmark::State& s) {
    for (auto _ : s) tcp_roundtrip(64);
}
BENCHMARK(BM_AsioTCP_64B)->Unit(benchmark::kMicrosecond);

static void BM_AsioUDP_64B(benchmark::State& s) {
    for (auto _ : s) udp_roundtrip(64);
}
BENCHMARK(BM_AsioUDP_64B)->Unit(benchmark::kMicrosecond);

#if defined(__linux__)
static void BM_IouringTCP_64B(benchmark::State& s) {
    for (auto _ : s) uring_tcp_roundtrip(64);
}
BENCHMARK(BM_IouringTCP_64B)->Unit(benchmark::kMicrosecond);
#endif

BENCHMARK_MAIN();

With Google Benchmark we measured 10 K in‑process round trips per transport on an M2‑Pro MBP (macOS 14.5, Docker Desktop 4.30):

Benchmarking ResultsBenchmarking Results

Table 1 – Median RTT (64 B payload, 10 K iterations)

Transport

Median RTT (µs)

Boost.Asio / TCP

82

Boost.Asio / UDP

38

io_uring / TCP

21

Even on consumer hardware, io_uring halves UDP’s latency and crushes traditional TCP by nearly 4×. This validates the architectural decision to build NimbusNet’s high‑fan‑out chat tier on kernel‑bypass primitives while retaining a pure‑userspace codebase.

Takeaways & Future Work

  • Portability first, performance second pays dividends—macOS dev loop, prod Linux wins.
  • UDP is “good enough” for most chats; sprinkle FEC / acks for mission‑critical flows.
  • io_uring slashes latency without root privileges, making kernel‑bypass approachable.

Next steps

  1. SO_REUSEPORT + sharded accept rings → horizontal scale on 64‑core EPYC Processor
  2. TLS off‑loading via kTLS with io_uring::splice.
  3. eBPF tracing to pinpoint queue depth vs. tail latency.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article House Democrat: 'Elon Musk isn't gone'
Next Article New on HBO & Max: June 2025
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Firefox Maker Mozilla Discontinues Pocket and Fakespot to Focus on Browser
News
Mantle And Republic Technologies Forge Strategic Partnership For Institutional mETH Integration | HackerNoon
Computing
Mysterious Database of 184 Million Records Exposes Vast Array of Login Credentials
Gadget
Your Go-Anywhere Backup Power Solution Is on Sale Now
News

You Might also Like

Computing

Mantle And Republic Technologies Forge Strategic Partnership For Institutional mETH Integration | HackerNoon

6 Min Read
Computing

ThreatBook Named a Notable Vendor In Global Network Analysis And Visibility (NAV) Independent Report | HackerNoon

4 Min Read
Computing

INE Security Partners With Abadnet Institute For Cybersecurity Training Programs In Saudi Arabia | HackerNoon

5 Min Read
Computing

Why It’s Difficult to Accurately Predict Ethereum Transaction Times | HackerNoon

19 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?