Theron - C++ concurrency library

The ThreadRing benchmark

The ThreadRing benchmark provided with Theron implements the well-known thread-ring performance test, defined as part of the Computer Language Benchmark Game and designed to test the raw efficiency of concurrent systems:

  • create 503 linked threads (named 1 to 503)
  • thread 503 should be linked to thread 1, forming an unbroken ring
  • pass a token to thread 1
  • pass the token from thread to thread N times
  • print the name of the last thread (1 to 503) to take the token

An integer token message is passed around a ring of 503 connected actors. Each passing of the token of the token from one actor to the next is called a hop and constitutes the sending of a message. The value N dictates the number of hops; a typical value is 50 million.

The key to the ThreadRing benchmark is minimizing the overheads of message sending and of context switching from one actor to another. Theron implements an M:N architecture where M virtual actors are actually scheduled and executed by N software threads. This means we are not actually creating 503 software threads, and instead have the freedom to choose the number of worker threads used. Additionally, Theron uses tail call optimization: Each sending actor uses TailSend() to send the token to the next actor in the ring, which has the effect of not waking a worker thread to process the receiving actor. Instead, the actor is processed by the thread already active, once it has finished processing the sending actor. For this reason TailSend() is only useful as the last operation performed by an actor -- although that situation is quite common. At any rate these optimizations probably make Theron an interesting alternative implementation from the point of view of the Computer Language Benchmarks Game.

An important point about the ThreadRing benchmark is that it is essentially serial, since there is only one message and, at any point, only one actor is ever sending or receiving the message. Each time the message is passed from one actor to the next, the overlap, or concurrency, between the sending and receiving actors is marginal. This means that the benchmark is a good test of raw message passing and actor scheduling overheads, since it removes all contention for resources except between the sending and receiving actors. In that sense, it measures the peak performance of message passing in the absense of other concurrency.

Here's the source for the ThreadRing benchmark. Some printfs and comments have been omitted for brevity:


static const int NUM_ACTORS = 503;


class Member : public Theron::Actor
{
public:

    inline Member()
    {
        RegisterHandler(this, &Member::InitHandler);
    }

private:

    inline void InitHandler(const Theron::Address &next, const Theron::Address from)
    {
        mNext = next;
        mCaller = from;

        RegisterHandler(this, &Member::TokenHandler);
        DeregisterHandler(this, &Member::InitHandler);
    }

    inline void TokenHandler(const int &token, const Theron::Address /*from*/)
    {
        if (token > 0)
        {
            TailSend(token - 1, mNext);
            return;
        }

        TailSend(token, mCaller);
    }

    Theron::Address mNext;
    Theron::Address mCaller;
};


THERON_REGISTER_MESSAGE(int);
THERON_REGISTER_MESSAGE(Theron::Address);


struct AddressCatcher
{
    inline void Catch(const int &/*message*/const Theron::Address from) { mAddress = from; }
    Theron::Address mAddress;
};


int main(int argc, char *argv[])
{
    AddressCatcher catcher;

    const int numHops = (argc > 1 && atoi(argv[1]) > 0) ? atoi(argv[1]) : 50000000;
    const int numThreads = (argc > 2 && atoi(argv[2]) > 0) ? atoi(argv[2]) : 16;

    Timer timer;
    timer.Start();

    {
        Theron::Framework framework(numThreads);
        Theron::ActorRef members[NUM_ACTORS];

        Theron::Receiver receiver;
        receiver.RegisterHandler(&catcher, &AddressCatcher::Catch);

        // Create NUM_ACTORS member actors for the ring.
        for (int index = 0; index < NUM_ACTORS; ++index)
        {
            members[index] = framework.CreateActor<Member>();
        }

        // Initialize the actors by passing each one the address of the next actor in the ring.
        for (int index(NUM_ACTORS - 1), nextIndex(0); index >= 0; nextIndex = index--)
        {
            framework.Send(members[nextIndex].GetAddress(), receiver.GetAddress(), members[index].GetAddress());
        }

        // Start the processing by sending the token to the first actor.
        framework.Send(numHops, receiver.GetAddress(), members[0].GetAddress());

        // Wait for the signal message indicating the tokens has reached zero.
        receiver.Wait();
    }

    timer.Stop();
}

As a variation, Theron also includes a variation of thread-ring called ParallelThreadRing, which splits the total number of hops between 503 separate tokens, which are passed in parallel around the ring. This "parallel" version of the benchmark passes the same number of messages, but in parallel, so measures the effect of contention for resources between the 503 active actors.

Performance results for the ThreadRing benchmark are published here.