Skip to content

endurodave/stl_allocator

Repository files navigation

License MIT conan Ubuntu conan Ubuntu conan Windows

A Custom STL std::allocator Replacement Improves Performance

Protect against heap fragmentation faults and improve execution speed with a fixed block alternative to STL std::allocator.

Table of Contents

Preface

Originally published on CodeProject at: A Custom STL std::allocator Replacement Improves Performance

CMake is used to create the build files. CMake is free and open-source software. Windows, Linux and other toolchains are supported. See the CMakeLists.txt file for more information.

Introduction

This is my third and final article concerning fixed block memory allocators here on Code Project. This time, we'll create an alternate C++ Standard Library std::allocator memory manager using the groundwork laid by the first two articles. 

The Standard Template Library (STL) is powerful C++ software library including container and iterator support. The problem with using the library for mission-critical or time-critical projects isn't with STL itself – the library is very robust. Instead, it's with the unrestricted use of the global heap. The standard STL allocator uses the heap extensively during operation. This is a problem on resource constrained embedded systems. Embedded devices can be expected to operate for months or years where a fault caused by a fragmented heap must be prevented. Fortunately, STL provides a means to replace std::allocator with one of our own design. 

Fixed block memory allocators are a common technique to provide dynamic-like operation yet the memory is provided from memory pools where the blocks doled are of a fixed size. Unlike the global heap, which is expected to universally operate on blocks of any size, a fixed block allocator can be tailored for a narrowly defined purpose. Fixed block allocators can also offer consistent execution times whereas the global heap cannot offer such guarantees. 

This article describes an STL-compatible allocator implementation that relies upon a fixed block allocator for dispensing and reclaiming memory. The new allocator prevents faults caused by a fragmented heap and offer consistent allocation/deallocation execution times. 

std::allocator

The STL std::allocator class provides the default memory allocation and deallocation strategy. If you examine code for a container class, such as std::list, you'll see the default std::allocator template argument. In this case, allocator<_Ty> is template class handling allocation duties for _Ty objects.

template<class _Ty, class _Ax = allocator<_Ty> >
class list : public _List_val<_Ty, _Ax>
{
    // ...
}

Since the template parameter _Ax defaults to allocator<_Ty> you can create a list object without manually specifying an allocator. Declaring myList as shown below creates an allocator class for allocating/deallocating int values. 

std::list<int> myList;

The STL containers rely upon dynamic memory for elements and nodes. An element is the size of an inserted object. In this case, sizeof(int) is the memory required to store one list element. A node is an internal structure required to bind the elements together. For std::list, it is a doubly-linked list storing, at a minimum, pointers to the next and previous nodes. 

Of course, the element size varies depending on the objects stored. However, the node size varies too depending on the container used. A std::list node may have a different size than a std::map node. Because of this, a STL-allocator must be able to handle different sized memory block requests. 

An STL-allocator must adhere to specific interface requirements. This isn't an article on the how's and why's of the std::allocator API – there are many online references that explain this better than I. Instead, I'll focus on where to place the memory allocation/deallocation calls within an existing STL-allocator class interface and provide new "x" versions of all common containers to simplify usage. 

xallocator    

Most of the heavy lifting for the new fixed block STL allocator comes from the underlying xallocator as described within my article "Replace malloc/free with a Fast Fixed Block Memory Allocator". As the title states, this module replaces malloc/free with new fixed block xmalloc/xfree versions. 

To the user, these replacement functions operate in the same way as the standard CRT versions except for the fixed block feature. In short, xallocator has two modes of operation: static pools, where all memory is obtained from pre-declared static memory, or heap blocks, where blocks are obtained from the global heap but recycled for later use when freed. See the aforementioned article for implementation details. 

stl_allocator

The class stl_allocator is the fixed block STL-compatible implementation. This class is used as an alternative to std::allocator

#ifndef _STL_ALLOCATOR_H
#define _STL_ALLOCATOR_H

// @see https://github.com/endurodave/stl_allocator
// David Lafreniere

#include "xallocator.h"
#include <memory>  // For std::allocator and std::allocator_traits

// Forward declaration for stl_allocator<void>
template <typename T>
class stl_allocator;

// Specialization for `void`, but we no longer need to define `pointer` and `const_pointer`
template <>
class stl_allocator<void> 
{
public:
	typedef void value_type;

	template <class U>
	struct rebind { typedef stl_allocator<U> other; };
};

// Define the custom stl_allocator inheriting from std::allocator
template <typename T>
class stl_allocator : public std::allocator<T> 
{
public:
	typedef size_t size_type;
	typedef ptrdiff_t difference_type;
	typedef T* pointer;
	typedef const T* const_pointer;
	typedef T& reference;
	typedef const T& const_reference;
	typedef T value_type;

	// Default constructor
	stl_allocator() {}

	// Copy constructor
	template <class U>
	stl_allocator(const stl_allocator<U>&) {}

	// Rebind struct
	template <class U>
	struct rebind { typedef stl_allocator<U> other; };

	// Override allocate method to use custom allocation function
	pointer allocate(size_type n, typename std::allocator_traits<stl_allocator<void>>::const_pointer hint = 0) 
	{
		return static_cast<pointer>(xmalloc(n * sizeof(T)));
	}

	// Override deallocate method to use custom deallocation function
	void deallocate(pointer p, size_type n) 
	{
		xfree(p);
	}

	// You can inherit other methods like construct and destroy from std::allocator
};

// Comparison operators for compatibility
template <typename T, typename U>
inline bool operator==(const stl_allocator<T>&, const stl_allocator<U>) { return true; }

template <typename T, typename U>
inline bool operator!=(const stl_allocator<T>&, const stl_allocator<U>) { return false; }

#endif 

The code is really just a standard std::allocator interface. There are many examples online. The source attached to this article has been used on many different compilers (GCC, Keil, VisualStudio). The thing we're interested in is where to tap into the interface with our own memory manager. The methods of interest are:

  • allocate()
  • deallocate()

allocate() allocates n number of instances of object type T. xmalloc() is used to obtained memory from the fixed block memory pool as opposed to the global heap.

pointer allocate(size_type n, stl_allocator<void>::const_pointer hint = 0)
{
    return static_cast<pointer>(xmalloc(n*sizeof(T)));
}

deallocate() deallocates a memory block previously allocated using allocate(). A simple call to xfree() routes the request to our memory manager. 

void deallocate(pointer p, size_type n)
{
    xfree(p);
}

Really, that's all there is to it once you have an fixed block memory manager. xallocator is designed to handle any size memory request. Therefore, no matter storage size called for by the C++ Standard Library, elements or nodes, xmalloc/xfree processes the memory request.

Of course, stl_allocator is a template class. Notice that the fixed block allocation duties are delegated to non-template functions xmalloc() and xfree(). This makes the instantiated code for each instance as small as possible. 

"x" Containers

The following STL container classes use fixed sized memory blocks that never change in size while the container is being utilized. The number of heap element/node blocks goes up and down, but block sizes are constant for a given container instantiation.

  • std::list
  • std::map
  • std::multipmap
  • std::set
  • std::multiset
  • std::queue

To make using stl_allocator a bit easier, new classes were created for many of the standard container types. Each new container just inherits from the standard library counterpart and is pre-pended with "x". 

  • xlist
  • xmap
  • xmultimap
  • xset
  • xmultiset
  • xqueue

The following code shows the complete xlist implementation. Notice xlist just inherits from std::list, but the key difference is the _Ax template argument defaults to stl_allocator and not std::allocator

#ifndef _XLIST_H
#define _XLIST_H

#include "stl_allocator.h"
#include <list>

// xlist uses a fix-block memory allocator
template <typename T, typename Alloc = stl_allocator<T>>
using xlist = std::list<T, Alloc>;

#endif

Each of the “x” versions of the STL container is used just like the std version except all allocations are handled by stl_allocator. For instance:

#include  "xlist.h"

xlist<xstring> myList;
myList.push_back("hello world");

The following container types use variable sized memory blocks from the heap that expand or contract in size during operation. 

  • std::string
  • std::vector

A corresponding xvector wasn't implemented because vectors require a contiguous memory region whereas the other container types are node based. A fixed block allocator works well with equally sized blocks, but not so well if a vector continually expands to a larger and larger single block. While stl_allocator would work with a vector, it could be misused and cause runtime problems with the fixed block memory manager. 

A std::string also varies its requested memory size during execution, but typically, strings aren't used in an unbounded fashion. Meaning, in most cases you're not trying to store a 100K-byte string, whereas a vector you might actually do that. Therefore, xstring's are available as explained next. 

"x" Strings

New x-version of the standard and wide versions of the string classes are created.

  • xstring
  • xwstring

Here, we just typedef a new x-version using stl_allocator as the default allocator of char and wchar_t types:

#ifndef _XSTRING_H
#define _XSTRING_H

#include "stl_allocator.h"
#include <string>

typedef std::basic_string<char, std::char_traits<char>, stl_allocator<char> > xstring;
typedef std::basic_string<wchar_t, std::char_traits<wchar_t>, stl_allocator<wchar_t> > xwstring;

#endif 

Usage of an xstring is just like any other std::string

#include "xstring.h"

xwstring s = L"This is ";
s += L"my string.";

"x" Streams

The iostreams C++ Standard Library offers powerful and easy-to-use string formatting by way of the stream classes. 

  • std::stringstream
  • std::ostringstream
  • std::wstringstream
  • std::wostringstream

Like the container classes, iostreams makes use of the custom STL allocator instead of the global heap using these new definitions.

  • xstringstream
  • xostringstream
  • xwstringstream
  • xwostringstream

Again, just typedef new x-versions with the default stl_allocator template arguments makes it easy.

#ifndef _XSSTREAM_H
#define _XSSTREAM_H

#include "stl_allocator.h"
#include <sstream>

typedef std::basic_stringstream<char, std::char_traits<char>, 
stl_allocator<char> > xstringstream;
typedef std::basic_ostringstream<char, std::char_traits<char>, 
stl_allocator<char> > xostringstream;

typedef std::basic_stringstream<wchar_t, std::char_traits<wchar_t>, 
stl_allocator<wchar_t> > xwstringstream;
typedef std::basic_ostringstream<wchar_t, std::char_traits<wchar_t>, 
stl_allocator<wchar_t> > xwostringstream;

#endif

Now, using an xstringstream is a snap.

xstringstream myStringStream;
myStringStream << "hello world " << 2016 << ends;

Benchmarking

In my previous allocator articles, simple tests show the fixed block allocator is faster than the global heap. Now let's check the stl_allocator enabled containers to see they compare against std::allocator on a Windows PC. 

All tests were built with Visual Studio 2008 and maximum speed compiler optimizations enabled. The xallocator is running in heap blocks mode where blocks are initially obtained from the heap but recycled during deallocations. The debug heap is also disabled by setting the Debugging > Environment _NO_DEBUG_HEAP=1. The debug heap is considerably slower because of the extra safety checks. 

The benchmark tests are simplistic and stress insertion/removal of 10000 elements. Each test is run three times. The insert/remove is where the STL library relies upon dynamic storage and that's the focus of these tests, not the underlying algorithmic performance. 

The code snippet below is the std::list test. The other container tests are similarly basic. 

list<int> myList;
for (int i=0; i<MAX_BENCHMARK; i++)
    myList.push_back(123);
myList.clear();

The performance difference between std::allocator and stl_allocator running in heap block mode is shown below: 
 

Container Mode Run Benchmark Time (mS)
std::list Global Heap 1 60
std::list Global Heap 2 32
std::list Global Heap 3 23
xlist Heap Blocks 1 19
xlist Heap Blocks 2 11
xlist Heap Blocks 3 11
std::map Global Heap 1 37
std::map Global Heap 2 34
std::map Global Heap 3 41
xmap Heap Blocks 1 38
xmap Heap Blocks 2 25
xmap Heap Blocks 3 25
std::string Global Heap 1 171
std::string Global Heap 2 146
std::string Global Heap 3 95
xstring Heap Blocks 1 57
xstring Heap Blocks 2 36
xstring Heap Blocks 3 40

The test results shows that, for this benchmark, the stl_allocator enabled versions are approximately 2 to 3 times faster than std::allocator. Now this doesn't mean that STL is now magically overall twice as fast. Again, this benchmark is just concentrating on insertion and removal performance. The underlying container algorithmic performance hasn't changed. Therefore, the overall improvement seen will vary depending on how often your application inserts/removes from containers. 

The stl_allocator operates in fixed time if xallocator is used in static pool mode. If xallocator heap blocks mode is used, the execution time is constant once the free list is populated with blocks obtained from the heap. Notice that the xlist benchmark above has an initial execution time of 19mS, yet the subsequent executions are 11mS each. On the first run, xallocator had to obtain fresh blocks from the global heap using operator new. On runs 2 and 3, the blocks already existed within the xallocator free-list so the heap isn't relied upon making successive runs much faster. 

The global heap has unpredictable execution time performance when the heap fragments. Since xallocator only allocates blocks and recycles them for later use the execution time is more predictable and consistent. 

Game developers go to great lengths to solve heap fragmentation and its myriad of associated problems. The Electronic Arts Standard Template Library (EASTL) white paper goes into detail about the problems with STL and specifically the std::allocator. Paul states in the EASTL -- Electronic Arts Standard Template Library document:

Quote:

Among game developers the most fundamental weakness is the std allocator design, and it is this weakness that was the largest contributing factor to the creation of EASTL.

While I don't write code for games, I can see how the lag associated with fragmentation and erratic allocation/deallocation times would effect game play. 

Reference Articles

Benefits

STL is an extremely useful C++ library. Too often through, for medical devices I can't use it due to the possibility of a heap fragmentation memory fault. This leads to rolling your own custom container classes for each project instead of using an existing, well-tested library. 

My goal with stl_allocator was eliminating memory faults; however, the fixed block allocator does offer faster and more consistent execution times whereas the heap does not. The heap implementation and level of fragmentation leads to unpredictable execution times. Depending on your application, this may or may not be an issue.

As shown in this article, simply employing an STL-compatible fixed block allocator opens up the possibility of using C++ Standard Template Library features on a project that otherwise may not have been possible. 

About

STL std::allocator Fixed Block Memory Allocator

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published