Protect against heap fragmentation faults and improve execution speed with a fixed block alternative to STL std::allocator.
- A Custom STL std::allocator Replacement Improves Performance
- Table of Contents
- Preface
- Introduction
- std::allocator
- xallocator
- stl_allocator
- Benchmarking
- Reference Articles
- Benefits
Originally published on CodeProject at: A Custom STL std::allocator Replacement Improves Performance
CMake is used to create the build files. CMake is free and open-source software. Windows, Linux and other toolchains are supported. See the CMakeLists.txt file for more information.
This is my third and final article concerning fixed block memory allocators here on Code Project. This time, we'll create an alternate C++ Standard Library std::allocator
memory manager using the groundwork laid by the first two articles.
The Standard Template Library (STL) is powerful C++ software library including container and iterator support. The problem with using the library for mission-critical or time-critical projects isn't with STL itself – the library is very robust. Instead, it's with the unrestricted use of the global heap. The standard STL allocator uses the heap extensively during operation. This is a problem on resource constrained embedded systems. Embedded devices can be expected to operate for months or years where a fault caused by a fragmented heap must be prevented. Fortunately, STL provides a means to replace std::allocator
with one of our own design.
Fixed block memory allocators are a common technique to provide dynamic-like operation yet the memory is provided from memory pools where the blocks doled are of a fixed size. Unlike the global heap, which is expected to universally operate on blocks of any size, a fixed block allocator can be tailored for a narrowly defined purpose. Fixed block allocators can also offer consistent execution times whereas the global heap cannot offer such guarantees.
This article describes an STL-compatible allocator implementation that relies upon a fixed block allocator for dispensing and reclaiming memory. The new allocator prevents faults caused by a fragmented heap and offer consistent allocation/deallocation execution times.
The STL std::allocator
class provides the default memory allocation and deallocation strategy. If you examine code for a container class, such as std::list
, you'll see the default std::allocator
template argument. In this case, allocator<_Ty>
is template class handling allocation duties for _Ty
objects.
template<class _Ty, class _Ax = allocator<_Ty> >
class list : public _List_val<_Ty, _Ax>
{
// ...
}
Since the template parameter _Ax
defaults to allocator<_Ty
> you can create a list object without manually specifying an allocator. Declaring myList
as shown below creates an allocator class for allocating/deallocating int
values.
std::list<int> myList;
The STL containers rely upon dynamic memory for elements and nodes. An element is the size of an inserted object. In this case, sizeof(int)
is the memory required to store one list element. A node is an internal structure required to bind the elements together. For std::list
, it is a doubly-linked list storing, at a minimum, pointers to the next and previous nodes.
Of course, the element size varies depending on the objects stored. However, the node size varies too depending on the container used. A std::list
node may have a different size than a std::map
node. Because of this, a STL-allocator must be able to handle different sized memory block requests.
An STL-allocator must adhere to specific interface requirements. This isn't an article on the how's and why's of the std::allocator
API – there are many online references that explain this better than I. Instead, I'll focus on where to place the memory allocation/deallocation calls within an existing STL-allocator class interface and provide new "x" versions of all common containers to simplify usage.
Most of the heavy lifting for the new fixed block STL allocator comes from the underlying xallocator
as described within my article "Replace malloc/free with a Fast Fixed Block Memory Allocator". As the title states, this module replaces malloc
/free
with new fixed block xmalloc
/xfree
versions.
To the user, these replacement functions operate in the same way as the standard CRT versions except for the fixed block feature. In short, xallocator
has two modes of operation: static pools, where all memory is obtained from pre-declared static memory, or heap blocks, where blocks are obtained from the global heap but recycled for later use when freed. See the aforementioned article for implementation details.
The class stl_allocator
is the fixed block STL-compatible implementation. This class is used as an alternative to std::allocator
.
#ifndef _STL_ALLOCATOR_H
#define _STL_ALLOCATOR_H
// @see https://github.com/endurodave/stl_allocator
// David Lafreniere
#include "xallocator.h"
#include <memory> // For std::allocator and std::allocator_traits
// Forward declaration for stl_allocator<void>
template <typename T>
class stl_allocator;
// Specialization for `void`, but we no longer need to define `pointer` and `const_pointer`
template <>
class stl_allocator<void>
{
public:
typedef void value_type;
template <class U>
struct rebind { typedef stl_allocator<U> other; };
};
// Define the custom stl_allocator inheriting from std::allocator
template <typename T>
class stl_allocator : public std::allocator<T>
{
public:
typedef size_t size_type;
typedef ptrdiff_t difference_type;
typedef T* pointer;
typedef const T* const_pointer;
typedef T& reference;
typedef const T& const_reference;
typedef T value_type;
// Default constructor
stl_allocator() {}
// Copy constructor
template <class U>
stl_allocator(const stl_allocator<U>&) {}
// Rebind struct
template <class U>
struct rebind { typedef stl_allocator<U> other; };
// Override allocate method to use custom allocation function
pointer allocate(size_type n, typename std::allocator_traits<stl_allocator<void>>::const_pointer hint = 0)
{
return static_cast<pointer>(xmalloc(n * sizeof(T)));
}
// Override deallocate method to use custom deallocation function
void deallocate(pointer p, size_type n)
{
xfree(p);
}
// You can inherit other methods like construct and destroy from std::allocator
};
// Comparison operators for compatibility
template <typename T, typename U>
inline bool operator==(const stl_allocator<T>&, const stl_allocator<U>) { return true; }
template <typename T, typename U>
inline bool operator!=(const stl_allocator<T>&, const stl_allocator<U>) { return false; }
#endif
The code is really just a standard std::allocator
interface. There are many examples online. The source attached to this article has been used on many different compilers (GCC, Keil, VisualStudio). The thing we're interested in is where to tap into the interface with our own memory manager. The methods of interest are:
allocate()
deallocate()
allocate()
allocates n
number of instances of object type T
. xmalloc()
is used to obtained memory from the fixed block memory pool as opposed to the global heap.
pointer allocate(size_type n, stl_allocator<void>::const_pointer hint = 0)
{
return static_cast<pointer>(xmalloc(n*sizeof(T)));
}
deallocate()
deallocates a memory block previously allocated using allocate()
. A simple call to xfree()
routes the request to our memory manager.
void deallocate(pointer p, size_type n)
{
xfree(p);
}
Really, that's all there is to it once you have an fixed block memory manager. xallocator
is designed to handle any size memory request. Therefore, no matter storage size called for by the C++ Standard Library, elements or nodes, xmalloc
/xfree
processes the memory request.
Of course, stl_allocator
is a template class. Notice that the fixed block allocation duties are delegated to non-template functions xmalloc()
and xfree()
. This makes the instantiated code for each instance as small as possible.
The following STL container classes use fixed sized memory blocks that never change in size while the container is being utilized. The number of heap element/node blocks goes up and down, but block sizes are constant for a given container instantiation.
std::list
std::map
std::multipmap
std::set
std::multiset
std::queue
To make using stl_allocator
a bit easier, new classes were created for many of the standard container types. Each new container just inherits from the standard library counterpart and is pre-pended with "x
".
xlist
xmap
xmultimap
xset
xmultiset
xqueue
The following code shows the complete xlist
implementation. Notice xlist
just inherits from std::list
, but the key difference is the _Ax
template argument defaults to stl_allocator
and not std::allocator
.
#ifndef _XLIST_H
#define _XLIST_H
#include "stl_allocator.h"
#include <list>
// xlist uses a fix-block memory allocator
template <typename T, typename Alloc = stl_allocator<T>>
using xlist = std::list<T, Alloc>;
#endif
Each of the “x
” versions of the STL container is used just like the std
version except all allocations are handled by stl_allocator
. For instance:
#include "xlist.h"
xlist<xstring> myList;
myList.push_back("hello world");
The following container types use variable sized memory blocks from the heap that expand or contract in size during operation.
std::string
std::vector
A corresponding xvector
wasn't implemented because vectors require a contiguous memory region whereas the other container types are node based. A fixed block allocator works well with equally sized blocks, but not so well if a vector
continually expands to a larger and larger single block. While stl_allocator
would work with a vector
, it could be misused and cause runtime problems with the fixed block memory manager.
A std::string
also varies its requested memory size during execution, but typically, strings aren't used in an unbounded fashion. Meaning, in most cases you're not trying to store a 100K-byte string, whereas a vector
you might actually do that. Therefore, xstring
's are available as explained next.
New x
-version of the standard and wide versions of the string
classes are created.
xstring
xwstring
Here, we just typedef
a new x-version using stl_allocator
as the default allocator of char
and wchar_t
types:
#ifndef _XSTRING_H
#define _XSTRING_H
#include "stl_allocator.h"
#include <string>
typedef std::basic_string<char, std::char_traits<char>, stl_allocator<char> > xstring;
typedef std::basic_string<wchar_t, std::char_traits<wchar_t>, stl_allocator<wchar_t> > xwstring;
#endif
Usage of an xstring
is just like any other std::string
.
#include "xstring.h"
xwstring s = L"This is ";
s += L"my string.";
The iostreams
C++ Standard Library offers powerful and easy-to-use string formatting by way of the stream
classes.
std::stringstream
std::ostringstream
std::wstringstream
std::wostringstream
Like the container classes, iostreams
makes use of the custom STL allocator instead of the global heap using these new definitions.
xstringstream
xostringstream
xwstringstream
xwostringstream
Again, just typedef
new x-versions with the default stl_allocator
template arguments makes it easy.
#ifndef _XSSTREAM_H
#define _XSSTREAM_H
#include "stl_allocator.h"
#include <sstream>
typedef std::basic_stringstream<char, std::char_traits<char>,
stl_allocator<char> > xstringstream;
typedef std::basic_ostringstream<char, std::char_traits<char>,
stl_allocator<char> > xostringstream;
typedef std::basic_stringstream<wchar_t, std::char_traits<wchar_t>,
stl_allocator<wchar_t> > xwstringstream;
typedef std::basic_ostringstream<wchar_t, std::char_traits<wchar_t>,
stl_allocator<wchar_t> > xwostringstream;
#endif
Now, using an xstringstream
is a snap.
xstringstream myStringStream;
myStringStream << "hello world " << 2016 << ends;
In my previous allocator articles, simple tests show the fixed block allocator is faster than the global heap. Now let's check the stl_allocator
enabled containers to see they compare against std::allocator
on a Windows PC.
All tests were built with Visual Studio 2008 and maximum speed compiler optimizations enabled. The xallocator
is running in heap blocks mode where blocks are initially obtained from the heap but recycled during deallocations. The debug heap is also disabled by setting the Debugging > Environment _NO_DEBUG_HEAP=1. The debug heap is considerably slower because of the extra safety checks.
The benchmark tests are simplistic and stress insertion/removal of 10000 elements. Each test is run three times. The insert/remove is where the STL library relies upon dynamic storage and that's the focus of these tests, not the underlying algorithmic performance.
The code snippet below is the std::list
test. The other container tests are similarly basic.
list<int> myList;
for (int i=0; i<MAX_BENCHMARK; i++)
myList.push_back(123);
myList.clear();
The performance difference between std::allocator
and stl_allocator
running in heap block mode is shown below:
Container | Mode | Run | Benchmark Time (mS) |
std::list |
Global Heap | 1 | 60 |
std::list |
Global Heap | 2 | 32 |
std::list |
Global Heap | 3 | 23 |
xlist |
Heap Blocks | 1 | 19 |
xlist |
Heap Blocks | 2 | 11 |
xlist |
Heap Blocks | 3 | 11 |
std::map |
Global Heap | 1 | 37 |
std::map |
Global Heap | 2 | 34 |
std::map |
Global Heap | 3 | 41 |
xmap |
Heap Blocks | 1 | 38 |
xmap |
Heap Blocks | 2 | 25 |
xmap |
Heap Blocks | 3 | 25 |
std::string |
Global Heap | 1 | 171 |
std::string |
Global Heap | 2 | 146 |
std::string |
Global Heap | 3 | 95 |
xstring |
Heap Blocks | 1 | 57 |
xstring |
Heap Blocks | 2 | 36 |
xstring |
Heap Blocks | 3 | 40 |
The test results shows that, for this benchmark, the stl_allocator
enabled versions are approximately 2 to 3 times faster than std::allocator
. Now this doesn't mean that STL is now magically overall twice as fast. Again, this benchmark is just concentrating on insertion and removal performance. The underlying container algorithmic performance hasn't changed. Therefore, the overall improvement seen will vary depending on how often your application inserts/removes from containers.
The stl_allocator
operates in fixed time if xallocator
is used in static pool mode. If xallocator
heap blocks mode is used, the execution time is constant once the free list is populated with blocks obtained from the heap. Notice that the xlist
benchmark above has an initial execution time of 19mS, yet the subsequent executions are 11mS each. On the first run, xallocator
had to obtain fresh blocks from the global heap using operator new
. On runs 2 and 3, the blocks already existed within the xallocator
free-list so the heap isn't relied upon making successive runs much faster.
The global heap has unpredictable execution time performance when the heap fragments. Since xallocator
only allocates blocks and recycles them for later use the execution time is more predictable and consistent.
Game developers go to great lengths to solve heap fragmentation and its myriad of associated problems. The Electronic Arts Standard Template Library (EASTL) white paper goes into detail about the problems with STL and specifically the std::allocator
. Paul states in the EASTL -- Electronic Arts Standard Template Library document:
Quote:Among game developers the most fundamental weakness is the std allocator design, and it is this weakness that was the largest contributing factor to the creation of EASTL.
While I don't write code for games, I can see how the lag associated with fragmentation and erratic allocation/deallocation times would effect game play.
- Replace malloc/free with a Fast Fixed Block Memory Allocator by David Lafreniere
- An Efficient C++ Fixed Block Memory Allocator by David Lafreniere
- EASTL -- Electronic Arts Standard Template Library by Paul Pedriana
STL is an extremely useful C++ library. Too often through, for medical devices I can't use it due to the possibility of a heap fragmentation memory fault. This leads to rolling your own custom container classes for each project instead of using an existing, well-tested library.
My goal with stl_allocator
was eliminating memory faults; however, the fixed block allocator does offer faster and more consistent execution times whereas the heap does not. The heap implementation and level of fragmentation leads to unpredictable execution times. Depending on your application, this may or may not be an issue.
As shown in this article, simply employing an STL-compatible fixed block allocator opens up the possibility of using C++ Standard Template Library features on a project that otherwise may not have been possible.