Move semantics is faster than copy semantics, when the compiler can replace expensive copy operations by cheaper move operations, that is, when it can replace a deep copy of a big object by a shallow copy of the pointer to the big object. Hence, classes using the pimpl idiom in combination with move semantics should see a considerable speed-up. As Qt applies the pimpl idiom consistently to every non-trivial Qt class, we should see a speed-up by simply using Qt classes instead of their STL counterparts. I’ll compare the performance of classes that use move semantics with Qt and STL classes with and without applying the pimpl idiom.
A Class Using Move Semantics and Pimpl Idiom
We apply the pimpl idiom to the class CTeam
from my post Performance Gains Through C++11 Move Semantics.
// cteam.h #ifndef CTEAM_H #define CTEAM_H #include <memory> class CTeam { public: ~CTeam(); // dtor CTeam(); // default ctor CTeam(const std::string &n, int p, int gd); // name ctor CTeam(const CTeam &t); // copy ctor CTeam &operator=(const CTeam &t); // copy assign CTeam(CTeam &&t); // move ctor CTeam &operator=(CTeam &&t); // move assign std::string name() const; int points() const; int goalDifference() const; private: struct Impl; std::unique_ptr<Impl> m_impl; }; #endif // CTEAM_H
The public interface of CTeam
is the same as before. We replaced the private data members by a unique pointer and moved them into the private implementation class CTeam::Impl
. Declaration and definition of CTeam::Impl
are located in the source file cteam.cpp
. This is one of the big advantages of the pimpl idiom: Header files don’t contain any implementation details. Hence, we can change the implementation of our pimpled class without changing the interface (see the post Pimp my Pimpl by Marc Mutz for more advantages of the pimpl idiom).
// cteam.cpp #include ... using namespace std; struct CTeam::Impl { ~Impl() = default; Impl(const std::string &n, int p, int gd); Impl(const Impl &t) = default; Impl &operator=(const Impl &t) = default; std::string m_name; int m_points; int m_goalDifference; static constexpr int statisticsSize = 100; std::vectorm_statistics; }; CTeam::Impl::Impl(const std::string &n, int p, int gd) : m_name(n) , m_points(p) , m_goalDifference(gd) { m_statistics.reserve(statisticsSize); srand(p); for (int i = 0; i < statisticsSize; ++i) { m_statistics[i] = static_cast (rand() % 10000) / 100.0; } }
Note how the C++11 keyword default
saves us from spelling out the trivial implementation of the destructor, copy constructor and copy assignment operator of the implementation class CTeam::Impl
. We must only write the code for the special name constructor. The rest is generated by the compiler.
We will use CTeam::Impl
to implement the constructors and assignment operators of the client-facing class CTeam
.
// cteam.cpp (continued) CTeam::~CTeam() = default; CTeam::CTeam() : CTeam("", 0, 0) {} CTeam::CTeam(const std::string &n, int p, int gd) : m_impl(new Impl(n, p, gd)) {} CTeam::CTeam(const CTeam &t) : m_impl(new Impl(*t.m_impl)) {} CTeam &CTeam::operator=(const CTeam &t) { *m_impl = *t.m_impl; return *this; } CTeam::CTeam(CTeam &&t) = default; CTeam &CTeam::operator=(CTeam &&t) = default; std::string CTeam::name() const { return m_impl ? m_impl->m_name : ""; }
We let the compiler generate the destructor. The default constructor delegates to the name constructor. The name constructor creates an object Team::Impl
with the given arguments. This is all as expected.
The copy constructor and assignment must perform a deep copy. The compiler-generated versions would simply copy the unique pointer m_impl
, that is, perform a shallow copy. As this is wrong, we must write the code for the copy constructor and assignment ourselves. The code simply uses the copy constructor and assignment of the implementation class.
A shallow copy is basically what we want for the move constructor and assignment. The default implementations simply copy the unique pointer m_impl
(shallow copy) and set m_impl
to nullptr
in the source of the move operation. The move operation transfers the ownership of the Impl
object from the source to the target CTeam
object. This behaviour is exactly implemented by the class std::unique_ptr
, which supports moving but not copying.
As the implemenation pointer m_impl
can be null, functions like CTeam::name
should check the validity of the pointer before they use it.
The Benchmarks
We use the benchmarks ShuffleAndSort and PushBack as shown in the post Performance Gains Through C++11 Move Semantics. We don’t use the benchmark EmplaceBack, because Qt 5.7 (the latest Qt version at the time of this writing) does not support emplace operations on Qt containers.
I ran different experiments, which I mark with the following labels.
- C++98 – Built example code with C++98 compiler
- C++11 – Built example code with C++11 compiler
- Copy – Class
CTeam
has only copy but no move operations - Move – Class
CTeam
has both copy and move operations - STL – Used std::string and std::vector in example code
- Qt – Used QString and QVector in example code
- Pimpl – Used pimpl idiom for class
CTeam
- Opt – Used lambdas for sort and C++11’s random number generation
We measured the performance of each experiment by the number of read instructions counted by callgrind. As relative performance is more telling than absolute numbers of read instructions, we take C++11/Move as the reference point with value 1.000.
Here are the results.
Experiment | ShuffleAndSort | PushBack |
---|---|---|
C++98/STL/Copy | 1.693 | 1.006 |
C++98/Qt/Copy | 1.335 | 1.048 |
C++11/STL/Move | 1.000 | 1.000 |
C++11/Qt/Move | 0.773 | 1.049 |
C++11/STL/Move/Pimpl | 0.730 | 1.011 |
C++11/Qt/Move/Pimpl | 0.724 | 1.071 |
C++11/STL/Move/Pimpl/Opt | 0.597 | 0.308 |
C++11/Qt/Move/Pimpl/Opt | 0.589 | 0.399 |
C++11/STL/Move/Opt | 0.867 | 0.296 |
C++11/Qt/Move/Opt | 0.638 | 0.378 |
For the ShuffleAndSort benchmark, the Qt experiments (green) are consistently faster – by a factor between 1.01 and 1.36 – than the STL experiments (red). The reason is simple. Qt has always used the pimpl idiom for its non-trivial classes like QVector
and QString
. Copying one of Qt’s implicitly shared classes means copying the pointer to the implementation. The class using pimpl performs a shallow copy instead of a deep copy. This is the situation when move semantcis has a performance advantage over copy semantics.
But using the pimpl idiom all the time comes at a cost. Whenever we create an object using pimpl, we create the “interface” object (e.g., CTeam
), which in turn creates the “implementation” object (e.g., CTeam::Impl
) dynamically on the heap. This is why the Qt experiments are consistently slower – by a factor between 1.04 and 1.30 – than the STL experiments for the PushBack benchmark. The overhead of pimpl shows whenever the code calls a custom constructor (e.g., the name constructor of CTeam
), copy constructor or copy assignment operator, that is, whenever the code performs a deep copy.
The picture is pretty much the same if we only look at the STL experiments. For ShuffleAndSort, the STL experiments with pimpl are always faster than the ones without pimpl. For PushBack, the situation reverses. STL experiments with pimpl are always slower than the ones without pimpl.
The ShuffleAndSort benchmark is a best case for move semantics and pimpl. It performs 20 copy operations at the beginning to fill the vector of teams. Then, it moves teams 810,000 times while shuffling and sorting. Similarly, the PushBack benchmark is a worst case for move semantics. It calls each of the name constructor, the move constructor and the destructor of CTeam
100,000 times. Calling the name constructor, which creates the implementation object dynamically on the heap, clearly dominates the execution time.
When we compare the experiments using pimpl with those not using pimpl (C++11/STL/Move vs. C++11/STL/Move/Pimpl, C++11/STL/Move/Opt vs. C++11/STL/Move/Pimpl/Opt), we see a speed-up of factor 1.370 to 1.452 for ShuffleAndSort and a slow-down of factor 1.011 to 1.041 for PushBack. The speed-up from using move semantics and pimpl is an order of magnitude more than the slow-down caused by the pimpl overhead. If our code leans more towards ShuffleAndSort, where shallow copies dominate deep copies, our code will most likely see an overall speed-up from using move semantics in combination with the pimpl idiom.
Fortunately, shallow copies dominate deep copies in most cases in real code. This observation was essential when the Qt project decided in its very beginning to use the pimpl idiom for all its non-trivial classes.
If we compare the overhead of using the pimpl idiom between STL and Qt experiments (C++11/STL/Move/Pimpl vs. C++11/Qt/Move/Pimpl, C++11/STL/Move/Pimpl/Opt vs. C++11/Qt/Move/Pimpl/Opt), the following picture emerges. For PushBack, Qt is 1.06 to 1.30 times slower than STL. The reason is that the pure Qt version of CTeam
uses pimpl for the string m_name
and the vector of doubles m_statistics
. For ShuffleAndSort, Qt is only marginally faster (factor: 1.008 – 1.014) than pure C++11/STL. This small speed-up may well be eaten up by the bigger slow-down caused by the pimpl overhead.
In the pre-C++11 times, using Qt classes gave us a speed advantage over STL classes most of the times. Things have changed with the advent of C++11. STL classes are now on par with Qt classes – thanks to the combination of move semantics and the pimpl idiom. A pure C++11 implementation gives us better control when to use the pimpl idiom and when not. With Qt, we always have to use it – no matter whether it yields a speed-up or not.
Conclusion
Move semantics gives us a speed-up over copy semantics, when the compiler can replace expensive copy operations by cheaper move operations. So, combining move semantics with the pimpl idiom should be a great fit, as the pimpl idiom replaces expensive deep copies of big objects by much cheaper shallow copies of pointers to these big objects. Our results corroborate this. We see a speed-up by factor 2.319 for the ShuffleAndSort benchmark by just using move semantics and the pimpl idiom. Using the pimpl idiom doesn’t come for free, because we must create the pointed-to object dynamically on the heap in an extra step. The PushBack benchmark shows that using pimpl can slow down things by a factor of 1.005.
The ShuffleAndSort benchmark is sort of a best case for the pimpl idiom, because almost all operations are move operations (shuffling and sorting). The PushBack benchmark is pretty much the opposite, because it doesn’t move anything. It only copies. Real code falls between these two extremes, but with a clear tendency to be closer to the ShuffleAndSort extreme. For these cases, we’ll see a speed-up because the speed-up from moving instead of copying is much bigger than the slow-down caused by the pimpl overhead.
This reasoning most likely made it easy for the Qt developers to use the pimpl idiom for every non-trivial Qt class. Using the pimpl idiom yields a runtime speed-up most of the time – in addition to providing stable interface (binary compatability!) and fast builds. Qt is considerably faster (factor: ~1.25) than pure C++, when move semantics is not available for a class. So, Qt is a good choice for all pre-C++11 compilers (e.g., C++98, C++03). This advantage melts away once move semantics and the pimpl idiom enter the picture – with C++11. Even for the best case scenario of the ShuffleAndSort benchmark, Qt is only marginally faster than pure C++ (factor: ~1.01). This slight advantage may easily be eaten up by the pimpl overhead, where Qt is considerably slower than pure C++ (factor: ~1.17).
The take-away from this post is. In most cases, we’ll see a speed-up from combining C++11’s move semantics with the pimpl idiom. C++11’s new unique_ptr
makes it easy to implement the pimpl idiom. Using Qt classes instead of their STL counterparts (e.g., QVector and QString instead of std::vector and std::string) doesn’t give us any advantages over the combination of move semantics and the pimpl idiom. Qt may even be at a slight disadvantage, because our code incurs the pimpl overhead with every occurrence of a Qt class and not only when we explicitly decide to use the pimpl idiom.
The (forward declaration of) struct Impl should be made public to facilitate deriving from it. A polymorphic implementation is a truly powerful idiom!
Of course the m_impl pointer itself should remain private.
Risto, can you give any example of why that would be useful?