Tuesday, 28 July 2015

The difference between Efficiency and Performance - It's not all about the Big O

As a software developer you should understand how to make the best use of your hardware in order to give the user the best experience possible with your program.  In order to do this you need to know how to write efficient code and how to write performant code.

(I was watching this cppCon video given by Chandler Carruth which helped crystallise these concepts for me.)

Efficiency - the amount of work you need to do.
Performance -  how fast you can do that work

Efficiency - governed by your algorithm
Performance - governed by your data structures.

Efficiency and performance are not dependant on one another.

It follows you can be efficient without being performant, you do very little work to get the job done but do it slowly e.g. You are a snail and need to travel 10m from point X to Y. You take the most direct route but slither along extremely slowly.

It also follows that you can be performant without being efficient, you do a lot of work to get the the job done but do it very fast e.g. You are superman and need travel 10m from X to Y. You fly right round the globe to get there but go very fast.

What we are of course aiming for is to be efficient and performant at the same time!

The efficiency of our algorithms are usually measured by Big O notation which measures how an algorithm scales with input size. We are taught in computer science classes to aim for the Holy Grail of O(1) and get scared when we see O(n2) or worse.

What we focus on less is how the algorithm actually performs.  One of the biggest issues contributing to performance is how data is stored in the data structure. Generally speaking we want data that is used together to be stored close together in memory so that we don't end up with CPU cache misses.  (See The First Rule of Performance Optimisation where I touch on this issue). If we access data in a predictable manner the hardware will pre-fetch data for us into the cache which can be a huge time saving over randomly accessing data from all over memory and pulling it into the caches on a 'as it is needed' basis.

One of the worst culprits is the good old LinkedList.  It has O(n) for access and search and coupled with O(1) for insertion and deletion, might be considered a very good data structure to use in a program.  Whilst of course I'm not writing off this important data structure, the problem is that the linked nodes are not held together in memory and it is therefore not the most performant data structure to traverse as you cannot help but end up with many expensive cache misses.

Perhaps the best example of the difference and trade-off between efficiency and performance is looking at sort algorithms. 

We are taught that bubble sort is to be avoided at all costs. After all it is O(n^2).  We should rather go for binary sort which comes in at much more acceptable O(n log(n)).

However, to be efficient you sometimes have to pay an overhead penalty which slows you down in the short term. For example if you had to edit a bunch of files you might do them manually, one by one. Alternatively you could spend the time creating a script to do the edits.  At some point the extra work of editing manually would be slower than to take the hit of creating the script. 

Bubble sort is performant, it does its work very quickly, it doesn't pay the penalty that binary sort pays to be more efficient for larger volumes.  So as long as we keep the numbers low (below about 60 items) bubble sort will outperform any other type of search.

In fact if you look at the implementation of java.util.Array.sort() you will find in the class java.util.DualPivotQuicksort a number of sorting strategies which are triggered at different thresholds. Again efficiency vs performance.

/** * If the length of an array to be sorted is less than this * constant, Quicksort is used in preference to merge sort. */private static final int QUICKSORT_THRESHOLD = 286;

/** * If the length of an array to be sorted is less than this * constant, insertion sort is used in preference to Quicksort. */private static final int INSERTION_SORT_THRESHOLD = 47;

/** * If the length of a byte array to be sorted is greater than this * constant, counting sort is used in preference to insertion sort. */private static final int COUNTING_SORT_THRESHOLD_FOR_BYTE = 29;

I wrote a quick JMH benchmark which demonstrates this nicely.

With 5000 elements binary sort with its greater efficiency is much faster:

Benchmark                     Mode  Cnt    Score    Error  Units
Sorter.binarySearchForValue  thrpt    5  915.362 ± 51.438  ops/s
Sorter.bubbleSort            thrpt    5  171.021 ±  9.818  ops/s

With 50 elements bubble sort with it's greater performance is faster:

Benchmark                     Mode  Cnt       Score      Error  Units
Sorter.binarySearchForValue  thrpt    5   90011.861 ± 2051.384  ops/s
Sorter.bubbleSort            thrpt    5  114931.583 ± 6423.543  ops/s


Whilst Big O notation is incredibly useful for predicting the efficiency (amount of work) of an algorithm, we must also pay attention to the performance of the data structures on which those algorithms operate.  In certain situations a less efficient but more performant algorithm may lead to greater speed overall. 

No comments:

Post a Comment