Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Bubble Sort: An Archaeological Algorithmic Analysis (duke.edu)
64 points by DyslexicAtheist on Feb 5, 2019 | hide | past | favorite | 19 comments


I have an amusing archaeological anecdote about the bubble sort.

Around 1990, on my first real job out of college, I was assigned about 100K lines of C code to maintain. I was told that it had problems scaling up to larger networks (it was a network mapping system).

While going through the code, I discovered that it contained a bubble sort that had been commented out. The comments said that it had been replaced with hand-rolled sorting logic in 68K assembly for "additional speed."

So I found the assembly code for the new sorting logic. It was also a bubble sort!

In disbelief, I replaced all this nonsense with a call to the standard library's `qsort` function. The scaling problems disappeared :-)


Saved this quote to show to my future data structures students. Thanks.


I remember when I first took a CS class, I spent an hour in the library attempting to bubble-sort microfiches, much to the annoyance of my supervising librarian, who recommended a radix sort in very strong terms. Unfortunately I had more trust in my incorrect understanding of my CS textbook than I had in her, though she was correct.

The article unfortunately omits actual code for insertion sort and selection sort, except the "jump-down" variant of selection sort it mentions early on.

Bentley gives the "simplest insertion sort" as follows (translated to modern C; the version in the book is pseudocode):

    for (int i = 1; i < n; i++)
        for (int j = i; j > 0 && x[j-1] > x[j]; j--)
            swap(j-1, j);
I think that variant is only faster than bubble sort because it stops iterating once the newly inserted item has reached its final resting place, on average halfway through the sorted part of the list, so it only does about n/4 comparisons and swaps per outer-loop iteration, rather than bubble sort's n or n/2. A better approach — usually the fastest sort for short arrays — stores the element being inserted in an auxiliary variable until its final resting place is found, thus avoiding almost half the work of swapping. Bentley reports from the previous millennium that this is a speedup of about a factor of four:

    for (int i = 1; i < n; i++) {
        data t = x[i];
        int j;
        for (j = i; j > 0 && x[j-1] > t; j--)
            x[j] = x[j-1];
        x[j] = t;
    }
A similar optimization is available for selection sort, which is otherwise equivalent to bubble sort, necessarily examining as it does almost n/2 elements on average to produce each output element.

There's an even simpler single-loop sort algorithm (I forget what it's called) which is unfortunately O(N³):

    for (int i = 1; i < n; i++)
        if (x[i-1] > x[i])
            swap(i-1, i), i = 1;
That won't work in Python, of course, but opinions may vary on whether we should be sorry or glad about that.


>Bentley gives the "simplest insertion sort" as follows

Is that from one of his books (e.g. Programming Pearls, More Programming Pearls, Writing Efficient Programs), or from some paper he published? I've read parts of the first two books and most of the third book, and all are great.

Shaker sort is an interesting variation of bubble sort (and maybe also selection sort, according to its Wikipedia page) that sorts in both directions.

https://en.wikipedia.org/wiki/Cocktail_shaker_sort

Only slightly more complex than bubble sort, which is itself simple.


Sorry, I meant Programming Pearls, 2nd Edition, Column 11 (Sorting), §11.1.

Shaker sort is potentially interesting for cases where you're adding a few out-of-order items to an otherwise sorted set, but I think insertion sort still beats it even in that case.


Thanks. Yes, I didn't mean that shaker sort is good for speed, just interesting for the approach.

Also, bubble sort, though one of the slowest, has this property:

https://en.wikipedia.org/wiki/Bubble_sort#Use

[ In computer graphics bubble sort is popular for its capability to detect a very small error (like swap of just two elements) in almost-sorted arrays and fix it with just linear complexity (2n). ]


Insertion sort has that capability too, without bubble sort's drawbacks.


Interesting, didn't know. Will look again at its Wikipedia page. Had looked recently, may not have noticed that.

Apropos of insertion sort, I think I read (it was years ago) in the Jon Bentley book (which I think I mentioned upthread), called Writing Efficient Programs, that he said insertion sort is used by some people in quicksort, as the alternative algorithm, when the sizes of the subarrays remaining to be sorted (recursively) are down a small size, in which case insertion sort has less overhead than quicksort, so is faster for those small chunks. (Or it could have been selection sort (instead of insertion sort.) In other words, a hybrid sort algorithm.


It's actually slightly cleverer than that.

When quicksort recurses into a subarray that is less than the critical size, (call it k) it does not switch to a simpler O(n^2) sort: it simply returns, leaving the subarray unsorted. The result is that the final array is mostly sorted, where no element is more than k distance from its true position. Once the array is mostly sorted, it then runs insertion sort, which is fast on mostly sorted arrays. Despite the fact that in the general case insertion sort is O(n^2), it is O(nk) on this mostly sorted array.

Selection sort performs no better on mostly sorted arrays than insertion sort. Which is why insertion sort is used instead of selection sort.


Wow, that is clever. Not doing more than you need to - for the same effect - kind of thing. Thanks for sharing.

I've been playing around with various simple algorithms for different tasks, over a period, and been noticing that there is a lot of scope for improving efficiency, even if by small amounts, sometimes by a bit more, by making various tweaks to the logic, often including early termination techniques.

A well-known example of early termination is: to find if a number n is prime, instead of checking whether it is divisible by all the integers from 2 to n - 1, you can just check for those up to the square root of n. (There are more optimizations on those lines possible, of course.)

I've found that there can be many quick wins in this area, often in various algorithms that process arrays. For example, problems involving arrays on Rosetta Code.


> I think that variant is only faster than bubble sort because it stops iterating once the newly inserted item has reached its final resting place,

It's actually faster because of a subtler reason: branch prediction.

In insertion sort, the x[j-1] > x[j] comparison is almost always true. Insertion sort performs that comparison O(n^2) times, and it is false only n-1 times. So the branch predictor quickly identifies that it should just assume the comparison is true.


I just realized the dumbsort algorithm at the bottom has an error; it should read:

    for (int i = 1; i < n; i++)
        if (x[i-1] > x[i])
            swap(i-1, i), i = 0;


How can a single-loop algorithm be O(N^3)?


Try it.


I discovered sorting algorithms (including bubblesort) from SORTDEMO.BAS -- https://www.youtube.com/watch?v=leNaS9eJWqo -- which came with QBasic.


I remember it's the first sorting algorithm taught in my high school computer science class.


It probably was in mine too, for sure it's the one I remembered the most. I also remember trying Shell sort but the only thing I remember about that one is the name. Merge sort is the one I've gotten the most mileage out of.


I swear if <insert language> C's <insert relevant language library> stdlib didn't come with a <insert Quicksort function and or method> qsort() I would have shipped a lot more code with a hand-rolled bubble sort implementation of my own lazy design....





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: