visit
arr = [5, 4, 3, 2, 1]
This means the array with length n will be sorted in n passes over the array. This also means that we can determine how many numbers are accurately placed in the array by keeping track of how many times the array has been sorted. Knowing this, we also don't need to go over the entire array each time; each pass doesn't need to go over all n elements, but n-(the number of times we've gone over the array), because all of the elements after that are already sorted. So, the algorithm can be broken down into the following two steps:
[5, 4, 3, 2, 1]
We have 5 numbers, so we do the above first loop 5 times.
JavaScript:
var arr = [5, 4, 3, 2, 1]
const bubble_sort = (function(unsorted){
for(var i=0; i<unsorted.length;i++){
// Do the second loop
}
})
Python:
arr = [5, 4, 3, 2, 1]
def bubble_sort(unsorted):
for i in range(len(unsorted)):
# Do the second loop
Setting up the next loop requires us to think for a moment about where we need to stop in the array before completing. If we have 5 items, the last item is located at
arr[4]
. The second loop has us comparing the number at the current index to the one at the next index, so we have to stop before we get to the end (as trying to access the element at
arr[5]
that doesn't exist will cause problems). In order to compare all of the items, we need to stop running the loop at the index before the last item we're going to touch. We also know the first time we run the loop (when i=0), none of the items will be definitely sorted, meaning we have to sweep through them all. The second time we run the loop, i=1, the very last item will be sorted, so we can ignore the very last item. Therefore, the inner loop needs to be run n-(1+i) times.
JavaScript:
var arr = [5, 4, 3, 2, 1]
const bubble_sort = (function(unsorted){
for(var i=0; i<unsorted.length;i++){
for(var j=0; j < unsorted.length-(1+i);j++){
// compare this number to the next one
}
}
})
Python:
arr = [5, 4, 3, 2, 1]
def bubble_sort(unsorted):
for i in range(len(unsorted)):
for j in (range(len(unsorted) - (i+1))):
# Do comparisons
Now, all we need to do is compare the left and right items at each j index, and swap them when appropriate. This can be done effectively with one line in python, but JavaScript will require us to store one of the values temporarily in a variable, copy one value into the other, then copy the stored variable. Our final basic functions look like this:
def bubble_sort(unsorted):
swapped = False
for i in range(len(unsorted)-1):
if unsorted[i] > unsorted[i+1]:
unsorted[i],unsorted[i+1] = unsorted[i+1],unsorted[i]
swapped = True
if swapped == True:
return bubble_sort(unsorted)
else:
return unsorted
This worked in a very similar way, just without the nested loops. It iterated over the entire array, until it was sorted. The advantage to doing this is you utilize the passes over the array to verify if the array is already sorted, and can stop while it's sorted, cutting down on the number of steps being taken. One of the disadvantage is that it doesn't keep track of how many times you've iterated through the array, thus it doesn't know how many items are in the right place and how many it can skip, which adds up and compounds over time to a large number of unnecessary executions. (Another disadvantage we won't dwell on too much is the recursive nature of an O(n^2) function, which will quickly grow the stack size too large and cause the program to hang or crash).
Optimizing it (although almost certainly an exercise in futility, as we'll observe below) can then be a simple question of integrating a missing advantage from one of the above functions into the other one. If we start with the function with the nested loops, we can keep track of each pass through, and if we stopped swapping numbers at any point, we'd know we have sorted the entire array and we can return what we have right now. Thus, our first completed (python) function can be optimized with this method to look like this:arr = [5, 4, 3, 2, 1]
def bubble_sort(unsorted):
for i in range(len(unsorted)):
swapped = False
for j in (range(len(unsorted) - (i+1))):
if unsorted[j] > unsorted[j+1]:
unsorted[j],unsorted[j+1] = unsorted[j+1],unsorted[j]
swapped = True
if swapped == False:
return unsorted
return unsorted
Now, optimizing the recursive bubble-sort function merely requires us to keep track of how many times it's been called, which can be easily done by adding a variable that gets passed to the function, and incremented by
1
each time the function gets called. Then, rather than using
range(len(unsorted)-1)
, we can turn the subtracted 1
into (1+the number of sorted items)
. The second function, optimized with the advantages of the first, looks like this:def bubble_sort(unsorted, last_sorted):
swapped = False
for i in range(len(unsorted)-(1+last_sorted)):
if unsorted[i] > unsorted[i+1]:
unsorted[i],unsorted[i+1] = unsorted[i+1],unsorted[i]
swapped = True
if swapped == True:
return bubble_sort(unsorted, last_sorted+1)
else:
return unsorted
bubble_sort(arr, 0)
So what is the Time Complexity of the Bubble Sort algorithm, and why have I been referencing how bad it is? Since it's usually described as being O(n^2), that means for an array of 8, we have to perform 64 steps, and for an array of 9, we'd have to perform 81 steps, right? Well, actually no. So if not, how many steps does it take?
Stack Overflow seems to have multiple similar-looking yet different equations as answers, why do some say n*(n-1), and others n*(n-1)/2 ? And, knowing this, why do we say it takes O(n^2) time?
Let's write out the most inefficient way of writing this algorithm, in order to start exploring the answers these questions.def bad_bubble_sort(unsorted):
for i in range(len(unsorted)):
for j in range(len(unsorted)-1):
if unsorted[j] > unsorted[j+1]:
unsorted[j],unsorted[j+1] = unsorted[j+1],unsorted[j]
return unsorted
This implementation does not check to see if the array is sorted, nor does this one know how many numbers have been sorted into place. This means it runs the maximum number of times, that it can, by design. The outer loop needs to sort n items, so it runs n times. In the inner loop, called by the outer one, each step of the loop takes the currently pointed to item in the array, and compares it to the next one.
This means all of them except for the last element come together to make a pair with the next item, which gives us n-1 pairs to compare in any array of length n. Thus, we can calculate the worst-implemented performance is actually accurately calculated as O(n*(n-1)). And this is close to n^2, but not quite exactly it.
So what about the first version that we made, that keeps track of the number of sorted items at the end? After all, the outer loop stays the same, but the inner loop gets smaller! To calculate that, we need to do a bit more math. For a number n, that is the length of the array, we know the first time we pass through the entire array we perform n-1 comparisons. The next time, we do n-2, then n-3...until we get down to 1.
So if we have 10 items, we do 9 comparisons, then 8, so that 9+8+7+6+5+4+3+2+1=45. This is a series sum can be described mathematically as (n*(n-1))/2. Plugging 10 into there, we get 10*(10-1)/2 = 10*9/2 = 90/2 = 45. ().
With these two functions, we can now compare the results they yield as the size of n grows larger, which is what Big-O notation is all about. In the terribly-implemented
bad_bubble_sort()
function, with it's actual complexity calculated as n*(n-1), would need to do 90 comparisons if passed a 10-item array. If we passed it 20 items, it would need to do 380 comparisons. If we passed it 5 items, it would need to do 20 comparisons. In fact, it's easy to see that going from 5 items to 10, doubling the input, makes 4.5 times the output. And when we go from 10 items to 20, we make 4.22 times the output. As we keep doubling the array size, we keep approaching creating 4 times the output, while always staying slightly above it. The regular
bubble_sort()
function works the same way. If we passed that function, with it's (n*(n-1))/2 calculated complexity, we can plug in 5 and get the result 10. We can plug in 10 items, and get the result 45. Once again, jumping from 5 items to 10 increased the output 4.5 times. This means we can observe both functions increasing their output in the exact same way, proportionally. So despite being different functions, their complexity grows at the exact same rate as the size of the input is increased.
We can also observe that as the size of the input for O(n*(n-1)) keeps being doubled, the size of the output tends to move closer towards quadrupling, or we can say as n increases by a factor of m, the output tends towards being m^2. We can easily verify this with large values: plugging in 1,000 gives us an output of 999,000. Tripling the input should create a nearly 9-fold increase in output, and plugging in 3,000 gives us 8,997,000, an increase of just over 9 times.
It is because of this, that we say Bubble Sort's complexity is O(n^2), when discussing it's performance. We aren't usually interested in actually calculating the number of steps it needs to take in order to perform the algorithm, but we want to be able to explain how the runtime grows, as the size of the input grows, and that is O(n^2).
While we've talked about average and worst-case complexities already, I'd like to briefly touch on the best-case complexity. The absolute best case for this is O(n), with an exact complexity of O(n-1). The reason for this lies in the optimization tactic that keeps track of whether or not we've performed any swaps. If we can get through the entire unsorted array without making a single swap, we know that we have a sorted array.
If passed an already-sorted array, and we've implemented this optimization, then only one pass has to be made over the array (making n-1 comparisons). Of course, how long that takes is determined entirely by the size of n, and thus we say it's best case performance is O(n). However, observing average and worst cases above, we can see that a bubble sort is not a good idea for lists of any appreciable size, and work best with lists expected to be already mostly sorted.