Sunday 30 November 2014

Complexity and Big-O

Previously, we just talked about algorithm run-time, but we did not talk in detail about how run-time is determined. Run-time is stated as a function relating the input to the complexity of an algorithm. How do we determine complexity? We use Big-O notation.

Best and Worst

In terms of complexity, we must first account for best and worst cases. The best case is the case where the algorithm takes the minimum running time to complete over an input. For example, in linear search, if the first item in the input is the item we are searching for, then we have a best case. The worst case on the other hand is if the item we are searching for isn't even in the list, in which case the algorithm would take the maximum running time to complete. We typically care about the worst case scenario, because that is what cause algorithms to take a long time to run in the real world. Although it seems pessimistic to look at the worst case all the time, we should be aware that it happens often.

There is also an average-case, which is the expected running-time under random inputs following certain probability distributions. This is rather difficult to determine, because it depends on the probability distribution, which is based on what?

Big-O notation

What is this Big-O notation that we have been hearing about? When relating complexity and algorithm run-time, what we really care about is GROWTH. How does the run-time grow based on the input? Big-O notation is a method for characterizing that growth. Lets break it down.
  • running time is proportional to the number of steps as a function of input size (n). 
  • what we really care about is how this number GROWS as a function of n
  • we don't really care about the absolute number of steps, so 2000 and 2004 are the same.
  • if we compare two functions 2n2 and 100n2 we only care about the n2
  • so constant factors don't matter
  • what about 2n2+n and n2? we only care about n2 because when n is really big, 2n is negligible compared to n2
  • think about it like this: as input grows, do you care if an algorithm takes 100 years or 200 years to run? Not really... both are too long...
  • we end up using a model of asymptotic growth, which deals with how the complexity grows as you reach the limit of the size of the input
  • typically done using O(f(n)
So, if we see O(n), it means that the algorithm's running time grows no faster than n. More specifically, if we see f(n) in O(n2), it means that the function is f(n) upper-bound by n2.

There are many classes of functions. Such as linear, logarithmic, and exponential. Linear and log functions grow at a rate that is within reason in reality. If an algorithm's complexity grows faster than these, such as exponential, then its run-time grows very quickly. In real life, we don't want that!

No comments:

Post a Comment