Wednesday 3 December 2014

Induction

Finally, we are introduced to Induction in this class. I don't really understand why we were not introduced to introduction when we were first doing proofs, given that it is so useful. Actually, induction seems so simple and straightforward that its power is sometimes underestimated at first. At least I was one of them. However, upon closer examination, I realized that induction actually proves some of our assumptions in our previous proofs. For example, one of our first proofs was "for all natural numbers n, if n is odd, then n^2 is odd." We proved that by first assuming that n is a natural number and n is odd. How do we know that if n is odd then n^2 is odd is true for all natural numbers? Do we plug every natural number in and check the result? We could do that until the end of time, and still be unable to get a conclusion. Therefore, we need a more powerful tool of logic to help us out.

Induction is really an axiom, and like all axioms, it is self evident. Lets break it down.

  • let P(n) be a predicate
  • if P(0) is true and for all natural numbers n, (P(n)=>P(n+1)) is true
  • then for all natural numbers n, P(n) is true!
So if P(0) is true, and P(0) => P(1) is true, then P(1)=> P(2) is true, and P(2) => P(3) is true... and so on...

Now lets prove the following proposition:
It's interesting to note, that I didn't really introduce anything new here... We were already given the equation sum of i = n(n+1)/2, and we simply proved that it's true. However, induction did not give us that equation nor did it give us the reasoning behind it. Is that satisfying?

Visualizing Big-O

When first introduced to the formal definition of Big-O, -Omega, and -Theta, it can be a little daunting to absorb everything and understand them. At least I admit that I was overwhelmed by all of the information. So how does one compartmentalize these definitions, and use them for algorithm analysis and proofs?

I don't think there is only one way, instead different people learn through different techniques. However, what works for me is to visualize the definitions in terms of graphs rather than focus on the definitions, which express the same information anyways. For Big-O, I imagine two curves that grow in different rates, and a point on the x-axis as the 'breakpoint', such that beyond that point, one curve is always slower at growth than the other. It goes to reason that that curve is in the Big-O of the faster growing curve.

Now that we understand Big-O, the scaffold is set, and we can easily understand Big-Omega, with just one minor alteration. Instead of saying that the slower curve is in Big-Omega, we say that the faster curve is in Big-Omega. That effectively makes the curve an upper-bound. Finally, Big-Theta is simply the conjunction between Big-O and Big-Omega.

As mentioned before, we use Big-O notation to help us explain the complexity of an algorithm. In reality, we want algorithms that run in O(n) or O(log n), and never O(n^2) or O(n^3), because the run-time would grow exponentially based on n.

Sunday 30 November 2014

Complexity and Big-O

Previously, we just talked about algorithm run-time, but we did not talk in detail about how run-time is determined. Run-time is stated as a function relating the input to the complexity of an algorithm. How do we determine complexity? We use Big-O notation.

Best and Worst

In terms of complexity, we must first account for best and worst cases. The best case is the case where the algorithm takes the minimum running time to complete over an input. For example, in linear search, if the first item in the input is the item we are searching for, then we have a best case. The worst case on the other hand is if the item we are searching for isn't even in the list, in which case the algorithm would take the maximum running time to complete. We typically care about the worst case scenario, because that is what cause algorithms to take a long time to run in the real world. Although it seems pessimistic to look at the worst case all the time, we should be aware that it happens often.

There is also an average-case, which is the expected running-time under random inputs following certain probability distributions. This is rather difficult to determine, because it depends on the probability distribution, which is based on what?

Big-O notation

What is this Big-O notation that we have been hearing about? When relating complexity and algorithm run-time, what we really care about is GROWTH. How does the run-time grow based on the input? Big-O notation is a method for characterizing that growth. Lets break it down.
  • running time is proportional to the number of steps as a function of input size (n). 
  • what we really care about is how this number GROWS as a function of n
  • we don't really care about the absolute number of steps, so 2000 and 2004 are the same.
  • if we compare two functions 2n2 and 100n2 we only care about the n2
  • so constant factors don't matter
  • what about 2n2+n and n2? we only care about n2 because when n is really big, 2n is negligible compared to n2
  • think about it like this: as input grows, do you care if an algorithm takes 100 years or 200 years to run? Not really... both are too long...
  • we end up using a model of asymptotic growth, which deals with how the complexity grows as you reach the limit of the size of the input
  • typically done using O(f(n)
So, if we see O(n), it means that the algorithm's running time grows no faster than n. More specifically, if we see f(n) in O(n2), it means that the function is f(n) upper-bound by n2.

There are many classes of functions. Such as linear, logarithmic, and exponential. Linear and log functions grow at a rate that is within reason in reality. If an algorithm's complexity grows faster than these, such as exponential, then its run-time grows very quickly. In real life, we don't want that!

Algorithm Run-time

From this point on, the course switches gears a little bit, and will begin to discuss algorithms, efficiency, and complexity. First lets talk a little bit about efficiency and orders of growth.

Although it may sound intuitive, but when first thinking about efficiency, one might ask "why is efficiency important?"
    The answer is that efficiency determines how long our programs takes to run, which is called run-time.

This is not to be confused with the time in seconds an algorithm takes to run. For instance, if one algorithm took 2 seconds to run, and another algorithm took 0.5 seconds to run, can you say that the second is faster than the first? No, because I did not specify many necessary factors that needs to be taken in account. For one, we need to think about the speed of the processor, as well as the implementation of Python. It also depends on the amount of data input, and in terms of extremely large data sets, some algorithms might take a millennium to run!

More importantly, efficiency is about algorithms, not about coding structure or clever coding.

So the way we normalize across these factors, is to talk about efficiency in terms of the steps an algorithm takes to produce a result based on an input. A step is both a physical thing and an abstract idea, in that we assume each step takes constant time.

Proof by Cases

Some proofs need to be broken down into different cases, through which the whole proof can be evaluated. This makes proving complex proofs easier. In fact, I think we already used this method for proving equivalence. In class, we showed that an implication "P implies Q" is equivalent to its contrapositive "not Q implies not P" by showing:

P Q P implies Q not Q implies not P
T T T T
T F F F
F T T T
F F T T
In each of the four cases above, we showed that P implies Q is true if and only if not Q implies not P is true. Even when P is True and Q is False, both the implication and its contrapositive are False, which means the equivalence still stand. Also note that proof by cases work in a much broader environment than propositions involving boolean variables.

Proofs and non-Boolean Functions, disproofs, and false proofs!

Non-Boolean Functions

In class, we discussed about non-boolean functions such as the floor function which takes in a number and returns the largest integer that is less than or equal to that number. The non-boolean functions can be used to set up more interesting predicates for evaluation. For example, prove that n2 + 3n + 127 is a prime number for all natural numbers n. Ok, lets take natural number 0, yes, 127 is a prime number. What about 1? If n = 1, then it equals to 131, yes that is a prime. 2? No...2 is not prime. So the proposition is incorrect. So for non-boolean functions, some input values would be ok, while others are not. The main takeaway is that just because a predicate seems to be true for some values, doesn't mean the whole proposition is true.

Disproving

Now, how do we disprove something? For example, how do we disprove the statement 2+2=5? We just need to prove that 2+2 does not equal to 5. So to disprove any statement S, we need to prove that not S is true. This is very useful, because we would not always get to prove true propositions, we would need to know how to disprove something that is false. The most important thing to notice is that the proposition is false, then we can negate it and start disproving. If we fail to notice that a proposition is false, we can try to prove it, but it wouldn't go anywhere. It is a very bad thing to prove something False to be True.

False Proofs

I think this is the funniest topic in this course. We must remember that the real world exists outside of the realm of mathematics, and there are many types of proofs out there, some of which do not actually prove anything! I think the most prevalent false proof out there is the proof by gut feeling. Our intuition is good for rough estimations and predictions based on bad evidence, which was useful in the wild, when we needed to escape predators lurking in the bushes, but not good for establishing truths.

Saturday 29 November 2014

Indirect Proofs - Proof by Contradiction

Proof by contradiction starts with an assumption that the proposition in question is false, followed by a series of logical deductions that lead to a contradiction of something that we all know to be true, then we can say that the proposition must be true. In simpler terms... if a proposition were false, then something false would be true, and since false cannot be true, our proposition better not be false.

This all sounds very abstract... so lets break it down.
Let P be a proposition
To prove P is true,
    we assume P is false (not P is True)
        this leads to some falsehood (such as 2 is not an even number)
    then not P -> falsehood is True (How? Remember truth tables?)
        the only way for this to work is if not P is false, which means P is true.
This is the same as proving the contrapositive of Truth -> P

Prove using Contrapositive

We already know that "If P implies Q" is logically equivalent to its contrapositive "Not Q implies not P". Therefore, the contrapositive would be equivalent to proving the original statement. In fact, this is sometimes easier to do, and should be used. In class, we used the example "Prove that if x2 is odd then x is odd for all natural numbers x". This becomes much simpler to prove if we prove the contrapositive. We just need to remember to indicate that we are proving the contrapositive, and state the contrapositive in the final prove.

For statements like "if x is odd then x2 is odd", the antecedent is equivalent to the consequence. To prove that two statements are equivalent, we would need to prove that each statement implies the other. First we prove P implies Q, then we prove Q implies P. Another way to prove equivalence is to use a chain of equivalence, whereby P is equivalent to a second statement which links to a chain of equivalent statements until Q.

Proofs

How do we communicate truth?

We sometimes need to convince other people that what we know or understand is true. How do we do that? We need proof.

Direct proof

A direct proof essentially show that there is a link between an assumption and the result. To form this link, we can use previously proven statements and axioms (e.g. DeMorgan's Law). Each assumption about a claim has its own scope. For instance, in the claim that for all elements of x in set D, x with the property 'P' implies x with property 'Q', we assume x is an element of D, whereby no x is a none-D, and P(x) is assumed true, then P(x) implies Q(x).

Axiomatic Method

Before diving straight into proofs, we should look back to the axiomatic method, a method of establishing truths in mathematics invented by Euclid in C. 300 BCE. He began with simple assumptions that are seemingly undeniable from logic, which are called axioms, and proved many propositions from those assumptions. His proofs consisted of a sequence of logical deductions from previous axioms and previously-proved statements, which finally concludes with the proposition in question. Theorems are important propositions proved through this deductive process.

Lets go back to the direct proofs. A direct proof can prove implications, which are in the form "if P, then Q". One way to prove an implication is to first assume P is true, then show Q is somehow linked to this assumption. Finding this link is not trivial. To visualize this, a Venn Diagram can be helpful. For example, to prove P(x) implies Q(x), we can look for results that P(x) implies, and results that imply Q(x). Now, we can visualize the results that P(x) implies as sets that contain P, while the results that imply Q(x) are contained by Q. Now all we need is to find a patch of containment from to Q.

One thing to note is that before a structured proof can be written, one should first do some scratch-work while one figures out the logical links. Just keep in mind to keep the scratch-work separate from the final proof.

Monday 13 October 2014

The Paper Folding Challenge

The Challenge: 

Take a strip of paper, hold it lengthwise, such that the longer side is horizontal, then fold it in half from left to right. Repeat this several times, each time folding from left to right, and halving the strip. You will reach a point where you can no longer physically fold the paper in half any more... 7 is the popular theoretical paper folding limit. Now, unfold the paper and you will realize that the creases sometimes point up and sometimes point down. Wouldn't it be nice if you were able to come up with a solution to predict the result of each fold without actually folding it? This will allow us to fold past the theoretical paper folding limit with our mind!

Understand the problem:

Okay, first, I immediately realized that this is a pattern recognition problem. Essentially, if I can figure out the pattern for which the paper's creases come to be as result of each fold,  I should be able to use that pattern to predict the next fold... Right?

Devise a Plan:

My plan was to first just take a piece of paper, fold it in half, observe, record the results, and repeat. Hold on, maybe there are some obvious facts we can take in account first before we dive into this problem. First, we know in our head that by making a fold, the paper is divided into two equal halves, and a second fold would in turn divide each of those halves into two equal halves.

Carry out the Plan:

After some folding, I realized the following:
  • for 2 folds and up, the first crease is always up, while the last crease is always down
  • starting from both ends and moving inwards, the each set of creases are alternates to each other
  • each fold  does not affect the creases created by the previous fold
  • each fold adds alternative up/down creases to the pre-existing creases, such that the first crease is always fronted by an up crease, and followed by a down crease, then each subsequent crease is followed by alternating ups and downs

Look back:

Here is the pattern for folds 1 to 6:
d
udd
uuddudd
uuduuddduuddudd
uuduudduuudduddduuduuddduuddudd
uuduudduuuddudduuuduuddduudduddduuduudduuudduddduuduuddduuddudd

How to do this in Python:

Add code below:

Implication VS Conjunction, Truth Tables, and More!

This post covers both week 3 and week 4's materials.

The Difference between Conjunction and Implication

This took some thinking, and I had some trouble understanding the implicit difference between the two. First, lets see how they are similar. Both implication and conjunction express elements with two properties. In implication for example, if P then Q, means P must be Q. In conjunction, P and Q means that an element is both P and Q. However, the difference is that while implication is one sentence, conjunction deals with two. Another difference is in how their negations are expressed. First, remember that negation simply inverts the truth value of a sentence; that is, for the negation to be True, the sentence has to be False. Now, since the implication can only be false when the assumption is True and the conclusion is False, the negation of the implication is exactly that. On the other hand, the conjunction is False, if one of the sentences is false, or both are false, so it can be written as not P or not Q.

All You Need to Know About Implications

An implication contains statements and conditions; it says that assuming if something is true, then consequently something else is true. An interesting thing about implications is that the only way to falsify it is to show that the assumption is true while the consequence is false. All other configurations of truth/false can verify the implication. So even when the assumption is false, and the consequence is true, the implication as a whole is still true. Actually, as I have stated before, even if the BOTH the assumption and the consequence are false, the implication is STILL true.

Another interesting thing is that knowing the implication means you automatically know its contrapositive. Huh? What's that? The contrapositive is simply the negation of the consequence implying the negation of the assumption. In other words, reverse the implication and apply negation to both sides. A Venn diagram should suffice as a visual explanation. What's more is that the contrapositive of the contrapositive is the original implication. That means the implication and the contrapositive are equivalent!!

Truth Tables

Now that we know about implication and conjunction... and how to verify/falsify them, we can try to verify more complex claims. I think truth tables are the most efficient tools for these kinds of exercises. To set up a truth table, simply count how many predicates you have, then calculate the base 2 to the power of that number, which becomes how many difference truth configurations for your predicates. For example, if you have 3 predicates, then the number of truth configurations is 2^3, which is 8.

DeMorgan's Law

Knowing how truth tables work helped me understand how DeMorgan's Law works. First, a quick summary of DeMorgan's Law:


  • negation (S1 and S2) is equivalent to (negation of S1 or negation of S2)
  • negation (S1 or S2) is equivalent to (negation of S1 and negation of S2)
Another way to understand these two laws is through Venn diagrams.
Finally, notice that DeMorgan's Law are named as such because they are universally True. In the truth table, this is represented as a column containing all True's. This means that every instance of universal truths can be represented by a truth table where every truth configuration leads to the final evaluation of the claim to be True.

Tuesday 30 September 2014

0

Week 1 to 2

What is CSC165?

The course is officially called Mathematical Expression and Reasoning for Computer Science, but what does that mean and what is that all about? By taking this course, we should be able to find the answer.

To keep track of my participation in this course, I will be keeping a course log/blog, which is what you are reading right now. This first one is going to be rather boring, since it is filled with introductory and background info, but they are necessary.

First, what is the relationship between math and computer science?

This is THE question to ask when thinking about how computers are used. There are many applications of mathematics in computers, such as algorithms, databases, computer graphics, artificial intelligence, programming languages, networking, and cryptography, which all use some math principles such as probability, set theory, combinatorics, and logic in one way or another. Perhaps we should try to think about: what CAN a computer do if we did not discover math beforehand? Let's ponder about that for a while... I guess it can be used as a decoration or art piece, to showcase the endless unrealized possibilities of human engineering... Now you can see that computer science and math are really two sides of the same coin.

How does one talk to a computer?

By talk, I really mean communicate, and by communicate, I really mean command. Yes, computers are our slaves.

I believe in the near future, we will be able to communicate to computers powered by artificial intelligence using natural language. However, computers are currently still poor at processing human's natural language, look at how constrained albeit powerful Google's search engine is after all these years of R&D. The reason for this is that, in our natural language, the meanings of words sometimes depends on the overall context of the sentence or subject at hand, which can be ambiguous, and requires an understanding or some assumptions of the culture behind the speakers to interpret. Computers lack culture, or rather, we just didn't program it into them? In reality, computers speak in binary. That's just the way they are "wired". Therefore, we communicate with computers through programming languages, which are technical, precise, and constrained languages. Programming languages act as an intermediary between binary and human languages.

Set Theory

Here we get to the meat of week 1's material... There are lots of math concepts and strict notations that go along with set theory. However, the most basic idea is that a set is a collection of things, and includes nothing. This means that empty sets exists, and every set with "things" in them also contain an empty set!