Everyday Computing and the Internet -- HKU CCST9003 Common Core

In order to make informed decisions in this information age, everyone needs to have an efficient way to sift through and evaluate the myriads of information that is available through the internet. The ultimate objective of this course (HKU CCST9003) is to help students develop a “computational” state of mind for everyday events. We will also discuss intensively the societal impacts of computing technologies on our daily life.

Thursday, November 24, 2011

Analysis on Angry Bird

Group 17
Chiu Kin Kwan, Yu Sheung Hung, Cheung Pik Ying, Jackson Cheung
HKU CCST9003 FA11

Analysis on Angry Bird

View more presentations from leonlei.

Wednesday, November 23, 2011

Presentation: Facebook: Technologies and Societal Impact

Ho Ka Chun Kelvin, Li Ho Hin, Kelvin, Lo Ka Ming Terrence, Sze Chi Chun, Carrie, Wong Ka Yan, Karen HKU CCST9003 FA11

Facebook: Technologies and Societal Impact

View more presentations from leonlei.

Tuesday, November 22, 2011

Presentation: Operating Systems

Group 7
Wong Ching Yat, Yau Cheuk Hang, Yu Miaoxia, Pau Wing Hong, Wong Lok Kwan
HKU CCST9003 FA11

Operating Systems

View more presentations from leonlei.

Monday, November 14, 2011

Security and cryptography

Comment 1:

Security is a “system” concept.

Follow-up:
Yes it is very important for us (the users) to understand this so that we will not have a false sense of security when we are “educated” that our data are encrypted. Now you know that data encryption is just part of the whole process. Anything goes wrong in other parts of the system, security cannot be promised.

Comment 2:

HTTPS protocol?

Follow-up:
This is the so-called “secure” version of the HTTP protocol. Basically, this protocol transports
encrypted data instead of sending data in plaintext. The data is usually encrypted using a symmetric key system for which the shared key has to be agreed using a public key approach. Please refer to Problem 3 of Tutorial 5 for the design of such a key set-up protocol.

Comment 3:

Stealing bank account information from the Internet?

Follow-up:
Yes whether you like it or not, this kind of things are believed to be happening all the time! The thing is it is not very difficult to identify a “weakest” link in the system (e.g., a particular e-commerce Web site). It is widely believed that after such a system is broken, the hacker will not just use the bank account information (e.g., for buying things) but he/she will hold the bank and/or the e-commerce Web site for ransoms.

Comment 4:

What is symmetric key cryptography?

Follow-up:
Symmetric key system has always been the most important way for data confidentiality, despite that public key system is shown to be more versatile and “strong”. The reason is that symmetric key algorithms are usually much much faster than the public key algorithms. In a typical symmetric key system, a shared key has to be agreed upon through some means (see Comment 2 above). Then, the communicating parties will use the shared key for doing encryption/decryption.

Comment 5:

Are there any more sophisticated cryptography techniques?

Follow-up:
One of the most notable sophisticated cryptography techniques is the elliptic curve cryptography,
which is based on yet another branch of mathematics (also related to number theory) to perform
encryption and decryption.

Comment 6:

Public key cryptography. RSA algorithm?

Follow-up:
We have already worked extensively on this in Tutorial 5.

Comment 7:

Difference between public key and symmetric key cryptography.

Follow-up:
The most important difference is, NOT the strength, but the way keys are distributed/shared.

Saturday, November 12, 2011

Greedy algorithm, Google Map vs Map Quest, WiFi vs 3G

Comment 1:

Are there other algorithms to find the shortest path that use recursion or dynamic programming?

Follow-up:

Yes there is one called Floyd-Warshall algorithm that uses dynamic programming to find all-pairs
shortest paths. This algorithm can also handle graphs with negative link weights.

Comment 2:

What happens in the shortest path greedy algorithm (i.e., Dijkstra) when two distances surrounding the node are equal?

Follow-up:

Good observation. We will usually just “randomly” choose one to break the tie.

Comment 3:

Any technical and systematic ways to calculate the time-complexity of an algorithm?

Follow-up:

Yes sure. For more complicated situation, we usually end up having a bunch of summations of series in the counting of number of key steps. Then, we will need to use some mathematical tools to obtain closed-form expressions. These are computer science topics, though.

Comment 5:

If 3G and Wi-Fi on my smartphone are similar things, why do I notice a significant difference between the speed of loading the same page on 3G and Wi-Fi?

Follow-up:

3G and Wi-Fi are similar in that they are both wireless communication technologies. But the similarity ends there. The wireless communication mechanisms used are highly different in many aspects. For example, 3G is based on cellular communication which is designed for longer range and thus, speed is lower (as electromagnetic signals deteriorate significantly over a distance). Wi-Fi is of a much shorter range and can therefore afford to provide a higher speed. There are many other technical differences, which are topics of a wireless communication and networking course.

Comment 6:

The “shortest path algorithm” can be demo-ed on 9 slides with animation instead of using just one slide. It is a bit too small.

Follow-up:

Thanks for the comment! You are right! Will improve this. Sorry for the inconvenience.

Comment 7:

How come MapQuest/Google-Map is so fast on a map that has billions of nodes?

Follow-up:

One trick is that MapQuest/Google-Map generally does not do the computations “on-demand”, i.e., they pre-computed many routes which are then stored in the database. When someone posts a query, majority of the routes are pulled out from the database.

Comment 9:

How to resolve the conflict when there is negative weight in the graph when using Dijkstra’s
algorithm?

Follow-up:

Dijkstra’s algorithm fails when there is negative weight in the graph. We will need to use a different technique, e.g., the Bellman-Ford algorithm. See also Comment 1 above.

Comment 10:

You have mentioned that researchers in the field of computer try to giving everything an unique ID (a unique IP address), for example, a microwave oven. However, I don’t really understand why we are going to do so. What are the purposes? If that applies widely, will there be any privacy problems?

Follow-up:

Yes sure I believe there will be significant privacy problems! Maybe you can consider using this as your survey topic?

Comment 11:

How do we obtain (e + n log n) for Dijkstra’s algorithm?

Follow-up:

A rough sketch as follows: We need to examine all the links (during the updating of the estimated distances labelled on the nodes) so that is why we have the e term. We need to sort the nodes in an increasing order of estimated distances and that is why we have the n log n term.

Comment 12:

Why greedy approach usually results in a fast algorithm?

Follow-up:

This is because as we make a greedy choice in each step, we reduce the problem size by 1. Thus, after making n greedy choices (i.e., n steps), we will finish the problem. Consequently, we usually end up having an algorithm that takes O(n) time, together with the time needed in some pre-processing (e.g., sorting, which takes another n log n time).

Comment 13:

Knapsack problem. It can e applied to daily life. Is it similar to linear programming taught in Maths at cert level?

Follow-up:

Yes it is similar. Wait until Tutorial 3 to find out more details.

Comment 14:

Dynamic programming is a bit difficult to understand. Also the DNA sequence example.

Follow-up:

The theme of Tutorial 3 is about dynamic programming. Hope you would feel better after going
through this tutorial cycle. The DNA sequence example is interesting and I think you can understand at least 70% of it by studying the Web page mentioned in class.

Comment 15:

I am a bit confused of the graph that represent the different representation of running time that results in different value.

Follow-up:

I think you are talking about the “estimated distances” labelled on the nodes in the example graph for showing how Dijkstra’s algorithm works. Those values are NOT running time (of an algorithm). Those are used for representing, for example, the time it takes to travel from one place to another geographically (when we use the graph to represent a map).

Comment 16:

Why “99” is used to represent the information that has not been processed?

Follow-up:

Yes in computer programming we usually use such tricks to ease processing.

Comment 18:

Google-Maps are more developed than MapQuest these days. For example, it can alert that friends are located close by to you.

Follow-up:

Thanks!

Comment 21:

I am not quite clear about how two determinant factors will affect the efficiency of greedy approach.

Follow-up:

The main factors are the “greedy choices”. If you pick the right greedy choice at each step, the
algorithm would end up giving “optimal” result. In terms of speed, please refer to Comment 12 above.

Friday, November 11, 2011

Brief introduction about Wolfram Alpha

Pau Wing Hong

Wolfram|Alpha (also known as Wolfram Alpha) is more than a search engine like Google. Instead, it is an answer-engine/computational engine developed by Wolfram research. Traditional search engines like Google and yahoo, which are only capable of providing a list of links to information, don’t answer questions. They only take your keywords at face value and don’t always yield good results. What really makes Wolfram Alpha shines is that it can compute, just like a calculator. It computes solutions and responses from a structured knowledge database.

Since the day it starts, the Wolfram Alpha Knowledge Engine contains 50,000 types of algorithms and models and over 10 trillion pieces of data. It is still in development, and is always accumulating new information to its database. As Wolfram|Alpha is running on 10,000 CPUs with Mathematica running in the background, it is capable of answering complicated mathematical questions.

The service is built on four basic pillars: a massive amount of data, a computational engine built on top of Mathematica, a system for understanding queries and technology to display results in interesting ways. Wolfram Alpha is also able to answer fact based questions such as “When did Steve Jobs die?” It displays its response as date, time difference from today and Anniversaries for October 5, 2011.

There are a number of things that Wolfram Alpha vastly different from Google. First of all, it is capable of answering complex queries. If complex search queries are typed into Google, it will get confused. This is because it can’t compute, unlike Wolfram Alpha. Just like a calculator, it does not care at all how many arguments are given to it. That’s why concatenating many arguments in a query often works extremely well. Apart from it, the answers and calculation from Wolfram Alpha is very accurate and precise. There is no need to worry about the validity of the information. Thirdly, two sets of data can be compared with graphs easily using Wolfram Alpha in which Google cannot.

Nevertheless, Wolfram Alpha does have its limitations. Since its answers are based on its own software and knowledge database. Wolfram Alpha can only answer a fact based question that has a specific answer. So it is not able to answer open ended questions like “Is Wolfram Alpha better than Google?”

As written in its main page, Wolfram Alpha’s goal is to bring deep, broad, expert-level knowledge accessible to anyone, anywhere, anytime. Clearly, the “Google Killer” is quite ambitious. However, in my opinion, Wolfram Alpha is not a typical search engine in essence. Therefore it is not a Google Killer as people might say, but it can be considered as a giant calculating encyclopaedia of statistics and facts. I think the site poses more of a threat to sites like Wikipedia.

Thursday, November 10, 2011

Recursion, randomization, sorting and computations

Comment 1:

Divide-and-conquer vs. recursion?

Follow-up:

Divide-and-conquer is a general technique in which we divide the problem into small parts and then solve the smallers independently. Recursion is closely related to divide-and-conquer in that it is usually the most concise way to express a divide-and-conquer idea. However, a divide-and-conquer idea does not always need to be realized by using recursion. Indeed, sometimes, we would like to avoid recursion because it can be very slow, as you have seen (or will see) in Tutorial 1.

Comment 2:

Randomization?

Follow-up:

Let us consider a simple problem in order to illustrate the usefulness of randomization. This problem is about several important but related concepts: worst-case analysis, average-case

analysis, and probabilistic analysis. Consider the following “hiring” algorithm:

1) Set candidate best to be unknown;

2) For each of the n candidates, do the following:

3) Interview candidate i ;

4) If candidate i is better than candidate best, then hire candidate i and set best to be i ;

Assume that interviewing each candidate has a cost of c In and hiring a candidate has a cost of c H

(where c H > c In under normal circumstances).

(a)

Can you give the worst case total cost of the above hiring algorithm?

(b)

Assume that the candidates come to the interview in a random order, i.e., each candidate is

??? equally likely to be the best. Specifically, candidate i has a probability of -- to be the best

among the first i candidates. Can you give the average case total cost of the above hiring

algorithm?

Hint: You can consider to use the “indicator random variable” X i , which is equal to 1 if

candidate i is hired and 0 otherwise. Hence, the average number of candidates that are actually

hired is equal to the “expected value” (i.e., the average) of ∑ X i .

Answers:

(a)

The worst case is that every interviewed candidate is hired. Thus, the total cost is: c In n + c H n .

(b)

In this average case, the only change to the total cost is the hiring part. So let’s focus just on this

part. Specifically, as given in the Hint, the average number of candidates that will be hired is:

??? which in turn is equal to: ∑ -- . A good bound for this sum is: log n . Thus, the average

case hiring cost is just c H log n , which is much smaller than the worst case’s hiring cost.

The lesson we learn is that average case is sometimes much better than the worst case.

As you can see, it is helpful to assume that all permutations of the input are equally likely so that a probabilistic analysis can be used. Now, here is the power of randomization—instead of assuming a distribution of inputs (i.e., the candidates), we impose a distribution. In particular, before running the algorithm, we randomly permute the candidates in order to enforce the property that every permutation is equally likely. This modification does not change our expectation of hiring a new person roughly log n times. It means, however, that for any input we expect this to be the case, rather than for inputs drawn from a particular distribution.

Comment 3:

Quantum computing? Parallel processing? Biological computing?

Follow-up:

These are really exortic computing models that we will elaborate on in the later part of the course. Please be patient. Thanks!

Comment 4:

There are a lot of different mathematical ways of documenting/calculating numbers/codings. How come only a few can be applied to computing processing algorithms?

Follow-up:

Good point. But please note that not many mathematical ways of calculations can be realized, in a mechanical manner, using a computing procedure (i.e., to be carried out by a computer). For instance, think about integration in calculus, there are many integration problems that need very good “inspection” or insight to solve. Agree?

Comment 5:

Insertion sort?

Follow-up:

Please find the sketch of a computing procedure using insertion sort below.

(1)Given a list of numbers A[1], A[2], ..., A[n]

(2)for i = 2 to n do:

(3)move A[i] forward to the position j <= i such that

(4)A[i] < A[k] for j <= k < i, and

(5)either A[i] >= A[j-1] or j = 1

Now, it is not difficult to see that the number of checkings/swappings in lines (3) to (5) above cannot be larger than i . Thus, the total number of steps, i.e., the estimated running time, would be ??? , i.e., on ??? the order of n .

Comment 6:

Quicksort? Randomization?

Follow-up:

The Quicksort algorithm looks very similar to the algorithm that you have worked (will work) on in

Problem 3 of Tutorial 1 (about “searching”). So I leave this to you to write up the computing

procedure. You can also prove that the estimated running time is n log n .

On the other hand, I would like to supplement a bit more about the “randomization” part used in

Quicksort. Similar to the “hiring problem” in Comment 2 above, we need a certain “distribution” in the input list so as to realize the potential of Quicksort (or, divide-and-conquer, for that matter).

Specifically, in Quicksort, we would like to choose a pivot so that the resulting two partitions are of more or less equal size. It is reasonable to assume that if the list is somehow “totally random” (we will talk more about generating randomness later on), then it is likely that a randomly selected number from the list has a value right in the middle, i.e., it will divide the list into two equal halves. So just like the hiring problem, we will randomly shuffle the list before sorting and then, statistically, we would expect the list to be divided into equal halves when we partition it.

Comment 7:

P2P should be discussed/elaborated.

Follow-up:

We will spend some time discussing about P2P systems in later part of the course. Please be patient.

Thanks!

Comment 8:

We talked about our actions being monitored, even in P2P because we are accessing the Trackers. But what about the ISPs? They track everything we do. What about VPN (virtual private network)? Can it prevent ISPs from tracking us?

Follow-up:

Yes it is true that the ISPs are keeping track of our moves all the time. So when the law enforcement people need the information (with warrant), they will supply it. Even VPN (i.e., setting up the so-called private links, in the form of encrypted channels) cannot help because ultimately your IP address has to be revealed. Only the data can be encrypted. We will discuss more about Internet security and privacy in later part of the course.

Comment 9:

Feasibility of parallel processing? For example, in the Tower of Hanoi problem we are limited by the number of pegs and the rules of the game.

Follow-up:

Yes you are right. How to do things in parallel in a computer has been baffling researchers for decades.

We will discuss more about these difficulties later in the course.

Comment 10:

Isn’t it true that “recursion” is something just like mathematical induction?

Follow-up:

Yes you are absolutely right! Very good observation. Indeed, recursion, or even divide-and-conquer, is closely related to the “induction” concept. We try to “extrapolate” solutions of smaller problems to larger ones. That is the idea.

Comment 11:

The CPUs are evolving nowadays. Their computational speeds increase exponentially and this lowers the significance of the effectiveness of one algorithm to solving the problem as the CPUs can carry out the tasks equally fast and well. But still thinking of an effective algorithm is still challenging and worth continuing.

Follow-up:

Oh this one I cannot agree with you. Indeed, as you will find out in this course and we will also discuss in more detail soon, there are some problems that cannot be solved practically without a smart algorithm, even if you have thousands of processors at your service.