Everyday Computing and the Internet -- HKU CCST9003 Common Core

In order to make informed decisions in this information age, everyone needs to have an efficient way to sift through and evaluate the myriads of information that is available through the internet. The ultimate objective of this course (HKU CCST9003) is to help students develop a “computational” state of mind for everyday events. We will also discuss intensively the societal impacts of computing technologies on our daily life.

Sunday, July 24, 2011

Speed-up expressions for multicore processors

Problem:

In class we learnt about Amdahl’s Law (see Presentation #4, pp. 11–13), here in this problem we try to derive speedup expressions for three variations of multicore processors:

1. A symmetric multicore processor composed of n base core equivalents (BCEs); each BCE is the smallest realization of a processor core, as illustrated in Figure 1(a);

2. A symmetric multicore processor composed of n/r cores, each of which is made by combining BCEs, as illustrated in Figure 1(b);

3. An asymmetric multicore processor composed of one r -BCE core and n – r BCEs, as illustrated in Figure 1(c).

Now, we try to derive speedup expressions of a certain program. We use the following notation.

f : the fraction of the program that must be executed sequentially
perf ( r ) : normalized performance of a r -BCE core (note: perf ( 1 ) = 1 ), defined as:

$\textup{perf} ( r ) = \frac{\textup{absolute performance of a }r\textup{-BCE core}}{\textup{absolute performance of one BCE}}$

where absolute performance can be the processing speed (e.g., Millions of Instructions Per Second, MIPS), or some other performance measure.

Speedup S is defined as:

$S= \frac{\textup{running time using 1 BCE} }{\textup{running time using a multicore processor chip}}$

(a) Derive S for a symmetric multicore processor chip with n BCEs in terms of f and n.

(b) Derive S for a symmetric multicore processor chip with (n/r) r-BCE cores in terms of f , n , r , and perf ( r ) .

(c) Derive S for an asymmetric multicore processor chip with one r-BCE core and n – r BCEs in terms of f , n , r , and perf ( r ) .

Follow-up:

(a) Let T denote the time required to execute the whole program using 1 BCE. Now, the time required to execute the whole program with a symmetric multicore processor chip is given by:

$fT+\frac{\left (1-f \right )T}{n}$

Consequently we have:

$S=\frac{T}{fT+\frac{\left (1-f \right )T}{n}}=\frac{1}{f+\frac{\left (1-f \right )}{n}}$

(b) Let T denote the time required to execute the whole program using 1 BCE. Now, the time required to execute the whole program with a symmetric multicore processor chip with (n/r) r-BCE cores is given by:

$\frac{fT}{\textup{perf}\left ( r \right )}+\frac{\left (1-f \right )T}{\left ( \frac{n}{r} \right )\textup{perf}\left ( r \right )}$

Consequently we have:

$S=\frac{T}{\frac{fT}{\textup{perf}\left ( r \right )}+\frac{\left (1-f \right )T}{\left ( \frac{n}{r} \right )\textup{perf}\left ( r \right )}}=\frac{1}{\frac{f}{\textup{perf}\left ( r \right )}+\frac{\left (1-f \right )}{\left ( \frac{n}{r} \right )\textup{perf}\left ( r \right )}}$

(c) Let T denote the time required to execute the whole program using 1 BCE. Now, the time required to execute the whole program with an asymmetric multicore processor chip with one r-BCE core and n – r BCEs is given by:

$\frac{fT}{\textup{perf}\left (r \right )}+\frac{\left (1-f \right )T}{\textup{perf}\left (r \right )+\left (n-r \right )}$

Consequently we have:

$S=\frac{T}{\frac{fT}{\textup{perf}\left (r \right )}+\frac{\left (1-f \right )T}{\textup{perf}\left (r \right )+\left (n-r \right )}}+\frac{1}{\frac{f}{\textup{perf}\left (r \right )}+\frac{\left (1-f \right )}{\textup{perf}\left (r \right )+\left (n-r \right )}}$

Cryptography using key

Problem:

(a) Suppose that there are n people who want to communicate with each other securely. How many keys are needed when a symmetric key cryptosystem is used? How about public key cryptosystem? Explain.

(b) Digital signatures cannot be done using symmetric key cryptography. Explain why.

(c) Suppose we produce a “digest” of a message by simply adding up the words (e.g., treating each character as a 8-bit number). What is the problem of this approach?

Follow-up:

(a) Using symmetric keys, the number of keys required for pair-wise communication is nC2 = n (n –1)/2. Using public key cryptosystem, the number 2 of keys require is just 2n (one public key and one private for each user).

(b) One of the most important requirements in digital signatures is non-repudiation—the signer cannot deny that a signature is produced by him/her. Thus, we need a “secret” that is bound to and only to each user. Using symmetric key cryptosystem, a secret key is known to at least two users and, therefore, a digital signature produced with a secret key cannot be bound to a unique user.

(c) The problem is that a message can be easily transformed into an entirely different message with the same digest, by just re-arranging the characters or words. For example, “car” and “arc” have the same digest.

Worst-case analysis, average-case analysis, and probabilistic analysis

Problem:

This problem is about several important but related concepts: worst-case analysis, average-case analysis, and probabilistic analysis. Consider the following “hiring” algorithm:

Set candidate best to be unknown;
For each of the n candidates, do the following:
Interview candidate i;
If candidate i is better than candidate best, then hire candidate i and set best to be i;

Assume that interviewing each candidate has a cost of $c_{In}$ In and hiring a candidate has a cost of $c_{h}$ (where $c_{h}$ > $c_{In}$ in under normal circumstances).

(a) Can you give the worst case total cost of the above hiring algorithm?

(b) Assume that the candidates come to the interview in a random order, i.e., each candidate is equally likely to be the best. Specifically, candidate i has a probability of 1/i -- to be the best among the first i candidates. Can you give the average case total cost of the above hiring algorithm?

Hint: You can consider to use the “indicator random variable” $X_{i}$ , which is equal to 1 if candidate i is hired and 0 otherwise. Hence, the average number of candidates that are actually hired is equal to the “expected value” (i.e., the average) of $\sum_{n}^{i=1}X_{i}$ .

Follow-Up:

(a) The worst case is that every interviewed candidate is hired. Thus, the total cost is: $c_{h}n+c_{In}n$

(b) In this average case, the only change to the total cost is the hiring part. So let’s focus just on this part. Specifically, as given in the Hint, the average number of candidates that will be hired is:
$\sum_{n}^{i=1}X_{i}$ which in turn is equal to: $\sum_{n}^{i=1}\frac{1}{i}$ . A good bound for this sum is: log n. Thus, the average case hiring cost is just $c_{h}$ log n , which is much smaller than the worst case’s hiring cost.

The lesson we learn is that average case is sometimes much better than the worst case.

Programming using stack

Problem:

A stack is a useful structure for storing data because it has many useful applications. For example, it can be applied in implementing recursive computing method. A stack is a “last-in-first-out” structure — data items are stored linearly as one item is placed on top of the other. The only two operations are: pop and push. The pop operation retrieves the top data item on the stack, while the push operation puts a new data item on the top of the stack. The figure below shows how a stack works:

(a) Can you write up a computing procedure for DFS using a stack instead of using recursion?

(b) Can you suggest a practical situation where we must use a stack instead of using recursion?

Follow-Up:

(a) The first possible version is as follows.

Stack_DFS1(x):

Mark x; Push(x);
if stack is empty, then end of algorithm; otherwise, do the following:
y = top of stack (without popping it);
if there is a neighbor z of y that is not marked, then do:
Mark z; Push(z);
else
Pop;
go to Step 2.

Another possible version is as follows.

Stack_DFS2(x):

Push(x);
if stack is empty, then end of algorithm; otherwise, do the following:
y <- Pop;
if y is not marked, then Mark y;
for every neighbor z of y that is not marked, do the following:
Push(z);
go to Step 2.

(b) In the Web crawling application, we need to use a stack. In fact, crawling is done by many computers running the “search” (or explore) simultaneously. Each computer takes the next node to be explored from the top of the stack, downloads the HTML file, and then scans it for further hyperlinks. But when a new HTML document is found (i.e., a “neighbor”), no recursive call is made; instead, the new node is pushed onto the stack.

Yet one interesting question remains: when we see a HTML document, how do we know it is indeed “new” (i.e., not marked) so that we want to push it on stack?

Making change using dynamic programming

Problem:

Consider the problem of making change. Assume coins of values 25 cents (i.e., a quarter), 10 cents (dime), 5 cents (nickel), and 1 cent (penny). Suppose now we want to return 63 cents in change. What would you do?

Now suppose that our coins’s values are instead: 1 cent, 5 cents, and 11 cents. We want to make change of 15 cents. Can you apply the same method as what you did above?

Follow-Up:

We can use a greedy method to find the smallest number of coins required for the first case:

63 (divide) 25 = 2 remainder =13
13 (divide) 10 = 1 remainder = 3

Thus, we need two quarters, one dime, and three pennies.

However, it is easy to see that the greedy method does not work for the second case. Indeed, we end up using one 11 cents and four 1 cent, while in fact three 5 cents would do good.

The lesson, again, is that greedy method does not always work.

Some of you discovered solving the second case using a “recursive” idea:

f(15) = min { f(14), f(10), f(4) } + 1

where f(n) denotes the minimum number of coins needed to make change for n cents.

The rationale of the above “recursive” idea is that the first term inside the “min” represents the case where one 1 cent coin is used so that we still need to make change for 14 cents. Similarly, the second term represents the case where one 5 cents coin is used so that we still need to make change for 10 cents. Finally, the third term represents the case where one 11 cents coin is used so that we still need to make change for 4 cents.

To really compute the answers, we need to further “reduce” the problems on the right side of the above equation (called a recurrence equation), until we get to the trivial cases, e.g., f(1), f(5), f(11) .

In fact, to construct the final answer, we need to build it from “bottom-up”, i.e., at the beginning we use the trivial cases, and then construct the bigger cases, and so on, until we get to the original question.

There is an official name for this “bottom-up” construction of recursively defined solutions—dynamic programming. This is a very powerful algorithm design paradigm that can be applied in solving many problems.

Further Inquiry:

(a) What is the estimated running time of the above dynamic programming method?
(b) Can you find a general relationship for the coin denominations so that the greedy method is optimal?

Communication networks design using graph problem

Problem:

In addition to finding shortest paths, sometimes we are also interested in finding the set of links connecting all the nodes with the minimum cost. A real life application of this idea is in the design of communication networks. Here, the nodes of the graph represent sites and the links represent the communication connections between sites. The cost of each connection could be the communication delay. Using the following graph as an example, can you design an algorithm similar to the Dijkstra’s algorithm in solving this problem?

Follow-Up:

To solve this problem using a greedy method, we can use the Kruskal’s algorithm, outlined below.

Sort all links in ascending order of costs
Start with an empty set of links, so that all nodes are disconnected (isolated)
Add the lowest cost link to the set to connect a disconnected node without creating a cycle; this link is not to be considered again
Repeat Step (3) until the set has n - 1 links

Further Inquiry:

(a) What is the estimated running time of the above algorithm?
(b) If there are links with negative costs, would this greedy method still work?

Friday, July 22, 2011

Digital signature

Comment:

Still confused about “digital signature”. In particular, can a “digital signature” be stolen?

Follow-up:

Digital signature is just an encrypted output of a message (or actually the digest of the message). The encryption is performed by using the private key of the signer. Thus, the signer cannot deny having signed the message because only he/she has the private key. Furthermore, every “signature” is unique in that the encrypted outputs of different messages are different.

Thus, a “digital signature” cannot be stolen.

Cracking using quantum computing

Comment:

What if quantum computing cracks the theory of public key approach?

Follow-up:

Theoretically, based on Peter Shor’s factorization algorithm, a quantum computer can break a public key cryptosystem in very short time. Thus, public key cryptosystem as we are using it now will become completely useless.

The critical problem of the current public key cryptosystem (the “RSA” algorithm) is that it relies on computational intractability. In other words, its security is not mathematically proven.

Security scientists are actively researching on “provable” public key approaches.

Certificate authority

Comment:

What if the certificate authority discloses private information of users? Safe?

Follow-up:

Yes it is definitely possible that a CA could inadvertently discloses some private information. For example, a CA’s database could be compromised by some hackers, just like the credit-card numbers are disclosed in some hacking of e-business Web sites.

So in a sense, one could argue that our Internet security is quite fragile.

Comment:

Who will certify the public key of the certificate authority?

Follow-up:

A CA’s public key is not “certified” but just published in a widely accessible site so that everyone can verify it.

Are malware and computer virus related to Internet security?

Comment:

Does it mean that we should not use some not-so-famous browser to surf the net because those will lead to risky result?

Follow-up:

Yes definitely. In fact, we should all be very careful in using any software in nowadays computing environment because of the Internet. Specifically, a malicious software (called malware) can extract useful information (e.g., bank statements, etc.) from your computer and then send them off to some remote computers (e.g., the hackers’ computers) for launching further attacks.

We should not trust certificates from unknown sources/companies. We should not trust the “judgment” made by the browsers, which might have been altered by some malwares.

Comment:

How about computer virus? (Is computer virus related to Internet security?)

Follow-up:

Computer virus, like “malware” I mentioned above, is highly related to Internet security in the sense that nowadays computer virus is not about making fun of an innocent user (as in the past). Instead, a virus will be used for getting sensitive information and/or using your computer to launch further attacks to others (e.g., Distributed Denial-of-Service attacks, like attacking popular Web sites such as Amazon.com).

Most importantly, a computer virus spreads by using emails or some bogus links in a Web page.

So we have to be careful in opening email attachments and clicking on some Web links.

Is password enough for authentication?

Comment:

Is password enough for authentication? Any other ways for the authentication (like USB keys)? What else can I do to enhance my Internet security?

Follow-up:

Password is definitely not enough. But we do not know what really is enough. Remember the weakest link concept? Thus, to enhance security, we really have to do as much as we can, without compromising efficiency or convenience too much. So it is true that in many situations, some more security measures, e.g., using a hardware key, to couple with password for authentication.

Biometrics (e.g., retina scan, finger-prints, etc.) are also commonly used as further information for authentication.

Quantum computer

Comment:

What makes a quantum computer impractical?

Follow-up:

For now, the greatest difficulty is in the implementation of a quantum circuit that is stable. In fact, the problems related to accurately and reliably measuring the quantum bits, namely de-coherence and error correction, are still very difficult to solve in practice, even if we assume we can deploy tremendous amount of resources at the construction of a quantum computer.

Still remember the chaotic theory? It is also anticipated that if we cannot tackle the accurate measurement problem effectively, it would be impossible to scale up the size of a quantum circuit without leading to enormous errors.

Skype system

Comment:

Why the SuperNode in a Skype network can boost the speed of the system?

Follow-up:

Sorry for causing this confusion!

In fact, each SuperNode is mainly responsible for routing calls for other users. In the routing mechanism, sometimes a SuperNode needs to serve as an agent on behalf of another Skype user who is behind a firewall or an NAT router (very commonly used at home or in an organization). Thus, each SuperNode is really helping to enhance the “reachability” of the Skype network, rather than the speed of the system.

Comment:

How can the Skype system enable “computer-to-telephone” calls?

Follow-up:

Good question.

Specifically, there are some Skype servers (not shown in the diagram in the presentation slide) that have connections to some PSTN (Public Switched Telephone Network) gateways. Thus, through such servers we can route our VoIP (voice over IP) calls to reach a landline telephone or a cell phone. As this directly consumes resources in the Skype network (owned and operated by the Skype company), the caller and the callee both need to pay some money to make this happen.

Thursday, July 21, 2011

Sequential processing

Comment 3:

What exactly is “sequential processing”?

Follow-up:

Sequential processing refers to those computations that can only be handled by one single processor. Putting in more processors does not help. For example, in a typical parallel or multi-task program, there are some “managerial” tasks to be carried out. One most commonly encountered task is the division of the whole job into pieces. Specifically, if we consider an image processing task, dividing up the image into pieces is one such managerial task. Obviously, this action is not parallelizable.

Can you think about some daily life examples of sequential processing task?

Parallel processing: data parallel and functional parallel

Comment:

Differences between data parallel and functional parallel approaches?

Follow-up:

Data parallel approach is suitable for situations where exactly the same actions can be taken for different pieces of data. A very good example is filtering or quantization in image processing because these tasks require operating on each pixel independently for the whole image. Another daily life example that we discussed in class is the painting of a large wall (in the same color!) by a group of workers in parallel.

Functional parallel approach is suitable for situations where different actions can be taken on the data (possibly the same) independently and simultaneously. A very good example is the recognition of different objects in the same image. Here, we apply different “actions”, which are the matching of different known object models, over the same image. These actions are independently usually and are different because matching different models require different kinds of processing. Another daily life example that we discussed in class is the re-modeling of an apartment — we usually need to do many different things such as installation of electrical wirings, painting of walls, installation of fixtures, etc. These functions (sometimes) are independent and can be carried out in parallel.

Finally, one very important difference between the two approaches is the “degree of parallelism” — which can be defined as the number of processors that you can use in a productive manner (i.e., having some useful work to do). Obviously, in data parallel approach, if the data involved is large (e.g., a very large image), then we can use many processors (e.g., up to thousands) to enhance the speed-up. On the other hand, usually there will not be many different (and independent) functions required in a task (e.g., re-modeling of an apartment). So the number of processors that can be put into productive use is usually limited (e.g., on the order of 10 or lower).

Think about “weather forecasting” computing. Which approach do you think is more suitable?

Comment:

Functional parallel computing can only be used when the different functions are independent of each other. So, for example, in the case of “object tracking”, as the different functions are actually dependent on each other, i.e., the object tracking can only be done after background subtracting, so the functions need to be done one after another. Hence, functional parallel computing could not be used in this case. In a sense, this is “completely sequential”?

Follow-up:

This is an excellent observation.

Using optimal method or approximation method?

Comment:

How do we know when to apply an optimal method and when to use an approximation method instead?

Follow-up:

Generally we want to solve a problem fast. Thus, if there is a fast optimal method (for a more precise notion of “fast”, please refer to Homework 5), then we should just use it. Now, a possibly difficult situation arises when we do not have a fast optimal method. For example, the optimal method requires exponential running time. In such a situation, we might need to forgo optimality but use a fast method that will give us an answer, possibly not optimal, but within a short period of time. For example, the “touring” (i.e., gossiping) problem is usually solved by a sub-optimal but fast method.

Time taken for the gossip to come back

Comment:

Time taken for the gossip to come back?

Follow-up:

A circular gossiping loop is a round trip tour visiting each and every node once. The cost of such a tour is defined as the total costs summing up all the link costs. Each link cost can represent the time needed for the gossip message to travel from one node to another node.

Comment:

Why is there no fast algorithm for a round trip tour where each link and node is visited once?

Follow-up:

That is life, I guess. There are in fact many such seemingly simple problems that do not have a fast optimal algorithm for them yet.

High tech, low tech and "no" tech

Comment:

I would like to know clearer distinctions among high, low, and no tech. In which category are the game consoles (e.g., NDS, X-Box, and PS2) classified?

Follow-up:

In my opinion, high tech means the technology has truly ingenious component so that it is very difficult for others to re-invent. Of course, when the technology is being used, others can somehow “reverse” engineer the underlying mechanisms and then possibly copy it. Thus, usually such “high tech” solutions are usually patented to prevent such copy-cats from “stealing” the treasures! Examples of high tech are: micro-chip fabrication, Google search, Skype, NVidia Graphics Processing, etc.

On the other hand, to me, low tech means that the technology is not really that difficult to design and implement. Nevertheless, low tech does not mean unsuccessful product. Usually the reverse is true. I think the reason is that while the underlying technology might not be difficult, the “application” of the technology might be pioneering (i.e., a first mover) so that users are accustomed to using it.

No tech, again in my judgment, is something that does not really involve much technology but just marketing and sales strategies, which are extremely important in a business, if not the most important components.

In summary, I think game consoles are really high tech stuffs because they are really “high performance” hardware built based on really difficult designs. We will talk a bit more about these hardware later on.

Computations in Google Map

Comment:

Why Google cannot store the complete map easily? For instance, for 1000 nodes, we need to store just around 500 × 999 units of data.

Follow-up:

While the analysis for 1000 nodes is correct, when you scale it up to 1 million nodes, the data storage becomes very big. Indeed, the file size of the street map for the U.S.A. is around 9GB (for details, please visit: www.esri.com). Thus, for the entire world with all the details (e.g., restaurants, shops, timing data, etc.), the data size would be even larger. Of course, Google still can “easily” handle all these. But to run an algorithm (e.g., Dijkstra) over such a large piece of data is still quite daunting. That is why we really have to use the “pre-compute-then-table-lookup” approach as far as we can.

Comment:

What kind of programming language is used for implementing MapQuest and Google Map?

Follow-up:

For high-performance applications, I strongly believe that their engineers are still relying on the C language. However, for other system components such as user interface, Web, etc., they use a variety of other tools (e.g., Java, Perl, Python, etc.).

Principles of the travel scheduling problem

Comment:

Underlying principles of the travel scheduling problem?

Follow-up:

Frankly speaking we also do not know the complete picture of this problem. Yet from an intuitive angle, the problem enlarges the complexity of the original shortest path problem in at least two dimensions. Firstly, we need to solve multiple instances of the shortest path problem for any single round-trip tour (e.g., one day in the trip). Secondly, we need to combine these shortest path solutions temporally across different tours so as to obtain the minimum overall cost for the whole trip.

Chaotic systems

Comment:

Chaotic systems?

Follow-up:

In simple terms, a chaotic system is one which is highly sensitive to initial conditions, and whose seemingly random behaviors can be actually described by deterministic equations. For more details, please visit:

• http://en.wikipedia.org/wiki/Chaos_theory (for a general introduction)
• http://en.wikipedia.org/wiki/Logistic_map (for more details about a chaotic system defined by a very simple recurrence equation); Reference [1] in http://www.eee.hku.hk/~ykwok/CCST9003/reading-others.html also has an excellent discussion

Random number generation

Comment:

I want to know more about random number generation.

Follow-up:

Please visit the following sites:

• http://www.random.org/randomness/
• http://www.fourmilab.ch/hotbits/

Critical thinking about 911

Comment:

Why watch the 911 video?

Follow-up:

Well, there are several reasons for watching this video:

On the 9th anniversary of the 911 event, there are many “official” memoirs of what happened on that day. As a “critical thinking” (a pre-requisite of “computational thinking”) exercise, it is useful to stimulate our brain by getting to know something contrary to main stream information.
There are plenty of computing related events in the video:
(i) The computer simulation done to verify the fact that the debris dropped at free-fall speed
(ii) The computer simulation done (by the NIST) to prove that WTC-7 was burnt to collapse

In later “meetings”, I will also show some other controversial videos to challenge your “lateral thinking”.

Tuesday, July 19, 2011

Links of Readings

Required Reading

HW: Sorting in the daily life

Problem (10 points):

This problem is about a common daily life application of sorting. Specifically, sorting (e.g., QuickSort) is usually done before we want to search for something. For example, bank accounts are always sorted according to the account numbers in order to facilitate searching a particular account on request.

Consider, as shown above, a sorted (in ascending order) list of n numbers (assume n is a power of 2, i.e., n = (2^k) for some positive integer k ). To search for a number x in the list, we can use the following algorithm. Compare x with the number in the middle of the list (i.e., m shown above). If they are equal, then our job is done. If x is smaller than that middle number, then we repeat this search process using just the left half of the list; otherwise, we repeat this search process using just the right half of the list.

(a) Why we can use just the left half of the list if the first comparison tells us that x is smaller than

the middle number?

(b) Please express the above algorithm as a recursive method.

HW: Difference between two multiplication methods

Problem (10 points):

In Meeting #1, we discussed about two multiplication methods: the traditional method and the lattice method (see slides 16 and 17). Please answer the following questions.

(a) Are the amounts of effort different in the two methods? Explain.

(b) Which method you prefer to use? Explain.

HW: "Hard" and "Unsolvable"

Problem (20 points):

This problem is aimed at enhancing your understanding of “hard” and “unsolvable” problems.

In 1965, computer chip pioneer Gordon E. Moore noticed that transistor density in chips had doubled every year in the early 1960s, and he predicted that this trend would continue. This prediction, moderated to a doubling (in transistor count) every 18 months, is known as the Moore’s Law. Often times, people extend the meaning of Moore’s Law to imply that computer processing speed doubles every 18 months.

(a) According to the speed-doubling result of Moore’s Law, how many times a computer today is

faster than one ten years ago?

(b) Suppose we have a computing problem that can be solved in estimated running time on the order of n , where n is the problem size (e.g., the number of nodes in a graph). Using the same amount of time, how much bigger a problem we can solve with a computer today than with one ten years ago?

(c) Suppose we have a computing problem that can be solved in estimated running time on the order of (n^2) , where n is the problem size (e.g., the number of nodes in a graph). Using the same amount of time, how much bigger a problem we can solve with a computer today than with one ten years ago?

(d) Suppose we have a computing problem that can be solved in estimated running time on the order of (n^6), where n is the problem size (e.g., the number of nodes in a graph). Using the same amount of time, how much bigger a problem we can solve with a computer today than with one ten years ago?

(e) Suppose we have a computing problem that can be solved in estimated running time on the order of (2^n), where n is the problem size (e.g., the number of nodes in a graph). Using the same amount of time, how much bigger a problem we can solve with a computer today than with one ten years ago?

(f) For Problem 3 of Tutorial 3, someone suggested that we might solve the problem (i.e., finding

integer roots for the equation) by simply testing every possible integers. Please comment.

HW: Dynamic programming

Problem (10 points):

In Tutorial 3 Problem 2, we worked on the knapsack problem in which there is only one unit of every item i. Here we consider the version where there is an unlimited supply of every item. In this version, we can consider a dynamic programming formulation for computing this quantity: K(x), which is defined as the maximum value achievable with a knapsack of capacity x.

(a) Please try to give a recurrence for K(x)?

(b) Please try to write up the computing procedure for K(x).

(d) Please draw a graph (with directed links) to illustrate the computation (just like the one I showed you in Tutorial 3 and in Tutorial 1 follow-up). How would you describe the computation using this graph?

HW: "Greedy" computing

Problem (10 points):

We just learnt about “greedy” computing method—trying to choose the best move in every step,

without regard to future options.

(a) What do you think is the necessary condition for the greedy computing method to find optimal solutions?

(b) How do you compare greedy computing method with recursive method (or divide-and-conquer)?

Sunday, July 17, 2011

Genetic Algorithm

Comment:

All of the following are related to Genetic Algorithms (GAs).

How to ensure that a better solution can always be found in the next generation?
How does a computer evaluate the “fitness” of a chromosome?
Depending on the encoding scheme chosen (e.g., using a bit string to represent whether or not a link is selected in a path), it is possible that a lot of invalid solutions are included in the chromosome population. Will this actually decrease the efficiency of the GA?

Follow-up:

First of all, there is no way we can guarantee a better solution to be found in a new generation. In a sense, the crossover and mutation operators serve as “random” searching tools in a vast solution space. With luck we might be able to derive a better solution; otherwise, we will end up getting worse solutions in a new generation. Then, using the selection step, the newly obtained worse solutions might get weeded out. Thus, in a practical GA, we usually employ the “elitism” policy, i.e., although the currently best solution is still subject to crossover and mutation, it will be explicitly “protected” in that a copy of it will be retained for the next generation.

The “fitness” of a chromosome is really just the metric we define for a potential solution. For instance, in the case of the shortest path problem, we can define the fitness of a chromosome to be the length of the path (if it is a valid path).

Yes including a lot of invalid solutions can possibly decrease the efficiency (in terms of the

convergence time) to find the best possible solution. However, doing so could also help the “exploratory” side of the GA because the invalid solutions might also contain some “good

components” (i.e., good genes in a bad chromosome) so that through crossover, such good components might get passed on to some valid solutions in new generations.

I have also enclosed a research paper that I co-authored many years ago about using a GA for multiprocessor task scheduling. Hope you would be able to find time to glance through it to see another example of encoding scheme, the design of crossover and mutation operators, and most importantly, the performance of the GA.

Sequential mode or parallel mode?

Problem:

A uniprocessor computer can operate in either sequential mode or parallel mode. In parallel mode, computations can be performed nine times faster than in sequential mode. A certain benchmark program took time T to run on this computer. Furthermore, suppose that 25% of T was spent in parallel mode whereas the remaining portion was in sequential mode.

(a) What is the effective speedup of the above execution as compared with the condition when parallel mode is not used at all?

(b) What is the fraction α of parallelized code of the benchmark program?

(c) Suppose we double the speed ratio between parallel mode and the sequential mode by hardware improvements. What is the new effective speedup?

(d) Suppose the speedup you calculated in (c) is to be obtained by software improvement alone in terms of parallelization fraction, instead of by any hardware improvement. What is the new parallelization fraction required?

Follow-up:

(a) If parallel mode is not used at all, the execution time will be 0.75T + 9 × 0.25T = 3T .
Therefore, the speedup of the program is 3T ⁄ T = 3 .

(b) The fraction of parallelized code is given by:

$\alpha = \frac{\textup{Sequential time of parallelized code}}{\textup{Total sequential time}} = \frac{9 \times 0.25T}{3T} = 0.75$

$0.75T+\frac{0.25T}{2}=0.875T$

And the new speedup is:

$\frac{3T}{0.875T}=3.43$

(d) Parallelization in general is done by the compiler (an important piece of software for translating your program into machine language). Since the new speedup is to be obtained by software instead of hardware, the speed of parallelized code is still 9 times faster than sequential mode. Let β be the new parallelization fraction. Then, we have:

$\frac{1}{\frac{\beta }{9}+1-\beta }=3.43$

Thus, we have

$\beta \approx 0.8$

That is, the compiler has to parallelize 5% more of the benchmark program.

Internet Key: Key generation

Comment:

(I am) Still confused about public key cryptography.

Follow-up:

Simply put, in a public key cryptosystem, each user owns a pair of keys, one made public and the other kept secret. To send a message securely, the public key of the intended recipient is used for the encryption so that only the recipient can decrypt it. On the other hand, for digital signature, the signer uses his/her own private key to encrypt the message (or actually the digest of the message) so that everyone can decrypt it using the signer’s public key to verify it.

For those of you who are interested in the mathematical details, please study the attached lecture notes on public key cryptography (file: module3-PublicKeyCrypto.pdf).

Comment:

How does KDC work?

Follow-up:

The KDC shares secret key (or symmetric key) with every register user. Thus, by using this key, the KDC can send a newly generated key (denoted as R1 in Presentation #5) to a user requesting for a new key to be shared with another register user of the KDC. Once getting this R1, the two users can then transform it further using some agreed-upon protocol to generate the actual shared key which is unknown to the KDC.

Comment:

I want to know if it is possible to combine the use of public and private keys as follows.

Follow-up:

Yes of course your suggestion would work. In fact, it provides not only “confidentiality” but “non-repudiation” as well.

Parallel processing: Amdahl's Law, speed-up and overhead

Comment:

The “speed-up” formula (i.e., Amdahl’s Law) still a bit confusing?

Follow-up:

First of all, it is important to understand that this formula assumes that a program (or a task, or a job) can be divided into two parts, i.e., sequential part (must be carried out by one single processor only), and parallel part. Most importantly, one key component in the assumption is that the parallel part can be perfectly divided into totally independent (i.e., no communication is required) pieces for different processors to work on.

Consequently, given n processors, we can perfectly speed up the computation of the parallel part by n times. Using the terminology in the presentation slide, the Speedup of enhancement is n. Thus, if the original total execution time of the program using one processor is T and the fraction of sequential part is x, then the new total execution time is:

$xT+\frac{\left (1-xT \right )}{n}$

The first term in the above expression is the time required to execute the sequential part. The second term is the time required to execute the parallel part, using n processors simultaneously. Finally, speed-up, which is defined as the ratio of original time to the new time, is given by:

$speed-up = \frac{T}{xT+\frac{\left (1-x \right )T}{n}}=\frac{1}{x+\frac{\left (1-x \right )}{n}}$

We can see that perfect speed-up, i.e., n , is achievable only when x = 0 . which means that the whole program is perfectly parallelizable into totally independent pieces. This is a situation not likely to happen in a typical computer program (or any daily life task, for that matter).

Comment:

Is the parallel processing formula (speed-up) really works?

As we assume the processor count tends to infinity, the time for communication should also be in a large scale. Should we consider the running time to be larger? Or the formula just neglects the factor of communication time between processors?

Follow-up:

This is another excellent observation. As mentioned in the Follow-Up to the comment above, in deriving the formula, we assume that the parallel part can be divided up perfectly, without any communication. In a real-life parallel program, of course there will be communications. Thus, in deriving a speed-up formula for such a program, we might need an additional term for the total execution time:

$xT+\frac{\left (1-x \right )T}{n}+cn^{2}$

In the above expression, the third term denotes the communication overhead time introduced by parallelization, which grows as (n x n) with a multiplicative constant c. This is not an “artificial” example but really represents some real-life communication overhead that you can find in many practical parallel programs.

What new conclusions can you draw from this new expression?

Comment:

Still quite confused about Slide 14?

Follow-up:

Slide 14 illustrates the need for extra communication overhead when we perform parallel processing. Specifically, in the Main Memory, there is a variable u, which has a value of 5. This variable is needed for some parallel computations to be carried out by three different processors as shown. Now, each processor has to read this variable from the Main Memory into its own memory in order to perform the computation. The problem here is that after P1 and P3 have read the variable u, P3 subsequently modifies its value (from 5 to 7), making different copies of u having different values (an inconsistent situation).

Obviously, inconsistency in shared data variables is unacceptable because it will lead to mistakes. Thus, to rectify the inconsistency, some extra communication has to be carried out. For instance, after P3 modifies u, it has to immediately inform every processor that is currently using u (i.e., having u in its own memory).

Can you think of some other way to handle this inconsistent situation?

Xanga/ICQ vs Facebook/MSN

Comment 3:

How does Facebook store our data? Using the listing method?

Follow-up:

The listing method suggested by Stanford is very likely being used by Facebook to store “connectivity” data among users. Of course, such a list is also complemented by a huge user database.

Comment:

Even being a first mover, a Web site may still be replaced by other Web sites with more functions, e.g., XANGA, being the first mover for Web blog, has generally be replaced by Facebook which also has a Web blog function and many other functions. Also, ICQ has also been replaced by MSN which has better technologies. This shows that being a first mover is not the only way of success but continuous improvement in applications and technologies are needed to achieve consistent success.

Follow-up:

Excellent observation! (That’s why I want to share this with all of you.)

Computing using depth-first-search (DFS)

Problem:

In class today, we learn about a computing method for traversing a graph (see Presentation #2, slide #9), called depth-first-search (DFS).

(a) Can you write up a recursive computing procedure for doing so?

(b) Can you analyze its estimated running time?

(c) Can you name some “everyday computing” applications of this method?

Follow-Up:

(a) A possible recursive algorithm is as follows:

DFS(x):

Mark x;
For every neighbor y of x that is not marked:
DFS(y);

(b) As discussed in class, the DFS procedure visits each and every node once, also each and every link once. Thus, the estimated running time should be on the order of N + E , where N and E are the number of nodes and number of links, respectively.

(c) We need to use DFS for many useful daily life applications:

playing chess: searching for the best move in the game tree;
playing puzzle: searching for the exit in a maze;
crawling the Web: the crawler program (e.g., in Google) needs to search the Web graph (i.e., visualizing the Web as a graph of connected documents, where the connections are the hyper-links) in a depth-first way.

Tutorial 3

Dijkstra's shortest path, Computation

Comment 1:

How to derive the Dijkstra’s shortest path algorithm’s estimated running time?

Follow-up:

As an approximation, we note that each link needs to be visited at once. In the worst case, we can have on the order of (n x n) links, where n is the number of nodes in the graph. Thus, the estimated running time is approximately (n x n) steps.

Comment 3:

Maybe it is true that if we put in more computing/storage resources, we can solve just about any problem.

Follow-up:

If you attended Tutorial 3, you should have learned that there are really difficult problems, in the sense that you will need unreasonably large amount of storage (e.g., every atom in the universe) and computing power to solve. Furthermore, in Problem 3 of Tutorial 3, you will see that there are problems that are simply unsolvable.

Dijkstra's shortest path problem

Problem:

We learn about the Dijkstra’s shortest path algorithm in class. Try to apply it to the following graph.

What is the shortest path between s and b?

Follow-Up:

Using Dijkstra and starting from s, the shortest path found is s-a-b. Yet we can easily see that the shortest path should be s-a-c-b. Thus, we can see that greedy method does not always work. Here, Dijkstra is defeated by the existence of links with negative costs.

However, as some of you discovered during the tutorial, the correct shortest path can be found by starting the Dijkstra work from the destination, i.e., b. Please try this yourself.

Indeed, in real-life usage (e.g., in software systems such as MapQuest), a bi-directional approach is used, i.e., the Dijkstra work is done concurrently from both the starting point and the destination. Conflicts are resolved on the way to construct the final shortest path.

Further Inquiry:

Consider that the link costs are multiplied together (e.g., when the links are associated with probabilities) in a path, would Dijkstra still work?

Comparsion between insertion sort and quick sort

Comment:

It will be good to know the details about the running time estimates of n and n log n for insertion sort and quicksort, respectively.

Follow-up:

Let’s take a look at insertion sort first. We know that insertion sort works by taking each item (e.g., a card) and then compare it with each of the remaining items on the list so as to find a right position. After the first item is done, we repeat the process for the second item.

Now, suppose we have n items in total, then we need to repeat the above process n times. The next question is: how many comparisons are needed for each item? It is not difficult to see that for the first item, we need to do n – 1 comparisons. For the second item, we need to do n – 2 comparisons. You can go on with this argument until the last item. Thus, the total number of comparisons is:

$\left (n-1 \right )+\left (n-2 \right )+...+2+1=\frac{n\left (n-1 \right )}{2}$

So, for very large n , we can approximate by (n x n) / 2 . As we are mostly interested in the “order of 2 magnitude”, we can further approximate this by just (n x n).

For quicksort, the mathematical derivation requires solving a “recurrence equation”. Thus, the
mathematics needed is really beyond the scope of the course here. Yet, to get a rough idea, it is similar to the recursive exponentiation we tackled in Tutorial 1 (problem 2).

Compution using recursive method

Problem:

The number n! is defined as n × ( n – 1 ) × ( n – 2 ) × … × 2 × 1 .

(a) Can you write up a recursive method to calculate the value of n! ?

(b) You are given a function called print( n ) which shows the value of n on the screen. Using your idea in (a) above, can you write up a recursive method to print numbers from n to 0 on the screen?

Follow-up:

(a) The number n! is defined as n × ( n – 1 ) × ( n – 2 ) × … × 2 × 1 .
We can express its computation as a recursive method below. Assume n > 1 .

Calculate_Factorial(n)

Is n = 1?
If yes, then answer is: 1
If no, then answer is: n x Calculate_Factorial(n-1)

(b) Using an idea similar to the above, we can solve the “printing” problem as follows. Assume n>0 .

Show_n_downto_0(n)

print(n)
Is n = 0?
If yes, then we are done
If no, then do: Show_n_downto_0(n-1)

Problem:

We observe that $a^{n}=\left (a^{n/2} \right )^{2}$ . Assume n is a power of 2.

Can you write up a recursive method to calculate the value?

Follow-up:

We can express this computation as a recursive method below.

Calculate_Power(a, n)

Is n = 1?
If yes, then answer is: a
If no, then:
b = Calculate_Power(a, n/2)
answer is then: b x b

Sunday, July 24, 2011

Friday, July 22, 2011

Thursday, July 21, 2011

Tuesday, July 19, 2011

Recommended Reading

Sunday, July 17, 2011