Showing posts with label computation. Show all posts
Showing posts with label computation. Show all posts

Tuesday, November 8, 2011

Leave the decisions to human beings, not machine

Kayson Wong

I love computers. Of the many reasons, one of them is that computers do not lie. It’s true. Computers are machines, they do not think, they have no feelings. Computers always give you the exact same output when given the exact same input. In fact, it’s all logic. 

What’s logic? Logic is something that everybody on this world agrees. You must agree, and you have to agree. That’s how logic is defined. When you ask everybody the same question, they always give you the same answer. Mathematics is a field that is derived from logic rules. For example, if you ask someone “What is 1+1”, they always reply “2”. It’s not because they think or feel the answer is “2”, it is because they’re told that the answer has to be “2”. By the definition of numbers and operations, “2” is the only answer to this question. Computers are designed with this same idea. At the core of a computer, logic operations are performed by the CPU. Computers will always produce the exact same output if they’re given the exact same input. This is the characteristic of any computers on this planet.

Today, many computer scientists are trying to push the limit of our computers further – to something so called “Artificial Intelligence”. Artificial Intelligence, or AI, is a term used to describe algorithms that simulate tasks that are normally accomplished by human beings only. As a result, decisions that were human-made in the past are now handled by computers, or to be more exact­, the programmers that came up with the algorithm. Some people today like to refer to this as “smart” technology. Camera manufacturers design AI algorithms so their cameras can “smartly” determine the “appropriate” shutter speed, aperture, focus and ISO speed for taking a photo. Translation software can translate an entire paragraph of text from one language to another. Robots can play table tennis. However, many AI algorithms are applied inappropriately, and in a lot of cases just make the problem worse. Professional photographers never use auto modes to take pictures; in fact professional cameras never include an auto mode, simply because the shoot settings calculated by the AI are far from the ideal settings for taking a photo. Language translators, up to this day, often distort the original meaning of the text, or even come up with grammatically weird sentences. As for the robots, I cannot understand why someone would design a robot to play table tennis. These ball games are designed for human beings for fun and health, not for the sake of ball games.

Putting computers into the seat of decision making is even worse when safety is a concern. In Netherlands, the Maeslantkering is a floodgate designed to protect the city from storm surges. The gates are entirely run by a computer; no human intervention can override the computer’s decision. The computer is programmed to close the gates if the surge height predicted by weather models is higher than 3 meters. During a storm, it happened that the computer predicted the surge height to be 2.99 meters, so it left the gates wide open. Obviously, no one living in the city would care if the water level is 1cm higher, but it matters to the computer!

Human beings must be involved in decision making processes, because unlike computers, humans can react to the situation creatively and come up with solutions that best resolve the problem. Despite the technology we have today, pilots remain essential in an airplane, because we believe that shall an emergency occur, pilots can react creatively and come up with the best way to solve the problem. Engineers cannot foresee each and every possible situation that can happen, so we need human beings in the cockpit to make decisions, despite of the fact that sometimes they make mistakes.

Another decision-making process that is more abstract and convoluted is the creation of arts. While some may argue that in the future computers can accomplish this job, they cannot. It’s not because of technical difficulties, it’s because this task is theoretically impossible to be achieved. Recall that computers are operated based on logic, every computer will give us the exact same response when the conditions we input to the computer are exactly the same. Can we convert the process of creating artistic materials into a logical process? No. Every one of us has a different feeling towards arts. I may look at a painting in a museum and see nothing; the guy next to me sees everything. We all interpret arts in a distinct way, a way that is based on things we have seen, the skills and talent that we have, our personal characters, and even our mood at that time. If we ask two persons to write some music, it’s very likely (if not for certain) that the two pieces of music will be completely different. Every person in this world gives us a different piece of music, because our DNAs are different, our experiences are also different. All the computers in this world, on the other hand, are always the same. This fundamental disparity distinguishes human beings from computers. So, it is not possible to come up with an algorithm that can replace human beings in art creation processes.

But what if we’re not trying to come up with an algorithm to write all music in this world, we’re trying to ask the computer to write the music that Beethoven himself would have wrote? If we are to simulate such creation process by a computer, we must answer the question of whether a human being is “computable”. That is, can be write an algorithm to simulate a human being?
The answer to this question is also no. To simulate a human being, we would have to simulate every physical and chemical change that occurs inside a human body. This requires simulation at atomic level. If we can calculate how each and every atom in a human body would behave, we can then work out all the chemical and physical reactions, and thus, simulate the human body. The truth is we cannot simulate every atom, because it is impossible to predict how a certain atom would behave in a given situation. It’s not even possible to measure the state of each atom. There is a theory in quantum mechanics called “uncertainly principle”. The short version of this principle is: if you measure the position of an atom to very high accuracy, then you would have a very low accuracy about its position, and vice versa. That is to say, we cannot theoretically take absolute accurate measurements; there are always errors to a certain extent.  Due to this error, we cannot exactly predict how each atom would behave. Quantum mechanics actually is a study of probability of atoms. For example, there’s 70% probability that the atom would pass through a certain region, and there’s 30% probability that the atom would be bounced back. It’s like throwing a dice, nobody knows the result. Fortunately, this probability is very useful because usually we’re dealing with a very large number of atoms when applying quantum mechanics. If we throw 6 million dices, then we can be quite sure that about one million of them will have the face “1” on the top. As we reduce the number of dices, this one-sixth prediction become less and less accurate. When it comes to one dice, it’s very hard to tell the result. The same applies to the atom case. This is why it is impossible to have physical simulations up to atomic level. Even if we can, we have to ask another question: is a human being a “computer”? That is, if I have two persons who are exactly identical, who share the exact same DNA and have the exact same experience, will they react exactly the same to a certain situation? I doubt it.

So at the end, what are computers good for, if they can never be as smart as humans? The answer is repetitive tasks. A lot of our economic activities involve mass production, which means producing a large amount of the same product to minimize cost. In the past, without technology, human workers are hired to this job. They have to repeat the same tasks over and over again, every day, every week, and every year. Unfortunately, human beings get bored at repetitive stuffs. As is turns out, these human workers do not enjoy much of their life. Today, these jobs are replaced with machines in factories. Machines can accomplish the same task over and over again, and more importantly, machines never get tired, get sick, get bored, or make errors. Machines are ideal for this job.

The decision making part is the part that should be left to humans. We’re not just living in this world for the sake of living; we’re living because we want to enjoy life. If the machines can create music, make paintings, feed us and talk to each other, why do we need to live? We’re not making computers into humans; we’re trying to make them help humans. Machines are merely the tools we use to accomplish our goals. In this context, it’s best to let computers or machines take over the dirty, dangerous or boring jobs on this planet, and let humans enjoy the sunshine at our beautiful beaches.

References
  1. Borel, Brooke. A Ping-Pong-Playing Terminator. 2 16, 2010. http://www.popsci.com/technology/article/2010-02/ping-pong-playing-terminator.
  2. "Maeslantkering." Wikipedia. n.d. http://en.wikipedia.org/wiki/Maeslantkering.

Friday, November 4, 2011

Grid Computing – Now and future development

There are lots of problems and questions need to be solved in academy. It is time consuming for scientists to solve the problems even they use supercomputer to compute the result. However, the grid computing do a great job on solving some complicated problems that should be done by supercomputer traditionally. Although grid computing is not so popular in normal computer users compare with cloud computing and cluster computing, grid computing is quite popular on solving some professional problems such as scientific and mathematical questions.

What is Grid Computing?
Grid computing refers to the combination of computer resources from heterogeneous computing device to reach a common goal. Although grid computing is a kind of distributive computing, it is different from cluster computing-based systems. In grid computing, each device can be widely spread.  The devices need not be the same computational architecture. They need middleware to divide and allocate the job that the devices are responsible to compute through network which is typically the Internet or Ethernet. There are several big projects using the grid computing to solve the problems, for example, Genome@Home, Folding@Home, MilkyWay@Home etc. All these projects contribute a lot to human society.

Advantage of using Grid Computing
There are many advantages to apply gird computing in scientific, mathematical and academic problems through volunteer computing. This means the people, who share their computing power from their device, will not get any money from institutes. It is voluntary base. The institute does not need to pay a lot to obtain a great computing power. Also, the cost of combining multiple processors in a normal Personal Computer is far lower than cost of making a tailor-made supercomputer due to the mass production of normal processors lower the cost of normal CPU. With grid computing, multiple devices can produce similar computing resource as multi-processor supercomputers. Hence, cost is relatively low compare with setting up a tailor-made super computer to solve the problems. Besides, the institutes need not to have lots of space to distributive computing-based system or a super computer. The maintenance cost can also be saved. 

Disadvantage of using Grid Computing
Although grid computing can lower the cost of computation, there are some restrictions. Firstly, as the devices in grid computing may not have stable and instant connections with other devices, the computation should be dived in the part that is highly parallel and can be solved independently. This increases the difficulty on designing the programme. Moreover, grid computing is mainly relying on voluntary computing. This limits the maximum computing power of grid computing.

Future Development
In past few years, grid computing mainly relies on the CPU as the main processing source. More and more computing power is made by the Graphic Processing Unit (GPU) and the Cell processor in the PlayStation 3 (PS3). These processors have higher Floating Point Operations (FLOPS), which is the main operation of scientific problems, than the traditional CPU in traditional PC. As GPU is not always have heavy work, there are more idling time compare with the CPU. Also, it is impossible for the gamers to play with their PS3 all the day. That means more available computing power. Therefore, the computing power from the GPU and PS3 can be easily obtained compare with CPU. In the future, there are more PCs equipped with high computing power GPU to fulfill the needs of multimedia. With certain promotion of grid computing, it is believed that grid computing will provide a considerable computing power and recycle more idling computing power.

Secondly, more devices have high computing power, such as mobile phones, tablets and Home Theatre PCs (HTPC). Their computing power is increasing while their power consumption is decreasing. They can be the potential devices of gird computing. Plus, development of high speed connection to the Internet through traditional connection and wireless connection makes grid computing easier than before. It is believed that more computing devices can join the grid computing and share the idling computing power. At the same time, the institutes can join their supercomputers and mainframe to speed up the processing. This may form an ultimate virtual supercomputer.

In shorts, grid computing will still develop to help scientists to study difficult problems in future. With more computing devices are available to join grid computing model. The potential of developing grid computing model is unlimited. Also, it is better to promote people to join the grid computing to share their idling computing power in order to greatly boost up computing power of grid computing. Let their devices to join the grid computing is better than idling their devices and do nothing. At least, there are some contributions to society. It is no doubt that developing grid computing is beneficial to us.

Reference
1.    Grid computing – Wikipedia. Retrieved October 2011 from Wikipedia: http://en.wikipedia.org/wiki/Grid_computing
2.    Folding@home. Retrieved October 2011 from Folding@home http://folding.stanford.edu/English/Main
3.    Folding@home – Wikipedia. Retrived October 2011 from http://en.wikipedia.org/wiki/Folding@Home

Thursday, November 3, 2011

Applications of the recursive computing paradigm

Recursion is a computing technique that helps solve complex problems in a simple way, by using the result of the previous value of that function and finding a relation between the previous result, current result and the current value. The result we use would have been obtained using the answer for previous value, and the cycle would go on till we would reach a value for which we know the answer. This value, also referred to as the ‘base case’ would give us the answer to all the other cases, known as the ‘recursive cases’.

 For example, let us calculate the result of n! (Factorial of a number n) using recursion. In recursion, we  will use the value of n-1!, and multiply number n to it. To obtain the value of n-1!, it would use the value of factorial of n-2, and so forth. This continues till it reaches a value for which we know the answer. In this example, we know that the value of 1! is 1. Thus we would keep reducing the problem from n to n-1 to n-2…to 1, for which we know the answer. We would use this answer to find values of all unknown successive values. So we multiply 2 to 1! to get the factorial of 2 and we keep multiplying successive numbers to get the value of their factorials. The basic idea of recursion is to reduce a problem, and take it to the step for which we know the value.

Recursion is very useful in solving complicated problems, such as the traditional problem ‘towers of hanoi’.In this problem, we are given n number of disks, and our job is to transport the disks from tower A to tower C (as shown in the figure). However, we can only move one disk at a time, and also we cannot place a bigger disk on a smaller disk.
Picture Source: http://www.alper.net/wp-content/uploads/2008/03/hanoi.jpg
The problem is very complicated to solve with iteration. However, it is a small and comparatively easy to understand problem with the recursive approach. Here is the approach-

a)    Consider it is solved for n-1 disks, and we can move n-1 disks at our disposal.
b)    So, we move n-1 disks to tower B.
c)    Then we move disk n to tower C.
d)    Now, we can move the n-1 disks to tower C, over disk n, and the problem is solved.
e)    The problem for n-1 disks would have been solved by using the same procedure replacing n by n-1 and n-1 by n-2. We would keep reducing the problem, till we reach a value for which we know the answer. In this case, we know that for n=1, we move the disk 1 directly to Tower C.

Thus in recursion, we just need to find a relation that gives us the answer for ‘n’ and we believe that we know the answer for n-1. Though, in reality we do not know the answer to n-1. But we find that out by finding out the answer for n-2, n-3 and so forth till we reach a given value for which we know the answer. Hence, recursion makes problem solving easy compared to other approaches.

However, the disadvantage of recursion shows when it comes to computing. Recursion is a slower and more space taking process than iteration, thus making its efficiency lesser than iteration for the same algorithm. Recursion requires more space as it stores the values to be calculated in ‘stacks’. It is slower as function calls take time compared to calculations within the same function. To understand this, we can compare the computer to our brain. For example, if we were given a problem to add all numbers from 1 to 5. The iterative way would require us to add 1, and then add 2 to it, and result in us just adding just all numbers from 1 to 5 without thinking anything. However, a recursive approach would require us to start from 5, and add to it the sum of numbers from 1 to 4. This would require us to keep 5 in our brain and find the sum of numbers 1 to 4. This in turn would require us to keep 4 in our brain and solve for all numbers from 1 to 3, and repeat the cycle till we reach the base case, which is adding all numbers from 1 to 1, which we know is 1. So, recursion would require us to keep numbers from 2 to 5 in our brain which requires additional space. Also, it makes calculation slower than the iterative procedure, as we can see that it is more complicated for us to try and call the ‘function’ again and again, rather than simply adding the consecutive numbers as a part of the same function.
But we cannot forget the importance of recursion in daily life in solving complicated problems. Problems such as the towers of Hanoi are made so much simpler by using recursion! Also, recursion at times has solutions to problems which are faster than their iterative equivalents. For example, quick sort, an application of recursion is one of the fastest sorting techniques used in modern world!

Tuesday, October 25, 2011

The Wolfram Alpha search engine

A new search engine called Wolfram Alpha is launched officially on May 15, 2009. In fact, it is a computation knowledge engine rather than a search engine. Instead of giving users some websites which may contain the information they look for, Wolfram Alpha can compute answers and relevant visualizations to factual queries from its knowledge base of structured data. The designer of Wolfram Alpha, Stephen Wolfram, believes that it will change the way people use online data.
   
This short survey will focus on the principle of Alpha’s question answering feature and how it will affect other weds.

With the help of sophisticated Natural Language Processing algorithms, Alpha can understand the meaning of input fed by a user and try to provide answers from its vast repository of data relevant to the likely cultural stance of the user. For example, even two users enter the same word "Cambridge", the one with a UK-based IP address  will receive the data about the Fenland City while the one with a US-based address will receive the data about the Massachusetts Town.

The performance of Wolfram Alpha greatly depends on the scale of its curation system. It stores huge amount of information and algorithms. Mathematica, a powerful tool for scientific computation, is used to manage such a large-scale data curation system and help Wolfram Alpha compute answers.

Mathematica does not only include algorithms for mathematical computation, but also the whole spectrum of logical, numerical, graphical, symbolic, and other computation. Meanwhile, the fundamentally symbolic nature of the Mathematica language allows an unprecedented degree of interoperability between different parts of the system, and between different algorithms and data sources. For example, if you search for "China GDP", Wolfram Alpha will generate a graph and other relevant economical data to you.

Inside Mathematica, a technology called Automatic algorithm selection enables Mathematica to select and apply the best algorithm(s) for a given task. So Wolfram Alpha can generate answers with high order of accuracy. For example, "lim(x->0) x/sin x" yields the expected result, 1, as well as a possible derivation using L'Hôpital's rule.

In terms of technology, Wolfram Alpha is entirely developed and deployed with Mathematica and Mathematica technologies. It is built on top of 5 million lies of Mathematica code which currently run on top of about 10,000 CPUs. In order to serve the public, a software called gridMathematica which increases the number of parallel processes that Mathematica can run at once is introduced. By increasing the number of tasks available, some types of problems can be solved in less time

Wolfram Alpha is an amazing product combining computational capabilities with linguistic processing capabilities. Due to its direct answering feature, many people compare it with Google search and call it "Google killer". Although Wolfram Alpha is powerful, it may not beat Google. Wolfram Alpha can only turn generic information into specific answers for factual queries. If we use it to search for a song called "sad but true", it will provide us the information about depressed. Furthermore, Google is a company but not a technology. In addition to its quality, the success of Google search also rely on its marketing, sales, customers, developers, brand reputation and luck.

However, Wolfram Alpha will probably be a worthy challenger for Wikipedia and many textbooks and reference works. Since users can get a direct answer to their question together with a nicely presented set of graphs and other information when they look for basic encyclopedic information, people will gradually replace them with computation knowledge engine like Wolfram Alpha.

In conclusion, even though Wolfram Alpha is not a perfectly mature technology, it shows the future trend of search engine. It is foreseeable that future search engines will like AI. Humans can normally communicate with them through natural language and they can respond to us with satisfactory answers directly. 

Reference
1.    http://news.bbc.co.uk/2/hi/technology/8052798.stm
2.    http://en.wikipedia.org/wiki/Wolfram_Alpha
3.    http://www.theregister.co.uk/2009/05/18/wolfram_alpha/
4.    http://www.readwriteweb.com/archives/wolframalpha_our_first_impressions.php
5.    http://blog.wolframalpha.com/2009/05/01/the-secret-behind-the-computational-engine-in-wolframalpha/
6.    http://www.wolfram.com/technology/guide/AutomaticAlgorithmSelection/
7.    http://www.wolfram.com/mathematica/how-mathematica-made-wolframalpha-possible.html
8.    http://en.wikipedia.org/wiki/GridMathematica
9.    http://bits.blogs.nytimes.com/2009/03/09/better-search-doesnt-mean-beating-google/

Friday, October 7, 2011

Computing Algorithm used in Google Search

Google Search is a search engine owned by the ‘Google Incorporation’. According to a statistical analysis carried out by ‘Alexa Traffic Rank’, Google Search is the most visited search engine on the Internet. It receives unfathomable amounts of queries every day and carries out the search in a fraction of a second. Since its launch in 1997 by Larry Page and Sergey Brin, Google Search has gone through a series of substantial changes in its algorithms. Every year, Google Inc. invests obscene sums of money on the Research and Development of the algorithms used. It is the innovation and novelty of these algorithms which is the basis of Google’s rise to success.

Google Inc. has acquired patent rights of the algorithms employed in Google Search. However, it is possible to outline the core logic behind these complex algorithms.  Simply defined, whenever a query is keyed in, the Search algorithm initially accepts it as simple text. It then breaks up the query into a series of search terms.  These search terms are usually words which are matched with the content of the websites in Google’s database. The websites containing the required words are then displayed. However, every search engine on the internet uses a search algorithm which is very similar to Google’s search algorithm. So, the question arises that, what differentiates Google from other search engines? The answer to this question lies in the difference in the algorithm used to rank the web pages generated by the search algorithm. The ranking algorithm used by Google Search is called the ‘PageRank’ which renders Google more successful than all other search engines. Hence, PageRank would be the main focus of this critique essay.

Page rank is a patented algorithm which aims to rank the webpages according to ‘relevance’ to the query keyed in. According to researchers at Google Inc., PageRank, unlike other ranking algorithms, tries to list the web pages according to the human concepts of relevance and importance. In order to qualitatively analyze PageRank, it is important to develop a general understanding of the basic logic of the computation carried out by PageRank. The main assumption underlying the logic of PageRank is that the webpages linked from other highly ranked pages are likely to be more important. So, in simple words, the World Wide Web acts as a giant recommendation system where webpages vote for other pages by sending outgoing links to them. Moreover, the votes from “more important” pages carry more weight.

According to Google, the PageRank actually calculates the probability that a person randomly clicking on links will arrive at any particular web page. So, it initially computes the number of human-generated links on each webpage. Furthermore, it allocates the weights of every link based on the number of links coming out from the source webpage. This means that the PageRank carried by any outgoing link is equal to the document’s own PageRank divided by the number of outgoing links of that document. So, the incoming PageRank of any webpage is the sum of the PageRanks carried by all incoming links.

Furthermore, as the algorithm is used to find the probability of visiting a webpage, PageRank further assumes that the user who is randomly clicking on links will stop clicking after some finite time. The probability that the user continues to click after reaching a particular page is called the ‘damping factor (d)’. So logically, this damping factor is multiplied to the sum of incoming PageRanks in order to calculate the probability of reaching a webpage through external links. However, a webpage can be visited directly by typing the URL in the browser. Intuitively, the probability of reaching a page in this way is calculated by subtracting ‘d’ from 1. So the final PageRank is the sum of these two probabilities.

A simple analysis of the PageRank algorithm depicts that the PageRank is relatively easy to implement for practical purposes. It has an optimal substructure, which means that the result generated by the PageRank algorithm can be processed using the ‘Greedy Method’. The pages with higher PageRanks can be displayed higher in the list of pages, and this greedy method is sure to yield the desired optimal result.    

Furthermore, the PageRank algorithm is easy to understand. This special property of the PageRank algorithm means that programmers can easily manipulate the algorithm to debug it and to keep it more updated. This leads to the dynamic nature of PageRank and ensures that it can cope up with technological modifications in the future.

In addition to these strategic advantages, PageRank also is much better than other traditional ranking algorithms as  it carries out ‘link analysis’. This means that PageRank not only considers the incoming links but also the importance of the source of these links. This results in much more greater relevance of the ranked results returned to the user as compared to traditional methods.  ‘Link analysis’ is also designed to protect users  from spammers. Ranking algorithms which only consider the content of webpages to generate results can easily be spammed. Spammers usually have financial motives associated to their websites. As spammers control the content of their webpages, they can attach meta-tags and special keywords into the HTML of their webpages, which these algorithms use to evaluate the data in a page. But, the content on the page may actually be very different. So, they try to mislead these algorithms into conferring them an unfairly high rank so that their webpages show in the top of the results list. However, due to the use of ‘link analysis’ in PageRank, these spammers are rendered extremely less successful because they have little or no control over the webpages which send incoming links to their pages.

On the other hand, PageRank also has several grave limitations. One of the major questions that can be raised is whether PageRank is adequately scalable. This means that can the algorithm can cope up efficiently when the size of data to be processed becomes extremely large. If encountered with a very large database of web pages, the PageRank algorithm would require a very large memory space to store them. Furthermore, as the algorithm not only maps page ids but also separate terms; the increase in the number of webpages would result in much larger memory requirement, which would not only be less efficient but also very expensive.

In addition to this, the runtime of PageRank is longer as compared to other ranking algorithms. The reason for this is that the calculation of PageRank not only involves  consideration of all the links (which can be considered as the edges which have to be visited once during the calculation) but it also requires calculation of weights of the PageRank. This results in complexity of the calculations and hence the PageRank is expected to be slower as compared to other ranking algorithms.

Another problem that is likely to occur with an algorithm such as PageRank is that sometimes it is not necessary that the pages occurring on the top of the sorted list must be relevant to the query keyed in. There can be several reasons for this. Most of the highest ranked webpages (as they are linked to several other webpages)  for example Google, Yahoo, or BBC etc are inhomogeneous in terms of theme. This means that websites such as these may be placed highest in the ranked result, however, they may be thematically unrelated to the query. Furthermore, links can be bought by spammers in order to attain higher ranks for their pages. These issues are still unaddressed by the PageRank algorithm.

In addition to this, careful analysis of the algorithm also reveals another limitation of PageRank. If there are webpages on the internet which only have incoming links and no outgoing links, it means that the PageRank given to them is undistributed and hence these webpages act as sinks of PageRank. This sinking of rank would result in a disequilibrium and would make the calculation of PageRank much more unreliable.

There is another significant reason to doubt the accuracy of PageRank. The links to and from webpages are dynamic and may change over the period of time. This may be the result of hundreds of websites being launched and several being removed every day. Moreover, changes in interests can also result in the regrouping of links over time. For example, political websites may experience increase in incoming links during polling seasons. This requires the database to be constantly updated in order to achieve accuracy. However, the PageRank database is only updated on quarterly basis per annum. This may result in unjust calculation of the PageRank!

The above discussion depicts that the current version of PageRank is still far from Larry Page’s claim that PageRank is able to  “understand exactly what you mean and give you back exactly what you want.”  However, it can be stated that it is still the best of all contemporary ranking algorithms. This is due to its specialized feature of ‘link analysis’. Although, PageRank is expected to have a longer run time than other algorithms, PageRank is still ‘super-fast’ according to human perception. It is able to do calculations in less than human reaction time, which makes it acceptable for the users. The problem of rank sink can be addressed using a systematic method. For example a simplistic approach can be the creation outgoing  links from the ‘rank-sink webpages’ to all webpages in the database. This would result in an even distribution of the PageRank and hence it would eliminate sinking of rank.

So, to conclude, if Google makes similar adequate modifications and the PageRank database is updated more frequently, PageRank would become much similar to what Larry Page aims it to be.

References
  1. Wikipedia
  2. http://www.beninbrown.com/2010/09/15/time-reevaluate-google-pagerank/
  3. http://dev.whydomath.org/node/google/math.html
  4. http://www.voelspriet2.nl/PageRank.pdf
  5. http://delivery.acm.org/10.1145/1150000/1145629/p233-desikan.pdf?key1=1145629&key2=5619550921&coll=DL&dl=ACM&CFID=115606932&CFTOKEN=11892472

Thursday, September 22, 2011

Applications of the greedy approaches

“Greedy approach”, or more technically, “greedy algorithm”, is a useful method to tackle some of the optimization problems. In optimization problems, they usually involve step-by-step decision making. The “greedy” sense of the “greedy algorithm” actually comes from the idea that we ignore what decisions have made in previous stages, and simply focus on the sub-optimal solution in every single stage. In this way, we attempt to derive the overall optimal solution. Take the “activity selection” problem as an example:

“There are N activities for someone to choose for participation. Every activity has its starting and ending time. Activities may clash in time. For instance, activity A is from 2pm to 5pm, while activity B is from 3pm to 6pm. In this case, only one of these two can be chosen. How to choose the activities so that one is able to join the maximum number of activities?”

In this problem, we can see that we need to choose activities one by one, and when choosing each activity, we need to consider which activities are still available at that moment. One may come up with a single rule of thumb to make each decision. For instance, we could repeatedly choose the currently available activity which starts at an earliest time, without making any time clashes. However, “greedy algorithm” may not be always correct. Let us consider the “greedy” method mentioned above, i.e. we keep on choosing the earliest starting activity, and try to use this algorithm to deal with the situation below:

Number of activities (N) = 3

If we adopt the above method, we can only choose activity A. But obviously, this is not the optimal solution, because we can choose activities B and C instead. What should the correct method be? We observe that if one can choose an activity which ends earlier, then more time could be allocated for choosing other activities. Thus, we can raise another “greedy” method, i.e. repeatedly choose the earliest ending activity, without making any time clash. By considering again the above situation, we would firstly choose activity B, because it ends most early (2:00 pm), and then choose activity C (ends at 4:00 pm). This seems working. But does this always work?

The answer is YES. The explanation is as follows. Consider a list of activities as time intervals drawn in the diagram below:

 
In the diagram, all the activities are sorted in ascending order of ending time and the red intervals are the chosen activities by the above method.

Suppose the contrary that the above method does not yield an optimal solution, i.e. there exists an activity X (green in colour) within the above diagram and it is not chosen. 3 cases are considered:

Case (i): Activity X ends earlier than start of 1st chosen activity OR starts later than the end of last chosen activity.
This is impossible because if such activity X exists, the “greedy” method mentioned is able to choose it, and this contradicts to the real situation.

Case (ii): Activity X lies between end of a chosen activity and start of next chosen activity.
Similar to case (i), if activity X exists like this, the “greedy” method is able to handle such case. Thus case (ii) also leads to contradiction.

Case (iii): Activity X has time clash with at least one of the chosen activites (red one).
This can be a possible situation. But the problem now is to find out a better solution than the “optimal” one found by the above “greedy” method. Case (iii) obviously does not yield a better solution, since if activity X is chosen instead, at least one of the activities in “red” must be discarded.

As a result, such “greedy” method is correct.

In this example, we learn that “greedy” methods may be very intuitive, efficient, yet wrong. To figure out whether a “greedy” method is correct, one should either provide a logical proof, or a counter-example to the method.

Sometimes, the “greedy” algorithm behind requires more thorough thinking, and the proof could be quite subtle. Let us come to another problem below.

[Modified from the problem “Saving Endeavour” (see reference)]
“There are N toys. Each toy needs to pass through 2 main processes: moulding and painting. Obviously, moulding must be done before painting. The time for moulding and painting(mi and pi respectively for ith toy) for every toy may be different. Now we only have 1 moulding machine and 1 painting machine. How should we produce the N toys in a suitable order such that the total production time is minimized?”

For example, now there are 6 toys in total. Their times for moulding and painting are tabulated as shown:

If we manufacture the toys in the order B, E, F, A, D and C,

Total time required would be 28 minutes, and this is minimal for such case.

This problem is much more complicated than the “activity selection” problem. It is because one can come up with even more possible “greedy” approaches:
  1. Choose the toy with minimum time for moulding first?
  2. Choose the toy with minimum time for painting first?
  3. Choose the toy with minimum sum of times for moulding and painting first?
  4. (And more……)
Unfortunately, the first three approaches mentioned above are all wrong. Many counter-examples can be constructed and are skipped here. But why could we think of such a lot of incorrect approaches? This is because we have not yet considered the key properties in this problem.

We can observe that we should try our best to make the moulding machine complete one job as soon as possible, so that it is readily available for production of other toys. In other words, the moulding machine should produce toys with the shorter moulding time first.

But this is NOT enough. The painting time should also be considered. As the start of painting is solely dependent on the completion of moulding, the painting machine may sometimes need to “wait” the moulding machine to finish its job. Therefore, to utilize the resources well, it is not difficult to see that we should minimize the “idle time” for painting machine. To do this, we should always produce the toys with shorter painting time as late as possible, so that the painting machine could somehow work while the moulding machine is still working.

By combining these two factors, we could then derive a “greedy” approach. We can sort the toys in an order such that for every pair of toys a and b (a < b), then min(ma, pb) < min(mb, pa). This is complicated, but makes sense, because as mentioned, we should produce the toy with shorter moulding time first, and delay the production of toys with shorter painting time. If we compare any two toys by using such mathematical expression, if either ma or pb is small enough, it would lead to earlier production of toy a.

Take the comparison between toy 2 and toy 4 as an example. Since min(1,4) = 1 is smaller than min(5,2) = 2, toy 2 should be produced before toy 4.
The proof is shown below.

Suppose toy a is produced, and let D be the time required for painting machine to be available again (counting from start of moulding toy a), then there are 2 cases:

Case (i): D is smaller than ma.
Case (ii): D is greater than ma.
Thus, the time required to finish painting toy a would be max(D - ma, 0) + pa.

Suppose toy a is produced prior to toy b, and let D’ be the time required for painting machine to be available again after painting toy a, then

D’ = time required to paint toy a = max(D - ma, 0) + pa
 
As a result, time required to finish painting toy b would be
max(D’ – mb, 0) + pb = max(max(D - ma, 0) + pa - mb, 0) + pb

Suppose toy b is produced prior to toy b, then time required to finish painting
toy a would be max(max(D - mb, 0) + pb – ma, 0) + pa

If we claim that a shorter total time can be reached when we produce toy a prior to toy b, then

max(max(D-ma, 0) + pa-mb, 0) + pb < max(max(D-mb, 0) + pb-ma, 0) + pa
max(max(D-ma, 0), mb-pa) + pb + pa - mb < max(max(D-mb, 0), ma-pb) + pa + pb – ma
 max(max(D-ma, 0), mb-pa) - mb < max(max(D-mb, 0), ma-pb) – ma
max(D-ma, 0, mb-pa) - mb < max(D-mb, 0, ma-pb) – ma
max(D, ma, ma+mb-pa) - mb - ma < max(D, mb, ma+mb-pb) – ma-mb
max(D, ma, ma+mb-pa) < max(D, mb, ma+mb-pb)
max(ma, ma+mb-pa) < max(mb, ma+mb-pb)
max(-mb, -pa) + ma + mb < max(-ma, -pb) + mb + ma
max(-mb, -pa) < max(-ma, -pb)
-min(mb, pa) < -min(ma, pb)
min(ma, pb) < min(mb, pa)

This is exactly what we have proposed.

From these two problems, we learn that “greedy” approaches could be useful in our daily lives. The “activity selection problem” is a very good example. Due to limited amount of time for most people, we often encounter situations of time clashes for various events. In order to attend as many events as possible, “greedy” approach could be used and this could optimize the utilization of time. For the “toy manufacturing problem”, it is particularly important because resources are scarce and so we need to make sure that the machines are working in the most efficient way in order to reduce the production cost.

Despite the fact that both “activity selection problem” and “toy manufacturing problem” oversimplify the real situation (since attending different event may have different satisfaction, and more machines and stages are involved in manufacture), such “greedy” approaches could serve as a simple, rough, yet efficient estimation.

References
  1. “Activity Selection Problem” http://en.wikipedia.org/wiki/Activity_selection_problem
  2. “Saving Endeavour” http://poj.org/problem?id=2751

Sunday, July 24, 2011

Worst-case analysis, average-case analysis, and probabilistic analysis

Problem:

This problem is about several important but related concepts: worst-case analysis, average-case analysis, and probabilistic analysis. Consider the following “hiring” algorithm:

  1. Set candidate best to be unknown;
  2. For each of the n candidates, do the following:
  3. Interview candidate i;
  4. If candidate i is better than candidate best, then hire candidate i and set best to be i;

Assume that interviewing each candidate has a cost of In and hiring a candidate has a cost of (where > in under normal circumstances).

(a) Can you give the worst case total cost of the above hiring algorithm?

(b) Assume that the candidates come to the interview in a random order, i.e., each candidate is equally likely to be the best. Specifically, candidate i has a probability of 1/i -- to be the best among the first i candidates. Can you give the average case total cost of the above hiring algorithm?

Hint: You can consider to use the “indicator random variable” , which is equal to 1 if candidate i is hired and 0 otherwise. Hence, the average number of candidates that are actually hired is equal to the “expected value” (i.e., the average) of .

Follow-Up:

(a) The worst case is that every interviewed candidate is hired. Thus, the total cost is:

(b) In this average case, the only change to the total cost is the hiring part. So let’s focus just on this part. Specifically, as given in the Hint, the average number of candidates that will be hired is:
which in turn is equal to: . A good bound for this sum is: log n. Thus, the average case hiring cost is just log n , which is much smaller than the worst case’s hiring cost.

The lesson we learn is that average case is sometimes much better than the worst case.

Programming using stack

Problem:

A stack is a useful structure for storing data because it has many useful applications. For example, it can be applied in implementing recursive computing method. A stack is a “last-in-first-out” structure — data items are stored linearly as one item is placed on top of the other. The only two operations are: pop and push. The pop operation retrieves the top data item on the stack, while the push operation puts a new data item on the top of the stack. The figure below shows how a stack works:


(a) Can you write up a computing procedure for DFS using a stack instead of using recursion?

(b) Can you suggest a practical situation where we must use a stack instead of using recursion?

Follow-Up:

(a) The first possible version is as follows.

Stack_DFS1(x):
  1. Mark x; Push(x);
  2. if stack is empty, then end of algorithm; otherwise, do the following:
  3.    y = top of stack (without popping it);
  4.    if there is a neighbor z of y that is not marked, then do:
  5.       Mark z; Push(z);
  6.    else
  7.       Pop;
  8. go to Step 2.

Another possible version is as follows.

Stack_DFS2(x):
  1. Push(x);
  2. if stack is empty, then end of algorithm; otherwise, do the following:
  3.    y <- Pop;
  4.    if y is not marked, then Mark y;
  5. for every neighbor z of y that is not marked, do the following:
  6.    Push(z);
  7.    go to Step 2.

(b) In the Web crawling application, we need to use a stack. In fact, crawling is done by many computers running the “search” (or explore) simultaneously. Each computer takes the next node to be explored from the top of the stack, downloads the HTML file, and then scans it for further hyperlinks. But when a new HTML document is found (i.e., a “neighbor”), no recursive call is made; instead, the new node is pushed onto the stack.

Yet one interesting question remains: when we see a HTML document, how do we know it is indeed “new” (i.e., not marked) so that we want to push it on stack?

Thursday, July 21, 2011

Parallel processing: data parallel and functional parallel

Comment:

Differences between data parallel and functional parallel approaches?

Follow-up:

Data parallel approach is suitable for situations where exactly the same actions can be taken for different pieces of data. A very good example is filtering or quantization in image processing because these tasks require operating on each pixel independently for the whole image. Another daily life example that we discussed in class is the painting of a large wall (in the same color!) by a group of workers in parallel.

Functional parallel approach is suitable for situations where different actions can be taken on the data (possibly the same) independently and simultaneously. A very good example is the recognition of different objects in the same image. Here, we apply different “actions”, which are the matching of different known object models, over the same image. These actions are independently usually and are different because matching different models require different kinds of processing. Another daily life example that we discussed in class is the re-modeling of an apartment — we usually need to do many different things such as installation of electrical wirings, painting of walls, installation of fixtures, etc. These functions (sometimes) are independent and can be carried out in parallel.

Finally, one very important difference between the two approaches is the “degree of parallelism” — which can be defined as the number of processors that you can use in a productive manner (i.e., having some useful work to do). Obviously, in data parallel approach, if the data involved is large (e.g., a very large image), then we can use many processors (e.g., up to thousands) to enhance the speed-up. On the other hand, usually there will not be many different (and independent) functions required in a task (e.g., re-modeling of an apartment). So the number of processors that can be put into productive use is usually limited (e.g., on the order of 10 or lower).

Think about “weather forecasting” computing. Which approach do you think is more suitable?

Comment:

Functional parallel computing can only be used when the different functions are independent of each other. So, for example, in the case of “object tracking”, as the different functions are actually dependent on each other, i.e., the object tracking can only be done after background subtracting, so the functions need to be done one after another. Hence, functional parallel computing could not be used in this case. In a sense, this is “completely sequential”?

Follow-up:

This is an excellent observation.

Principles of the travel scheduling problem

Comment:

Underlying principles of the travel scheduling problem?

Follow-up:

Frankly speaking we also do not know the complete picture of this problem. Yet from an intuitive angle, the problem enlarges the complexity of the original shortest path problem in at least two dimensions. Firstly, we need to solve multiple instances of the shortest path problem for any single round-trip tour (e.g., one day in the trip). Secondly, we need to combine these shortest path solutions temporally across different tours so as to obtain the minimum overall cost for the whole trip.

Tuesday, July 19, 2011

Links of Readings

Required Reading

Recommended Reading 

  • Feynman, R., Hey, A., & Allen, R. (2000). Feynman lectures on computation. Cambridge, MA: Perseus Books.
  • Schneier, B. (2004). Secrets and lies: Digital security in a networked world. Indianapolis, Ind.: Wiley.
  • Wing, J. M. (2008). Five deep questions in computing. Communications of the ACM, 51(1), 58-60.

HW: Sorting in the daily life

Problem (10 points):

This problem is about a common daily life application of sorting. Specifically, sorting (e.g., QuickSort) is usually done before we want to search for something. For example, bank accounts are always sorted according to the account numbers in order to facilitate searching a particular account on request.
Consider, as shown above, a sorted (in ascending order) list of n numbers (assume n is a power of 2, i.e., n = (2^k) for some positive integer k ). To search for a number x in the list, we can use the following algorithm. Compare x with the number in the middle of the list (i.e., m shown above). If they are equal, then our job is done. If x is smaller than that middle number, then we repeat this search process using just the left half of the list; otherwise, we repeat this search process using just the right half of the list.

(a) Why we can use just the left half of the list if the first comparison tells us that x is smaller than
the middle number?

(b) Please express the above algorithm as a recursive method.

(c) Please give the number of comparisons needed in the worst case. Explain your answer.

HW: Difference between two multiplication methods

Problem (10 points):

In Meeting #1, we discussed about two multiplication methods: the traditional method and the lattice method (see slides 16 and 17). Please answer the following questions.

(a) Are the amounts of effort different in the two methods? Explain.

(b) Which method you prefer to use? Explain.

HW: "Hard" and "Unsolvable"

Problem (20 points):

This problem is aimed at enhancing your understanding of “hard” and “unsolvable” problems.

In 1965, computer chip pioneer Gordon E. Moore noticed that transistor density in chips had doubled every year in the early 1960s, and he predicted that this trend would continue. This prediction, moderated to a doubling (in transistor count) every 18 months, is known as the Moore’s Law. Often times, people extend the meaning of Moore’s Law to imply that computer processing speed doubles every 18 months.

(a) According to the speed-doubling result of Moore’s Law, how many times a computer today is
faster than one ten years ago?

(b) Suppose we have a computing problem that can be solved in estimated running time on the order of n , where n is the problem size (e.g., the number of nodes in a graph). Using the same amount of time, how much bigger a problem we can solve with a computer today than with one ten years ago?

(c) Suppose we have a computing problem that can be solved in estimated running time on the order of (n^2) , where n is the problem size (e.g., the number of nodes in a graph). Using the same amount of time, how much bigger a problem we can solve with a computer today than with one ten years ago?

(d) Suppose we have a computing problem that can be solved in estimated running time on the order of (n^6), where n is the problem size (e.g., the number of nodes in a graph). Using the same amount of time, how much bigger a problem we can solve with a computer today than with one ten years ago?

(e) Suppose we have a computing problem that can be solved in estimated running time on the order of (2^n), where n is the problem size (e.g., the number of nodes in a graph). Using the same amount of time, how much bigger a problem we can solve with a computer today than with one ten years ago?

(f) For Problem 3 of Tutorial 3, someone suggested that we might solve the problem (i.e., finding
integer roots for the equation) by simply testing every possible integers. Please comment.

HW: Dynamic programming

Problem (10 points):

In Tutorial 3 Problem 2, we worked on the knapsack problem in which there is only one unit of every item i. Here we consider the version where there is an unlimited supply of every item. In this version, we can consider a dynamic programming formulation for computing this quantity: K(x), which is defined as the maximum value achievable with a knapsack of capacity x.

(a) Please try to give a recurrence for K(x)?

(b) Please try to write up the computing procedure for K(x).

(c) What is the estimated running time (in terms of n and W)?

(d) Please draw a graph (with directed links) to illustrate the computation (just like the one I showed you in Tutorial 3 and in Tutorial 1 follow-up). How would you describe the computation using this graph?

HW: "Greedy" computing

Problem (10 points):

We just learnt about “greedy” computing method—trying to choose the best move in every step,
without regard to future options.

(a) What do you think is the necessary condition for the greedy computing method to find optimal solutions?

(b) How do you compare greedy computing method with recursive method (or divide-and-conquer)?

(c) Can you find other daily life computing problems where greedy method is good?

Sunday, July 17, 2011

Genetic Algorithm

Comment:

All of the following are related to Genetic Algorithms (GAs).
  • How to ensure that a better solution can always be found in the next generation?
  • How does a computer evaluate the “fitness” of a chromosome?
  • Depending on the encoding scheme chosen (e.g., using a bit string to represent whether or not a link is selected in a path), it is possible that a lot of invalid solutions are included in the chromosome population. Will this actually decrease the efficiency of the GA?
Follow-up:

First of all, there is no way we can guarantee a better solution to be found in a new generation. In a sense, the crossover and mutation operators serve as “random” searching tools in a vast solution space. With luck we might be able to derive a better solution; otherwise, we will end up getting worse solutions in a new generation. Then, using the selection step, the newly obtained worse solutions might get weeded out. Thus, in a practical GA, we usually employ the “elitism” policy, i.e., although the currently best solution is still subject to crossover and mutation, it will be explicitly “protected” in that a copy of it will be retained for the next generation.

The “fitness” of a chromosome is really just the metric we define for a potential solution. For instance, in the case of the shortest path problem, we can define the fitness of a chromosome to be the length of the path (if it is a valid path).

Yes including a lot of invalid solutions can possibly decrease the efficiency (in terms of the
convergence time) to find the best possible solution. However, doing so could also help the “exploratory” side of the GA because the invalid solutions might also contain some “good
components” (i.e., good genes in a bad chromosome) so that through crossover, such good components might get passed on to some valid solutions in new generations.

I have also enclosed a research paper that I co-authored many years ago about using a GA for multiprocessor task scheduling. Hope you would be able to find time to glance through it to see another example of encoding scheme, the design of crossover and mutation operators, and most importantly, the performance of the GA.

Sequential mode or parallel mode?

Problem:

A uniprocessor computer can operate in either sequential mode or parallel mode. In parallel mode, computations can be performed nine times faster than in sequential mode. A certain benchmark program took time T to run on this computer. Furthermore, suppose that 25% of T was spent in parallel mode whereas the remaining portion was in sequential mode.

(a) What is the effective speedup of the above execution as compared with the condition when parallel mode is not used at all?
 
(b) What is the fraction α of parallelized code of the benchmark program?

(c) Suppose we double the speed ratio between parallel mode and the sequential mode by hardware improvements. What is the new effective speedup?

(d) Suppose the speedup you calculated in (c) is to be obtained by software improvement alone in terms of parallelization fraction, instead of by any hardware improvement. What is the new parallelization fraction required?

Follow-up:

(a) If parallel mode is not used at all, the execution time will be 0.75T + 9 × 0.25T = 3T .
Therefore, the speedup of the program is 3T ⁄ T = 3 .

(b) The fraction of parallelized code is given by:



(c) The new execution time now becomes:

And the new speedup is:


(d) Parallelization in general is done by the compiler (an important piece of software for translating your program into machine language). Since the new speedup is to be obtained by software instead of hardware, the speed of parallelized code is still 9 times faster than sequential mode. Let β be the new parallelization fraction. Then, we have:

 

Thus, we have


That is, the compiler has to parallelize 5% more of the benchmark program.

Comparsion between insertion sort and quick sort

Comment:

It will be good to know the details about the running time estimates of n and n log n for insertion sort and quicksort, respectively.

Follow-up:

Let’s take a look at insertion sort first. We know that insertion sort works by taking each item (e.g., a card) and then compare it with each of the remaining items on the list so as to find a right  position. After the first item is done, we repeat the process for the second item.

Now, suppose we have n items in total, then we need to repeat the above process n times. The next question is: how many comparisons are needed for each item? It is not difficult to see that for the first item, we need to do n – 1 comparisons. For the second item, we need to do n – 2 comparisons. You can go on with this argument until the last item. Thus, the total number of comparisons is:



So, for very large n , we can approximate by (n x n) / 2 . As we are mostly interested in the “order of 2 magnitude”, we can further approximate this by just (n x n).

For quicksort, the mathematical derivation requires solving a “recurrence equation”. Thus, the
mathematics needed is really beyond the scope of the course here. Yet, to get a rough idea, it is similar to the recursive exponentiation we tackled in Tutorial 1 (problem 2).