Everyday Computing and the Internet -- HKU CCST9003 Common Core

In order to make informed decisions in this information age, everyone needs to have an efficient way to sift through and evaluate the myriads of information that is available through the internet. The ultimate objective of this course (HKU CCST9003) is to help students develop a “computational” state of mind for everyday events. We will also discuss intensively the societal impacts of computing technologies on our daily life.

Showing posts with label searching. Show all posts

Friday, November 11, 2011

Brief introduction about Wolfram Alpha

Pau Wing Hong

Wolfram|Alpha (also known as Wolfram Alpha) is more than a search engine like Google. Instead, it is an answer-engine/computational engine developed by Wolfram research. Traditional search engines like Google and yahoo, which are only capable of providing a list of links to information, don’t answer questions. They only take your keywords at face value and don’t always yield good results. What really makes Wolfram Alpha shines is that it can compute, just like a calculator. It computes solutions and responses from a structured knowledge database.

Since the day it starts, the Wolfram Alpha Knowledge Engine contains 50,000 types of algorithms and models and over 10 trillion pieces of data. It is still in development, and is always accumulating new information to its database. As Wolfram|Alpha is running on 10,000 CPUs with Mathematica running in the background, it is capable of answering complicated mathematical questions.

The service is built on four basic pillars: a massive amount of data, a computational engine built on top of Mathematica, a system for understanding queries and technology to display results in interesting ways. Wolfram Alpha is also able to answer fact based questions such as “When did Steve Jobs die?” It displays its response as date, time difference from today and Anniversaries for October 5, 2011.

There are a number of things that Wolfram Alpha vastly different from Google. First of all, it is capable of answering complex queries. If complex search queries are typed into Google, it will get confused. This is because it can’t compute, unlike Wolfram Alpha. Just like a calculator, it does not care at all how many arguments are given to it. That’s why concatenating many arguments in a query often works extremely well. Apart from it, the answers and calculation from Wolfram Alpha is very accurate and precise. There is no need to worry about the validity of the information. Thirdly, two sets of data can be compared with graphs easily using Wolfram Alpha in which Google cannot.

Nevertheless, Wolfram Alpha does have its limitations. Since its answers are based on its own software and knowledge database. Wolfram Alpha can only answer a fact based question that has a specific answer. So it is not able to answer open ended questions like “Is Wolfram Alpha better than Google?”

As written in its main page, Wolfram Alpha’s goal is to bring deep, broad, expert-level knowledge accessible to anyone, anywhere, anytime. Clearly, the “Google Killer” is quite ambitious. However, in my opinion, Wolfram Alpha is not a typical search engine in essence. Therefore it is not a Google Killer as people might say, but it can be considered as a giant calculating encyclopaedia of statistics and facts. I think the site poses more of a threat to sites like Wikipedia.

Sunday, October 23, 2011

The Wolfram Alpha Search Engine

Wolfram Alpha is a brand-new search engine. One expert said it "could be as important as Google". What is so special about this new search engine?

Wolfram Alpha is developed by Wolfram Research. Scientists is already using this company’s previous creations, the computer program Mathematica, to help them with their work. Same as this former program, Wolfram Alpha is also mathematic. But this “answer-engine” not just can help you with mathematic works. It can do something more.

The term “answer-engine” means that Wolfram Alpha is “question answering”. The engine would understand your question, then answer your question. For example, typing “which is the tallest building in the world?”, Wolfram Alpha will answer you “Burj Khalifa”. Searching for “lim(x->infinity) (x^3)/(x+5)^3”, Wolfram Alpha will answer you “1”. When compared with Google, or other ordinary search engines, which would gives you websites containing your keywords, Wolfram Alpha’s results would truly amaze us.

Wolfram Alpha is expected to be successful, even can be compared with Google. Before this search engine appeared, it was far more inconvenient to find the answer of our question. Searching on Google gives us a lot if irrelevant results; Reading and searching on books is very time consuming; asking teachers or lecturers is not always possible as we seldom meet. Thus, Wolfram Alpha will be the best choice of search engine when we are looking for answer of a certain question.

Wolfram Alpha also outstands with its sophisticated knowledge. We can easily search for accurate and detailed physics or chemistry data on it. It can also act as a dictionary, giving us several definition of a certain English word. Stephen Wolfram, developer of this engine, also showed us searching examples for nutritional information, weather, and census data in a demonstration. The abundance and flexibility of Wolfram Alpha is astonishing.

Although it has some advantages over Google and other search engines, it is quite different from other search engines. Sometimes what we are looking for are some webpage, some photos, some articles, some wider variety of information, rather than an exact solution. Therefore, there would not be direct competition between these engines.

With its unique and convenient features, it is believed that Wolfram Alpha would become as famous as Google. Its success can even induce other companies to develop new technologies of search engines, not just blinded to the old format webpage search engines.

References

Friday, October 7, 2011

Computing Algorithm used in Google Search

Google Search is a search engine owned by the ‘Google Incorporation’. According to a statistical analysis carried out by ‘Alexa Traffic Rank’, Google Search is the most visited search engine on the Internet. It receives unfathomable amounts of queries every day and carries out the search in a fraction of a second. Since its launch in 1997 by Larry Page and Sergey Brin, Google Search has gone through a series of substantial changes in its algorithms. Every year, Google Inc. invests obscene sums of money on the Research and Development of the algorithms used. It is the innovation and novelty of these algorithms which is the basis of Google’s rise to success.

Google Inc. has acquired patent rights of the algorithms employed in Google Search. However, it is possible to outline the core logic behind these complex algorithms. Simply defined, whenever a query is keyed in, the Search algorithm initially accepts it as simple text. It then breaks up the query into a series of search terms. These search terms are usually words which are matched with the content of the websites in Google’s database. The websites containing the required words are then displayed. However, every search engine on the internet uses a search algorithm which is very similar to Google’s search algorithm. So, the question arises that, what differentiates Google from other search engines? The answer to this question lies in the difference in the algorithm used to rank the web pages generated by the search algorithm. The ranking algorithm used by Google Search is called the ‘PageRank’ which renders Google more successful than all other search engines. Hence, PageRank would be the main focus of this critique essay.

Page rank is a patented algorithm which aims to rank the webpages according to ‘relevance’ to the query keyed in. According to researchers at Google Inc., PageRank, unlike other ranking algorithms, tries to list the web pages according to the human concepts of relevance and importance. In order to qualitatively analyze PageRank, it is important to develop a general understanding of the basic logic of the computation carried out by PageRank. The main assumption underlying the logic of PageRank is that the webpages linked from other highly ranked pages are likely to be more important. So, in simple words, the World Wide Web acts as a giant recommendation system where webpages vote for other pages by sending outgoing links to them. Moreover, the votes from “more important” pages carry more weight.

According to Google, the PageRank actually calculates the probability that a person randomly clicking on links will arrive at any particular web page. So, it initially computes the number of human-generated links on each webpage. Furthermore, it allocates the weights of every link based on the number of links coming out from the source webpage. This means that the PageRank carried by any outgoing link is equal to the document’s own PageRank divided by the number of outgoing links of that document. So, the incoming PageRank of any webpage is the sum of the PageRanks carried by all incoming links.

Furthermore, as the algorithm is used to find the probability of visiting a webpage, PageRank further assumes that the user who is randomly clicking on links will stop clicking after some finite time. The probability that the user continues to click after reaching a particular page is called the ‘damping factor (d)’. So logically, this damping factor is multiplied to the sum of incoming PageRanks in order to calculate the probability of reaching a webpage through external links. However, a webpage can be visited directly by typing the URL in the browser. Intuitively, the probability of reaching a page in this way is calculated by subtracting ‘d’ from 1. So the final PageRank is the sum of these two probabilities.

A simple analysis of the PageRank algorithm depicts that the PageRank is relatively easy to implement for practical purposes. It has an optimal substructure, which means that the result generated by the PageRank algorithm can be processed using the ‘Greedy Method’. The pages with higher PageRanks can be displayed higher in the list of pages, and this greedy method is sure to yield the desired optimal result.

Furthermore, the PageRank algorithm is easy to understand. This special property of the PageRank algorithm means that programmers can easily manipulate the algorithm to debug it and to keep it more updated. This leads to the dynamic nature of PageRank and ensures that it can cope up with technological modifications in the future.

In addition to these strategic advantages, PageRank also is much better than other traditional ranking algorithms as it carries out ‘link analysis’. This means that PageRank not only considers the incoming links but also the importance of the source of these links. This results in much more greater relevance of the ranked results returned to the user as compared to traditional methods. ‘Link analysis’ is also designed to protect users from spammers. Ranking algorithms which only consider the content of webpages to generate results can easily be spammed. Spammers usually have financial motives associated to their websites. As spammers control the content of their webpages, they can attach meta-tags and special keywords into the HTML of their webpages, which these algorithms use to evaluate the data in a page. But, the content on the page may actually be very different. So, they try to mislead these algorithms into conferring them an unfairly high rank so that their webpages show in the top of the results list. However, due to the use of ‘link analysis’ in PageRank, these spammers are rendered extremely less successful because they have little or no control over the webpages which send incoming links to their pages.

On the other hand, PageRank also has several grave limitations. One of the major questions that can be raised is whether PageRank is adequately scalable. This means that can the algorithm can cope up efficiently when the size of data to be processed becomes extremely large. If encountered with a very large database of web pages, the PageRank algorithm would require a very large memory space to store them. Furthermore, as the algorithm not only maps page ids but also separate terms; the increase in the number of webpages would result in much larger memory requirement, which would not only be less efficient but also very expensive.

In addition to this, the runtime of PageRank is longer as compared to other ranking algorithms. The reason for this is that the calculation of PageRank not only involves consideration of all the links (which can be considered as the edges which have to be visited once during the calculation) but it also requires calculation of weights of the PageRank. This results in complexity of the calculations and hence the PageRank is expected to be slower as compared to other ranking algorithms.

Another problem that is likely to occur with an algorithm such as PageRank is that sometimes it is not necessary that the pages occurring on the top of the sorted list must be relevant to the query keyed in. There can be several reasons for this. Most of the highest ranked webpages (as they are linked to several other webpages) for example Google, Yahoo, or BBC etc are inhomogeneous in terms of theme. This means that websites such as these may be placed highest in the ranked result, however, they may be thematically unrelated to the query. Furthermore, links can be bought by spammers in order to attain higher ranks for their pages. These issues are still unaddressed by the PageRank algorithm.

In addition to this, careful analysis of the algorithm also reveals another limitation of PageRank. If there are webpages on the internet which only have incoming links and no outgoing links, it means that the PageRank given to them is undistributed and hence these webpages act as sinks of PageRank. This sinking of rank would result in a disequilibrium and would make the calculation of PageRank much more unreliable.

There is another significant reason to doubt the accuracy of PageRank. The links to and from webpages are dynamic and may change over the period of time. This may be the result of hundreds of websites being launched and several being removed every day. Moreover, changes in interests can also result in the regrouping of links over time. For example, political websites may experience increase in incoming links during polling seasons. This requires the database to be constantly updated in order to achieve accuracy. However, the PageRank database is only updated on quarterly basis per annum. This may result in unjust calculation of the PageRank!

The above discussion depicts that the current version of PageRank is still far from Larry Page’s claim that PageRank is able to “understand exactly what you mean and give you back exactly what you want.” However, it can be stated that it is still the best of all contemporary ranking algorithms. This is due to its specialized feature of ‘link analysis’. Although, PageRank is expected to have a longer run time than other algorithms, PageRank is still ‘super-fast’ according to human perception. It is able to do calculations in less than human reaction time, which makes it acceptable for the users. The problem of rank sink can be addressed using a systematic method. For example a simplistic approach can be the creation outgoing links from the ‘rank-sink webpages’ to all webpages in the database. This would result in an even distribution of the PageRank and hence it would eliminate sinking of rank.

So, to conclude, if Google makes similar adequate modifications and the PageRank database is updated more frequently, PageRank would become much similar to what Larry Page aims it to be.

References

Monday, October 3, 2011

Google Search: Financial and Business Aspects

Google is a global technology leader, focused on improving the ways people connect with information. Through innovations in web search and advertising, Google is now a top Internet destination and possesses one of the most recognized brands in the world. Available to anyone with an Internet connection, Google maintains the world's largest online index of web sites and other content. A report from Hitslink, an analytics firm, states that Google has close to 80% of worldwide search traffic and market share. Those numbers indicate the importance of Google in the digital world.

In the late 1990’s many new search engines appeared, most of which are used today. These include Yahoo!, Excite, Inktomi, AltaVista, Google and others. Among these, Google occupies over a half of the entire search engine market. On a user level, the procedures of search engine operations are very simple. All the user has to do is to input necessary parameters of the search. In other words, the user says the search engine what he or she wants to get, e.g. “Happy Valley”, “flight to Hong Kong tonight” or “CCST 9003. However, how search engines decide which page match which query is a real science. Search engines have their own technologies, algorithms, and formulas, which are typically kept secret. Those criteria that the search engines use and are widely known include the occurrence of the search query term on the page text, the title, the meta tags, in the liking text inside the page, in the URL address (Uniform Resource Locator), the headings, as well as bold and italicized font (Nymark & Ramzan, 2008). It is important to note that all search engines are unique, and have their own formulas and algorithms, but they perform the same function; they fetch relevant results that match the user’s query. Google uses page rank to decide which site will come first in the search result.

In order to build up a huge database of the web-sites, Google uses its crawlers named “Googlebots”. Googlebots is an automated web-browser, which follows every link it sees. It visits a site and all its internal pages, gather them, and bring back to the server for index. Next Google processes these pages and creates an index, much like the index in the back of a book. The index is parceled into manageable sections and stored across a large network of computers around the world. When a person types a query into the Google search box, his query is sent to Google machines and compared with all the documents stored in Google’s index to identify the most relevant matches. In a split second, Google system prepares a list of the most relevant pages and also determines the relevant sections and bits of text, images, videos and more. What a person gets is a list of search results with relevant information excerpted beneath each result.(Google, 2010).

Since search engines are huge projects from technological point of view, they must have big capacity servers to operate normally. These servers, in turn, are expensive assets. What Google does to cover these expenses and to gain tremendous profits as well is that it provides context advertising. Google’s advertising program is named “AdWords”. AdWords is a separate vertical bar that returns text advertisements only for a certain keyword or phrase. For a search query of “Hong Kong tourism”, there will be 8-11 advertisements in the AdWords that appears on the right of the search result. These advertisements will be related to hotels in Hong Kong or other tourist facilities available in Hong Kong. This online advertising is very effective because it is targeted and covers very narrow segment of the market; the exact segment the advertiser wants. Google AdWords has received extremely high praise from advertisers who found it a very cost-effective and targeted advertising option, as it allows for easy display of ads to promote goods and services on both Google search results pages as well as collaborating sites. Considering the vast popularity of Google, it is easy to reach a wide audience there. It is also worth mentioning that Google generates most of its revenue through its advertisements program. Out of US$23.6 billion revenue of Google in 2009, US$22.9 billion came from advertising.

Google AdWords network operates by allowing advertisers to bid on an endless selection of keywords that they believe may be typed into the Google search engine in search of information. When an advertiser bids on a specific keyword, and they bid a high enough price, the result is that their ad will appear in a special segment of the search engine results pages for that keyword which is located both above the regular search engine results and along the right side of the page as well. If an advertiser does have an ad appear in one of these places, and a search user happens to click on the advertisement, traveling to the advertiser’s assigned page, then Google will charge the advertiser a certain amount, which is dependent upon the advertiser’s bid amount.

The Google Company has seen a continuous growth in its revenue, market share, profits and share value ever since it has been set up. Even the economic recession that caused many high performing companies to shut down could not impede the growth of Google Company. Google has positioned itself as the leader of search-related advertising. Google has been complemented by the fast growth in search-related ads sector. According to Piper Jaffray’s survey search related ads is the fastest growing sector of the online ad business. Furthermore, Google has almost twice as many search ad click-throughs as runner-up Yahoo. In December 2009, Google had 16.5 trillion ad click-through, compared with Yahoo's 9 trillion, according to Nielsen/Net Ratings. These figures show the dominance of Google in online advertisements market but it should be not be forgotten that too much dependence of a company on only one single revenue generation project can be detrimental for company’s stable future. It is also possible that in future, Yahoo and Microsoft may be able to create a technology that gives them a competitive edge over Google in online advertisements and suddenly Google will see a sharp decline in its revenue.

Furthermore, while applauding the increase in revenue of Google we should bear in mind that Google’s traffic acquisition cost (TAC) has also increased considerably in last couple of years. TAC can be considered as a direct measure of how much it costs Google to drive visitors to its search site and other services from other Web sites, in order that it can then serve them advertising. Included in TAC is the amount of money that Google pays to other companies to ensure that Google is the default browser in their devices. An excellent example of TAC is the US$2 million per month that Google pays to Apple so that its search engine is the default choice for Mobile Safari on the iPhone. TAC could also be used as an indicator that Google's trying hard to keep its traffic up, and that the strategy appears to be working.(Google, 2010)

Nevertheless, there have been concerns from various quarters regarding the adverse impacts that Google has created in the business world. Google has a market share of close to 80% of worldwide search traffic and people believe that Google takes negative advantage of this situation. They believe that Google can be a dangerous monopoly. Through various acquisitions Google is strengthening its dominance over search and online advertisements markets. Google is fast becoming the “information gatekeeper”. Google's dominance gives it unprecedented control over information. If in the future Google's stock price declines, the company can seek to recoup dollars elsewhere by misusing the information collected. Marketers are hungry for demographic information, and they're willing to pay for it. Google provides the door, checks who's coming inside and can pass that information onto marketing paparazzi. The temptation to mine the information will be huge, and that temptation will increase as Google matures, its growth slows and its stock falls to earth.(Wilcox, 2010). The Google Company's core business is about search and advertising, which relies on the content of other people and businesses. People complain that Google doesn't own the information from which it makes nearly all its revenue, rather Google is the middleman of the information, which it takes for free.

It should also be noted that due to Google’s dominance of search engine markets, many search engines have experienced a drop in their revenue and some had to go out of the business. This may seem fair enough to an economist stating that the most efficient company has the right to stay in the business in the long run. But let us not forget that it takes a herculean effort to setup a search engine. A great amount of intellectual resource is put in by computer software engineers to develop a search engine. As stated before the algorithms and the formulae that conduct search are not available off the shelf. Rather they need to be ingeniously developed by software engineers. Google’s dominance has prevented the efforts of these engineers to be justly rewarded. The specialized dedicated resources of other search engines are not being utilized to their optimum level and these search engines are unable to collect high revenues.

Google is an excellent example of a company that has harnessed technological superiority to reap financial gains. Google dominates the search and online advertisements market not because it was the first firm in the industry, rather it was the technological superiority that enabled Google to outdo the big firms that were in the industry before. With surprising tact and tenacity Google has been able to dwarf giants such as Microsoft and Yahoo in revenue generation and market share. For Google to ensure its financial dominance it has to focus on its technological aspects. It needs to make its search engine more efficient so that people continue to prefer Google over other search engines. This will motivate businesses to compete for a place on AdWord advertisement and subsequently Google will continue to generate revenue. The key areas of technology where Google needs to focus to improve its search engine is relevance of the search, comprehensiveness of its stored data, freshness of data that is being crawled and the speed of the search. Furthermore, Google should also take into account the overall impact that it generates around the world. It should address the negativities that are being associated with it. Google should kick start a proper campaign to change its image as an “evil monopoly”. It should take steps to build trust amongst online users so that they use Google search engine without the fear being exploited.

References

Tuesday, September 20, 2011

Performance Criteria of Google Search

Google was started by Larry Page and Sergey Brin, two PhD students at Stanford University in 1997 as a research project. It became the leading search engine 10 years ago. Such is its popularity today, that it receives several hundred million queries each day through its many different services.

Google search engine accepts queries and tries to find the answer to the query. After doing so it displays the results on the result page. Other than text, one can also search for images and videos. Apart from its searching capability, Google has many other special features that make Google even more special. These features include similar synonym words, weather forecasts, time zones, stock quotes, maps, earthquake data, movie showtimes, airports, home listings, sports scores, news, shopping, Gmail etc. There are other special features for numbers including ranges (e.g. 70-73), prices, temperature, money/unit conversion (e.g. 12.3cm in inches), calculations (e.g. 3*4 – sqrt 6), package tracking, patents, area codes and language translation of displayed pages etc.

These features make Google so special that it becomes almost impossible to stay without using this search engine. The popularity of Google means that it must have an excellent performance otherwise people would not be using it. Considering the number of users it has, it can be stated that Google has definitely not disappointed many. According to Alexa Traffic Rank, Google is the most viewed site today. Keeping in mind the salience of user experience associated to Google, this essay would basically focus on the empirical aspects of Google’s performance.

While judging the performance of Google one of the most important things is the run time of the search. Run time basically is the duration of the period in which the computer program executes itself. Google has a run time which has amazed many of its users. It takes very little time to display the results of the search, which sometimes is even less than the human reaction time.

Furthermore, for any query, Google can display the first thousand results with up to a hundred displayed per page. The option to specify the number of searches is only available if ‘Instant Search’ is not enabled. If ‘Instant Search’ is enabled, regardless of how many searches you want, only 10 are displayed per page.

Whenever a person starts typing anything on Google search, Google has this special function that displays several related texts that match the initial characters of what one has typed. This makes it quite simple for the user as now the person can directly choose any of the related searches that match the text one was about to type. However, this is not always the case as sometimes the related texts do not match what the user was about to write in the search box.

If a person does not know the spelling of a word or phrase or makes a typing error while typing in the search box and if it’s close to the original correct word or phrase, Google has another function that points out the mistake by displaying ‘Did you mean: ‘ followed by the correct word or phrase. This saves a lot of time as the user immediately knows what mistake he/she has made.

Google also has this feature of displaying the related queries that have already been previously searched, on the result page when a person is searching for something. However this is not always the case as Google might not be able to find any related searches of that particular data.

Google compiles information that makes it searchable via the internet. It not only caches or indexes pages but also take snapshots of the other different file types such as PDF, Word documents, Excel spreadsheets, Flash SWF, plain text files and so on. It allows the user to access the file even if they don’t have the corresponding viewer application as the cached version is converted to (x) HTML. This results in the saving of precious amounts of time and resources of the user as the user does not need to download the specialist softwares.

On the other hand, Google Search may also be the source of user disutility instead of satisfaction. Google places a cookie on each registered user’s computer. This enables Google to track the person’s search history and retain the data for a year. For this reason it has been criticized a lot as people feel they do not have any privacy left.

Google’s Page Rank algorithm has been criticized a lot. This is because the relevance of the search is not always accurate. For example if one wants to find information regarding Michael Jackson, a member of the faculty of HKU, Google will display results of the famous American singer Michael Jackson first. Thus, the order of the search result is not always how you want it to be. Ordinary users like us tend to rely on the first pages of the Google search. Even the journalists rely on the first pages as they assume that everything listed is not important. A source tells us that Google helps us see only a small part of what one could see if one uses other research tools. The same source also said that Page Rank algorithm can be influenced by individual views of the Google staff as only they get to edit the information and decide what will go online and in which form.

In my opinion Google has many special features. It also has functions that enhance its performance and makes it user friendly and at the same time popular with the users. Moreover, it has an extremely short run time. However it has a few drawbacks as well as it is not perfect. The information we get from Google can be biased as it can be influenced by the Google staff. Also the order of the search is not always like how we want it to be. The privacy of users is also affected as the specially placed cookies enable Google to track vital information. The snapshots for SWF and text files cannot be converted into (x) HTML. I believe that despite the drawbacks, Google still has become one of the most important applications of our daily life. And if Google overcomes all these issues it will be even closer to becoming the perfect search engine.

References

Sunday, September 11, 2011

The Key to Success in the Google Search Engine

Introduction

Google search engine is a web search engine developed by two Stanford computer science graduates, Larry Page and Sergey Brin in 1997 (Google, 2010). According to the Alexa Search Engine rating, Google Search is the most-used web search engine in the world (Wikipedia, 2009).

In 2008, Google search engine commands 57% of the web search in the United States, followed by yahoo and Microsoft, which accounts for 23% and 11% of the search respectively (Agence France-Presse, 2008). The Google Search has already become an essential part of many people, especially students and researchers’ lives. This short survey tries to discuss the key factors behind the huge success of the Google search engine and the cultural influence brought by it.

There are several major reasons why Google can beat the numerous competitors and substitutes in the market and become the most popular web search engine in the world. These reasons include aspects of technology, economic, business mode and strategies of the Google Inc.

1. Technology
High technology level is an important reason why the Google search engine prevails over its fellow competitors. The formation of the google search engine is based on a research project carried out by its two founders on “BackRub”. “BackRub” is an algorithm that follows the links in a website and analysis all the connections. Later it is modified into the “PageRank” algorithm that generats a popular index based on the quantity and quality of incoming links. (Morrow, B., 2008)

In fact, the “PageRank” algorithm used by the google search engine till now is the major reason why google search is so successful. The technology reflects the users’ views of the importance of web pages by considering more than 500 million variables and 2 billion terms (Nishith, R., 2009) on June 19, 2009: Google Search – The Success story; url: http://www.pluggd.in/google-search-success-story-297/). The “more important” pages receive a higher PageRank and appear at the top of the search results. This improves the accuracy of the result.

The PageRank is also affected by votes from each page that casts a vote. Some of the pages are consider more important and have higher voting power. This technology allows Google search engine to produced more accurate and relevant search results compared to other search engines such as yahoo and Bang.

Besides, Google use cheap commodity computer parts while ensuring that they will always have a duplicate in case they fail. Quick swapping and upgrading is allowed by attaching the components to the computer with Velcro rather than screws (Morrow, B., 2008). At present, Google’s search engine and other web applications such as Gmail and Google Doc. are now bundled into the OS on low-cost Linux-based computers (Blankenhorn, 2008). This ensures high performance and fast reaction of the Google search engine with comparatively lower cost.

2. Economic
With the success in Google search engine, the Google Inc. has expanded drastically in size in the past few years. Currently, Google Inc. has 20 offices in the United States and international locations in over 30 countries. It offers a localized search engine for more than 115 countries (Google, 2010). The large scale of the Google Inc. allows much capital to be invested on researching new technology and developing new applications. This maintains the advantages of Google in technology level and further expands the market share of the search engine with promotion of side-services such as Google dictionary, weather forecast, applications such as Gmail and Google Documents, and mobile platform such as Android.

The Economy of Scales of the Google Inc. creates a large entrance barrier for other companies to enter the market of web search engines. It will requires huge capital and human resource to develop a new search engine that can provides better search results at high speed than Google, which has accumulated many years worth of data about the habits of its users. These users are used to using Google search and it is extremely difficult and infeasible to make them change their habit.

    In addition, the long history and good reputations of the mature Google search engine also attracts users from developing countries which can only afford computers recently to consider Google first instead of other competitors in the market. This results in the unstoppable continuous expansion of the Google kingdom in the world.

3.    Human Resource
    Human Resource is one of the greatest assets of the Google Inc. By having top talents from all over the world, Google manage to bring innovations and maintain its leading technology level of its products. This maintains the prevailing position of the Google search engine in the market in terms of technology, creativity and design.

There are two reasons why the Google Inc. has been so successful in attracting and keeping high quality human resources. Firstly, it has an extremely well working environment. Google Inc. is famous for its fun yet serious working environment and high paid. In fact, Google Inc. was selected as the #1 best place to work for by the Fortune magazine in 2008-2009 (Fortune Magazine (CNN), 2009) and the #4 best place to work for in 2009-2010 (Fortune Magazine (CNN), 2010). The comfortable working environment and the pride as a Google employee enhances the working quality and efficiency of the Google workers. Other than good working environment, Google Inc. also has a unique policy that greatly encourages innovation called the “Innovation Time Off”. This policy allows employees to spend 20% of their working time on projects that are not part of their description. This motivates innovation and diversity of Google. According to Marissa Mayer, Google's Vice President of Search Products and User Experience, half of all new products launched by Google were organized by employees during the Innovation Time Off (Wikipedia, 2010).

These two factors contribute to the fact that Google is now having the best talents from all aspects in their best shape. This explains why Google search engine is keep improving all the time.

4. Strategy

Google’s mantra “Don’t be Evil” explains its major strategy used: respect the users. Unlike many other search engines, which earns profits by mixing search results with sponsored advertisement or clustering loads of advertisement into the front interface, Google keeps its layout clean and simple, and only put advertisement in the form of keywords separated from the search result. These acts, which show respect to users, gain much support and popularity of the search engine.

Besides, Google is never satisfied with its search engine. We can see that Google is improving their search engine everyday, from accuracy of results, supplementary functions to customized layouts. This maintains the competitiveness of the world’s most popular web search engine.

    Last but not least, designers and maintainers of the Google search are creative and have a great sense of humor. In time of special festivals such as the April Fool’s Day and important days such as the day of birth of some well-known artists, we can always find the funny tricks and creativity of Google in the layouts and words used. We can also find the sense of humor of Google search in its language settings. In fact, we can use google search is some funny language such as Pirate, Hacker and Pig Latin (兒童黑話) (Google, 2010). These efforts of Google seems tiny and insignificant, but in fact it helps the Google search engine to becomes more friendly to the users and emerges Google search with the users’ daily lives.

Limitations and Potential Threats of Google Search Engine

    Although the Google Search engine is undoubtedly very successful at present, there still exist some limitations and potential threats, which will be briefly discussed here.

1.    User experience
    Compared to more localized search engine such as Yahoo HK in Hong Kong and Baidu in Mainland China, the search result of Google is not “local” enough. Therefore, when users want to obtain search results, which are more “local” instead of being international, they prefer the other search engines. A good example is that Baidu, instead of Google, is the most popular search engine in Mainland China.

    Besides, Google directs you directly to a new website on the same page when you click on the results instead of using pop-ups. This is quite inconvenient when you want to view the contents of a lot of results. In this aspect, the Yahoo search engine is more user-friendly.

2.    PageRank used by companies as a tool to earn profits
    The PageRank algorithm of the Google Search Engine uses the method of sorting the search result into different level of importance and relevance based on a voting system of other websites and the quality and quantities of its entire links. Some profit-making companies abuse this system by increasing the rating of their advertisement page by using methods to increase the number of votes. In fact, currently there are companies that exist to serve the purpose to help other companies to get their advertisement page to a higher position in the Google search result.

    The result is that the Google search results are mixed with advertisements and homepage of profit-making companies. This highly violates Google’s motto of “Don’t be Evil” and affects the relevancy of search results. Google will have to put more effort on developing new strategies against these kinds of behavior.

3.    Nature of PageRank
The basic working principle of the PageRank system includes that some webpages that are regarded as “more important”, such as CNN.com, have higher voting power over individual users. It is doubtful whether the existence of linkage between these “important websites” and the search result necessary means that that particular result is more relevant than others.

4.    Political Concerns
    As the world’s largest web search engine, Google has faced several political challenges before. The most significant and recent one is the retrieval of Google China from Mainland China. There were rumors that the retrieval was because the adhesion of the Google China search engine to the Internet censorship policies of China that blocks and filters webpage that are considered as “threats” to the Central People’s Government, contradicts the main thesis of Google “Don’t be Evil”.

    Apart from the Google China event, Google search engine also faces some criticism and concerns on the topic of privacy and copyright in the United States (Google, 2010).

   These political related issues, especially those related to the privacy of the users of Google and the search results, will be the major concern of Google in the future.

Conclusion

    Undoubtedly, Google is the most successful web search engine in the world. The key to success of it includes mainly four aspects, which has already been discussed in this paper. They include prevailing technology, such as the effective PageRank system and the fast speed; economy of scale of Google Inc. that creates great entrance barriers for new search engines; good management of human resources and the effective strategy adopted by Google. Without any of these factors, Google would not have been so successful.

    However, there still exist drawbacks of Google Search’s popularity and potential threats, including some minor technical problems, potential threats of being used to generate profits while hurting the relevance of the search results, the limitations of the PageRank system, and last but not least, political concerns. In the future, Google need to be very careful while expanding its web search engine to the world and takes these factors into account while producing strategy. Otherwise, failures such as the Google China retrieval event are very likely to occur because of careless expansion.

Reference

Sunday, July 24, 2011

Programming using stack

Problem:

A stack is a useful structure for storing data because it has many useful applications. For example, it can be applied in implementing recursive computing method. A stack is a “last-in-first-out” structure — data items are stored linearly as one item is placed on top of the other. The only two operations are: pop and push. The pop operation retrieves the top data item on the stack, while the push operation puts a new data item on the top of the stack. The figure below shows how a stack works:

(a) Can you write up a computing procedure for DFS using a stack instead of using recursion?

(b) Can you suggest a practical situation where we must use a stack instead of using recursion?

Follow-Up:

(a) The first possible version is as follows.

Stack_DFS1(x):

Mark x; Push(x);
if stack is empty, then end of algorithm; otherwise, do the following:
y = top of stack (without popping it);
if there is a neighbor z of y that is not marked, then do:
Mark z; Push(z);
else
Pop;
go to Step 2.

Another possible version is as follows.

Stack_DFS2(x):

Push(x);
if stack is empty, then end of algorithm; otherwise, do the following:
y <- Pop;
if y is not marked, then Mark y;
for every neighbor z of y that is not marked, do the following:
Push(z);
go to Step 2.

(b) In the Web crawling application, we need to use a stack. In fact, crawling is done by many computers running the “search” (or explore) simultaneously. Each computer takes the next node to be explored from the top of the stack, downloads the HTML file, and then scans it for further hyperlinks. But when a new HTML document is found (i.e., a “neighbor”), no recursive call is made; instead, the new node is pushed onto the stack.

Yet one interesting question remains: when we see a HTML document, how do we know it is indeed “new” (i.e., not marked) so that we want to push it on stack?