Links & Law - Information about legal aspects of search engines, linking and framing

Hyperlink & Search Engine Law News  Decisions & Court Documents Worldwide Legal Resources (Hyperlink & Search Engine Law Articles) Linking Law Cases Search Engine Law Publications by Dr. Stephan Ott Technical    Background

 Search Engines - Google - Yahoo

Search Engines

A search engine is a program designed to help find files stored on a computer, for example a public server on the World Wide Web, or one's own computer. The search engine allows one to ask for media content meeting specific criteria (typically those containing a given word or phrase) and retrieving a list of files that match those criteria. A search engine often uses a previously made, and regularly updated index to look for files after the user has entered search criteria.

In the context of the Internet, search engines usually refer to the World Wide Web and not other protocols or areas. Furthermore search engines mine data available in newsgroups, large databases, or open directories like DMOZ.org. Because the data collection is automated, they are distinguished from Web directories, which are maintained by people.

The vast majority of search engines are run by private companies using proprietary algorithms and closed databases, the most popular currently being Google (with MSN Search and Yahoo! closely behind). There have been several attempts to create open-source search engines, among which are Htdig, Nutch, Egothor, and OpenFTS. [1] (http://www.searchtools.com/tools/tools-opensource.html)

History

The first Web search engine was "Wandex", a now-defunct index collected by the World Wide Web Wanderer, a web crawler developed by Matthew Gray at MIT in 1993. Another very early search engine, Aliweb, also appeared in 1993 and still runs today. One of the first engines to later become a major commercial endeavor was Lycos, which started at Carnegie Mellon University as a research project in 1994.

Soon after, many search engines appeared and vied for popularity. These included WebCrawler, Hotbot, Excite, Infoseek, Inktomi, and AltaVista. In some ways they competed with popular directories such as Yahoo!. Later, the directories integrated or added on search engine technology for greater functionality.

In 2002, Yahoo! acquired Inktomi and in 2003, Yahoo! acquired Overture, which owned AlltheWeb and Altavista. In 2004, Yahoo! launched its own search engine based on the combined technologies of its acquisitions and providing a service that gave pre-eminence to the Web search engine over the directory.

Search engines were also known as some of the brightest stars in the Internet investing frenzy that occurred in the late 1990s. Several companies entered the market spectacularly, recording record gains during their initial public offerings. Some have completely taken off their public search engine, and are marketing Enterprise-only editions, such as Northern Light (http://www.northernlight.com/) which used to be part of the 8 or 9 early search engines after Lycos came out.

Before the advent of the Web, there were search engines for other protocols or uses, such as the Archie search engine for anonymous FTP sites and the Veronica search engine for the Gopher protocol.

Osmar R. Zaïane's From Resource Discovery to Knowledge Discovery on the Internet details the history of search engine technology prior to the emergence of Google.

Recent additions to the list of search engines include a9.com, AlltheWeb, Ask Jeeves, Clusty, Gigablast, Ez2Find, Teoma, WiseNut, GoHook, Walhello, Kartoo, Snap and Mamma .

 

Google

Around 2001, the Google search engine rose to prominence. Its success was based in part on the concept of link popularity and PageRank. How many other web sites and web pages link to a given page is taken into consideration with PageRank, on the premise that good or desirable pages are linked to more than others. The PageRank of linking pages and the number of links on these pages contribute to the PageRank of the linked page. This makes it possible for Google to order its results by how many web sites link to each found page. Google's minimalist user interface was very popular with users, and has since spawned a number of imitators.

Researchers at NEC Research Institute claim to have improved upon Google's patented PageRank technology by using web crawlers to find "communities" of websites. Instead of ranking pages, this technology uses an algorithm that follows links on a webpage to find other pages that link back to the first one and so on from page to page. Google and most other web engines utilize not only PageRank but more than 150 criteria to determine relevancy. The algorithm "remembers" where it has been and indexes the number of cross-links and relates these into groupings. PageRank is based on citation analysis that was developed in the 1950s by Dr. Eugene Garfield at the University of Pennsylvania. Google's founder's cite Garfield's work in their original paper. In this way virtual communities of webpages are found. Teoma's search technology uses a communities approach in its ranking algorithm. Web link analysis was first developed by Dr. Jon Kleinberg and his team while working on the CLEVER project at IBM's Almaden research lab.

 

Challenges faced by search engines

  • The web is growing much faster than any present-technology search engine can possibly index (see distributed web crawling).

  • Many web pages are updated frequently, which forces the search engine to revisit them periodically.

  • The queries one can make are currently limited to searching for key words, which may results in many false positives.

  • Dynamically generated sites, which may be slow or difficult to index, or may result in excessive results from a single site.

  • Many dynamically generated sites are not indexable by search engines; this phenomenon is known as the invisible web.

  • Some search engines do not order the results by relevance, but rather according to how much money the sites have paid them.

  • Some sites use tricks to manipulate the search engine to display them as the first result returned for some keywords. This can lead to some search results being polluted, with more relevant links being pushed down in the result list.

 

How search engines work

Web search engines work by storing information about a large number of web pages, which they retrieve from the WWW itself. These pages are retrieved by a web crawler (sometimes also known as a spider) — an automated web browser which follows every link it sees. The contents of each page are then analyzed to determine how it should be indexed (for example, words are extracted from the titles, headings, or special fields called meta tags). Data about web pages is stored in an index database for use in later queries. Some search engines, such as Google, store all or part of the source page (referred to as a cache) as well as information about the web pages. This cached page always holds the actual search text since it is the one that was actually indexed, so it can be very useful when the content of the current page has been updated and the search terms are no longer in it. This problem might be considered to be a mild form of linkrot, and Google's handling of it increases usability by satisfying user expectations that the search terms will be on the returned web page.

When a user comes to the search engine and makes a query, typically by giving key words, the engine looks up the index and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text.

There is another main type: Real-time search engines (such as Orase (http://www.orase.com), which is now defunct). Such search engines don't use an index. The information that a search engine needs is only collected if a new query is started. Compared to the index-based systems of Google-like search engines this real-time system has some advantages: The information are always up-to-date, there are (almost) no dead links and less system resources are needed. (Google uses almost 100,000 computers, Orase only one.) But there are some disadvantages, too: A search needs longer to be finished, for example.

The usefulness of a search engine depends on the relevance of the results it gives back. While there may be millions of Web pages that include a particular word or phrase, some pages may be more relevant, popular, or authoritative than others. Most search engines employ methods to rank the results to provide the "best" results first. How a search engine decides which pages are the best matches, and what order the results should be shown in, varies widely from one engine to another. The methods also change over time as Internet usage changes and new techniques evolve.

Most Web search engines are commercial ventures supported by advertising revenue and, as a result, some employ the controversial practice of allowing advertisers to pay money to have their listings ranked higher in search results.

!!! This article is licensed under the GNU Free Documentation License, which means that you can copy and modify it as long as the entire work (including additions) remains under this license. See http://www.gnu.org/copyleft/fdl.html for details. It uses material from the Wikipedia article Search Engines!!!

 

Google

Google, Inc. (NASDAQ: GOOG (http://quotes.nasdaq.com/asp/SummaryQuote.asp?symbol=GOOG&selected=GOOG)), is a U.S.-based corporation, established in 1998, that manages the Google search engine. Google is headquartered at the "Googleplex" in Mountain View, California, and employs over 3,000 workers. Google's CEO Dr. Eric Schmidt, formerly CEO of Novell, took over when co-founder Larry Page stepped down.

 

History

 

Beginnings

Google began as a research project in early 1996 by Larry Page and Sergey Brin, two Stanford Ph.D. students who developed the theory that a search engine based on analysis of the relationships between Web sites would produce better results than the basic techniques then in use. It was originally nicknamed BackRub because the system checked backlinks to estimate a site's importance.

Convinced that the pages with the most links to them from other highly relevant Web pages must be the most relevant ones, Page and Brin decided to test their thesis as part of their studies, and laid the foundation for their search engine. They formally founded their company, Google, Inc., on September 7, 1998 at a friend's garage in Menlo Park, California. In February 1999, the company moved into offices at 165 University Avenue in Palo Alto, home of a number of other noted Silicon Valley technology startups. Google quickly outgrew the University Avenue site, moving to a complex of buildings (known by some as "The Googleplex") in Mountain View's Amphitheater Parkway later that year.

The Google search engine gained a following among Internet users for its simple, clean design and relevant search results. In 2000, Google had begun selling advertisements by the keyword so that they would be more relevant to the end user. The ads were text-based in order to keep page design uncluttered and fast-loading. The concept of selling keyword advertising was originally pioneered by Overture[1] (http://www.content.overture.com/d/USm/about/news/mile.jhtml), formerly Goto.com. While many of its dot-com siblings went under, Google quietly rose in stature while turning a profit.

U.S. Patent 6,285,999 (http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=6,285,999.WKU.&OS=PN/6,285,999&RS=PN/6,285,999) describing Google's ranking mechanism (PageRank) was granted on September 4, 2001. The patent was officially assigned to Stanford University and lists Lawrence Page as the inventor.

In February 2003, Google acquired Pyra Labs, owner of Blogger, a pioneering and leading weblog-hosting Web site. The acquisition seemed inconsistent with the general mission of Google. However, the move secured the company's ability to use information gleaned from blog postings to improve the speed and relevance of articles contained in Google News.

At its peak in early 2004, Google handled upwards of 80 percent of all search requests on the world wide web through its Web site and clients like Yahoo!, AOL, and CNN.[2] (http://www.onestat.com/html/aboutus_pressbox21.html) Google's share fell in February 2004 when Yahoo! dropped Google's search technology in order to deliver independent results.

Google's declared code of conduct is Don't be evil. Their site includes humorous features such as cartoon modifications [3] (http://www.google.com/holidaylogos.html) of their logo for special occasions, the option to display the site in fictional or humorous languages such as Klingon and Leet, and April Fool's jokes about the company.

It is conjectured that Google's response to Yahoo will be personalized searches, using the personal data that is gathering from Orkut, Gmail, and Froogle to give results based on the individual. In fact, there is a Personalized Google Search (http://labs.google.com/personalized)Beta in Google Labs (http://labs.google.com/), the experimental section of Google.com.

 

Etymology

The name "Google" is a play on the word googol, which was coined by Milton Sirotta, nephew of U.S. mathematician Edward Kasner in 1938, to refer to the number represented by 1 followed by a hundred zeros. Google's use of the term reflects the company's mission to organize the immense amount of information available on the Web.

 

Financing and IPO

Google's major investors are the venture capital firms Kleiner Perkins Caufield & Byers and Sequoia Capital. In October 2003, while discussing a possible IPO (Initial Public Offering of shares), the company was approached by Microsoft about a possible partnership or merger; no such deal ever materialized.

In January 2004, Google announced the hiring of Morgan Stanley and Goldman Sachs Group to arrange an IPO. That IPO (one of the most anticipated in history) was projected to raise as much as $4 billion. According to a banker involved in the transaction, the deal would yield an estimated $12 billion market capitalization for Google.

On April 29, 2004, Google filed an S-1 form with the Securities and Exchange Commission for an IPO to raise as much as USD $2,718,281,828 (with a touch of mathematical humor). The filing revealed that Google turned a profit every year since 2001 and earned a profit of $105.6 million on revenues of $961.8 million during 2003.

In May 2004, Google officially cut Goldman Sachs from the IPO, leaving Morgan Stanley and Credit Suisse First Boston as the joint underwriters. They chose the unconventional way of allocation the initial offering through an auction (and specifically a "Dutch auction"), so that "anyone" would be able to participate in the offering. The smallest required account balances at most authorized online brokers that are allowed to participate in an IPO, however, are around $100,000. In the run-up to the IPO the company was forced to slash the price and size of the offering, but the process didn't run into any technical difficulties or result in any significant legal challenges. The initial offering of shares was sold for $85 a piece. The public valued it at $100.34 at the close of the first day of trading which saw 22,351,900 shares change hands.

After some initial stumbles, Google's initial public offering took place on August 19, 2004. 19,605,052 shares were offered at a price of $85 per share. Of that, 14,142,135 were floated by Google and 5,462,917 by selling stockholders. The sale raised $1.67 billion, of which approximately $1.2 billion went to Google. The vast majority of Google's 271 million shares remained under Google's control. The IPO gave Google a market capitalization of more than $23 billion. Many of Google's employees became instant paper millionaires. Ironically Yahoo! also benefited from the IPO because it owns 2.7 million shares of Google. The company was listed on the NASDAQ stock exchange under the ticker symbol GOOG.

Since the IPO, Google's stock market capitalization has risen to $50 billion as the stock price has doubled. On August 19 2004 the number of shares outstanding was 172.85 million while the "free float" was 19.60 million (which makes 89% held by insiders). In January 2005 the shares outstanding was up 100 million to 273.42 million, 53% of that was held by insiders which made the float 127.70 million (up 110 million shares from the first trading day). The two founders are said to hold almost 30% of the outstanding shares. The company has not reported any treasury stock holdings as of the Q3 2004 report.

 

Corporate culture

Philosophy

Google is known for its relaxed corporate culture, reminiscent of the Dot-com boom. Google's corporate philosophy is based on many casual principles including, "You can make money without doing evil", "You can be serious without a suit" and "work should be challenging and the challenge should be fun." A complete list of corporate fundamentals is available on Google's web site [4] (http://www.google.com/corporate/tenthings.html). The company encourages equality along the corporate levels and tells its employees to work on a personal project one day a week. Twice a week there is a roller hockey game in the company parking lot.

 

Twenty Percent Rule

Each Google employee is allowed to spend 20% of their work week developing new products. Some of these end up as Google services (most notably Google News)

 

Googleplex

The Googleplex's lobby (Google headquarters) is decorated with a piano, lava lamps and a real time projection of current search queries. The hallways are full of exercise balls and bicycles. Each employee has a Linux workstation and access to the corporate recreation center. The recreation center includes a workout room with weights and rowing machines, locker rooms, washers and dryers, a massage room, assorted video games, Foosball, a baby grand piano, a pool table and ping pong. In addition to the rec room, there are snack rooms stocked with various cereals, gummy bears, M&Ms, toffee, licorice, cashews, yogurt, carrots, fresh fruit, and dozens of different drinks including fresh juice, soda and make-your-own cappuccino. After eating, people can relieve themselves on digital toilets similar to Japanese toilets.

 

IPO and culture

Many people have suggested that after Google's IPO their culture will not be able to stay so "fun" and focused on the future.[5] (http://www.wired.com/news/business/0,1367,63241,00.html?tw=wn_story_related) [6] (http://www.ciol.com/content/news/2004/104043001.asp) The company may be required to answer to shareholders who will want the company to cut back on employee benefits and to focus on short term advances. Also, it may be hard to maintain a collegial atmosphere when approximately 1,000 (30%) of the employees are paper-millionaires. In a report given to potential investors, co-founders Sergey Brin and Larry Page promised that the IPO would not change the company's culture. Later Mr. Page said, "We think a lot about how to maintain our culture and the fun elements."

 

Criticism and controversy

Despite Google's apparent success it has also managed to become the target of critics.

 

Copyright issues

A number of organizations have used the Digital Millennium Copyright Act to demand that Google remove references to allegedly copyrighted material on other sites. Google typically handles this by removing the link as requested and including a link to the complaint in the search results.

There have also been complaints that Google's web cache feature violates copyright. However, Google provides mechanisms for requesting that caching be disabled (which Google respects; it also honors the robots.txt file which is another mechanism that allows operators of a website to request that part or all of their site not be included in search engine results).

 

Multinational Corporation

Google is a multinational corporation, having offices in over a dozen countries [7] (http://www.google.com/jobs/positions.html). In order to comply with the varying laws of these countries, several versions of Google restrict very specific keyword searches. According to French and German law, for example, ethnocentrism and historical revisionism are illegal. Google complies with these laws by banning keyword searches related to these terms. China, whose human rights record has been widely criticized by the international community, has in the past restricted citizen access to popular search engines such as Altavista, Yahoo, and Google. This complete ban is currently lifted, however the government remains proactive in filtering internet content.[8] (http://journalism.berkeley.edu/projects/chinadn/en/archives/002885.html)

 

Partiality

In February 2003, Google banned the ads of Oceana, a two and a half year old non-profit organization, which was protesting the environmental effects of a major cruise ship operations' sewage treatment practices. Google claimed that their editorial policy states, "that Google does not accept advertising if the ad or site advocates against other individuals, groups, or organizations."

 

Offensive search results

In April 2004, Google received complaints that a search for "Jew" on its site listed the anti-Jewish website Jew Watch at or towards the top of the list. Google insisted this was a result of their content-oblivious PageRank algorithm. [10] (http://www.google.com/explanation.html).

 

Privacy

Some have pointed out the privacy implications of having a centrally located, widely popular data warehouse of millions of internet users' searches, and how under existing US law, Google would be required to hand over all such information to the US government.

It has been claimed that Google infringes the privacy of visitors by uniquely identifying them using cookies which are used to track web user's search history. The cookies possess excessively distant expiry dates and it is claimed users' searches are recorded without permission for advertising purposes. In response Google claims cookies are necessary to maintain user preferences between sessions and offer other search features. The use of cookies with distant expiry dates is not uncommon.

Some users believe the processing of email message content by Google's GMail service goes beyond proper use. The point is often made that people without GMail accounts, who have not agreed to the GMail terms of service, but send email to GMail users have their correspondence analyzed without permission. Google claims that mail sent to or from GMail is never read by a human being beyond the account holder, and is only used to improve relevance of advertisements. Other popular email services such as Hotmail also scan incoming email to try to determine whether it is unsolicited email.

Chris Hoofnagle, associate director of the Electronic Privacy Information Center in Washington, DC warned that "As courts become more frequent integrators of electronic records, there is a greater risk of Google ... becoming a serious privacy threat."

 

The PageRank system

Google's central PageRank system has been criticized, some calling it "undemocratic". Common arguments are that the system is unfairly biased towards large web sites, and that the criteria for a page's importance are not subject to peer review. The system is also highly susceptible to manipulation and fraud through the use of dummy sites. See Google bomb.

 

Google Offers Wikimedia Hosting

On February 11, 2005, news [11] (http://news.com.com/Google+may+host+encyclopedia+project/2100-1038_3-5572744.html?tag=nefd.top) [12] (http://meta.wikimedia.org/wiki/Google_hosting) emerged that discussions are in progress over the possibility of Google hosting a section of the Wikimedia Foundation's information on donated servers and internet transit. Early information states that no advertising (such as Google's AdWords) would be necessary on Wikimedia's projects. The Foundation confirmed that the two groups will hold a private IRC chat in March to further discuss possibilities, and stressed that no details have yet been finalised. As a result, some outsiders have coined the unofficial term "Googlepedia".

!!! This article is licensed under the GNU Free Documentation License, which means that you can copy and modify it as long as the entire work (including additions) remains under this license. See http://www.gnu.org/copyleft/fdl.html for details. It uses material from the Wikipedia article Google!!!

 

Yahoo

Yahoo! Inc. (NYSE: YHOO (http://www.nyse.com/about/listed/lcddata.html?ticker=YHOO)) is an American computer services company with a mission to "be the most essential global internet service for consumers and businesses". It operates an Internet portal, a web directory and a host of other services including the popular Yahoo! Mail. It was founded by Stanford graduate students David Filo and Jerry Yang in January 1994 and incorporated on March 2nd, 1995. The company is headquartered in Sunnyvale, California.

According to Alexa Internet, a web trends company, Yahoo is the most visited website on the Internet today. The global network of Yahoo websites received 3 billion page views per day as of October 2004.

 

History

Yahoo started out as "Jerry's Guide to the World Wide Web" but eventually received a new moniker with the help of a dictionary. The name Yahoo is an acronym for "Yet Another Hierarchical Officious Oracle," but Filo and Yang insist they selected the name because they liked the general definition of a yahoo, as in Gulliver's Travels by Jonathan Swift: "rude, unsophisticated, uncouth." Yahoo itself first resided on Yang's student workstation, "Akebono," while the software was lodged on Filo's computer, "Konishiki"—both named after legendary sumo wrestlers. The "yet another" phrasing goes back at least to the Unix utility yacc, whose name is an acronym for "yet another compiler compiler".

Yahoo had its initial public offering on April 12, 1996, selling 2.6 million shares at $13 each.

As Yahoo's popularity has increased, so has the range of features it offers, making it a kind of one-stop shop for all the popular activities of the Internet. These now include: Yahoo! Mail, a web-based e-mail service, an instant messaging client, a very popular mailing list service (Yahoo! Groups), online gaming and chat, various news and information portals, online shopping and auction facilities, and an online payment system (similar to PayPal) called Yahoo! Paydirect. Many of these are based at least in part on previously independent services, which Yahoo has acquired - such as the popular GeoCities free web-hosting service, Rocketmail, and various competing mailing list providers such as eGroups. Many of these take-overs were controversial and unpopular with users of the existing services, as Yahoo often changed the relevant terms of service. An example of this would be their claiming intellectual property over content on their servers, which the old companies had not.

Yahoo has now begun making partnerships with telecommunications and Internet providers - such as BT in the UK, Rogers in Canada and SBC in the US - to create content-rich broadband services to rival those offered by AOL. The company offers a branded credit card, Yahoo! Visa, through a partnership with First USA.

 

Yahoo was one of the few surviving large Internet companies after the dot-com bubble burst. Nevertheless, on September 26, 2001, Yahoo stocks closed at an all-time low of $4.06.

Yahoo formed partnerships with telecommunications and Internet providers to create content-rich broadband services to compete with AOL. On 3 June 2002, SBC and Yahoo launched a national co-branded dial service. In July 2003, BT Openworld announced an alliance with Yahoo On 23 August 2005, Yahoo and Verizon launched an integrated DSL service.

In late 2002, Yahoo began to bolster its search services by acquiring other search engines. In December 2002, Yahoo acquired Inktomi. In February 2003, Yahoo acquired Konfabulator and rebranded it Yahoo! Widgets, a desktop application and in July 2003, it acquired Overture Services, Inc. and its subsidiaries AltaVista and AlltheWeb. On February 18, 2004, Yahoo dropped Google-powered results and returned to using its own technology to provide search results.

Google then released Gmail, its webmail service offering 1 GB of storage, on 1 April 2004. Yahoo responded by upgrading the storage of all free Yahoo Mail accounts from 4 MB to 1 GB, and all Yahoo Mail Plus accounts to 2 GB. In 2007, Yahoo took out the storage meters and made the storage limit unlimited. On 9 July 2004, Yahoo acquired e-mail provider Oddpost to add an Ajax interface to Yahoo! Mail Beta. Google also released Google Talk, a Voice over IP and instant messaging service, on 24 August 2005. On 13 October 2005, Yahoo and Microsoft announced that Yahoo! Messenger and MSN Messenger would become interoperable.

Yahoo continued acquiring companies to expand its range of services, particularly Web 2.0 services. Yahoo Launch became Yahoo! Music on 9 February 2005. On 20 March 2005, Yahoo purchased photo sharing service Flickr. On 29 March 2005, the company launched its blogging and social networking service Yahoo! 360°. In June 2005, Yahoo acquired blo.gs, a service based on RSS feed aggregation. Yahoo then bought online social event calendar Upcoming.org on 4 October 2005. Yahoo acquired social bookmark site del.icio.us on 9 December 2005 and then playlist sharing community webjay on 9 January 2006.

On 27 August 2007, Yahoo released a new version of Yahoo! Mail that makes it possible for users to send instant messages to the largest combined instant messaging (IM) community including users of Yahoo! Messenger and Windows Live Messenger, to send free text messages to mobile phones in the U.S., Canada, India and the Philippines.

 

!!! This article is licensed under the GNU Free Documentation License, which means that you can copy and modify it as long as the entire work (including additions) remains under this license. See http://www.gnu.org/copyleft/fdl.html for details. It uses material from the Wikipedia article Yahoo!!!

 

 

Back to the overview

Overview of the Section

Search Engine, Linking & Framing Terminology Explained!

 

Latest News - Update 71

Legal trouble for YouTube in Germany

Germany: Employer may google job applicant

EU: Consultation on the E-Commerce-Directive

WIPO Paper on tradmarks and the internet

The ECJ and the AdWords Cases

 

 

Masthead/Curriculum Vitae
Copyright © 2002-2008 Dr. Stephan Ott 

All Rights Reserved.

 

Google