UDC The Design and Implementation of a Specialized Search Engine Based on Robot Technology 厦门大学博硕士论文摘要库

Similar documents

2



2



-2-



50 2

Supply Chain SCM IBM DRP

() MONORCHIIDAE SP


2

1


% 6.7% % % / 1


XML SOAP DOM B2B B/S B2B B2B XML SOAP

1

1998 5

WTO






UDC The Policy Risk and Prevention in Chinese Securities Market

Abstract There arouses a fever pursuing the position of being a civil servant in China recently and the phenomenon of thousands of people running to a

UDC Hainan Airlines Investment Valuation Analysis (MBA) 厦门大学博硕士论文摘要库

Abstract Since 1980 s, the Coca-Cola came into China and developed rapidly. From 1985 to now, the numbers of bottlers has increased from 3 to 23, and

Abstract Today, the structures of domestic bus industry have been changed greatly. Many manufacturers enter into the field because of its lower thresh

WTO OEM

Abstract After over ten years development, Chinese securities market has experienced from nothing to something, from small to large and the course of

X UDC A Post-Evaluation Research on SINOPEC Refinery Reconstruction and Expanding Project MBA 厦门大学博硕士论文摘要库

UDC Empirical Researches on Pricing of Corporate Bonds with Macro Factors 厦门大学博硕士论文摘要库




WTO

:






厦 门 大 学 学 位 论 文 原 创 性 声 明 本 人 呈 交 的 学 位 论 文 是 本 人 在 导 师 指 导 下, 独 立 完 成 的 研 究 成 果 本 人 在 论 文 写 作 中 参 考 其 他 个 人 或 集 体 已 经 发 表 的 研 究 成 果, 均 在 文 中 以 适 当 方


UDC 厦门大学博硕士论文摘要库

:

厦 门 大 学 学 位 论 文 原 创 性 声 明 本 人 呈 交 的 学 位 论 文 是 本 人 在 导 师 指 导 下, 独 立 完 成 的 研 究 成 果 本 人 在 论 文 写 作 中 参 考 其 他 个 人 或 集 体 已 经 发 表 的 研 究 成 果, 均 在 文 中 以 适 当 方


THE APPLICATION OF ISOTOPE RATIO ANALYSIS BY INDUCTIVELY COUPLED PLASMA MASS SPECTROMETER A Dissertation Presented By Chaoyong YANG Supervisor: Prof.D





- 2 - Russell Thaler unexpected dramatic P t =P t-1 + P t t P t-1 t-1 2 T.Russell and R.Thaler, The Relevance of Quasi-Rationality in Competitiv


WTO WTO ATM POS 4 CRM 2

1


J. D. 17 Daniel J. Elazar, American Federalism: A View From the States (New York: Happer & Row, Publishers, 1984), p



X MGC X 22 X 23 X MGC X BALB/c 26 X MGC X MGC X MGC X..



I

Kluyveromyces sp. Y-85 Saccharomyces cerevisiae E-15, E g/100ml Y-85 E-15 DNA Y YEPD MM E E-15 Y-85 Y-85 12h E h 0.1%ED

ASP 電子商務網頁設計

100Mbps 100Mbps 1000Mbps 100Mbps 1000Mbps 100Mbps 100Mbps PD LXT Mbps 100Mbps 100Mbps 1

- 2 -

Abstract / / B-ISDN ATM Crossbar Batcher banyan N DPA Modelsim Verilog Synopsys Design Analyzer Modelsim FPGA ISE FPGA ATM ii

A study on the Animal Husbandry Economy in the Tang Dynasty and the Five Dynasties 厦门大学博硕士论文摘要库

UDC The Establishment of Fractional BSDE : : : : : 厦门大学博硕士论文摘要库

Research on the Mycorrhizal Community of Pinus Massoniana Lamb in Wuyishan Nature Reserve Abstract Mycorrhizae is the symbiosisal body of fugus and bo

厦门大学博硕士论文摘要库

<4D F736F F D20312D3120D5D0B9C9CBB5C3F7CAE9A3A8C9CFBBE1B8E5A3A92E646F63>

Research for RS encoding and decoding technology in the Digital Television Terrestrial Broadcasting System 2006 厦门大学博硕士论文摘要库

2002 II

Microsoft Word - A _ doc

2.

厦门大学博硕士论文摘要库

ACV pdf

引 例 3 现 实 生 活 中 的 电 子 商 务 案 例 1 王 小 姐 是 一 家 网 络 公 司 职 员, 现 在 已 经 是 有 八 个 月 身 孕 的 准 妈 妈 由 于 出 行 不 是 很 方 便, 但 是 又 要 购 置 一 些 孕 妇 与 婴 儿 出 生 后 的 物 品 于 是 来

UDC 厦门大学博硕士论文摘要库



.1 Abstract LD MAX PCB MC P

untitled



豐佳燕.PDF

128 ( ) ( ) [ 1 ] [2] [3] (1) (2) (3) [1] [2] [3] 10 2 ( ) (1997.6) ( ) 64

2 SGML, XML Document Traditional WYSIWYG Document Content Presentation Content Presentation Structure Structure? XML/SGML 3 2 SGML SGML Standard Gener

标题

59 1 CSpace 2 CSpace CSpace URL CSpace 1 CSpace URL 2 Lucene 3 ID 4 ID Web 1. 2 CSpace LireSolr 3 LireSolr 3 Web LireSolr ID

Total Internet Connectivity in a Single Chip

Transcription:

10384 200128011 UDC The Design and Implementation of a Specialized Search Engine Based on Robot Technology 2004 5 2004 2004 2004 5

World Wide Web Robot Web / (Focused Crawling) Web Meta data Web Web I

ABSTRACT This article firstly give a brief introduction of the history evolvement working principle of World Wide Web and also with the information retrieval issues of the Web. Most general search engines nowadays search Web pages as many as possible with robot software, and then build full document index or partial index. According to specific strategies, search engine retrieve the best match URL hyperlinks from its database when received user's query, then reply user with ordered results. With the rapid development of WWW and increasing complex of search engine system, it becomes much harder to design and implement a satisfied search engine. It s feasible to design and implement a specified search engine targeted towards specific users specific specialty field at present, it s also a research trend. There are many valuable research works has been done with Focused Crawling. This article applied an efficient topic item auto expansion algorithm in a specialized search engine based on focused crawling technology. The algorithm highly exploits Meta data of URL within Web document with Web mining technology. Under normal software and hardware configuration and limited network resource, the system accomplish topic relevant Web page s searching indexing quickly and correctly, and it also afford specialized users great quality specialized information retrieval service. Key Words Search Engine Focused Crawling Web Mining II

... I ABSTRACT... II... I... 1 1.1 Web... 1 1.2 Web... 2 1.3... 2 1.4... 4 1.5... 5... 6 2.1... 6 2.2... 6 2.3... 8 2.3.1 HTTP/ HTTPS... 8 2.3.2 Robot... 9 2.3.3... 9 2.3.4... 10 2.3.5... 10 2.3.6... 11 2.3.7... 11 2.3.8... 12 2.3.9 XML... 12 2.4 Google... 12 2.4.1 Google... 14 I

2.4.2... 16 2.5... 17... 20 3.1... 20 3.2... 21 3.3... 22 3.3.1... 22 3.3.2 URL... 23 3.3.3... 23 3.3.4... 24 3.4... 24 3.5 HITS... 25... 27 4.1 Web... 28 4.1.1 MetaData... 28 4.1.2 MetaData... 29 4.2 Web... 29 4.3... 30 4.4... 31 4.5... 32... 34 5.1... 34 5.2... 34 5.3... 35... 49... 51... 53 II

WWW World Wide Web 1989 3 CERN the European Laboratory for Particle Physics B/S Web 1993 Internet Web Mosaic Netscape Navigator Web Web Internet Web WWW Internet Web Home Page WWW 1.1 Web WWW Web Web Web URL Web Hyper Text Web Web HTML Hyper Text Markup Language Web 1

Web Web IE Netscape Opera Web HTML Web WWW B/S Client Server 1.2 Web Search Engine Web 1995 Internet Yahoo Alta Vista Infoseek/Go Excite Lycos Google Ask Jeeves Baidu [1-5] Yahoo Infoseek Excite Lycos [6-9] 1.3 [10] 1 SINGLE/GENERAL SEARCH ENGINE 2

Google AltaVista Excite Infoseek/Go Lycos 2 META SEARCH ENGINE ALL4ONE metacrawler Profusion 3721 3 INTELLIGENT SEARCH ENGINE ASK JEEVES Google, 4 PERSONAL SEARCH ENGINE " " 3

: 5 SPECIALIZED SEARCH ENGINE AAAFREESTUFF MAPBLAST SE4Topic Web 1.4 Web Meta data Web 4

1.5 WWW 5

SEARCH ENGINE WEB CRAWLERS [11-12] SPIDER ROBOT Internet Intranet 2.1 1994 4 WEBCRAWLER WWW Lycos 1995 Yahoo Excite Infoseek/Go AltaVista 1994 4 Internet Baidu Yahoo 2.2 Web URL 6

Web [13] Web [14] 2.1 URL URL Robot) WWW WWW 2.1 7

2.3 2.3.1 HTTP/ HTTPS HTTP TCP/IP WWW HTTP WWW HTTP Header Fields Entity HTTP/1.1 Server Date Content-type Last-modified Content-length 8

HTTP / 2.3.2 Robot Robot Robot Internet URL URL URL Robot Robot Robot 1 URL 2 URL HTTP/HTTPS Internet HTML URL 3 URL 4 2 3 2.3.3 1 [15] NOT AND OR 2 3 [16-17] 9

4 2.3.4 Internet Web 40 60 HTML 200 Internet 2.3.5 [18] 1 2 n 10

Degree papers are in the Xiamen University Electronic Theses and Dissertations Database. Full texts are available in the following ways: 1. If your library is a CALIS member libraries, please log on http://etd.calis.edu.cn/ and submit requests online, or consult the interlibrary loan department in your library. 2. For users of non-calis member libraries, please mail to etd@xmu.edu.cn for delivery details.