XPath - PDF Free Download

Lecture 2: Query XML Document Spring 2005 1

Content Part I Xpath Part II Xquery Part III XSL-Extensible Stylesheet Language Part IV XML Supported in SQL Server Part V Parsing XML in Java Spring 2005 2

Review 1: Xpath expressions The most useful path expressions: nodename Selects all child nodes of the named node / Selects from the root node // Selects nodes in the document from the current node that match the selection no matter where they are. Selects the current node.. Selects the parent of the current node @ Selects attributes

Review 2: Wildcards Path wildcards can be used to select unknown XML elements. * @* node() Matches any element node Matches any attribute node Matches any node of any kind

Part I: XPath XML Path Language 5

Example for XPath Queries <bib> <book price= 55 > <publisher>addison-wesley</publisher> <author>serge Abiteboul</author> <author><first-name>rick</first-name> <last-name>hull</last-name> </author> <author>victor Vianu</author> <title>foundations of Databases</title> <year>1995</year> </book> <book> <publisher>freeman</publisher> <author>jeffrey D. Ullman</author> <title>principles of Database and Knowledge Base Systems</title> <year>1998</year> </book> </bib> 6

Data Model for XPath The root Processing instruction Comment bib The root element book book price=55 publisher author.... Addison-Wesley Serge Abiteboul 7

XPath: Simple Expressions /bib/book/year Result: <year>1995</year> <year>1998</year> /bib/paper/year Result: an empty set of nodes (there were no papers) 8

XPath: Restricted Kleene Closure //author Result: <author>serge Abiteboul</author> <author><first-name>rick</first-name> <last-name>hull</last-name> </author> <author>victor Vianu</author> <author>jeffrey D. Ullman</author> A set of 4 nodes /bib//first-name Result: <first-name>rick</first-name> 9

XPath: Functions /bib/book/author/text() Result: Serge Abiteboul Victor Vianu Jeffrey D. Ullman Rick Hull doesn t appear because he has no text node Some functions in XPath: text() = matches text nodes and returns the text value node() = matches any node (regardless of type) name() = returns the name of the current tag 10

XPath: Wildcard //author/* Result: <first-name>rick</first-name> <last-name>hull</last-name> * Matches any element (but not text or attribute) 11

XPath: Attribute Nodes /bib/book/@price Result: 55 @price means that price is an attribute @* Matches any attribute 12

XPath: Qualifiers /bib/book/author[first-name] Result: <author> <first-name>rick</first-name> <last-name>hull</last-name> </author> [first-name] means that author has to have a first-name child element. 13

XPath: More Qualifiers /bib/book/author[first-name][address[zip][city]]/last-name Result: returns all the last names of all authors with a first name and an address which includes city and zip code. [ ][ ] means that author satisfies both qualifiers. Spring 205 14

XPath: More Qualifiers /bib/book[@price < 60] /bib/book[author/first-name = Rick ] Boolean expressions /bib/book[author/text()] Existential expression /bib/book[2] Positional expression 15

XPath: Summary bib * / /bib bib/paper bib//paper matches a bib element matches any element matches the root element matches a bib element under root matches a paper in bib matches a paper in bib, at any depth 16

XPath: Summary (cont.) //paper paper book @price bib/book/@price matches a paper at any depth matches a paper or a book matches a price attribute matches price attribute in book, in bib bib/book[@price<55]/author/last-name matches? 17

The Root <?xml version="1.0"?> <bib> <paper>1</paper> <paper>2</paper> </bib> bib is the document element The root is above bib /bib = returns the document element / = returns the root Why? Because we may have comments before and after <bib> </bib>; they become siblings of bib element. This is advanced xmlogy 18

Part II : XQuery XML Query Language

Outline XML Data versus Relational. Query language for XML. An introduction to XQuery. Path Expressions. FLWOR Expressions. Examples. Other XQuery Syntax. XML XQuery Data Model. Links

XML Data versus Relational Data name phone row row row John 3634 Sue 6343 name phone phone phone name name Dick 6363 Relation in XML John 3634 Sue 6343 Dick 6363 { row: { name: John, phone: 3634 }, } row: { name: Sue, phone: 6343 }, row: { name: Dick, phone: 6363 }

Relational to XML Data A relation instance is basically a tree with: Unbounded fan-out at level 1 - i.e., any number of rows, Fixed fan-out at level 2 - i.e., fixed number fields. XML data is essentially an arbitrary tree: Unbounded fan-out at all nodes/levels, Any number of levels, Variable number of children at different nodes, with variable path lengths.

Query Language for XML Must be high-level; SQL for XML. Must conform to XSchema: But also work in absence of schema information. Support simple and complex/nested datatypes. Support universal and existential quantifiers, and aggregation. Operations on sequences and hierarchies of document structures. Capability to transform and create XML structures.

XQuery Influenced by XML-QL, Lorel, Quilt, YATL: Also, XPath and XML Schema. Reads a sequence of XML fragments or atomic values and returns a sequence of XML fragments or atomic values: Inputs/outputs are objects defined by XML-Query data model, rather than strings in XML syntax.

Overview of XQuery Path expressions. Element constructors. For-Let-Where-Order-Return - [FLWOR -- flower ] expressions: Several other kinds of expressions as well, including conditional expressions, list expressions, quantified expressions, etc, Generalises SELECT-FROM-HAVING-WHERE from SQL. Expressions evaluated with respect to a context: Item (current node), Position (in sequence being processed), Size (of the sequence being processed), Also includes namespaces, variables, functions, date, and so on.

Path Expressions Examples: Bib/paper Bib/book/publisher Bib/paper/author/lastname Given an XML document, the value of a path expression p is a set of objects

Path Expression Examples Bib &o1 paper book paper Doc = firstname &o43 references &o12 &o24 &o29 references references author title year author http title author author publisher title author author &o44 lastname &o45 &o46 &o52 1997 &o47 &o48 &o49 &o50 &o51 firstname &96 page &25 lastname first last &o70 &o71 &243 &206 Serge Abiteboul Victor Vianu 122 133 Bib/paper = <&o12,&o29> Bib/book/publisher = <&o51> Bib/paper/author/lastname = <&o71,&206> Note that order of elements matters!

Element Construction An XQuery expression can construct new values or structures. Example: consider the path expressions on the previous slide: Each of them returns a newly constructed sequence of elements, Key point is that we do not just return existing structures or atomic values; We can re-arrange them as we wish into new structures.

FLWOR Expressions FOR-LET-WHERE-ORDERBY-RETURN = FLWOR FOR/LET Clauses List of tuples WHERE Clause List of tuples ORDERBY/RETURN Clause Instance of XQuery data model

FOR versus LET FOR $x IN list-expr Binds $x in turn to each value in the list expression. LET $x = list-expr Binds $x to the entire list expression, Useful for common sub-expressions and for aggregations.

FOR versus LET FOR iterates over an input sequence and calculates some value for each item in that sequence, returning a sequence obtained by concatenating the results of these calculations. In simple cases there is one output item for every input item. So: for $n in (1 to 10) return $n * $n Returns the sequence (1, 4, 9, 16, 25, 36, 49, 64, 81, 100).

FOR versus LET The XQuery LET clause simply declares a variable and gives it a value: let $maxcredit := 3000 let $overdrawncustomers := //customer[overdraft > $maxcredit] return count($overdrawncustomers) In this example you can simply replace each variable reference by the expression that provides the expression's value. This means that the result is the same as: count(//customer[overdraft > 3000])

FOR versus LET: Example FOR $x IN document("bib.xml")/bib/book RETURN <result> $x </result> Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result>... FOR generates a list of bindings of $x to each book element in the bib. LET $x IN document("bib.xml")/bib/book RETURN <result> $x </result> Returns: <result> <book>...</book> <book>...</book> <book>...</book>... </result> LET generates a single binding of $x to the list of book elements in the bib.

XQuery Example (1) Find all book titles published after 1995: FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN $x/title Result: <title> abc </title> <title> def </title> <title> ghi </title>

XQuery Example (2) For each author of a book by Morgan Kaufmann, list all books they have published: FOR $a IN distinct(document("bib.xml") /bib/book[publisher= Morgan Kaufmann ]/author) RETURN <result> $a, FOR $t IN /bib/book[author=$a]/title RETURN $t </result> distinct = a function that eliminates duplicates (after converting inputs to atomic values).

Results for Example 2 <result> <author>jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result> Observe how the nested structure of result elements is determined by the nested structure of the query. FOR $t IN /bib/book[author=$a]/title

WHERE The WHERE clause performs a very similar function to the WHERE clause in a SQL select statement: It specifies a condition to filter the items we are interested in, it is optional, but if it appears it must only appear once, after all the for and let clauses. for $genre in //genre/choice for $video in //video for $actorrefs in $video/actorref for $actor in //actor where $video/genre = $genre and $actor/@id = $actorrefs return concat($genre, ": ", $actor) First define all the tables they are interested in, then define a WHERE expression to define all the restriction conditions that select subsets of the rows in each table, and join conditions that show how the various tables are related.

XQuery Example (3) <big_publishers> FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p </big_publishers> For each publisher p - Let the list of books published by p be b Count the # books in b, and return p if b > 100 count = (aggregate) function that returns the number of elements

XQuery Example (4) Find books whose price is larger than average: LET $a=avg(document("bib.xml")/bib/book/price) FOR $b in document("bib.xml")/bib/book WHERE $b/price > $a RETURN $b avg() == aggregate function

Collections in XQuery Ordered and unordered collections: /bib/book/author = an ordered collection, distinct(/bib/book/author) = an unordered collection. Examples: LET $a = /bib/book $a is a collection statement that iterates over all books in collection. $b/author also a collection (several authors...). However: RETURN <result> $b/author </result> Returns a single collection! <result> <author>...</author> <author>...</author> <author>...</author>... </result>

ORDERBY If there is no order by clause in a FLWOR expression, then the order of the results is as if the for clauses defined a set of nested loops. Often you want the query results in sorted order, and this can be achieved using the order by clause. Sort the videos in ascending order of year, and within that in decreasing order of the user rating: for $x in //video order by $x/year ascending, number($x/user-rating) descending return $x/title

RETURN Every XQuery FLWOR expression has a return clause, and it always comes last. It defines the items that are included in the result. Usually the XQuery return clause generates a single item each time it is evaluated. In general, though, it can produce a sequence. For example, you can do this: for $v in //video[genre="comedy"] return //actor[@id = $v/actorref] Which selects all the actors for each comedy video.

Sorting in XQuery <publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) ORDERBY $p RETURN <publisher> <name> $p/text() </name>, FOR $b IN document("bib.xml")//book[publisher = $p] ORDERBY $b/price DESCENDING RETURN <book> $b/title, $b/price </book> </publisher> </publisher_list>

Conditional Expressions: If-Then-Else FOR $h IN //holding ORDERBY $h/title RETURN <holding> $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author </holding>

Existential Quantifiers XQuery's SOME operator, the "existential quantifier". Testing whether a condition applies for some node within a given node-set is natural in XPath Tell me if there exists at least one reserve_price that is greater than 1000 dollars. Return value if true if at least one reserve_price has value greater than 1000. some $price in document("data/items.xml")//reserve_price satisfies $price > 1000

Existential Quantifiers FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN $b/title

Universal Quantifiers XQuery's EVERY operator, is the "universal quantifier for testing whether a condition applies for every node within a node-set FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title

Other Interesting Things in XQuery Before and After: For dealing with order in the input. Filter: Deletes some edges in the result tree. Recursive functions. Namespaces. References, links Lots more stuff

Part III: XLS Extensible Stylesheet Language 49

What is XSL? XSL stands for Extensible Stylesheet Language CSS was designed for styling HTML pages, and can be used to style XML pages XSL was designed specifically to style XML pages, and is much more sophisticated than CSS XSL consists of three languages: XSLT (XSL Transformations) is a language used to transform XML documents into other kinds of documents (most commonly HTML, so they can be displayed) XPath is a language to select parts of an XML document to transform with XSLT XSL-FO (XSL Formatting Objects) is a replacement for CSS There are no current implementations of XSL-FO, and we won t cover it 50

How does it work? The XML source document is parsed into an XML source tree You use XPath to define templates that match parts of the source tree You use XSLT to transform the matched part and put the transformed information into the result tree The result tree is output as a result document Parts of the source document that are not matched by a template are typically copied unchanged 51

Simple XPath Here s a simple XML document: <?xml version="1.0"?> <library> <book> <title>xml</title> <author>gregory Brill</author> </book> <book> <title>java and XML</title> <author>brett McLaughlin</author> </book> </library > XPath expressions look a lot like paths in a computer file system / means the document itself (but no specific elements) /library selects the root element /library/book selects every book element //author selects every author element, wherever it occurs 52

Simple XSLT <xsl:for-each select="//book"> loops through every book element, everywhere in the document <xsl:value-of select="title"/> chooses the content of the title element at the current location <xsl:for-each select="//book"> <xsl:value-of select="title"/> </xsl:for-each> chooses the content of the title element for each book in the XML document 53

Using XSL to create HTML Our goal is to turn this: 54 <?xml version="1.0"?> <library> <book> <title>xml</title> <author>gregory Brill</author> </book> <book> <title>java and XML</title> <author>brett McLaughlin</author> </book> </library > Into HTML that displays something like this: Book Titles: XML Java and XML Book Authors: Gregory Brill Brett McLaughlin Note that we ve grouped titles and authors separately

What we need to do We need to save our XML into a file (let s call it books.xml) We need to create a file (say, books.xsl) that describes how to select elements from books.xml and embed them into an HTML page We do this by intermixing the HTML and the XSL in the books.xsl file We need to add a line to our books.xml file to tell it to refer to books.xsl for formatting information 55

books.xml, revised <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="books.xsl"?> <library> <book> <title>xml</title> This tells you where to find the XSL file <author>gregory Brill</author> </book> <book> <title>java and XML</title> <author>brett McLaughlin</author> </book> </library > 56

Desired HTML <html> <head> <title>book Titles and Authors</title> </head> <body> <h2>book titles:</h2> <ul> <li>xml</li> <li>java and XML</li> </ul> <h2>book authors:</h2> <ul> <li>gregory Brill</li> <li>brett McLaughlin</li> </ul> </body> </html> Blue text is data extracted from the XML document Brown text is our HTML template We don t necessarily know how much data we will have 57

XSL outline <?xml version="1.0" encoding="iso-8859-1"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/xsl/transform"> <xsl:template match="/"> <html>... </html> </xsl:template> </xsl:stylesheet> 58

Selecting titles and authors <h2>book titles:</h2> <ul> <xsl:for-each select="//book"> <li> <xsl:value-of select="title"/> </li> </xsl:for-each> </ul> <h2>book authors:</h2>...same thing, replacing title with author Notice the xsl:for-each loop Notice that XSL can rearrange the data; the HTML result can present information in a different order than the XML 59

All of books.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="books.xsl"?> <library> <book> <title>xml</title> <author>gregory Brill</author> </book> <book> <title>java and XML</title> <author>brett McLaughlin</author> </book> </library > Note: if you do View Source, this is what you will see, not the resultant HTML 60

All of books.xsl <?xml version="1.0" encoding="iso-8859-1"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/ XSL/Transform"> <xsl:template match="/"> <html> <head> <title>book Titles and Authors</title> </head> <body> <h2>book titles:</h2> <ul> <xsl:for-each select="//book"> <li> <xsl:value-of select="title"/> </li> </xsl:for-each> </ul> <h2>book authors:</h2> <ul> <xsl:for-each select="//book"> <li> <xsl:value-of select="author"/> </li> </xsl:for-each> </ul> </body> </html> </xsl:template> </xsl:stylesheet> 61

How to use it In a modern browser, you can just open the XML file Older browsers will ignore the XSL and just show you the XML contents as continuous text You can use a program such as Xalan, MSXML, or Saxon to create the HTML as a file This can be done on the server side, so that all the client side browser sees is plain HTML The server can create the HTML dynamically from the information currently in XML 62

Another Example of XSL Simple.xml Simple.xsl Simplexsl.xml Spring 2005 63

Part IV: XML supported in SQL Server 64

XML 数据类型用户可以将 XML 架构的集合与 XML 类型的变量参数或列关联起来在这种情况下,XML 数据类型实例称为类型化的 XML 实例, 否则称为非类型化的 XML 实例虽然在 SQL Server 2008 中可以像其他数据类型一样使用 XML 数据类型, 但是使用时还是存在了一些限制, 如下所示 : XML 数据类型实例所占据的存储空间大小不能超过 2 GB; XML 列不能指定为主键或外键的一部分 ; 不能用做 sql_variant 实例的子类型 ; 不支持转换或转换为 text 或 ntext, 请改用 varchar(max) 或 nvarchar(max); 不能用在 GROUP BY 语句中 ; 不能用做除 ISNULL COALESCE 和 DATALENGTH 之外的系统标量函数的参数 ;

例 1 在 PXSCJ 数据库中创建一个表 Xmltable, 表中包含两列 :Name 和 Content, 分别存储 XML 文件名和 XML 文件的内容 ; 定义一个 XML 类型的变量并赋值创建表 Xmltable 的 T-SQL 语句, 如下所示 : USE PXSCJ GO CREATE TABLE Xmltable ( Name char(20) NOT NULL PRIMARY KEY, Content xml NULL, ) 定义 XML 类型变量的 T-SQL 语句, 如下所示 : DECLARE @doc xml SELECT @doc='<xmldata><name>note.xml</name><content>hello!</co ntent></xmldata>'

SQL Server 2008 中导入 XML 数据 ( 一 ) 导入 XML 数据的方法一般有两种 1. 使用 INSERT 语句直接插入可以使用 INSERT 语句将 XML 数据以字符串形式直接插入 XML 类型列中例 10.2 向例 10.1 新建的表 Xmltable 中插入一行包含 XML 数据的记录, 示例数据为之前定义的 note.xml 文件的内容 INSERT INTO Xmltable VALUES('note.xml', '<note><to>wang</to><from age="20">zhang</from> <heading>reminder</heading><body>don't forget me this weekend!</body></note>')

2. 使用行集函数 OPENROWSET 语句当 XML 文件的内容很多时, 直接插入的方式显然不太合适这时可以使用行集函数 OPENROWSET 来完成 OPENROWSET 函数返回一个表, 可以在查询的 FROM 子句中像引用表名那样引用 OPENROWSET 函数将 OPENROWSET 函数返回的内容用做 INSERT 或 MERGE 语句的源表, 就可以将数据文件中的数据导入到 SQL Server 表中 OPENROWSET 函数的语法格式 : OPENROWSET ( BULK 'data_file', { FORMATFILE = 'format_file_path' [ <bulk_options> ] SINGLE_BLOB SINGLE_CLOB SINGLE_NCLOB } )

与 SELECT 一起使用的 FROM 子句可以调用 OPENROWSET(BULK...) 而非表名, 同时可以实现完整的 SELECT 功能带有 BULK 选项的 OPENROWSET 函数在 FROM 子句中需要使用 AS 子句指定一个别名也可以指定列别名, 如果未指定列别名列表, 则格式化文件必须具有列名, 指定列别名会覆盖格式化文件中的列名, 例如, SELECT FROM OPENROWSET(BULK...) AS table_alias SELECT FROM OPENROWSET(BULK...) AS table_alias(column_alias,...n) 例 3 假设 note.xml 文件保存在 D 盘根目录下, 使用 OPENROWSET 函数将该文件导入到数据表 Xmltable 中插入数据使用如下语句 : INSERT INTO Xmltable(name, content) SELECT 'note2.xml' AS name, * FROM OPENROWSET(BULK 'D:\note.xml', SINGLE_BLOB) AS note

XML 数据插入后可以使用 SELECT 语句查看插入了哪些数据 : SELECT * FROM Xmltable 结果如图 11 所示

另外,OPENROWSET 函数还可以用于插入图片文件文本文件 Word 文件 Excel 文件等内容这里以插入图片文件为例, 具体的操作步骤如下 (1) 建立测试表 USE PXSCJ GO CREATE TABLE Test ( TestID int IDENTITY(1,1), BLOBName varchar(50), BLOBData varbinary(max) ) (2) 使用 OPENROWSET 函数将图片文件导入数据库表字段 INSERT INTO Test(BLOBName, BLOBData) SELECT 'picture', BulkColumn FROM OPENROWSET(Bulk 'D:\picture.jpg', SINGLE_BLOB) AS BLOB

(3) 查询导入数据若上述脚本执行成功, 则可以通过下述查询语句来查询表 BLOBTest 中插入的数据 : SELECT * FROM Test 图 10.2 查询图片数据

Xquery in SQL2008 XML 数据类型方法 SQL Server 2008 系统提供了一些内置的用于 XML 数据类型的方法由于 XML 数据是分层次的, 具有完整的结构和元数据, 所以在查询 XML 实例时与普通数据类型不同可以使用 XML 数据类型方法查询存储在 XML 类型的变量或列中的 XML 实例常用的 XML 数据类型方法有以下几种 (1)query() 方法语法格式 : query ('XQuery') 该方法只有一个参数 XQuery,XQuery 为一个字符串, 用于指定查询 XML 实例中的 XML 节点 ( 如元素属性 ) 的 XQuery 表达式 query() 方法返回一个 XML 类型的结果

例 4 声明一个 XML 变量并将有关学生信息的 XML 数据分配给它, 再使用 query() 方法对文档指定 XQuery 来查询 <student> 子元素 DECLARE @xmldoc xml SET @xmldoc=' <school> <class> <student> <name> 王林 </name> <sex> 男 </sex> <age>20</age> </student> <student> <name> 何丽 </name> <sex> 女 </sex> <age>21</age> </student> </class> </school>' SELECT @xmldoc.query('/school/class/student') AS 学生信息

执行结果如图所示 XQuery 的基本用法 (2)value() 方法语法格式 : value (XQuery, SQLType) value() 方法对 XML 执行 XQuery 查询, 并返回 SQL 类型的标量值通常, 可以使用此方法从 XML 类型列参数或变量内存储的 XML 实例中提取值这样就可以指定将 XML 数据与非 XML 列中的数据进行合并或比较的 SELECT 查询了 XQuery:XQuery 表达式, 一个字符串文字, 从 XML 实例内部检索数据 XQuery 必须最多返回一个值, 否则将返回错误 SQLType: 要返回的首选 SQL 数据类型,value() 方法的返回类型要与 SQLType 参数匹配

例 5 使用 value() 方法从 XML 数据中查询出元素的属性值, 并赋给 char 变量 DECLARE @xmldoc xml DECLARE @number char(6) SET @xmldoc=' <school> <class><student number="081101"> <name> 王林 </name> <sex> 男 </sex> <age>20</age> </student> <student number="081102"> <name> 何丽 </name> <sex> 女 </sex> <age>21</age> </student></class> </school>' SET @number=@xmldoc.value('(/school/class/student/@number)[1]','char(6)') SELECT @number AS 学号

执行结果如图所示 value() 方法的使用 (3)exist() 方法语法格式 : exist (XQuery) exist() 方法返回一个位值, 表示下列条件之一 1, 表示 True( 如果查询中的 XQuery 表达式返回一个非空结果 ), 即它至少返回一个 XML 节点 0, 表示 False( 如果它返回一个空结果 ) NULL( 如果执行查询的 XML 数据类型实例包含 NULL)

例 16 使用 exist() 方法判断一个 XML 变量中是否存在某个属性 DECLARE @xmldoc xml SET @xmldoc= '<student name=" 王林 "></student>' SELECT @xmldoc.exist('/student/@name') AS 位值执行结果如图 10.5 所示 exist() 方法的使用

(4)modify() 方法语法格式 : modify (XML_DML) 使用该方法可以修改 XML 文档的内容, 也可以修改 XML 类型变量或列的内容等 XML_DML 参数是 XML 数据操作语言 (DML) 中的字符串, 使用 XML DML 语句可以在 XML 数据中插入更新或删除节点 modify() 方法只能在 UPDATE 语句的 SET 子句中使用 XML 数据修改语言 (XML DML) 是对 XQuery 语言的扩展, 使 XQuery 语言能够进行数据操作 (DML) XML DML 将下列区分大小写的关键字添加到 XQuery 中 :insert( 插入 ) delete ( 删除 ) replace value of( 替换 ) XML DML 中 insert 关键字的功能是将一个或多个节点作为 XML 实例中节点的子节点或同级节点插入 XML 实例中语法格式如下 : insert Expression1 {{as first as last} into after before}expression2

例.7 使用 XML DML 语句在一段 XML 数据中一个节点的后面添加一个节点 DECLARE @xmldoc xml SET @xmldoc='<student><name> 王林 </name><sex> 男 </sex><age>20</age></student>' SELECT @xmldoc AS 插入节点前数据 SET @xmldoc.modify('insert <birthday>1991-02-10</birthday> after (/student/sex)[1]') SELECT @xmldoc 插入节点后数据执行结果如图所示图 10.6 插入节点

XML DML 语句的 delete 关键字的功能是删除 XML 实例中的节点语法格式如下 : delete Expression 表达式 Expression 不能是根节点如果表达式返回空序列, 则不进行删除, 不返回错误例 8 删除 XML 类型变量中的一个节点 DECLARE @xmldoc xml SET @xmldoc= '<student><name> 王林 </name><sex> 男 </sex><age>20</age></student>' SELECT @xmldoc AS 删除节点前数据 SET @xmldoc.modify('delete (/student/age)[1]') SELECT @xmldoc 删除节点后数据执行结果如图所示图 10.7 删除节点

XML DML 语句的 replace value of 关键字的功能是在 XML 文档中更新节点的值语法格式如下 : replace value of Expression1 with Expression2 Expression1 标识其值要更新的节点, 它必须仅标识一个单个节点 Expression2 用于指定节点的新值例 9 将学生信息的 XML 数据中的 name 节点的属性值 081101 使用 091101 来代替 DECLARE @xmldoc xml SET @xmldoc= '<student><name number="081101"> 王林 </name><sex> 男 </sex><age>20</age> </student>' SELECT @xmldoc AS 更新节点前数据 SET @xmldoc.modify('replace value of (/student/name/@number)[1] with "091101" ') SELECT @xmldoc 更新节点后数据

执行结果如图所示更新节点的值 (5)nodes() 方法 nodes() 方法可以将 XML 实例拆分成关系数据 nodes() 方法的结果是一个包含原始 XML 实例的逻辑副本的行集在这些逻辑副本中, 每个行示例的上下文节点都被设置成由查询表达式标识的节点之一这样, 后续的查询可以浏览与这些上下文节点相关的节点语法格式 : nodes (XQuery) as Table(Column) XQuery 参数是一个字符串形式的 XQuery 表达式如果查询表达式构造节点, 这些已构造的节点将在结果行集中显示 Table(Column) 用于指定结果行集的表名称和列名称

例 10 使用 nodes() 方法查找并列的 <student> 节点 DECLARE @xmldoc xml SET @xmldoc='<class> <student number="081101"> <name> 王林 </name> <sex> 男 </sex> <age>20</age> </student> <student number="081102"> <name> 王燕 </name> <sex> 女 </sex> <age>21</age> </student> </class>' SELECT T.a.query('.') AS 结果 FROM @xmldoc.nodes( /class/student ) T(a)

执行结果如图所示图 10.9 nodes() 方法的使用

3.XQuery 查询 SQL Server 2008 支持的 XQuery 基本语法中除了能够使用 Xpath 路径表达式进行查询外, 还包含一个通用标准格式 : FLWOR 表达式 FLWOR 是 For,Let,Where,Order by, Return 的缩写以下示例说明了 FLWOR 的用法 ( 假设 book 元素是根元素 ): for $x in doc("note.xml")/book/note let $y :=/book/note/to where $x/number<20 order by $x/brand return $x/brand

例 11 查询 xml 文档中 age 元素小于 20 的 name 元素的数据 DECLARE @xmldoc xml SET @xmldoc='<class> <student number="081101"> <name> 王林 </name><sex> 男 </sex><age>20</age> </student> <student number="081102"> <name> 王燕 </name><sex> 女 </sex><age>19</age> </student> <student number="081103"> <name> 程明 </name><sex> 男 </sex><age>18</age> </student> </class>' SELECT @xmldoc.query('/class/student[age<20]') 执行结果如图所示查询 age 元素小于 20 的 XML 数据

例 12 使用 FLWOR 表达式查询 XML 数据 DECLARE @x XML SET @x='<manuinstructions ProductModelID="1" ProductModelName="SomeBike" > <Location LocationID="L1" > <Step>Manu step 1 at Loc 1</Step> <Step>Manu step 2 at Loc 1</Step> <Step>Manu step 3 at Loc 1</Step> </Location> <Location LocationID="L2" > <Step>Manu step 1 at Loc 2</Step> <Step>Manu step 2 at Loc 2</Step> <Step>Manu step 3 at Loc 2</Step> </Location> </ManuInstructions>' SELECT @x.query( 'for $step in /ManuInstructions/Location[1]/Step return string($step) ' )

执行结果如图 11 所示图 10.11 例 10.12 中 FLWOR 表达式的使用

FOR XML 子句的使用在 SELECT 语句中使用 FOR XML 子句可以将 SQL Server 2008 中表的数据检索出来并自动生成 XML 格式语法格式 : FOR XML { { RAW [ ( 'ElementName' ) ] AUTO } [ <CommonDirectives> [, { XMLDATA XMLSCHEMA [ ( 'TargetNameSpaceURI' ) ] } ] [, ELEMENTS [ XSINIL ABSENT ] ] EXPLICIT [ <CommonDirectives> [, XMLDATA ] ] PATH [ ( 'ElementName' ) ] [<CommonDirectives> [, ELEMENTS [ XSINIL ABSENT ] ] ] } <CommonDirectives> ::= [, BINARY BASE64 ] [, TYPE ] [, ROOT [ ( 'RootName' ) ]]

1.FOR XML RAW FOR XML RAW 是 FOR XML 查询模式中最简单的一种它获得查询结果并将结果集内的每一行转换为以一般标识符 <row /> 作为元素标记的 XML 元素在默认情况下,RAW 模式下元素名称为 <row>, 结果集中非空的列值将映射为 <row> 元素的一个属性, 即 <row> 元素的属性名称为列名或列别名如果需要定义别的元素名称, 则可以使用 ElementName 来指定 RAW 模式下可以使用以下选项 BINARY BASE64: 指定查询返回二进制 base64 编码格式的二进制数据 TYPE: 指定查询以 XML 类型返回结果 ROOT [('RootName')]: 指定将一个根元素添加到结果 XML 中可以指定要使用 RootName 生成的根元素名称, 如果不指定则默认为 <root> XMLDATA: 返回内联 XDR 架构, 但不将根元素添加到结果中后续的 SQL Server 版本将删除该选项, 这里不推荐使用 XMLSCHEMA [('TargetNameSpaceURI')]: 返回内联 XSD 架构如果指定该选项 ( 用于返回架构中指定的命名空间 ), 则可以选择指定目标命名空间 URI ELEMENTS: 指定列作为子元素返回其中,ELEMENTS XSINIL 指定为空列值创建其 xsi:nil 属性设置为 True 的元素 ELEMENTS ABSENT 指示对于空列值, 将不在 XML 结果中添加对应的 XML 元素

例 13 查询 PXSCJ 数据库的 XSB 表中总学分大于 50 的学生信息, 并将结果返回为 XML 元素 USE PXSCJ GO SELECT 学号, 姓名, 性别, 出生时间 FROM XSB WHERE 总学分 >50 FOR XML RAW 执行上述语句, 查看结果窗口中的结果, 如图所示图 10.12 使用 RAW 模式将查询结果生成为 XML 元素

例 14 使用 RAW 模式指定以 XML 类型返回结果 DECLARE @x XML SET @x=( SELECT * FROM KCB FOR XML RAW('course'),TYPE) SELECT @x 执行结果如图 13 所示图 10.13 返回 XML 类型的结果

2.FOR XML AUTO FOR XML AUTO 模式也返回 XML 文档, 该模式将查询结果返回为嵌套的 XML 树形式不过和 RAW 模式不同的是, 在 AUTO 模式中使用表名作为元素名称,FROM 子句中每个在 SELECT 子句中至少列出一次的表都被表示为一个 XML 元素, 使用列名作为属性名称 AUTO 模式中使用的选项命令与 RAW 模式的相同例 15 使用 AUTO 模式检索出学生的学号课程名和成绩信息 SELECT CJB. 学号, 课程名, 成绩 FROM CJB JOIN KCB ON CJB. 课程号 =KCB. 课程号 FOR XML AUTO

执行结果如图 14 所示图 10.14 使用 AUTO 模式生成 XML 元素

3.FOR XML EXPLICIT 使用 RAW 和 AUTO 模式都不能很好地控制从查询结果生成的 XML 的形状, 而 FOR XML EXPLICIT 模式允许用户显式地定义结果 XML 树的形状 EXPLICIT 模式产生独立于表的具有任意树形的层次结构如果直接在 SELECT 语句中使用 FOR XML EXPLICIT 子句, 会出现错误要正确使用 FOR XML EXPLICIT 模式, 在 SELECT 关键字后必须增加两个数据列 : Tag 和 Parent 第一列名称为 Tag,Tag 列必须提供当前元素的标记号 ( 整数类型 ), 查询必须为从行集构造的每个元素提供唯一标记号第二列名称为 Parent,Parent 列必须提供父元素的标记号, 如果父元素为根元素, 则可以使用 NULL 或 0 这样,Tag 和 Parent 列将提供层次结构信息例如,Tag 列的值为 1,Parent 列的值为 NULL, 则相应的元素将作为根元素 Tag 值为 2, Parent 值为 1, 则标记号为 2 的数据列的一组元素将作为根元素的子元素添加除了在 SELECT 子句后包含 Tag 和 Parent 列外, 还应该至少包含一个数据列格式如下 : [ ElementName!TagNumber!AttributeName!Directive ]

例 16 使用 EXPLICIT 模式检索出学号姓名总学分 3 列的信息 SELECT DISTINCT 1 AS Tag, NULL AS Parent, XSB. 学号 AS [ 学生信息!1! 学号 ], 姓名 AS [ 学生信息!1! 姓名 ], NULL AS [ 成绩信息!2! 成绩 ] FROM XSB, CJB WHERE XSB. 学号 =CJB. 学号 UNION ALL SELECT 2 AS Tag, 1 AS Parent, XSB. 学号, 姓名, 成绩 FROM XSB, CJB WHERE XSB. 学号 =CJB. 学号 ORDER BY [ 学生信息!1! 学号 ],[ 成绩信息!2! 成绩 ] FOR XML EXPLICIT

上述语句中使用 UNION ALL 组合了两个查询, 第一个查询将 < 学生信息 > 设为父元素, 并设置属性学号和姓名, 将值 2 赋给 < 成绩信息 > 元素的 Tag, 将值 1 赋给 Parent, 从而将 < 成绩信息 > 设为 < 学生信息 > 的子元素应用 FOR XML EXPLICIT, 并指定所需的 ORDER BY 子句必须先按学号再按成绩对行集进行排序, 以便先显示成绩中的 NULL 值执行上述语句后单击显示的结果, 显示如图所示的窗口使用 EXPLICIT 模式生成 XML 元素

4.FOR XML PATH FOR XML PATH 模式提供了一种更简单的方法来混合元素和属性 PATH 模式还是一种用于引入附加嵌套来表示复杂属性的较简单的方法使用 PATH 模式可以为使用 EXPLICIT 指令所编写的查询提供更简单的代替方案语法格式 : FOR XML PATH [ ( 'ElementName' ) ] [<CommonDirectives> [, ELEMENTS [ XSINIL ABSENT ] ] ] 在 PATH 模式中, 列名或列别名被作为 XPath 表达式来处理这些表达式指明了如何将值映射到 XML 每个 XPath 表达式都是一个相对 XPath, 它提供了项类型 ( 如属性元素和标量值 ) 以及将相对于行元素而生成的节点的名称和层次结构如果查询生成的结果集中包含了列名, 则指定的列名将作为 <row> 元素的子元素, 相应的列值将作为元素的内容例如, SELECT * FROM XSB WHERE 学号 = '081101' FOR XML PATH

上述语句的执行结果如下 : <row> < 学号 >081101</ 学号 > < 姓名 > 王林 </ 姓名 > < 性别 >1</ 性别 > < 出生时间 >1990-02-10</ 出生时间 > < 专业 > 计算机 </ 专业 > < 总学分 >60</ 总学分 > </row> 如果指定的列别名以 @ 符号开始并且不包含 / 标记, 则将创建包含相应列值的 <row> 元素的属性例如, SELECT 学号 AS '@ 编号 ', 姓名, 出生时间, 总学分 FROM XSB WHERE 学号 = '081101' FOR XML PATH

上述语句的执行结果如下 : <row 编号 ="081101"> < 姓名 > 王林 </ 姓名 > < 出生时间 >1990-02-10</ 出生时间 > < 总学分 >60</ 总学分 > </row> 使用 / 标记可以指定元素的层次, 例如, 学生信息 / 学号可以指定 < 学生信息 > 为父元素,< 学号 > 为子元素例 17 查找总学分大于 50 的学生,<row> 元素更名为 < 学生管理 >, 备注作为学生管理的属性学生管理元素下是学生信息元素, 学号姓名和总学分作为学生信息的子元素 SELECT 备注 AS '@ 备注 ', 学号 AS ' 学生信息 / 学号 ', 姓名 AS ' 学生信息 / 姓名 ', 总学分 AS ' 学生信息 / 总学分 ' FROM XSB WHERE 总学分 >50 FOR XML PATH(' 学生管理 ')

执行结果如图所示图 10.16 使用 PATH 模式生成 XML 元素

Part II : Parsing XML documents into Programming languages Parsing XML in Java

Outline Introduction to XML Parsers Tree-based Parsers and Event-base Parsers DOM DOM4J Spring 2005 104

Introduction to parsers The word parser comes from compilers In a compiler, a parser is the module that reads and interprets the programming language.

Introduction to Parsers In XML, a parser is a software component that sits between the application and the XML files.

Introduction to parsers It reads a text-formatted XML file or stream and converts it to a document to be manipulated by the application.

Well-formedness and validity Well-formed documents respect the syntactic rules. Valid documents not only respect the syntactic rules but also conform to a structure as described in a DTD.

Validating vs. Non-validating parsers Both parsers enforce syntactic rules only validating parsers know how to validate documents against their DTDs

Tree-based parsers These map an XML document into an internal tree structure, and then allow an application to navigate that tree. Ideal for browsers, editors, XSL processors.

Operation of a Tree-based Parser XML DTD Document Tree Valid Tree-Based Parser Application Logic XML Document Internet Technologies

Event-based An event-based API reports parsing events (such as the start and end of elements) directly to the application through callbacks. The application implements handlers to deal with the different events

<?xml version="1.0"?> <doc> <para>hello, world!</para> </doc> start document start element: doc start element: para characters: Hello, world! end element: para end element: doc end document Spring 2005 113

Event-based vs. Tree-based parsers Tree-based parsers deal generally small documents. Event-based parsers deal generally used for large documents.

Event-based vs. Tree-based parsers Tree-based parsers are generally easier to implement. Event-based parsers are more complex and give hard time for the programmer

What is DOM? The Document Object Model (DOM) is an application programming interface (API) for HTML and XML documents. It defines the logical structure of documents and the way a document is accessed and manipulated

Properties of DOM Programmers can build documents, navigate their structure, and add, modify, or delete elements and content. Provides a standard programming interface that can be used in a wide variety of environments and applications. structural isomorphism.

DOM Identifies The interfaces and objects used to represent and manipulate a document. The semantics of these interfaces and objects - including both behavior and attributes. The relationships and collaborations among these interfaces and objects.

What DOM is not!! The Document Object Model is not a binary specification. The Document Object Model is not a way of persisting objects to XML or HTML. The Document Object Model does not define "the true inner semantics" of XML or HTML.

What DOM is not!! The Document Object Model is not a set of data structures, it is an object model that specifies interfaces. The Document Object Model is not a competitor to the Component Object Model (COM).

<?xml version="1.0"?> <products> <product> <name>xml Editor</name> <price>499.00</price> </product> <product> <name>dtd Editor</name> <price>199.00</price> </product> <product> <name>xml Book</name> <price>19.99</price> </product> <product> <name>xml Training</name> <price>699.00</price> </product> </products> DOM into work

DOM into work

dom4j An Open Source XML framework for Java. Allows you to read, write, navigate, create and modify XML documents. Integrates with DOM and SAX. Full XPath support. XSLT Support.

Download and Use Go to: http://dom4j.org. Go to http://dom4j.org/download.html, and download the latest release (current = 1.4). Unzip. Don t forget the classpath. When working in an IDE, don t forget to add the log4j.jar library. Javadoc: http://dom4j.org/apidocs/index.html. Quick start guide: http://dom4j.org/guide.html.

Opening an XML Document import org.dom4j.*; public class TestDom4j { public Document parse(string id) throws DocumentException{ SAXReader reader = new SAXReader(); Document document = reader.read(id); return document; } } We can read: file, URL, InputStream, String

Example XML File <?xml version="1.0" encoding="utf-8"?> <salesdata xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xsi:nonamespaceschemalocation="c:\documents and Settings\eran\ My Documents\Academic\Courses\XML\xpath_ass_schema.xsd"> <year> <theyear>1997</theyear> <region><name>central</name><sales unit="millions">34</sales></region> <region><name>east</name><sales unit="millions">34</sales></region> <region><name>west</name><sales unit="millions">32</sales></region> </year> <year> <theyear>1998</theyear> <region><name>east</name><sales unit="millions">35</sales></region> region><name>west</name><sales unit="millions">42</sales> </region> </year> </salesdata>

Accessing XML Elements Accessing root element Retrieving child elements public void dump(document document) throws DocumentException{ Element root = document.getrootelement(); for (Iterator i = root.elementiterator(); i.hasnext(); ) { Element element = (Element)i.next(); System.out.println(element.getQualifiedName()); System.out.println(element.getTextTrim()); System.out.println(element.elementText("theyear")); } } Retrieving element name Retrieving element text Retrieving the text of the child element theyear

Accessing XML Elements cont d What will be the output of dump()? year 1997 year 1998 Why?

Accessing XML Elements Recursively public void go(element element, int depth){ for (int d=0; d<depth; d++){ System.out.print(" "); } System.out.print(element.getQualifiedName()); System.out.println(" "+ element.gettexttrim()); for (Iterator i = element.elementiterator(); i.hasnext(); ) { Element son = (Element)i.next(); go(son, depth+1); } } What will be the output?

Accessing Recursively cont d salesdata year theyear 1997 region name central sales 34 region name east sales 34 region name west sales 32 year theyear 1998 region name east sales 35 region name west sales 42 The whole XML tree, element names + values

Creating an XML document public Document createdocument() { Document document = DocumentHelper.createDocument(); Element root = document.addelement("phonebook"); Element address1 = root.addelement("address").addattribute("name", "Yuval").addAttribute("category", "family").addtext("ehud 3, Jerusalem"); Creating root element } Element address2 = root.addelement("address").addattribute("name", "Ortal").addAttribute("category", "friends").addtext("kibbutz Givaat Haim"); return document; What will we get when running go()? Adding elements

Creating an XML document cont d phonebook address Ehud 3, Jerusalem address Kibbutz Givaat Haim FileWriter out = new FileWriter("addresses.xml"); XMLWriter output = new XMLWriter(out); output.write(document); output.close(); XML tree structure of the new document Writing the XML document to a file

Client Program public static void main(string[] args) { Foo foo = new Foo(); try{ Opening the file Dumping and printed recursively } Document doc = foo.parse("c\\sales.xml"); foo.dump(doc); foo.go(doc.getrootelement(), 0); foo.xpath(doc); Document newdoc = foo.createdocument(); foo.go(newdoc.getrootelement(), 0); FileWriter out = new FileWriter( "C:\\addresses.xml" ); newdoc.write(out); } catch (Exception E){ System.out.println(E); } Creating a new document

Xpath - Introduction XML Path Language. XPath is a language for addressing parts of an XML document. Enables node locating and retrieving, very much like directory accessing in file systems. Limited (but not bad) filtering and querying abilities. Retrieved the actual PCDATA or node sets

Xpath Simple Path Selection Xpath Expression: /salesdata/year/theyear <theyear>1997</theyear> <theyear>1998</theyear> / signifies child-of /salesdata/year[2]/theyear <theyear>1998</theyear> Filtering the level getting only the second year element

Xpath Conditions /salesdata/year/region[sales > 34] <region> <name>east</name> <sales unit="millions">35</sales> </region> <region> <name>west</name> <sales unit="millions">42</sales> </region> Going down to region, and filtering according to the sales element /salesdata/year/region[sales >?

Xpath Traveling Up the Tree /salesdata/year/region[sales > 34]/parent::year/theyear <theyear>1998</theyear> Going up the XML tree (and then down again)

Xpath Traveling Down Fast /descendant::sales <sales unit="millions">34</sales> <sales unit="millions">34</sales> <sales unit="millions">32</sales> <sales unit="millions">35</sales> <sales unit="millions">42</sales> Going all the way down, until the sales element./*/sales Same same

Xpath Advanced Queries The years (text nodes) for which sales data exists: Logical operators //region[name=\"west\" and sales > 32]/sales[@unit='millions']/ancestor:: year /theyear Accessing attributes ancestor is same as parent but goes all the way up to year <theyear>1998</theyear>

Xpath Advanced Queries (cont d) The years (text nodes) in which the west region sales were higher than the east region sales; sales may be expressed in thousands or in millions: year[region[name="west"]/sales[@unit='m illions'*1000 or @unit='thousands'] > region[name="east"]/sales[@unit='mil lions *1000 or @unit='thousands']]/theyear/text()

Xpath in dom4j Xpath queries can be used in dom4j: public void xpath(document document) { XPath xpathselector = DocumentHelper.createXPath("/salesdata/year/theyear"); List results = xpathselector.selectnodes(document); for (Iterator iter = results.iterator(); iter.hasnext(); ) { Element element = (Element) iter.next(); } } System.out.println(element.asXML()); Xpath expression is fed to the xpathselector The nodes are selec from the document, according to the xpa query

Links For more information, see http://www.w3.org/tr/xquery The current draft of the spec itself http://www.w3.org/xml/query#products A list of implementations http://xml.coverpages.org/xmlquery.html Summaries and links to all sorts of XQuery resources http://www.xmlstarterkit.com/ Software AG s XML server implementing XQuery http://www.xquery.com Tutorials, home of Bumblebee http://www.saxonica.com Saxon is most highly regarded open source XQuery implementation