http://www.oreilly.com/catalog/xmlnut/chapter/ch09.html - Chapter 9 of XML in a nutshell
http://www-xray.ast.cam.ac.uk/~jgraham/mozilla/xpath-tutorial.html - Mozilla XPath documentation
http://www.zvon.org/xxl/XPathTutorial/General/examples.html - XPath tutorial by examples
document.evaluate('XPATH HERE', document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue; document.evaluate('XPATH HERE', document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue.innerHTML;
/html/body/table // all <table> tags that are direct descendants of the body tag /html/body/table // the first table that is direct descendants of the body tag //*[@id="firstname"] // select element with id="firstname" //a[@id="blah"] // find <a> with a certain attribute value //*[text()="Advanced Search"] // find the element containing specified text //a[.="Advanced Search"] // find the element containing specified text
//@id // selects all id attributes of any element in the document person//@id // selects all id attributes of any element contained in the <person> element. //profession[.="physicist"] // find all <profession> whose value is physicist
Attributes are also part of XPath. To select a particular attribute of an element, use an at sign @ followed by the name of the attribute you want.
Selecting sibling element:
//middle_initial/../first_name // select <first_name> elements that are sibling of <middle_initial> elements //input[following-sibling::text()="label1199734151"] select the input box that has the text .... next to it //label[text()="Enter Email Address"]/../following-sibling::div/input
Selecting a parent or child element:
//person [profession="physicist"] // find all <person> that have <profession> child element with value "physicist" //span[text()="My Templates"]/ancestor::a select the link that has display text "My Templates" using the ancestor axis //planet[name="Venus"] // find the <planet> element that have <name> child with text equal to "Venus"
//*[text()="01/10- Test Campaign"]/../..
The above XPath look for any element where the text is "01/10- Test Campaign" and then traverse two level up the parent hierarchy to get the grand parent node.
parent::node() corresponds to the .. shortcut. In fact parent::* is limited to the principal node type of the axis.
Using the contains() function:
//*[contains(@href,"followMeVCR.php")] select an element that the href attribute contains followMeVCR.php //img[contains(@src,"sma/images/products")]
The contains() function take two string arguments, and returns true if first argument contains second argument.
Using and / or on attributes:
//input[@value="out" and @name="inOrOut"] select an input element, base on the value attribute and the name attribute //input[@type="hidden" and @name="contactID"] //input[@type="radio" and @value="team" and @name="fromMainOption"]
//@id/.. // identifies all elements in the document that have id attributes count(//person) // count <person> tags
Locate table cell with XPath
Given a table that has column headerA, column headerB, column headerA contains cellA with a unique value, column headerB contains cellB. We can locate cellB because the horizontal distance between cellA and cellB is equal to the horizontal distance between headerA and headerB.
To calculate the distance between headerA and headerB, we calculate how far they are from the first (oldest) sibling:
Assuming that we can find cellA using:
Then we can find cellB using:
//td[contains(text(),"#email#")]/following-sibling::td[count(//span[contains(text(),"#headerB#")]/ancestor::td/preceding-sibling::td) - count(//span[contains(text(),"#headerA#")]/ancestor::td/preceding-sibling::td)]
In the above example, we assume that the element for the table cell is td. In other cases, it can be a span or div, so all the td have to be replace with span or div. The #email#, #headerA#, #headerB# are place holders and we have to replace those with appropriate value.
The comment(), text(), and processing-instruction() Location Steps
Since comments and text nodes don't have names, the comment() and text() functions match any comment or text node that's an immediate child of the context node. Each comment is a separate comment node. Each text node contains the maximum possible contiguous run of text not interrupted by a tag. Entity references and CDATA sections are resolved into text and markup and do not interrupt text nodes.
With no arguments, the processing-instruction() function selects all the context node's processing instruction children. If it has an argument, it selects only the processing instruction children with the specified target. For example, XPath expression processing-instruction('xml-stylesheet') selects all processing instruction children of the context node whose target is xml-stylesheet.
Wildcards allow you to match different element and node types at the same time. There are three wildcards: *, node(), and @*.
The asterisk * matches any element node, regardless of type. The * does not match attribute, text, comment, or processing instruction nodes. You can put namespace prefix in front of the asterisk. In this case, only elements in the same namespace are matched. For example, svg:* matches all elements with the same namespace URI as the svg prefix is mapped to. As usual, the URI, not the prefix, matters. The prefix may differ in the stylesheet and the source document, as long as the namespace URI is the same.
The node() wildcard matches all nodes: element, text, attribute, processing instruction, namespace, and comment.
The @* wildcard matches all attribute nodes. As with element, you can attach a namespace prefix to the wildcard to match only attributes in a specific namespace. For instance, @xlink:* matches all XLink attributes, provided that the prefix xlink is mapped to the http://www.w3.org/1999/xlink namespace. Again, the URI, not the prefix, matters.
Multiple Matches with |
You may want to match more than one type of element or attribute, but not all types. You can combine individual location steps with a vertical bar | to indicate that you want to match any of the named elements. For instance: object|img|embed.
*|@* matches elements and attributes, but does not match text, comment, or processing instruction nodes
Compound Location Paths
The XPath expressions you've seen so far — element names, @ plus an attribute name, / , comment(), text(), node(), and processing-instruction() — are all single location steps. You can combine these location steps with the forward slash to move down the hierarchy from the matched node to other nodes. You can also use a period to refer to the current node, a double period to refer to the parent node, and a double forward slash to refer to descendants of context node.
A double forward slash // selects from all descendants of the context node as well as the context node itself. At the beginning of an XPath expression, it selects from all descendants of the root node. For example, the XPath expression //name selects all name elements in the document.
XPath supports a full complement of relational operators, including =, <, >, >=, <=, and !=. Note that if < or <= is used inside an XML document, you still must escape the less-than sign as &lt;.
XPath also provides boolean 'and' and 'or' operators to combine expression logically. For example, the XPath expression //person[@born<=1920 and @born>=1910] selects all person elements with born attribute values between 1910 and 1920, inclusive.
General XPath Expression
XPath expressions can also return numbers, booleans ( true(), false(), not() ), and strings. XPath provides the five basic arithmetic operators: +, -, *, div, and mod.
XPath function returns one of these four types: boolean, number, node set, string. There are no void function in XPath. XPath is not as strongly typed as Java or C. You can often use these types as a function argument, regardless of which type the function expects, and the processor will substitute one of the two strings true and false for the boolean. The one exception is functions that expect to receive node sets as arguments. XPath cannot convert strings, booleans, or numbers to node sets.
XPath processor can convert a node set to its string value (its text content).
The position() function returns the current node's position in the context node list as a number.
The last() function returns the number of nodes in the context node set, which is the same as the position of the last node in the set.
The count() function is similar to last(), except that it returns the number of nodes in its node set argument rather than in the context node list.
The id() takes a string containing one or more IDs separated by whitespace as an argument and returns a node set containing all nodes in the documents that have those IDs. These are nodes with attributes declared to have type ID in the DTD, not necessarily nodes with attributes named ID or id.
XPath includes functions for basic string operations, such as finding string's length or changing letters from uppercase to lowercase. It does not have the full power of the string libraries in Python or Perl. For example, there's no regular expression support.
The concat(s1, s2, s3, …) function call takes as arguments any number of strings and concatenate them together.
The string() function converts any type of argument to a string in a reasonable fashion.
The starts-with(str1,str2) function call return true if str1 starts with str2.
The substring-before() function takes two string arguments and returns the substring of the first argument string that precedes the second argument's initial appearance. If the second string doesn't appear in the first string, then substring-before() returns the empty string. For example, substring-before('MM/DD/YYYY','/') is 'MM'.
The substring-after() is similar to substring-before(). This is equivalent to post-match in Perl.
The substring() function takes three arguments: the string from which the substring is copied, the position in the string from which to start extracting, and the number of character to copy. The third argument may be omitted.
The string-length() function returns a number giving the length of the string value of its argument, or of the context node if no argument is given. For example: string-length(//name[position()=1]).
The normalize-space() function remove extra spaces. For example: normalize-space(string(//name[position()=1])).
The number() function can take any type as an argument and convert it to a number. If the argument is omitted, it converts the context node. Booleans are converted to 1 if true, 0 if false. Strings are converted in a plausible fashion. Node sets are converted to number by first converting them to their string values and then converting them to numbers. If the object you convert can't be reasonably interpreted as a single number, then NaN is returned.
The round(), floor(), ceiling() functions all take a single number as an argument.
The sum() function takes a node set as an argument. It converts each node in the set to its string value, then converts each of those strings to a number. Finally, it adds the numbers and returns the result.
Different types of nodes:
There are seven kinds of node: root node, element nodes, text nodes, attribute nodes, comment nodes, processing instruction nodes, namespace nodes.
/ The root node for XPath /html The html tag /html/body/p Select all p tags that are direct descendants of the body tag