Xpath Intro


XPath is a non-XML language used to identify any particular element of an XML / XHTML document.

XPath indicates nodes by position, relative position, type, content, and several other criteria. XSLT use XPath expressions to match and select particular elements in the input document for copying into the output document or further processing. XPointer use XPath expression to identify the particular point in or part of an XML document that an XLinks links to.

XPath expressions can also represent numbers, strings, or boolean, so XSLT stylesheets carry out simple arithmetic for numbering and cross-referencing figures, tables, and equations. Strings manipulation in XPath lets XSLT perform tasks like making the title of a chapter uppercase in a headline, but mixed case in a reference in the body text.

An XML document is a tree made up of nodes. Some nodes contains other nodes. One root node ultimately contains all nodes. XPath is a language for picking nodes and sets of nodes out of this tree. From the perspective of XPath, there are seven kinds of nodes: the root node, element nodes, text nodes, attribute nodes, comment nodes, processing instruction nodes, and namespace nodes.

Constructs not included in this list are CDATA sections, entity references, and document type declarations. XPath operates on an XML document after these items have been merged into the document. For instance, XPath cannot identify the first CDATA section in a document or tell whether a particular attribute value was included directly in the source element start tag or merely defaulted from the declaration of the attribute in the DTD.

The XPath data model has several inobvious features. First, the tree's root node is not the same as its root element. The tree's root node contains the entire document, including the root element, comments, and processing instructions that occur before the root element start tag or after the root element end tag.

XPath data model does not include everything in the document. In particular, the XML declaration and DTD are not addressable via XPath. However, if the DTD provides default values for any attributes, then XPath recognizes those attributes. Similarly, any references to parsed entities are resolved. Entity references, character references, and CDATA sections are not individually identifiable, though any data they contain is addressable. For example, XSLT does not enable you to make all text in CDATA section bold because XPath doesn't know what text is and isn't part of a CDATA section.

Finally, xmlns attributes are reported as namespace nodes. They are not considered as attribute nodes, though a non-namespace aware parser will see them as such. Furthermore these nodes are attached to every element and attribute node for which that declaration has scope. They are not just attached to the single element where the namespace is declared.

Root Location Path:

XPath syntax was deliberately chosen to be similar to the syntax used by the Unix shell. Here / is the root of a Unix filesystem and / is the root node of an XML document.

The forward slash / is an absolute location path because no matter what the context node is, no matter where you were in the input document when this template was applied, it always means the same thing: the root node of the document. It is relative to the document you process, but not to anything within that document.

Child Element Location Steps:

The second simplest location path is a single element name. This selects all child elements with the specified name. Exactly which elements they are depends on what the context node is, so this is a relative XPath.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License