XPath Syntax and examples
  • 24 May 2021
  • 2 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

XPath Syntax and examples

  • Dark
    Light
  • PDF

Article Summary

There are several ways to write an XPath expression to capture the information from an HTML element. Adopting different strategies, depending on the structure of web page, can help you capture the data you want more reliably.

XPath is a query language made up of location steps that help you and your agent find the data you need. There are three components to an XPath:

  1. Axis
  2. Node
  3. Predicates or filters

Common axis names

The axis represents a relationship to the current node and is used to locate nodes relative that node on the tree.

Axis nameAbbreviated syntaxDescription
ancestor::Selects all ancestors (parent, grandparent, etc.) of the current node.
ancestor-or-self::Selects all ancestors (parent, grandparent, etc.) of the current node.
attribute@Selects all attributes of the current node.
child/child:: or /Selects all children of the current node.
descendant::Selects all descendants (children, grandchildren, etc.) of the current node.
descendant-or-self:://Selects all descendants (children, grandchildren, etc.) of the current node and the current node itself.
following-sibling::contains all following siblings of the context node.
parent::/..contains the parent of the context node. Use .. for the short hand of parent::node().
preceding-sibling::Selects all nodes that appear before the current node in the document, except ancestors, attribute nodes and namespace nodes.
self::.. selects the current node or self::node().
normalize-spacenormalize-space( [string] )Removes any leading or trailing white-space from a string replaces it with a single space.

Screen Shot 2020-03-24 at 9.56.17 PM.png

Common filters/predicates:

To apply any function to your XPath expression you must add brackets next to the name of the HTML element. Multiple functions can be applied to one element.

Attribute selector: Use the attributes to identify an element.
Screen Shot 2020-03-24 at 10.19.07 PM.png

Attribute contains function: Select some of the attributes value to find a match. This is helpful when the value is long or if there is a specific value string you want.
Screen Shot 2020-03-24 at 10.20.05 PM.png

Text contains function: Instead of using an attribute use the text on the website to identify the element you want. For example, if you would like to select sponsored items, search for the word sponsored in the HTML and return the result.

Screen Shot 2020-03-24 at 10.22.45 PM.png

Not contains: To exclude a specific element from your capture add a "not" before your "contains" function.
Screen Shot 2020-03-24 at 10.28.19 PM.png

And - or: Add multiple functions to identify your HTML element. For example, if you want to select both the price and price shared the attribute of id="tv" and class="price", using both of these would narrow in the results to show the only those that matched both tv and price.

Screen Shot 2020-03-24 at 10.25.54 PM.png

Number operator: Adding a number inside of brackets selects that HTML element based on numerical order. Selecting by numerical order is less reliable than specifying an element by an attribute such as class because the order of HTML elements can change.


Was this article helpful?