XPath expression best practices
  • 25 May 2021
  • 2 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

XPath expression best practices

  • Dark
    Light
  • PDF

Every time you select an element in the Agent Builder, Mozenda generates an XPath expression to guide the agent to the proper location. These auto-generated XPath expressions are generally reliable, but learn how to customize your own XPath expressions, to increase the speed, reliability and flexibility of your agents.

More than anything else in your Mozenda toolbox, XPath skills can take your agent building to the next level.

Best practices:

  • Choose a reliable filter/predicate that is specific, easy to read, and contains clear labels for the action. This removes ambiguity and creates an agent that is more reliable at gathering the data you need.
  • Use a contains function, normailize-space or nested XPath expression for more reliable and dynamic agents.
  • Begin item lists require a different format of XPath expression, called a relative XPath. Relativity creates a central location to your other actions anchoring your data points for each item.
  • If your ultimate goal is scraping a complex site, start with a simple site to build confidence in your skills and a foundation of best expression and problem solving.

Helpful hints

  • Create an outline of the steps, actions and data your agent will collect. Refer to this as you build your agent.
  • Test your XPath expression in the DevTools.
  • Open Inspect and select CTRL+F to write and test your XPath expression.
  • Hover over the elements in DevTools HTML to highlight and confirm the item you want to select on the website.
  • Build using two monitors or windows. One with the Agent Builder open and the other open to your target URL in a browser.

Common XPath expressions

The list below shows recommended functions to use in the agent builder for dynamic and reliable XPath expressions.

Contains function

Use a contains function to find a partial match rather than an exact match. The contains is a great function for a dynamic website with attributes and values that have character changes.

//elementname[contains(@attribute,"value")]
//elementname[contains(.,"text")]

Normalized-space

Removes leading and trailing white-space from a string, replaces sequences of extra whitespace characters with a single space, and returns the resulting string. This function builds consistency into the webpage.

//div[normalize-space(@class)='foobar'] 
//div[contains(normalize-space(@class),'foobar')]

Nested XPath expressions

Use nested XPath expressions to select your target element. These apply conditions that create an if then statement.

For example, you want to select the <h4> element with a nested <p> element that says id="productname". If any other <h4> element is found, the nested expression ensures that the element is only selected if it contains the element <p> with id="product name".

//h4[./child::p[contains(.,"productname")]]

Short hand:

//h4[./p[contains(.,"productname")]]

Adding an alternate location

An alternate location gives an additional location to find the needed data. The location paths are looked at sequentially. Use this to build dynamic captures and avoid optional settings.

In the Agent Builder,

  1. Select an action.
  2. Select the XPaths window.
  3. Select at the end of the XPath string.
  4. Press Enter to create a new line.
  5. Write the alternate XPath location.
e.g.
1 //elementname[contains(@attribute,"value")]
2 //elementname[contains(.,"text")]
3
For more information

Refer to XPath 1.0 specifications


Was this article helpful?