- 25 May 2021
- 2 Minutes to read
XPath expression best practices
- Updated on 25 May 2021
- 2 Minutes to read
Every time you select an element in the Agent Builder, Mozenda generates an XPath expression to guide the agent to the proper location. These auto-generated XPath expressions are generally reliable, but learn how to customize your own XPath expressions, to increase the speed, reliability and flexibility of your agents.
More than anything else in your Mozenda toolbox, XPath skills can take your agent building to the next level.
- Choose a reliable filter/predicate that is specific, easy to read, and contains clear labels for the action. This removes ambiguity and creates an agent that is more reliable at gathering the data you need.
- Use a contains function, normailize-space or nested XPath expression for more reliable and dynamic agents.
- Begin item lists require a different format of XPath expression, called a relative XPath. Relativity creates a central location to your other actions anchoring your data points for each item.
- If your ultimate goal is scraping a complex site, start with a simple site to build confidence in your skills and a foundation of best expression and problem solving.
- Create an outline of the steps, actions and data your agent will collect. Refer to this as you build your agent.
- Test your XPath expression in the DevTools.
- Open Inspect and select CTRL+F to write and test your XPath expression.
- Hover over the elements in DevTools HTML to highlight and confirm the item you want to select on the website.
- Build using two monitors or windows. One with the Agent Builder open and the other open to your target URL in a browser.
Common XPath expressions
The list below shows recommended functions to use in the agent builder for dynamic and reliable XPath expressions.
Use a contains function to find a partial match rather than an exact match. The contains is a great function for a dynamic website with attributes and values that have character changes.
Removes leading and trailing white-space from a string, replaces sequences of extra whitespace characters with a single space, and returns the resulting string. This function builds consistency into the webpage.
Nested XPath expressions
Use nested XPath expressions to select your target element. These apply conditions that create an
if then statement.
For example, you want to select the
<h4> element with a nested
<p> element that says
id="productname". If any other
<h4> element is found, the nested expression ensures that the element is only selected if it contains the element
Adding an alternate location
An alternate location gives an additional location to find the needed data. The location paths are looked at sequentially. Use this to build dynamic captures and avoid optional settings.
In the Agent Builder,
- Select an action.
- Select the XPaths window.
- Select at the end of the XPath string.
- Press Enter to create a new line.
- Write the alternate XPath location.
e.g. 1 //elementname[contains(@attribute,"value")] 2 //elementname[contains(.,"text")] 3
Refer to XPath 1.0 specifications