Agent building helpful hints
  • 25 May 2021
  • 2 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

Agent building helpful hints

  • Dark
    Light
  • PDF

Article Summary

Every time you select an element in the Agent Builder, Mozenda generates an XPath expression to guide the agent to the proper location. These auto-generated XPath expressions are generally reliable, but by learning how to customize or create your own XPath expressions, you can increase the speed, reliability and flexibility of your agents.

More than anything else in your toolbox, XPath skills can take your agent building to the next level.

  • When you write an XPath expression the goal is to make the expression specific and remove any ambiguity.
  • Load the URL in a browser outside of the Agent Builder. This allows you to maneuver through the website without affecting your agent.
  • It is helpful to have two monitors or windows. One to load the URL in the browser and the other with the Agent Builder. As you write in the agent builder it might jump around so you can look at the browser that hasn’t moved.
  • When you inspect an element, look above what is highlighted to see if a node gives more information about the area you inspected. If so, use that node as an anchor point for your XPath expression.
  • In DevTools, as you move your mouse in the HTML, the element in the website will be highlighted. Use this to confirm if the HTML node you are hovering over is associated to the item you want on the page.
  • Choose an expression filter that is reliable, easy to read, and labels the action.
  • Use DevTools to test your Xpath expression in real time. Open the DevTools (right-click + inspect), select in the HTML and hold CTRL+F. A search bar displays to write and test your XPath.
  • Start with a simple website to get the hang of how the syntax works. If your ultimate goal is scraping a complex site, testing on simple sites builds confidence and provides a foundation to analyze complex sites because you already practiced on a site that is readable.
  • Set up capture definitions to confirm you are getting the right kind of information. For example, if you capture a phone number, refine capture text or add a capture definition to only recognize those characters. If you're capturing anything outside of those parameters an error will be thrown because you are not getting a phone number.
  • Remove any optional capture actions because if there is an error you are not notified. If there is an option, set a reminder to manually check that you are getting the expected data.
  • Ancestor vs ../ is more flexible. Ancestor will look through all the levels and is more reliable when they are used to navigate up and better for universal XPath. This works great for lists.
  • Look for tags that are easily identifiable, rather than a random combinations of words, IDs, or letters. Tags are less likely to change, while words, letters, and IDs are often auto-generated and can change.
  • Write out all of the axis' name and not just the shorthand name. This will help you understand why each item is located where it is. Learn what the shorthand means before using it in your XPath expression.
  • Understand how to put XPath inside of XPaths to narrow your search more specifically. These are great for error handling. For example, if the format changes or if there are no results on the page, you can capture the parent and if it doesn't have any children.

Was this article helpful?