Given the increased reliance on XML in scripts and exchanging data, a number of different solutions leverage XML traversal options to get all the things done. We frequently use path to bring a file into a script or program, or accept input from STDIN. The most basic task that we then perform is simply selecting an item from that file or STDIN and then variabalizing it. One common tool that we use here is Path. XPath calls these objects nodes, and uses path expressions to select these nodes. A path expression is the path along the xml input that is followed to find a piece of data.
There are some pretty standard wildcards the can be used with xpath, where node() watches any node, * matches any element node, @* matches any attribute node, helping to constrain output.
Supported expressions include:
- node: This is a text input that identifies the name of a node to start a relative search from – for example site would select all nodes in a structure with the name site
- . Identifies the current node (kinda’ like pwd in a shell)
- .. Starts at the parent of the current node – for example,
- / Starts traversal from the root node – for example, /computer would select any nodes that falls underneath
- /computer meaning that these are absolute paths
- // Identifies the nodes in an XML structure that match a selection wherever they may be – for example
- //computer would select all nodes that contain //computer and search for other expressions below those that you may identify such as: ‘xpath //computer/general/mac_address’
- //* Selects everything
- //computer/* Selects all the child element nodes of everything that starts with computer
- @ Select an attribute in an XML structure – for example ‘xpath //computer/general/@’
- [1] This predicate selects the first item (or whatever number is identified, so xpath
- //computer[3]/general/mac_address would return with the mac address of the third computer
- [@PATTERN] Constrains found sets, so ‘xpath //computer/general/[@mac_address]’ identifies all computers with an actual mac_address attribute
- //[@PATTERN=VALUE] Constrains a found set to all items where the attribute contains the value, so ‘xpath //computer/general/[@mac_address=00]’ identifies all computers with an actual mac_address attribute that has the value of 00
- //[@*] Selects only items with something in an attribute (non-null), so ‘xpath //computer/general/[@mac_address=@*] (btw, rather than use an =, you can use > or <)
- | creates compound matches. So ‘xpath //computer/general/mac_address | //computer/general/name’ would grab the mac_address and name of every computer
- [last()] Identifies the last item, so ‘xpath //computer[last()]/general/mac_address’ would return the last computer’s mac address
- [last()-2] placing a negative number after the parenthesis identifies descending orders from the end of a found set – for example, //computer[last()-2] Selects the second to last computer
Overall, as you can see xpath really makes traversing XML structures simple. Other tools and languages have their own ways, but most are similar in syntax.