In the previous part of the series, we looked at the introduction and demonstration of XML External Entity (XXE) injection attack on a sample application, leaving behind the internal working of the vulnerability.

In this part, we will be looking at the source code of the application to get to the root cause of the vulnerability and a basic understanding of the payload used.

First, let’s talk about the basics of XML file format.

Understanding the XML format

XML (eXtensible Markup Language) format in syntactical terms is similar to HTML format, in HTML all the tags represent a specific meaning such as head, body whereas in XML all the tags are mere representation of named data. A simple XML file is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
    <book>
        <title>The Great Gatsby</title>
        <author>F. Scott Fitzgerald</author>
        <year>1925</year>
        <price>10.99</price>
    </book>
    <book>
        <title>1984</title>
        <author>George Orwell</author>
        <year>1949</year>
        <price>8.99</price>
    </book>
</catalog>

The above example XML data states that a catalog (an XML entity) contains two books with titles “The Great Gatsby” and “1984” (along with some other information such as author, year of publication and book price).

Question

How a different representation of a data such as catalog can result in a vulnerability?

The answer lies not in the representation of the data but the parsing of the data in the application.

There is an addition feature added to provide the sanity check in XML format named Document Type Definition (DTD) which is used to facilitate the XXE attack.

DTD is used to declare the structure of the XML document i.e. the values it may contain. The same can be declared with DOCTYPE element at the top of the XML document.

An example of DTD declaration:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
    <!ELEMENT catalog (book+)>
    <!ELEMENT book (title, author, year, price)>
    <!ATTLIST book id ID #REQUIRED>
    <!ELEMENT title (#PCDATA)>
    <!ELEMENT author (#PCDATA)>
    <!ELEMENT year (#PCDATA)>
    <!ELEMENT price (#PCDATA)>
]>
<catalog>
    <book>
        <title>The Great Gatsby</title>
        <author>&author;</author>
        <year>1925</year>
        <price>10.99</price>
    </book>
    <book>
        <title>1984</title>
        <author>George Orwell</author>
        <year>1949</year>
        <price>8.99</price>
    </book>
</catalog>
  • ELEMENT: Element declaration defines the elements and their structure within the XML document
  • ATTLIST: Attribute declaration defines the attributes that an element can have, along with their types and default values.

An example of DTD declaration to support the use of variables:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [ <!ENTITY author "F. Scott Fitzgerald"> ]>
<catalog>
    <book>
        <title>The Great Gatsby</title>
        <author>&author;</author>
        <year>1925</year>
        <price>10.99</price>
    </book>
    <book>
        <title>1984</title>
        <author>George Orwell</author>
        <year>1949</year>
        <price>8.99</price>
    </book>
</catalog>

In the above example, we have declared a entity named author with a value of F. Scott Fitzgerald, during the parsing of the document the instance(s) of &author; will be replaced with the value of F. Scott Fitzgerald.

The same DTD can be used to query a web page or loading a file from the system and assigning it to the appropriate variable like follows:

  1. Loading a file in variable named content
<!DOCTYPE root [ <!ENTITY content SYSTEM "file:///some/secret/file"> ]>
  1. Loading the web page in variable named content
<!DOCTYPE root [ <!ENTITY content SYSTEM "http://google.com/"> ]>

NOTE: SYSTEM keyword is used to declare an external DTD subset, i.e. in this case a system identifier.

Let’s refer the payload used in the demonstration,

<!DOCTYPE root [
    <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<input>
    <username>
        &xxe;
    </username>
    <password>
        asura
    </password>
</input>

As, you can see in the payload, we are reading a file mounted at /etc/passwd location and assigning it to variable named xxe and the same variable is assigned to the entity named username.

Therefore, while rendering on the browser the passed username is displayed therefore inherently displaying the contents of the /etc/passwd file.

Let’s deep dive into checking where in our source code the vulnerability exists.

Source Code Evaluation

To evaluate the source code of the application and find the cause of the vulnerability, we need to evaluate the /login endpoint.

To interactively attach the shell inside docker, use the following command:

docker exec -it xxe /bin/bash

After interactively interacting with the shell, let’s try to find the file containing the /login endpoint defined.

grep -r "/login" .

The above command execution results in 2 file matches index.html and XmlController.java, here we are interested in .java file. The truncated content of file src/main/java/com/example/controllers/XmlController.java are as follows:

import com.example.services.XmlService;
...
@PostMapping("/login")
public String parseXml(@RequestParam("xmlInput") String xmlInput, Model model) {
	try {
		Asura result = xmlService.parseXml(xmlInput);
		logger.info("result: {}", result);
		model.addAttribute("xmlOutput", result);
	} catch (Exception e) {
		model.addAttribute("error", "Error parsing XML: " + e.getMessage());
	}
	return "result";
}

The function which is used to parse the XML document is parseXml. The following is parseXml function defined in src/main/java/com/example/services/XmlServiceImpl.java

@Override
public Asura parseXml(String xmlInput) throws Exception {
	Asura output = new Asura("", "", "");
	
	// XML parsing
	DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
	DocumentBuilder builder = factory.newDocumentBuilder();
	Document document = builder.parse(new InputSource(new StringReader(xmlInput)));
 
	NodeList itemList = document.getElementsByTagName("input");
 
	if (itemList.getLength() > 0) {
		Element item = (Element) itemList.item(0);
		String username = getElementTextContent(item, "username");
		String password = getElementTextContent(item, "password");
 
		output = getJsonByKey(username);
 
		if (output == null) {
			output = new Asura("", "", "");
		}
 
		output.setUsername(username);
	}
 
	return output;
}

In the above code snippet, the developer has not made any sanitization check in the XML parsing variable named factory, due to which XML will parsed normally as expected without any validation of the XML file provided.

Bug

No sanitization checks made in the factory variable of the XML parsing library DocumentBuilderFactory

The library provides various options to properly validate, disable DTD and context aware execution of the XML file. The following lines can be added after the declaration of factory variable, to enable the validation of XML input:

factory.setNamespaceAware(true);
factory.setValidating(false);
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);

Now, let’s try executing our same payload again and check whether we can access the contents of the file /etc/passwd.

Voila! We have successfully mitigated the vulnerability, as we are unable to view the contents of /etc/passwd file.

For reference, the full code of the parseXml function is as follows:

@Override
public Asura parseXml(String xmlInput) throws Exception {
	Asura output = new Asura("", "", "");
	
	// XML parsing
	DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
 
	// Mitigation
	factory.setNamespaceAware(true);
	factory.setValidating(false);
	factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true); // Prevent DTD processing
 
	DocumentBuilder builder = factory.newDocumentBuilder();
	Document document = builder.parse(new InputSource(new StringReader(xmlInput)));
 
	NodeList itemList = document.getElementsByTagName("input");
 
	if (itemList.getLength() > 0) {
		Element item = (Element) itemList.item(0);
		String username = getElementTextContent(item, "username");
		String password = getElementTextContent(item, "password");
 
		output = getJsonByKey(username);
 
		if (output == null) {
			output = new Asura("", "", "");
		}
 
		output.setUsername(username);
	}
 
	return output;
}

In this part of the series, we have successfully mitigated the XXE vulnerability. But there is something more to XXE vulnerability. In the above demonstration, the username entity is being shown at the front-end of the website due to which we can see the contents of /etc/passwd file, but what if the username entity content is not shown back to us?

The above said type of vulnerability is a special type of XXE attack named Blind XXE, where the commands are executed at the server-end but no output is provided to the user for verification of successful exploitation of the vulnerability.

There are various methods which can be incorporated, one of which is using a InteractSH server and a payload to ping to the domain of the InteractSH server. You can get your free test InteractSH instance from InteractSH official website.