pboyd.io

What is UDP?

Imagine a holiday dinner with the Internet Protocol family: IP is at the head of the table, TCP is to the right, and UDP is to the left. ICMP is in the kitchen, SMTP couldn’t make it (IMAP delivered the apology), and Uncle ARP sits quietly after the introductions. But otherwise, there’s a lot of chatter (protocols, am I right?). TCP entertains everyone with the latest accomplishments of HTTP/2, just coming of age. IP listens with pride, knowing that the “TCP/IP Family” has a bright future with its undisputed scion: TCP. During a lull in the conversation, TCP looks over to UDP and asks with a smirk, “So, how’s DNS?”

UDP bristled, “Quite well, no thanks to you. How’s DNS over HTTPS?”

“Oh, you’ve heard about DoH. What a splendid little protocol it’s becoming. Name resolution is such a trifle. How are you holding up with HTTP/3? That must be a strain for a protocol like you.”

And, so it goes. Apparently, not even anthropomorphic networking protocols can have a civil family dinner. The rivalry between TCP and UDP has been fierce and one-sided. TCP is reliable. TCP runs the web. TCP runs e-mail. Meanwhile, UDP comes home after a piddling job with DNS to play video games.

Even so, during an (interesting!) career building software, you’ll probably encounter a few good reasons to use UDP, but you won’t recognize them unless you get to know this overshadowed little sibling.

UDP defined

If you’ve heard anything about UDP, you’ve probably heard it described as connectionless and unreliable. That’s true, but it’s like describing Elon Musk as a wingless human: technically accurate, yet completely unhelpful. We usually define things by what they are, not what they aren’t. So, what’s UDP? Well, not much, it turns out. You can read RFC 768 for yourself, but to summarize, UDP is layered on IP and adds port numbers:

 0      7 8     15 16    23 24    31
+--------+--------+--------+--------+
|     Source      |   Destination   |
|      Port       |      Port       |
+--------+--------+--------+--------+
|                 |                 |
|     Length      |    Checksum     |
+--------+--------+--------+--------+
|
|          data octets ...
+---------------- ...

     User Datagram Header Format

If you’re developing a network stack, you should dig into the details of that checksum, but the rest of us can just remember that UDP adds ports to an IP packet. Of course, this is only a helpful definition if you know what IP is. Lucky for us, RFC 791 has a wonderfully succinct description:

The internet protocol implements two basic functions: addressing and fragmentation.

Fragmentation is a worthwhile topic for another time. Addressing is what we need to understand right now. It’s so easy to take this for granted today that I fear you will tune me out, but please bear with me: hosts connected on IP networks have addresses, and if you know another host’s address, you can send it a datagram. It’s amazing, actually. It works all over the world. We usually say packet today, but datagram is a nice word. Think of it like an old-time telegram for data on a packet-switched network. IP is concerned about your datagram’s payload size, but besides that, it doesn’t care what you send. IP only delivers datagrams to other hosts.

But IP doesn’t actually try very hard to deliver your datagram. In fact, it makes no guarantees. Your datagram could be lost, sent to the wrong address, or mangled beyond recognition. Even if it does arrive, nothing may be there to receive it, and if you sent multiple messages, they may be out of order. IP doesn’t care. Networks aren’t reliable, and IP won’t let you pretend that they are.

Lobbing off datagrams and hoping for the best may be good enough for IP, but humans usually need stronger guarantees than that. If these were telegrams instead of datagrams, we might layer in a human protocol, like, “Please confirm that you received this message at once.” At the risk of over-simplifying, TCP is a more formal and complete extension of that idea. That makes TCP reliable, but it comes with trade-offs. Here are a few:

The trade-offs are often acceptable, but sometimes they aren’t, and when they aren’t, you may wish you could send IP datagrams. The problem is that IP datagrams go to hosts, not programs on a host. Multiple programs attempting to communicate by raw IP datagrams would be a mess: programs would have to recognize and filter messages intended for another program. UDP, if you remember, adds ports to IP datagrams. So now, instead of listening for datagrams on an IP address, a program can listen for datagrams on a port. You could say it’s a protocol for delivering IP datagrams to users (or, at least, user-level programs) instead of hosts. Or, if you like, the User Datagram Protocol.

What’s it for?

This has been more theoretical than I usually like, so let me wrap up with some of my favorite reasons to use UDP, with some real-world examples: