Writing a Wayland Client... Without libwayland

A little under two years ago, I tried to start learning graphics programming by following the LearnOpenGL tutorial. I quickly ran into an issue with GLFW where I could not get the Wayland backend to work. I had to build it for X11 and rely on Xwayland for the program to work. This led me to want to understand how Wayland works, so I started to write a Wayland client following the wayland book. Still unsatisfied, I began writing my own program to generate code from Wayland protocols, with which I would write a Wayland client without using libwayland. I got the project working, but not well, and the code was a mess that I was very unhappy with. I recently returned to the project with more experience, and a clearer picture of how to achieve my goals. I now want to share what I've learned for any others who may be interested in either bypassing libwayland, or simply building a deeper understanding of libwayland does internally.

Wayland Basics

To begin, it is important to understand what Wayland is and how it (roughly) works. Wayland is a collection of protocols and a wire format description for encoding and decoding of binary data by which these protocols are used. A Wayland compositor acts as a server which client programs can connect to and invoke remote method calls on 'objects' -- included in this is submission of rendered frames for the client's graphical representation. The compositor receives rendered frames from all active client programs, and will determine how and where to put these images on screen.

Wayland's Wire Format

Wayland's wire format specifies encodings for the following:

uint: an unsigned 32-bit integer.
int: a signed 32-bit integer.
fixed: a 24.8 bit fixed point number.
object: an unsigned 32-bit integer, representing the id of an object.
new_id: an unsigned 32-bit integer, representing the id of an object to be allocated.
string: a NULL-terminated string, prefixed with an unsigned 32-bit integer representing its length, padded to 4-byte (32-bit) alignment.
array: a series of arbitrary bytes, prefixed with an unsigned 32-bit integer representing its length, padded to 4-byte (32-bit) alignment.
fd: a file descriptor associated with the message being sent -- NOT sent in a regular message, but instead in ancillary data in a control message sent alongside the regular socket data. In standard wire messages, fd arguments can be considered 0-sized.
enum: a single value or bitset value representing some known enum type, encoded as a 32-bit unsigned integer.

All are to be encoded in host-native endianness, which in my case is little-endian.

Ok, so we know how individual bits of data are to be encoded, now how do we actually go about interacting with the compositor?

Well, the wire format also requires that we send a header preceding each message. The wire format header is defined to be encoded as two words. A 32-bit id of the object to which the message is relevant. Followed by 2 16-bit values packed into the second word; the high 16 bits being the length of the message, including the header, and the low 16 bits being the opcode of the message.

In my Zig client code, I have defined it as follows:

const WireHeader = packed struct (u64) {
  id: u32,
  op: u16,
  len: u16,
};

An annoying little detail about `new_id`

I should probably note, before proceeding any further, that new_id is not always to be encoded simply as a 32-bit id. When one is passing a new_id in a message that is defined as having an explicit interface for the to-be-allocated object, nothing more is needed. However in cases where there is no specified interface for the to-be-allocated object, we are required to send both the string name of the interface and the version of the interface we want to receive preceding the 32-bit new_id value.

Wayland Protocols

The primary feature of Wayland is its nature of being a collection of protocols. This means that in place of a set ABI for things such as a window, we instead get a description of operations that produce what we think of as a window. This description comes in the form of an XML specification. These XML documents are what is referred to when one mentions Wayland protocols.

The core Wayland protocol is defined in wayland.xml which can be found here. This protocol specifies most of what is needed to create a window on a modern Wayland compositor. I say 'most' as it does not cover everything. Inherent to the design of the Wayland is the idea that protocols are extensible and thus the userspace is not locked into any one implementation, should a better design come around in the future. In practice, what this means is that to have a functioning graphical program under Wayland, we need more than just wayland.xml. As of writing, the expected protocols you will need to create a window to put things on screen are, at minimum, the core Wayland protocol, the xdg-shell protocol, and the xdg-decoration protocol. Should you desire to use GPU-rendering of your program, you will also need the likes of the linux-dmabuf protocol (usage of which, I may cover in a follow-up blog post).

These extra protocols can be found here.

In short; realistically, to put a window on screen, you will need:

wayland.xml: core protocol.
xdg-shell.xml: standard shell protocol.
xdg-decoration-unstable-v1.xml: standard decoration protocol (but not supported by GNOME's compositor, mutter).

Turning XML Into Code

Having the XML specifications for these protocols is all well and good, but we can't exactly compile XML into a program. The standard approach, when using libwayland, is to use the 'wayland-scanner'. This is a program that will take a given Wayland protocol XML file, and generate a client or server header, or implementation code, depending on what you specify when you run the program.

As I sought to write my program without libwayland, and in Zig, not C, I opted to write my own code generation tool to produce a client code structure that I find reasonably nice to work with. I won't go into it in this blog, but for those who are interested, the tool can be found here.

Writing the Client Program

Now that we have an understanding of the communication format, and the basic protocols necessary to produce a functioning graphical client program, we're ready to properly begin.

Establishing a Connection

The first step is to establish a connection to the host compositor. This will typically be advertised through two environment variables.

XDG_RUNTIME_DIR
WAYLAND_DISPLAY

Combined, these will give the path to the socket, through which we connect to the compositor. The path will be something like /run/user/1000/wayland-1. (In the case that WAYLAND_DISPLAY is unset, we should try wayland-0 as a default.)

So we initialize a UNIX socket connection at this path.

In my own client code this looks like:

 // `i32_(x)` is a value-cast to i32 as a convenience function in my codebase
 const socket_fd = i32_(linux.socket(
   linux.AF.UNIX,
   linux.SOCK.STREAM | linux.SOCK.CLOEXEC,
   0,
 ));

 const socket_addr = socket_addr: {
   var addr: linux.sockaddr.un = .{
     .family = linux.AF.UNIX,
     .path = @splat(0),
   };

   // `socket_path` being: "$XDG_RUNTIME_DIR/$WAYLAND_DISPLAY"
   if (socket_path.len + 1 > addr.path.len) @panic("Socket Path Too Long");
   @memcpy(addr.path[0..socket_path.len], socket_path);
   break :socket_addr addr;
 };

 // `u32_(x)` is a value-cast to u32 as a convenience function in my codebase
 const connect_rc = linux.connect(
   socket_fd,  &socket_addr,  u32_(@sizeOf(@TypeOf(socket_addr))));

 // `transmute(T,x)` is bitwise reinterpret convenience function in my codebase
 if (transmute(isize, connect_rc) < 0) {
   @panic("Failed to connect to Wayland socket!");
 }

Once this connection is initialized successfully we can begin sending requests to, and receiving events from, the compositor.

Communicating With the Compositor

To actually communicate with the compositor, we need to read and write over the socket we've connected to. As we also need to send and receive file descriptors, which can only be sent as ancillary data via control messages, we cannot simply use the read and write system calls. Instead we use sendmsg and recvmsg, as these allow us to write iov buffers and send ancillary data via control messages.

To construct our messages, we'll write the standard data in the form of a message body (sent in iov buffers), and any file descriptors will be sent as the control data alongside the message.

As far as managing these incoming and outgoing messages in the client program; I opted to use four ringbuffers -- one each for standard data in/out, and one each for file descriptors in/out, as the file descriptors have to be encoded separately anyway. For ease of use, I only use power of two sized ringbuffers and 32-bit indices for read/write, allowing them to wrap around, and masking for actual reads and writes.

Now to actually send and receive data, let's look at the definitions for the sendmsg and recvmsg system calls. In C, they are defined as follows:

// function signatures
ssize_t sendmsg(int sockfd, const struct msghdr *msg, int flags);
ssize_t recvmsg(int sockfd, struct msghdr *msg, int flags);

// msghdr struct definition
struct msghdr {
 void         *msg_name;       /* Optional address */
 socklen_t     msg_namelen;    /* Size of address */
 struct iovec *msg_iov;        /* Scatter/gather array */
 size_t        msg_iovlen;     /* Number of elements in msg_iov */
 void         *msg_control;    /* Ancillary data, see cmsghdr below. */
 size_t        msg_controllen; /* Ancillary data buffer size */
 int           flags;          /* Flags on received message */
};

// cmsghdr
struct cmsghdr {
 size_t cmsg_len;   /* Data byte count, including header */
 int    cmsg_level; /* Originating protocol */
 int    cmsg_type;  /* Protocol-specific type */
};
/* cmsghdr is immediately followed by:
     unsigned char cmsg_data[]; */

So to send and receive messages, we'll need to construct a msghdr with an iovec for the message data, and another buffer of control message data. We do not need the msg_name field, so we'll set that to NULL and the associated length to zero.

To send ancillary data, we'll need to encode the file descriptors as control messages in a flat buffer. I'll return to this shortly.

Constructing the iovec buffers from a ringbuffer is rather simple. I'll illustrate it here with a rough code outline.

const head = write_idx & (buffer_len-1); // write position
const tail = read_idx & (buffer_len-1); // read position
// for outgoing data
if (write_idx != read_idx) {
  if (tail < head) {
    // pass ringbuffer[tail..head]
  } else if (head == 0) {
    // pass ringbuffer[tail..END]
  } else {
    // pass ringbuffer[tail..END] AND ringbuffer[0..head]
  }
}

// for incoming data
if (tail > head)  {
  // pass ringbuffer[head..tail]
} else if (tail == 0) {
  // pass ringbuffer[head..END]
} else {
  // pass ringbuffer[head..END] AND ringbuffer[0..tail]
}

Encoding the control messages is a little trickier considering, at least as of writing this, Zig's standard library provides no implementation for the typical C macros one would use to work with control messages.

To write a file descriptor as a control message, we'll pull a file descriptor from the outgoing fd ringbuffer, and place it into some separate buffer that will be passed as the msg_control field in the msghdr struct. The fd will need to be preceded by a cmsghdr of level SOL_SOCKET (available in Zig's stdlib as std.os.linux.SOL.SOCKET) and type SCM_RIGHTS (available in Zig's stdlib as std.os.linux.SCM.RIGHTS). The cmsg_len field will be the result of the C macro CMSG_LEN(len), which I have implemented in Zig as:

 pub inline fn cmsg_len(len: usize) usize {
   return cmsg_align(@sizeOf(cmsghdr)) + len;
 }
 pub inline fn cmsg_align(len: usize) usize {
   return (((len) + @sizeOf(usize) - 1) & ~@as(usize, (@sizeOf(usize) - 1)));
 }

I wrote this as a translation of the CMSG_LEN and CMSG_ALIGN align macros as found in musl libc's <sys/socket.h>.

#define CMSG_ALIGN(len) (((len) + sizeof (size_t) - 1) & (size_t) ~(sizeof (size_t) - 1))
#define CMSG_LEN(len)   (CMSG_ALIGN (sizeof (struct cmsghdr)) + (len))

Where the len passed to msg_len is the size (in bytes) of the data type to follow. In our case, this will only ever be a file descriptor, which is 4 bytes.

A Note On Object IDs In Wayland

One extra detail I feel is important to note before continuing, is how object ID allocation works under Wayland. Generally, the compositor requires that any new_id sent be no greater than the current greatest ID + 1. A client may re- use an ID of an object that the compositor has acknowledged as deleted, but ID reuse is not strictly required. An ID of 0 is to be treated as NULL.

Actually Sending & Receiving Messages

Now that we have our methods for sending and receiving messages sorted out, we can finally begin exchanging messages with the compositor. The first message to send will be wl_display.get_registry. Let's look at the XML for this request.

    <request name="get_registry">
      <description summary="get global registry object">
	This request creates a registry object that allows the client
	to list and bind the global objects available from the
	compositor.

	It should be noted that the server side resources consumed in
	response to a get_registry request can only be released when the
	client disconnects, not when the client side proxy is destroyed.
	Therefore, clients should invoke get_registry as infrequently as
	possible to avoid wasting memory.
      </description>
      <arg name="registry" type="new_id" interface="wl_registry"
	   summary="global registry object"/>
    </request>

We can see that the get_registry request takes a single argument of type new_id and with the associated interface of wl_registry. This request is the second request defined in the wl_display interface, so we assign it opcode 1. The wl_display object is guaranteed to be a global object of ID 1. We'll write the header followed by the data. Here the data is a single new_id with a known interface, so it will be a single uint of the ID to allot for the wl_registry, which will be 2. The header for this message will be:

ID=1 (wl_display)
OP=1 (get_registry is the second request available)
LEN=12 (size of WireHeader, 8 bytes, + size of uint, 4 bytes)

So the data to write will be:

const get_registry_msg = [_]u32{
 1, (@as(u32, 12)<<16) | 1, 2
};

After writing this message to the socket, the compositor will respond by writing back a bunch of wl_registry.global events, defined in the spec as:

    <event name="global">
      <description summary="announce global object">
	Notify the client of global objects.

	The event notifies the client that a global object with
	the given name is now available, and it implements the
	given version of the given interface.
      </description>
      <arg name="name" type="uint" summary="numeric name of the global object"/>
      <arg name="interface" type="string" summary="interface implemented by the object"/>
      <arg name="version" type="uint" summary="interface version"/>
    </event>

This series of events will tell us what global interfaces are supported by the compositor. These events will come in individually, each with a wire header of:

ID=2 (the wl_registry object's ID)
OP=0 (this is the first event defined in the wl_registry interface)
LEN=N (size of WireHeader + size of the data to follow)

N here will vary between events, as it is data-dependent. The length will be: 8 (header) + 4(name is uint) + (4+strlen(interface)+padding) + 4(version is uint).

In my case running Hyprland, I get the following:

wl_registry#2.global: name=1, interface="wl_seat", version=9
wl_registry#2.global: name=2, interface="wl_data_device_manager", version=3
wl_registry#2.global: name=3, interface="wl_compositor", version=6
wl_registry#2.global: name=4, interface="wl_subcompositor", version=1
wl_registry#2.global: name=5, interface="wl_shm", version=2
/* -- Truncated -- */
wl_registry#2.global: name=61, interface="wp_color_manager_v1", version=1
wl_registry#2.global: name=62, interface="wp_drm_lease_device_v1", version=1
wl_registry#2.global: name=63, interface="wp_linux_drm_syncobj_manager_v1", version=1
wl_registry#2.global: name=64, interface="wl_drm", version=2
wl_registry#2.global: name=65, interface="zwp_linux_dmabuf_v1", version=5
wl_registry#2.global: name=66, interface="wl_output", version=4

For a minimal graphical client, we only care about a small number of these. We specifically care about:

wl_seat: This will tell us the compositor's name, and is how we access input devices, such as the pointer and keyboard.
wl_compositor: The global compositor object, which we need to get a wl_surface object.
wl_shm: The interface we will use to allocate shared memory that the compositor will be able to read and load our framebuffer image from.
xdg_wm_base: The base of xdg-shell, through which we can create an xdg_surface, which in turn will let us get an xdg_toplevel -- the "window".
zxdg_decoration_manager_v1: If you're on the likes of KDE, you cannot complete creation of a window unless you also assign it a decoration object before first commit.

Next up, we need to bind the interfaces we want. We do this with the wl_registry.bind request, defined in the spec as:

    <request name="bind">
      <description summary="bind an object to the display">
	Binds a new, client-created object to the server using the
	specified name as the identifier.
      </description>
      <arg name="name" type="uint" summary="unique numeric name of the object"/>
      <arg name="id" type="new_id" summary="bounded object"/>
    </request>

You may notice that, unlike the wl_display.get_registry request, the id parameter of type new_id, does not have an associated interface. This means we have to prefix the new_id with the string name of the interface, and a uint of the version at which we want to bind the interface.

So for each bind request, the data will be:

name: the uint name value with which the interface was advertised.
interface: the string name of the interface, encoded as a 32-bit integer denoting the string's length followed by the NUL-terminated string, plus however many bytes of padding are needed to get to 32-bit alignment.
version: the uint version we wish to bind the interface at. To keep it simple, one can just bind with the advertised version.
id: the new_id to be assigned to the bound object.

And the associated header will be:

ID=2 (wl_registry)
OP=0 (opcode of the bind request)
LEN= (8 + size of data)

We do not have to flush each of these messages to the socket individually. All of these can be written to the outgoing buffer, and then written out as a single sendmsg call.

With the basic global objects bound, we can finally go about creating our window.

Creating a Window

The Wayland protocols do not lay out a single 'window' object description anywhere. What we consider a 'window' is instead a surface with the role of toplevel as defined by the desktop shell. This is where the xdg-shell protocol comes into play. To create our window we will need the following:

wl_surface: created by invoking the create_surface request on the global wl_compositor object.
xdg_surface: created by invoking the get_xdg_surface request on the global xdg_wm_base object, passing in the wl_surface as a parameter.
xdg_toplevel: created by invoking the get_toplevel request on the created xdg_surface object.
zxdg_decoration_v1: created by invoking the get_toplevel_decoration request on the global zxdg_decoration_manager_v1 object, passing in the xdg_toplevel as a parameter.

I'll note again: GNOME's compositor, mutter, does NOT support the xdg-decoration protocol, so the xdg_decoration object should be omitted when running on GNOME.

Once we have created each of these objects, we can invoke the wl_surface.commit and zxdg_decoration_v1.set_mode requests to finalize creation of the window and set the decoration mode to either server or client.

It is a protocol error to attempt to attach anything to the surface before the compositor sends an xdg_surface.configure event, to which the client program should respond by sending an xdg_surface.ack_configure request containing the uint serial value received in the configure event.

We now have a window... but nothing to draw to it.

Drawing to the Window

To actually put something on screen, as is typically the point of a graphical program, we need to create an image, create a wl_buffer object that is associated with this image, and attach the wl_buffer object to our wl_surface.

The simplest option is to map some shared memory, draw to it from the CPU, and present a software rendered frame to the compositor. To do this, we will need to use the wl_shm global object we bound earlier. We will use it to create a wl_shm_pool object, and from that create a wl_buffer, which we will attach to the wl_surface.

Creating a wl_shm_pool first requires having a shmfile available. We can use the memfd_create system call to get the fd, call ftruncate on the fd to size it appropriately for the output window, and mmap it in to be able to write the pixels of the image.

Next up is invoking wl_shm.create_pool with the fd, and the filesize of the shmfile. As this involves sending a file descriptor, the next outgoing message will have to contain a control message buffer to deliver this fd. The standard message data will simply be the size of the fd (prefixed with a WireHeader), and accompanying it will be a control message buffer containing the cmsghdr followed by the 4 bytes of the file descriptor.

Next we create the wl_buffer object. This will use the wl_shm_pool.create_buffer request, which is defined as follows:

    <request name="create_buffer">
      <description summary="create a buffer from the pool">
	Create a wl_buffer object from the pool.

	The buffer is created offset bytes into the pool and has
	width and height as specified.  The stride argument specifies
	the number of bytes from the beginning of one row to the beginning
	of the next.  The format is the pixel format of the buffer and
	must be one of those advertised through the wl_shm.format event.

	A buffer will keep a reference to the pool it was created from
	so it is valid to destroy the pool immediately after creating
	a buffer from it.
      </description>
      <arg name="id" type="new_id" interface="wl_buffer" summary="buffer to create"/>
      <arg name="offset" type="int" summary="buffer byte offset within the pool"/>
      <arg name="width" type="int" summary="buffer width, in pixels"/>
      <arg name="height" type="int" summary="buffer height, in pixels"/>
      <arg name="stride" type="int" summary="number of bytes from the beginning of one row to the beginning of the next row"/>
      <arg name="format" type="uint" enum="wl_shm.format" summary="buffer pixel format"/>
    </request>

Here we can pass in the new_id of the wl_buffer, offset, width, height, stride, and format of the image this wl_buffer will represent. For this the data may look something like:

const width = 960; const height = 540;
const format: wl_shm.format = .xrgb8888;
const stride = 4 * width;

const wl_buffer_id = get_next_id();
const msg_len: u32 = 8 + (@sizeOf(i32) * 4 + @sizeOf(u32) * 2);
// create_buffer is opcode 0
const msg_bytes = [_]u32 {
 wl_shm_pool_id, (msg_len<<16), wl_buffer_id,
 0, width, height, stride, u32_(format),
};

Putting a Window on Screen

Now, at long last, we are ready to present a frame to the window surface. To attach our frame to the window, we need to:

damage the surface to make sure the compositor knows to update what is being displayed -- we'll do this with wl_surface.damage_buffer(x_offset, y_offset, width, height).
attach the frame's buffer to the surface -- this, we'll do with wl_surface.attach(wl_buffer, x_offset, y_offset).
commit the changes to the surface -- which is a simple wl_surface.commit().

The Pipeline

In short, here are all the steps to complete, in-order, to put a software-rendered image on-screen under Wayland.

Locate and connect to the Wayland socket.
Bind desired global interfaces.
Create a wl_surface, an xdg_surface, an xdg_toplevel, and, if not on GNOME, an xdg_decoration_v1.
Commit the wl_surface.
Wait for the xdg_surface.configure event, and ACK the configure using the serial value from the configure event.
Open and map in a shmfile and create the wl_shm_pool from the shmfile.
Create a wl_buffer from the wl_shm_pool.
Damage the wl_surface, attach the wl_buffer, and commit the wl_surface once more.

If we want to instead render our images using the GPU, we will not use the wl_shm->wl_shm_pool->wl_buffer path. We would instead use the linux-dmabuf protocol. I will cover this in a, hopefully shorter, future post.

I have also published a public sample repository showing a basic client that will open a blank window, wait for a close event, and exit once it's received. This basic client does not depend on libc at all, and can be built as a static executable. This repository can be found here.