Debugging Type-Based Alias Analysis optimizations in BPF

2026-02-28

During a seemingly routine update to an existing BPF program, my colleague and I noticed some of our internal integration tests began to fail due to packet drops. Maybe a transient network issue? Rerunning didn't help. Try sending different types of packets? Same issue.

We need to repro this ourselves. After manually sending some ICMP packets inside a VM, tcpdump pointed us right to the issue: bad L3 checksums.

16:32:50.160124 ext0  In  IP (tos 0x88, ttl 62, id 12290, offset 0, flags [DF], proto GRE (47), length 84, bad cksum 5944 (->58bc)!)

The update itself was nothing of significance, merely updating the program to use BTF generated vmlinux.h definitions instead of depending on specific kernel headers:

$ diff old.bpf.c new.bpf.c
1,5c1
< #include "linux/bpf.h"
< #include "linux/in.h"
< #include "linux/ip.h"
< #include "linux/ipv6.h"
< #include "linux/udp.h"
---
> #include "vmlinux.h"

Ok, let's inspect the code to see if we're modifying the IP header anywhere without recomputing the checksum. The only occurrence is an update to the Type of Service (ToS) field, but we do recompute the checksum immediately after:

/*
 * Get the current value (from) before writing the new value (to) in order to
 * correctly recompute the L3 checksum of the ip header. We need to use 16-bit
 * values in order to include the adjacent byte together with the ToS as the
 * checksum is computed using two bytes.
 */
uint16_t from = *((uint16_t *)iph);
iph->tos = 0x88;
uint16_t to = *((uint16_t *)iph);
int ret = bpf_l3_csum_replace(skb, (ETH_HLEN + offsetof(struct iphdr, check)),
                              from, to, 2);
if (ret < 0) {
  return TC_ACT_PIPE;
}

I wrote this code recently and a quick read of the preceding comment reminded me why we read the from and to vars in this manor. The IPv4 checksum algorithm mathematically operates on 16-bit words (2 bytes), not single bytes. Therefore, any incremental update to the checksum must calculate the difference (delta) between the old 16-bit word and the new 16-bit word. We know the ToS value is stored in the second byte of the header, so we use *((uint16_t *)iph) to read bytes 0 and 1.

We know the old program didn't exhibit this problem, so we need to see if there's difference in the byte code. Firstly we'll compile the new BPF program so we can disassemble the object file in order to inspect the underlying assembly code:

$ clang -target bpf -c -O2 -g -nostdinc prog.bpf.c -o prog.o
$ llvm-objdump --disassemble --source prog.o > prog.asm

If we search for our update to the ToS field, something looks strange. We're not recalculating the to var after writing the new value to the IP header. Instead, it's assigning r4 (to) = r3 (from) and not rereading the value:

;   uint16_t from = *((uint16_t *)iph);
      43:	69 13 00 00 00 00 00 00	r3 = *(u16 *)(r1 + 0x0)
      44:	b7 02 00 00 88 00 00 00	r2 = 0x88
;   iph->tos = 0x88;
      45:	73 21 01 00 00 00 00 00	*(u8 *)(r1 + 0x1) = r2
      46:	b7 02 00 00 0a 00 00 00	r2 = 0xa
;   int ret = bpf_l3_csum_replace(skb, (ETH_HLEN + offsetof(struct iphdr, check)),
      47:	07 02 00 00 0e 00 00 00	r2 += 0xe
      48:	bf 71 00 00 00 00 00 00	r1 = r7
      49:	bf 34 00 00 00 00 00 00	r4 = r3
      50:	b7 05 00 00 02 00 00 00	r5 = 0x2
      51:	85 00 00 00 0a 00 00 00	call 0xa
      52:	18 01 00 00 00 00 00 80 00 00 00 00 00 00 00 00 r1 = 0x80000000 ll

Ok, it's trying to optimize away the second memory read of the ToS value because it thinks it hasn't changed, which suggests this is likely cause by Clang's Strict Aliasing Rules. Because uint16_t * and struct iphdr * are completely different types, Clang assumes that modifying iph->tos does absolutely nothing to the memory referenced by ((uint16_t *)iph). Since Clang thinks the underlying 16-bit integer hasn't been altered by the struct assignment (which it has), it aggressively optimizes away the second memory read and simply reuses the cached value of the from.

We can verify this by passing the -fno-strict-aliasing flag to Clang when compiling in order to disable such optimizations. This works, and we are able to see the L3 checksum is now correctly recomputed. But we don't want to turn of all optimizations, as some of them are in fact useful!

Instead, we can use READ_ONCE (which is a nice wrapper around the barrier macro) to tell the compiler it must reread the memory:

uint16_t from = READ_ONCE(*((uint16_t *)iph));
iph->tos = 0x88;
uint16_t to = READ_ONCE(*((uint16_t *)iph));

And as we can see in the assembly code, we no longer reuse the cached value of r3 (from) for r4 (to):

;   uint16_t from = *((uint16_t *)iph);
      43:	69 13 00 00 00 00 00 00	r3 = *(u16 *)(r1 + 0x0)
      44:	b7 02 00 00 88 00 00 00	r2 = 0x88
;   iph->tos = 0x88;
      45:	73 21 01 00 00 00 00 00	*(u8 *)(r1 + 0x1) = r2
;   uint16_t to = *((uint16_t *)iph);
      46:	69 14 00 00 00 00 00 00	r4 = *(u16 *)(r1 + 0x0)
      47:	b7 02 00 00 0a 00 00 00	r2 = 0xa
;   int ret = bpf_l3_csum_replace(skb, (ETH_HLEN + offsetof(struct iphdr, check)),
      48:	07 02 00 00 0e 00 00 00	r2 += 0xe
      49:	bf 71 00 00 00 00 00 00	r1 = r7
      50:	b7 05 00 00 02 00 00 00	r5 = 0x2
      51:	85 00 00 00 0a 00 00 00	call 0xa
      52:	18 01 00 00 00 00 00 80 00 00 00 00 00 00 00 00 r1 = 0x80000000 ll

At this point we have a fix, but still need to understand why this happened. My hunch points me toward LLVM's Alias Analysis. From the docs:

Alias Analysis (aka Pointer Analysis) is a class of techniques which attempt to determine whether or not two pointers ever can point to the same object in memory.

Because this relates to us updating the iphdr struct, we need to see the difference between the original definition and the CO-RE generated version. There's no possible way it has changed? This struct is very much set in stone.

Immediately after opening vmlinux.h I see the following:

#ifndef BPF_NO_PRESERVE_ACCESS_INDEX
#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)
#endif

As per the docs, this attribute is specific to BPF:

This attribute may be attached to a struct or union declaration, where if -g is specified, it enables preserving struct or union member access debuginfo indices of this struct or union

This must be changing the way Clang generates LLVM IR.

Without this attribute, we read 2 bytes at offset 0 (bytes 0 and 1) in order to get the from value. We then write the new ToS value at byte offset 1. Clang is able to calculate that these read/write offsets overlap, so it knows the value has changed when getting the to value; It can't optimize this away.

Using CO-RE, the call to the GEP instruction used when reading iph->tos is masked by the LLVM intrinsic function llvm.bpf.preserve.struct.access.index, leading to alias analysis no longer being able to see the offset overlap.

Upon reflection, I think the original code is technically a strict aliasing violation due to the effective and access types (struct iphdr and uint16_t * respectively) not adhering to the legally allowed criteria for aliasing. But reading the C Standard Library spec is a task for another day.

Programming Thoughts & Paradigms

Debugging Type-Based Alias Analysis optimizations in BPF

Resources