A Deep Dive into Aes67 Packet Structure and Data Flow Mechanics

March 16, 2026

By: Audio Scene

The AES67 standard is a crucial protocol in professional audio-over-IP networks, enabling interoperability between different manufacturers’ equipment. Understanding its packet structure and data flow mechanics is essential for audio engineers and network administrators.

Overview of AES67 Packet Structure

At its core, AES67 packets are based on the RTP (Real-time Transport Protocol) over UDP/IP. Each packet contains headers and payload data that facilitate synchronized audio streaming across networks.

Packet Header Components

  • Ethernet Header: Handles physical addressing and framing.
  • IP Header: Manages source and destination IP addresses.
  • UDP Header: Facilitates connectionless data transfer.
  • RTP Header: Contains sequence numbers, timestamps, and synchronization information.

Payload Data

The payload carries the audio samples encoded in a format compatible with AES67, such as Linear PCM or other supported codecs. The payload size varies depending on network conditions and audio quality requirements.

Data Flow Mechanics in AES67

The data flow in AES67 networks relies on precise timing and synchronization. The protocol uses the IEEE 1588 Precision Time Protocol (PTP) to ensure all devices are synchronized to a common clock, minimizing latency and jitter.

Stream Initialization

Devices discover each other using SAP (Session Announcement Protocol) or SAP-like mechanisms. Once a session is established, audio streams are synchronized across the network.

Packet Transmission and Reception

Audio data packets are transmitted continuously, with sequence numbers and timestamps embedded in RTP headers to maintain order and timing. Receivers use this information to reconstruct the audio stream accurately.

Conclusion

Understanding the AES67 packet structure and data flow mechanics is vital for deploying reliable audio-over-IP systems. Proper synchronization, packet management, and network configuration ensure high-quality, low-latency audio transmission across complex network environments.