Gossip Protocol

Overview

The Gossip protocol in Agave implements a distributed, eventually-consistent communication layer that allows validators to share cluster-wide information. It uses a Cluster Replicated Data Store (CRDS) with push and pull mechanisms to propagate data efficiently across thousands of nodes without centralized coordination.

Architecture

Core Components

ClusterInfo (cluster_info.rs:150-200): The main gossip coordinator that manages the local CRDS, handles incoming/outgoing messages, and maintains network topology. CRDS (crds.rs:66-85): The replicated data store mapping CrdsValueLabel -> CrdsValue, supporting partial updates and concurrent access. CrdsGossip: Coordinates push and pull gossip strategies. CrdsGossipPull (crds_gossip_pull.rs:1-150): Implements anti-entropy pull requests using bloom filters. CrdsGossipPush: Implements eager push dissemination to a subset of peers. ContactInfo (contact_info.rs): Contains node identity and network endpoints (gossip, TVU, TPU, etc.).

Network Topology

Gossip organizes the network in layers:

Layer 0: Leader nodes
Layer 1: As many nodes as possible (efficient fan-out)
Layer 2: Remaining nodes (can fit 2^20 nodes if layer 1 has 2^10)

This layered approach enables efficient message propagation even in large clusters.

CRDS (Cluster Replicated Data Store)

Data Model

From crds.rs:66-85, the CRDS stores versioned values:

pub struct Crds {
    table: IndexMap<CrdsValueLabel, VersionedCrdsValue>,
    cursor: Cursor,              // Next insert ordinal
    shards: CrdsShards,          // Sharded for parallel access
    nodes: IndexSet<usize>,      // Indices of ContactInfo entries
    votes: BTreeMap<u64, usize>, // Vote entries by insert order
    epoch_slots: BTreeMap<u64, usize>,
    duplicate_shreds: BTreeMap<u64, usize>,
    records: HashMap<Pubkey, IndexSet<usize>>, // All data per node
    entries: BTreeMap<u64, usize>,
    purged: VecDeque<(Hash, u64)>, // Recently purged values
}

Key characteristics:

Partial Updates: Each CrdsValueLabel maps to one CrdsValue
Non-Atomic: Full record updates are not atomic
Versioned: Each value has wallclock timestamp and local metadata
Sharded: CRDS_SHARDS_BITS = 12 (4096 shards) for parallelism

CrdsValue Types

CRDS stores various cluster information:

ContactInfo: Node identity and network endpoints
Vote: Validator vote state
LowestSlot: Oldest slot a node has available
SnapshotHashes: Available snapshot information
EpochSlots: Slots the node has observed
DuplicateShred: Evidence of slashing conditions

Merge Strategy

From crds.rs:187-200, values are merged using the overrides function:

fn overrides(value: &CrdsValue, other: &VersionedCrdsValue) -> bool {
    // ContactInfo uses special override logic
    if let CrdsData::ContactInfo(value) = value.data()
        && let CrdsData::ContactInfo(other) = other.value.data()
        && let Some(out) = value.overrides(other)
    {
        return out;
    }
    match value.wallclock().cmp(&other.value.wallclock()) {
        Ordering::Less => false,
        Ordering::Greater => true,
        // On tie, compare hashes
    }
}

Newer wallclock times win; ties broken by hash comparison.

Versioned CRDS Values

From crds.rs:121-132:

pub struct VersionedCrdsValue {
    ordinal: u64,                // Insert order
    pub value: CrdsValue,        // Actual data
    pub(crate) local_timestamp: u64,
    num_push_recv: Option<u8>,   // Tracking push duplicates
}

The num_push_recv field tracks message origin:

None: Local message or pull request
Some(0): Received via pull response
Some(k) where k > 0: Received via push with k-1 duplicates

Gossip Pull Protocol

Anti-Entropy Mechanism

The pull protocol implements anti-entropy to ensure eventual consistency:

Construct Bloom Filter: Create bloom filter of local CRDS data
Send Pull Request: Ask random peer for data not in bloom filter
Receive Response: Peer sends values not matching bloom filter
Merge Data: Integrate new values into local CRDS

From crds_gossip_pull.rs:1-13:

Basic strategy:
1. Construct a bloom filter of the local data set
2. Randomly ask a node on the network for data not in the bloom filter

Bloom filters have a false positive rate. Each request uses a different
bloom filter with random hash functions for different false positive
distributions.

Bloom Filter Configuration

From crds_gossip_pull.rs:53-57:

const CRDS_GOSSIP_PULL_CRDS_TIMEOUT_MS: u64 = 15000;
const FAILED_INSERTS_RETENTION_MS: u64 = 20_000;
const FALSE_RATE: f64 = 0.1;  // 10% false positive rate
const KEYS: f64 = 8.0;        // 8 hash functions

CrdsFilter

The CrdsFilter structure (crds_gossip_pull.rs:60-83):

pub struct CrdsFilter {
    pub filter: Bloom<Hash>,
    mask: u64,        // Mask for filtering subset of data
    mask_bits: u32,   // Number of mask bits
}

The mask enables partial CRDS synchronization:

Higher mask_bits = smaller subset of data
Allows gradual sync of large datasets
Computed from ratio of items to max capacity

Pull Request Flow

From crds_gossip_pull.rs:67-73:

pub struct PullRequest {
    pub pubkey: Pubkey,      // Remote node's identity
    pub addr: SocketAddr,    // Request source address
    pub wallclock: u64,      // Remote node's wallclock
    pub filter: CrdsFilter,  // Bloom filter of remote's data
}

Pull interval: Every 5 gossip rounds (PULL_REQUEST_PERIOD = 5)

Gossip Push Protocol

Eager Push Dissemination

The push protocol eagerly sends new data to peers:

New Local Value: Node creates or receives new CRDS value
Select Peers: Choose subset of cluster based on stake weight
Push Message: Send value to selected peers
Prune Messages: Peers can request to be removed from push path

Push messages have bounded size (PUSH_MESSAGE_MAX_PAYLOAD_SIZE).

Push Deduplication

The CRDS tracks push duplicates in VersionedCrdsValue.num_push_recv:

First push: num_push_recv = Some(1)
Subsequent pushes: Increment counter
Metrics track redundant pushes for monitoring

Stake-Weighted Selection

Peers are selected for push based on stake weight:

Higher stake nodes more likely to be selected
Ensures critical validators receive updates quickly
Uses WeightedShuffle for stake-proportional randomization

Cluster Discovery

Entrypoints

From cluster_info.rs:156:

pub struct ClusterInfo {
    pub gossip: CrdsGossip,
    keypair: ArcSwap<Keypair>,
    entrypoints: RwLock<Vec<ContactInfo>>,  // Bootstrap nodes
    my_contact_info: RwLock<ContactInfo>,
    // ...
}

Entrypoints are well-known bootstrap nodes:

Used for initial cluster discovery
Node sends pull requests to entrypoints
Learns about other cluster members from responses

Contact Info Propagation

Each node publishes its ContactInfo containing:

Pubkey: Node identity
Gossip Socket: Where to send gossip messages
TVU Socket: Transaction Verification Unit endpoint
TVU Forwards: For forwarding shreds
TPU Socket: Transaction Processing Unit endpoint
TPU Forwards: For forwarding transactions
RPC Socket: JSON RPC endpoint (if enabled)
Wallclock: Timestamp for versioning

ContactInfo is updated when endpoints change or node restarts.

Gossip Timing

Core Intervals

From cluster_info.rs:97-102:

const GOSSIP_SLEEP_MILLIS: u64 = 100;  // 100ms between gossip rounds
const PULL_REQUEST_PERIOD: usize = 5;   // Pull every 5 rounds (500ms)

Gossip loop executes every 100ms:

Process incoming messages
Send push messages (every round)
Send pull requests (every 5 rounds)
Prune inactive peers
Update metrics

Timeouts and Cleanup

CRDS Timeout: 15 seconds (CRDS_GOSSIP_PULL_CRDS_TIMEOUT_MS)
Ping Cache TTL: 1280 seconds (GOSSIP_PING_CACHE_TTL)
Failed Inserts: 20 seconds (FAILED_INSERTS_RETENTION_MS)

Network Layer

Socket Management

ClusterInfo manages multiple UDP sockets:

Gossip Socket: Primary gossip communication
TVU Sockets: Multiple sockets for receiving shreds
- Minimum: 1 (MINIMUM_NUM_TVU_RECEIVE_SOCKETS)
- Default: 1 (DEFAULT_NUM_TVU_RECEIVE_SOCKETS)
TVU Retransmit: 12 sockets default (DEFAULT_NUM_TVU_RETRANSMIT_SOCKETS)

Channel Capacity

From cluster_info.rs:106-121:

const CHANNEL_CONSUME_CAPACITY: usize = 1024;  // Batches per iteration
const GOSSIP_CHANNEL_CAPACITY: usize = 4096;   // 262k packets (64/batch)

High capacity handles burst traffic without dropping packets.

Ping/Pong Protocol

Nodes exchange pings to verify liveness:

Ping Cache: 126,976 capacity (GOSSIP_PING_CACHE_CAPACITY)
Rate Limiting: 1280/64 = 20 second delay between pings
Token-based: Pings contain random tokens, verified in pongs

Data Propagation

Push Fanout

Push messages are sent to a calculated subset:

Prune Data: Max 32 nodes per prune message (MAX_PRUNE_DATA_NODES)
Payload Limits:
- Push: PUSH_MESSAGE_MAX_PAYLOAD_SIZE
- Pull Response: PULL_RESPONSE_MAX_PAYLOAD_SIZE

Pull Response Sizing

Pull responses are sized to fit in packets:

Minimum serialized size tracked (PULL_RESPONSE_MIN_SERIALIZED_SIZE)
Responses split across multiple messages if needed
Ensures efficient network utilization

Duplicate Shred Handling

Duplicate shreds (slashing evidence) have special handling:

Tracked separately in CRDS (duplicate_shreds BTreeMap)
Maximum payload size: DUPLICATE_SHRED_MAX_PAYLOAD_SIZE
Prioritized propagation for security

Metrics and Monitoring

GossipStats

From cluster_info.rs:160, comprehensive metrics:

pub(crate) stats: GossipStats,

Tracks:

Messages sent/received
Push/pull counts
Duplicate detection
Timing information
Signature sampling

CRDS Statistics

From crds.rs:104-118:

pub(crate) struct CrdsStats {
    pub(crate) pull: CrdsDataStats,
    pub(crate) push: CrdsDataStats,
    pub(crate) num_redundant_pull_responses: u64,
    pub(crate) num_duplicate_push_messages: u64,
}

Per-type counters track message counts and failures.

Signature Sampling

From crds.rs:60-64, rare signatures are logged:

const SIGNATURE_SAMPLE_LEADING_ZEROS: u32 = 19;
// Target: 1 signature reported per minute
// log2(680k messages/min) ≈ 19.375

Helps detect unusual gossip activity patterns.

Configuration

Capacity Limits

From cluster_info.rs:128:

const CRDS_UNIQUE_PUBKEY_CAPACITY: usize = 8192;

Limits unique pubkeys in CRDS to bound memory usage.

Contact Info Management

const DEFAULT_CONTACT_DEBUG_INTERVAL_MILLIS: u64 = 10_000;
const DEFAULT_CONTACT_SAVE_INTERVAL_MILLIS: u64 = 60_000;

Debug: Log contact info every 10 seconds
Save: Persist contact info every 60 seconds

Key Files

cluster_info.rs:1-200 - Main ClusterInfo coordination
crds.rs:1-200 - CRDS data store implementation
crds_gossip_pull.rs:1-150 - Pull protocol with bloom filters
crds_gossip_push.rs - Push protocol implementation
contact_info.rs - Node contact information
crds_value.rs - CRDS value types
crds_data.rs - CRDS data variants

ClusterInfoVoteListener: Monitors votes from gossip
TVU: Uses gossip for shred reception coordination
RepairService: Uses cluster info for repair requests
ServeRepair: Responds to repair requests using cluster topology

RPC API

Plugin Interface

Core Components

​Overview

​Architecture

​Core Components

​Network Topology

​CRDS (Cluster Replicated Data Store)

​Data Model

​CrdsValue Types

​Merge Strategy

​Versioned CRDS Values

​Gossip Pull Protocol

​Anti-Entropy Mechanism

​Bloom Filter Configuration

​CrdsFilter

​Pull Request Flow

​Gossip Push Protocol

​Eager Push Dissemination

​Push Deduplication

​Stake-Weighted Selection

​Cluster Discovery

​Entrypoints

​Contact Info Propagation

​Gossip Timing

​Core Intervals

​Timeouts and Cleanup

​Network Layer

​Socket Management

​Channel Capacity

​Ping/Pong Protocol

​Data Propagation

​Push Fanout

​Pull Response Sizing

​Duplicate Shred Handling

​Metrics and Monitoring

​GossipStats

​CRDS Statistics

​Signature Sampling

​Configuration

​Capacity Limits

​Contact Info Management

​Key Files

​Related Components

Overview

Architecture

Core Components

Network Topology

CRDS (Cluster Replicated Data Store)

Data Model

CrdsValue Types

Merge Strategy

Versioned CRDS Values

Gossip Pull Protocol

Anti-Entropy Mechanism

Bloom Filter Configuration

CrdsFilter

Pull Request Flow

Gossip Push Protocol

Eager Push Dissemination

Push Deduplication

Stake-Weighted Selection

Cluster Discovery

Entrypoints

Contact Info Propagation

Gossip Timing

Core Intervals

Timeouts and Cleanup

Network Layer

Socket Management

Channel Capacity

Ping/Pong Protocol

Data Propagation

Push Fanout

Pull Response Sizing

Duplicate Shred Handling

Metrics and Monitoring

GossipStats

CRDS Statistics

Signature Sampling

Configuration

Capacity Limits

Contact Info Management

Key Files

Related Components