Distributed caching systems dramatically improve Jekyll site performance by serving content from edge locations worldwide. By combining Ruby's processing power with Cloudflare Workers' edge execution, you can build sophisticated caching systems that intelligently manage content distribution, invalidation, and synchronization. This guide explores advanced distributed caching architectures that leverage Ruby for cache management logic and Cloudflare Workers for edge delivery, creating a performant global caching layer for static sites.

In This Guide

Distributed Cache Architecture and Design Patterns

A distributed caching architecture for Jekyll involves multiple cache layers and synchronization mechanisms to ensure fast, consistent content delivery worldwide. The system must handle cache population, invalidation, and consistency across edge locations.

The architecture employs a hierarchical cache structure with origin cache (Ruby-managed), edge cache (Cloudflare Workers), and client cache (browser). Cache keys are derived from content hashes for easy invalidation. The system uses event-driven synchronization to propagate cache updates across regions while maintaining eventual consistency. Ruby controllers manage cache logic while Cloudflare Workers handle edge delivery with sub-millisecond response times.


# Distributed Cache Architecture:
# 1. Origin Layer (Ruby):
#    - Content generation and processing
#    - Cache key generation and management
#    - Invalidation triggers and queue
#
# 2. Edge Layer (Cloudflare Workers):
#    - Global cache storage (KV + R2)
#    - Request routing and cache serving
#    - Stale-while-revalidate patterns
#
# 3. Synchronization Layer:
#    - WebSocket connections for real-time updates
#    - Cache replication across regions
#    - Conflict resolution mechanisms
#
# 4. Monitoring Layer:
#    - Cache hit/miss analytics
#    - Performance metrics collection
#    - Automated optimization suggestions

# Cache Key Structure:
# - Content: content_{md5_hash}
# - Page: page_{path}_{locale}_{hash}
# - Fragment: fragment_{type}_{id}_{hash}
# - Asset: asset_{path}_{version}

Ruby Cache Manager with Intelligent Invalidation

The Ruby cache manager orchestrates cache operations, implements sophisticated invalidation strategies, and maintains cache consistency. It integrates with Jekyll's build process to optimize cache population.


# lib/distributed_cache/manager.rb
module DistributedCache
  class Manager
    def initialize(config)
      @config = config
      @stores = {}
      @invalidation_queue = InvalidationQueue.new
      @metrics = MetricsCollector.new
    end
    
    def store(key, value, options = {})
      # Determine storage tier based on options
      store = select_store(options[:tier])
      
      # Generate cache metadata
      metadata = {
        stored_at: Time.now.utc,
        expires_at: expiration_time(options[:ttl]),
        version: options[:version] || 'v1',
        tags: options[:tags] || []
      }
      
      # Store with metadata
      store.write(key, value, metadata)
      
      # Track in metrics
      @metrics.record_store(key, value.bytesize)
      
      value
    end
    
    def fetch(key, options = {}, &generator)
      # Try to fetch from cache
      cached = fetch_from_cache(key, options)
      
      if cached
        @metrics.record_hit(key)
        return cached
      end
      
      # Cache miss - generate and store
      @metrics.record_miss(key)
      value = generator.call
      
      # Store asynchronously to not block response
      Thread.new do
        store(key, value, options)
      end
      
      value
    end
    
    def invalidate(tags: nil, keys: nil, pattern: nil)
      if tags
        invalidate_by_tags(tags)
      elsif keys
        invalidate_by_keys(keys)
      elsif pattern
        invalidate_by_pattern(pattern)
      end
    end
    
    def warm_cache(site_content)
      # Pre-warm cache with site content
      warm_pages_cache(site_content.pages)
      warm_assets_cache(site_content.assets)
      warm_data_cache(site_content.data)
    end
    
    private
    
    def select_store(tier)
      @stores[tier] ||= case tier
                       when :memory
                         MemoryStore.new(@config.memory_limit)
                       when :disk
                         DiskStore.new(@config.disk_path)
                       when :redis
                         RedisStore.new(@config.redis_url)
                       else
                         @stores[:memory]
                       end
    end
    
    def invalidate_by_tags(tags)
      tags.each do |tag|
        # Find all keys with this tag
        keys = find_keys_by_tag(tag)
        
        # Add to invalidation queue
        @invalidation_queue.add(keys)
        
        # Propagate to edge caches
        propagate_invalidation(keys) if @config.edge_invalidation
      end
    end
    
    def propagate_invalidation(keys)
      # Use Cloudflare API to purge cache
      client = Cloudflare::Client.new(@config.cloudflare_token)
      client.purge_cache(keys.map { |k| key_to_url(k) })
    end
  end
  
  # Intelligent invalidation queue
  class InvalidationQueue
    def initialize
      @queue = []
      @processing = false
    end
    
    def add(keys, priority: :normal)
      @queue << {
        keys: Array(keys),
        priority: priority,
        added_at: Time.now.utc
      }
      
      # Sort by priority
      @queue.sort_by! { |item| [priority_score(item[:priority]), item[:added_at]] }
      
      # Start processing if not already running
      process_queue unless @processing
    end
    
    private
    
    def priority_score(priority)
      case priority
      when :critical then 0
      when :high then 1
      when :normal then 2
      when :low then 3
      else 2
      end
    end
    
    def process_queue
      @processing = true
      
      Thread.new do
        while item = @queue.shift
          process_invalidation(item[:keys])
          sleep(0.1) # Throttle invalidation
        end
        
        @processing = false
      end
    end
  end
  
  # Jekyll integration
  class JekyllCacheGenerator < Generator
    def generate(site)
      cache_manager = DistributedCache::Manager.new(site.config['cache'])
      
      # Generate cache keys for all content
      site.pages.each do |page|
        cache_key = generate_cache_key(page)
        cache_manager.store(cache_key, page.output, 
          ttl: page.data['cache_ttl'] || 3600,
          tags: page.data['tags'] || []
        )
      end
      
      # Warm API data cache
      warm_api_data_cache(site, cache_manager)
    end
    
    def generate_cache_key(page)
      # Generate deterministic cache key
      hash_input = [
        page.path,
        page.content,
        page.data.to_json,
        page.site.config['version']
      ].join('|')
      
      "page_#{Digest::MD5.hexdigest(hash_input)}"
    end
  end
end

Cloudflare Workers Edge Cache Implementation

Cloudflare Workers provide edge caching with global distribution and sub-millisecond response times. The Workers implement sophisticated caching logic including stale-while-revalidate and cache partitioning.


// workers/edge-cache.js
// Global edge cache implementation

export default {
  async fetch(request, env, ctx) {
    const url = new URL(request.url)
    const cacheKey = generateCacheKey(request)
    
    // Check if we should bypass cache
    if (shouldBypassCache(request)) {
      return fetch(request)
    }
    
    // Try to get from cache
    let response = await getFromCache(cacheKey, env)
    
    if (response) {
      // Cache hit - check if stale
      if (isStale(response)) {
        // Serve stale content while revalidating
        ctx.waitUntil(revalidateCache(request, cacheKey, env))
        return markResponseAsStale(response)
      }
      
      // Fresh cache hit
      return markResponseAsCached(response)
    }
    
    // Cache miss - fetch from origin
    response = await fetch(request.clone())
    
    // Cache the response if cacheable
    if (isCacheable(response)) {
      ctx.waitUntil(cacheResponse(cacheKey, response, env))
    }
    
    return response
  }
}

async function getFromCache(cacheKey, env) {
  // Try KV store first
  const cached = await env.EDGE_CACHE_KV.get(cacheKey, { type: 'json' })
  
  if (cached) {
    return new Response(cached.content, {
      headers: cached.headers,
      status: cached.status
    })
  }
  
  // Try R2 for large assets
  const r2Key = `cache/${cacheKey}`
  const object = await env.EDGE_CACHE_R2.get(r2Key)
  
  if (object) {
    return new Response(object.body, {
      headers: object.httpMetadata.headers
    })
  }
  
  return null
}

async function cacheResponse(cacheKey, response, env) {
  const responseClone = response.clone()
  const headers = Object.fromEntries(responseClone.headers.entries())
  const status = responseClone.status
  
  // Get response body based on size
  const body = await responseClone.text()
  const size = body.length
  
  const cacheData = {
    content: body,
    headers: headers,
    status: status,
    cachedAt: Date.now(),
    ttl: calculateTTL(responseClone)
  }
  
  if (size > 1024 * 1024) { // 1MB threshold
    // Store large responses in R2
    await env.EDGE_CACHE_R2.put(`cache/${cacheKey}`, body, {
      httpMetadata: { headers }
    })
    
    // Store metadata in KV
    await env.EDGE_CACHE_KV.put(cacheKey, JSON.stringify({
      ...cacheData,
      content: null,
      storage: 'r2'
    }))
  } else {
    // Store in KV
    await env.EDGE_CACHE_KV.put(cacheKey, JSON.stringify(cacheData), {
      expirationTtl: cacheData.ttl
    })
  }
}

function generateCacheKey(request) {
  const url = new URL(request.url)
  
  // Create cache key based on request characteristics
  const components = [
    request.method,
    url.hostname,
    url.pathname,
    url.search,
    request.headers.get('accept-language') || 'en',
    request.headers.get('cf-device-type') || 'desktop'
  ]
  
  // Hash the components
  const keyString = components.join('|')
  return hashString(keyString)
}

function hashString(str) {
  // Simple hash function
  let hash = 0
  for (let i = 0; i < str.length; i++) {
    const char = str.charCodeAt(i)
    hash = ((hash << 5) - hash) + char
    hash = hash & hash // Convert to 32bit integer
  }
  return Math.abs(hash).toString(36)
}

// Cache invalidation worker
export class CacheInvalidationWorker {
  constructor(state, env) {
    this.state = state
    this.env = env
  }
  
  async fetch(request) {
    const url = new URL(request.url)
    
    if (url.pathname === '/invalidate' && request.method === 'POST') {
      return this.handleInvalidation(request)
    }
    
    return new Response('Not found', { status: 404 })
  }
  
  async handleInvalidation(request) {
    const { keys, tags, pattern } = await request.json()
    
    let keysToInvalidate = []
    
    if (keys) {
      keysToInvalidate = keys
    } else if (tags) {
      keysToInvalidate = await this.findKeysByTags(tags)
    } else if (pattern) {
      keysToInvalidate = await this.findKeysByPattern(pattern)
    }
    
    // Invalidate each key
    await Promise.all(
      keysToInvalidate.map(key => this.invalidateKey(key))
    )
    
    // Propagate to other edge locations
    await this.propagateInvalidation(keysToInvalidate)
    
    return new Response(JSON.stringify({
      invalidated: keysToInvalidate.length
    }))
  }
  
  async invalidateKey(key) {
    // Delete from KV
    await this.env.EDGE_CACHE_KV.delete(key)
    
    // Delete from R2 if exists
    await this.env.EDGE_CACHE_R2.delete(`cache/${key}`)
  }
}

Jekyll Build-Time Cache Optimization

Jekyll build-time optimization involves generating cache-friendly content, adding cache headers, and creating cache manifests for intelligent edge delivery.


# _plugins/cache_optimizer.rb
module Jekyll
  class CacheOptimizer
    def optimize_site(site)
      # Add cache headers to all pages
      site.pages.each do |page|
        add_cache_headers(page)
      end
      
      # Generate cache manifest
      generate_cache_manifest(site)
      
      # Optimize assets for caching
      optimize_assets_for_cache(site)
    end
    
    def add_cache_headers(page)
      cache_control = generate_cache_control(page)
      expires = generate_expires_header(page)
      
      page.data['cache_control'] = cache_control
      page.data['expires'] = expires
      
      # Add to page output
      if page.output
        page.output = inject_cache_headers(page.output, cache_control, expires)
      end
    end
    
    def generate_cache_control(page)
      # Determine cache strategy based on page type
      if page.data['layout'] == 'default'
        # Static content - cache for longer
        "public, max-age=3600, stale-while-revalidate=7200"
      elsif page.url.include?('_posts')
        # Blog posts - moderate cache
        "public, max-age=1800, stale-while-revalidate=3600"
      else
        # Default cache
        "public, max-age=300, stale-while-revalidate=600"
      end
    end
    
    def generate_cache_manifest(site)
      manifest = {
        version: '1.0',
        generated: Time.now.utc.iso8601,
        pages: {},
        assets: {},
        invalidation_map: {}
      }
      
      # Map pages to cache keys
      site.pages.each do |page|
        cache_key = generate_page_cache_key(page)
        manifest[:pages][page.url] = {
          key: cache_key,
          hash: page.content_hash,
          dependencies: find_page_dependencies(page)
        }
        
        # Build invalidation map
        add_to_invalidation_map(page, manifest[:invalidation_map])
      end
      
      # Save manifest
      File.write(File.join(site.dest, 'cache-manifest.json'), 
                 JSON.pretty_generate(manifest))
    end
    
    def generate_page_cache_key(page)
      components = [
        page.url,
        page.content,
        page.data.to_json
      ]
      
      Digest::SHA256.hexdigest(components.join('|'))[0..31]
    end
    
    def add_to_invalidation_map(page, map)
      # Map tags to pages for quick invalidation
      tags = page.data['tags'] || []
      categories = page.data['categories'] || []
      
      (tags + categories).each do |tag|
        map[tag] ||= []
        map[tag] << page.url
      end
    end
  end
  
  # Hook into Jekyll's build process
  Jekyll::Hooks.register :site, :post_write do |site|
    optimizer = CacheOptimizer.new
    optimizer.optimize_site(site)
  end
end

# Rake task for cache warm-up
namespace :cache do
  desc 'Warm cache for entire site'
  task :warm do
    require 'net/http'
    require 'uri'
    
    site_url = ENV['SITE_URL'] || 'https://yourdomain.com'
    urls_file = '_site/urls.txt'
    
    # Read URLs from sitemap or generate list
    urls = if File.exist?(urls_file)
             File.readlines(urls_file).map(&:chomp)
           else
             generate_urls_from_sitemap
           end
    
    puts "Warming cache for #{urls.size} URLs..."
    
    # Warm cache with concurrent requests
    threads = []
    urls.each_slice(10) do |batch|
      batch.each do |url|
        threads << Thread.new do
          uri = URI.parse("#{site_url}#{url}")
          Net::HTTP.get(uri)
          puts "Warmed: #{url}"
        end
      end
      threads.each(&:join)
      threads.clear
      sleep(0.5) # Rate limiting
    end
    
    puts "Cache warming completed!"
  end
end

Multi-Region Cache Synchronization Strategies

Multi-region cache synchronization ensures consistency across global edge locations. The system uses a combination of replication strategies and conflict resolution.


# lib/distributed_cache/synchronizer.rb
module DistributedCache
  class Synchronizer
    def initialize(config)
      @config = config
      @regions = config.regions
      @connections = {}
      @replication_queue = ReplicationQueue.new
    end
    
    def synchronize(key, value, operation = :write)
      case operation
      when :write
        replicate_write(key, value)
      when :delete
        replicate_delete(key)
      when :update
        replicate_update(key, value)
      end
    end
    
    def replicate_write(key, value)
      # Primary region write
      primary_region = @config.primary_region
      write_to_region(primary_region, key, value)
      
      # Async replication to other regions
      (@regions - [primary_region]).each do |region|
        @replication_queue.add({
          type: :write,
          region: region,
          key: key,
          value: value,
          priority: :high
        })
      end
    end
    
    def ensure_consistency(key)
      # Check consistency across regions
      values = {}
      
      @regions.each do |region|
        values[region] = read_from_region(region, key)
      end
      
      # Find inconsistencies
      unique_values = values.values.uniq.compact
      
      if unique_values.size > 1
        # Conflict detected - resolve
        resolved_value = resolve_conflict(key, values)
        
        # Replicate resolved value
        replicate_resolution(key, resolved_value, values)
      end
    end
    
    def resolve_conflict(key, regional_values)
      # Implement conflict resolution strategy
      case @config.conflict_resolution
      when :last_write_wins
        resolve_last_write_wins(regional_values)
      when :priority_region
        resolve_priority_region(regional_values)
      when :merge
        resolve_merge(regional_values)
      else
        resolve_last_write_wins(regional_values)
      end
    end
    
    private
    
    def write_to_region(region, key, value)
      connection = connection_for_region(region)
      connection.write(key, value)
      
      # Update version vector
      update_version_vector(key, region)
    end
    
    def connection_for_region(region)
      @connections[region] ||= begin
        case region
        when /cf-/
          CloudflareConnection.new(@config.cloudflare_token, region)
        when /aws-/
          AWSConnection.new(@config.aws_config, region)
        else
          RedisConnection.new(@config.redis_urls[region])
        end
      end
    end
    
    def update_version_vector(key, region)
      vector = read_version_vector(key) || {}
      vector[region] = Time.now.utc.to_i
      write_version_vector(key, vector)
    end
  end
  
  # Region-specific connections
  class CloudflareConnection
    def initialize(api_token, region)
      @client = Cloudflare::Client.new(api_token)
      @region = region
    end
    
    def write(key, value)
      # Write to Cloudflare KV in specific region
      @client.put_kv(@region, key, value)
    end
    
    def read(key)
      @client.get_kv(@region, key)
    end
  end
  
  # Replication queue with backoff
  class ReplicationQueue
    def initialize
      @queue = []
      @failed_replications = {}
      @max_retries = 5
    end
    
    def add(item)
      @queue << item
      
      # Process queue if not already processing
      process_queue unless @processing
    end
    
    def process_queue
      @processing = true
      
      Thread.new do
        while item = @queue.shift
          begin
            execute_replication(item)
          rescue => e
            handle_replication_failure(item, e)
          end
        end
        
        @processing = false
      end
    end
    
    def execute_replication(item)
      case item[:type]
      when :write
        replicate_write(item)
      when :delete
        replicate_delete(item)
      when :update
        replicate_update(item)
      end
      
      # Clear failure count on success
      @failed_replications.delete(item[:key])
    end
    
    def replicate_write(item)
      connection = connection_for_region(item[:region])
      connection.write(item[:key], item[:value])
    end
    
    def handle_replication_failure(item, error)
      failure_count = @failed_replications[item[:key]] || 0
      
      if failure_count < @max_retries
        # Retry with exponential backoff
        @failed_replications[item[:key]] = failure_count + 1
        
        # Requeue with delay
        item[:retry_delay] = 2 ** failure_count
        @queue << item
        
        log("Replication failed for #{item[:key]}, retrying in #{item[:retry_delay]}s")
      else
        log("Replication permanently failed for #{item[:key]}: #{error.message}")
        @failed_replications.delete(item[:key])
      end
    end
  end
end

Cache Performance Monitoring and Analytics

Cache monitoring provides insights into cache effectiveness, hit rates, and performance metrics for continuous optimization.


# lib/distributed_cache/monitoring.rb
module DistributedCache
  class Monitoring
    def initialize(config)
      @config = config
      @metrics = {
        hits: 0,
        misses: 0,
        writes: 0,
        invalidations: 0,
        regional_hits: Hash.new(0),
        response_times: []
      }
      @start_time = Time.now
    end
    
    def record_hit(key, region = nil)
      @metrics[:hits] += 1
      @metrics[:regional_hits][region] += 1 if region
    end
    
    def record_miss(key, region = nil)
      @metrics[:misses] += 1
    end
    
    def record_response_time(milliseconds)
      @metrics[:response_times] << milliseconds
      
      # Keep only last 1000 measurements
      if @metrics[:response_times].size > 1000
        @metrics[:response_times].shift
      end
    end
    
    def generate_report
      uptime = Time.now - @start_time
      total_requests = @metrics[:hits] + @metrics[:misses]
      hit_rate = total_requests > 0 ? (@metrics[:hits].to_f / total_requests * 100).round(2) : 0
      
      avg_response_time = if @metrics[:response_times].any?
        (@metrics[:response_times].sum / @metrics[:response_times].size).round(2)
      else
        0
      end
      
      {
        general: {
          uptime_hours: (uptime / 3600).round(2),
          total_requests: total_requests,
          hit_rate_percent: hit_rate,
          hit_count: @metrics[:hits],
          miss_count: @metrics[:misses],
          write_count: @metrics[:writes],
          invalidation_count: @metrics[:invalidations]
        },
        performance: {
          avg_response_time_ms: avg_response_time,
          p95_response_time_ms: percentile(95),
          p99_response_time_ms: percentile(99),
          min_response_time_ms: @metrics[:response_times].min || 0,
          max_response_time_ms: @metrics[:response_times].max || 0
        },
        regional: @metrics[:regional_hits],
        recommendations: generate_recommendations
      }
    end
    
    def generate_recommendations
      recommendations = []
      hit_rate = (@metrics[:hits].to_f / (@metrics[:hits] + @metrics[:misses]) * 100).round(2)
      
      if hit_rate < 70
        recommendations << "Low cache hit rate (#{hit_rate}%). Consider increasing cache TTLs or implementing more aggressive caching."
      end
      
      if @metrics[:response_times].any? && percentile(95) > 100
        recommendations << "High p95 response time (#{percentile(95)}ms). Consider optimizing cache lookup or reducing cache key complexity."
      end
      
      if @metrics[:invalidations] > @metrics[:writes] * 0.1
        recommendations << "High invalidation rate. Review cache key strategy to reduce unnecessary invalidations."
      end
      
      recommendations
    end
    
    private
    
    def percentile(p)
      return 0 if @metrics[:response_times].empty?
      
      sorted = @metrics[:response_times].sort
      index = (p / 100.0 * (sorted.length - 1)).ceil
      sorted[index]
    end
  end
  
  # Integration with monitoring services
  class MetricsExporter
    def initialize(monitoring, exporters = [])
      @monitoring = monitoring
      @exporters = exporters
      @export_interval = 60 # seconds
      @export_thread = nil
    end
    
    def start
      @export_thread = Thread.new do
        loop do
          export_metrics
          sleep @export_interval
        end
      end
    end
    
    def stop
      @export_thread&.kill
      @export_thread = nil
    end
    
    private
    
    def export_metrics
      metrics = @monitoring.generate_report
      
      @exporters.each do |exporter|
        begin
          exporter.export(metrics)
        rescue => e
          log("Failed to export metrics to #{exporter.class}: #{e.message}")
        end
      end
    end
  end
  
  # Cloudflare Analytics exporter
  class CloudflareAnalyticsExporter
    def initialize(api_token, zone_id)
      @client = Cloudflare::Client.new(api_token)
      @zone_id = zone_id
    end
    
    def export(metrics)
      # Format for Cloudflare Analytics
      analytics_data = {
        cache_hit_rate: metrics[:general][:hit_rate_percent],
        cache_requests: metrics[:general][:total_requests],
        avg_response_time: metrics[:performance][:avg_response_time_ms],
        timestamp: Time.now.utc.iso8601
      }
      
      @client.send_analytics(@zone_id, analytics_data)
    end
  end
end

This distributed caching system provides enterprise-grade caching capabilities for Jekyll sites, combining Ruby's processing power with Cloudflare's global edge network. The system ensures fast content delivery worldwide while maintaining cache consistency and providing comprehensive monitoring for continuous optimization.