Skip to content Skip to navigation

Prime Your Cache with a Cron Job

Author: 
Topics: 

Properly configuring caching on a Drupal site is one of the most important things that a site administrator can do to improve performance. Caching represents a tradeoff between speed and "freshness" of content, however. There are some tools, such as the Boost crawler, that can prime your cache.

(See also: Load Page Cache after Cron Runs.)

However, sometimes you need more granular control over when a cached version of a page is regenerated. Often there are a few pages on a site that are edge cases. Maybe it's a complex calendar that gets frequent updates, or a Panels page with multiple Views. You can set your cache expiration to a low value (e.g., 1 hour), and if you're using Boost, you can set up these exclusions on a per-page basis.

Once the cached version of that page expires, the next user to hit that page has to sit and wait for Drupal to build the entire page from scratch. Only subsequent users reap the benefits of caching.

It would be great if we could have a machine generate the cached version so that all users get the speed of a cached page.

For this example, our complex page is at http://foo.stanford.edu/foo, and we've set our cache lifetime to 2 hours.

We want to set up a cron job to "prime" our cache by hitting that page every hour.

screenshot of Scheduling Service form

  1. Go to https://tools.stanford.edu/cgi-bin/scheduler and click "Create new job."
  2. Enter the following in the Command field:
    curl -sS http://foo.stanford.edu/foo > /dev/null
  3. For this example, the Principal field does not matter too much, but for consistency's sake you should use the same principal that the site uses (e.g., group-foo/cgi). For the following example (using a shell script), the principal will matter.
  4. Make the job Active? Yes
  5. Mail command output? No, email only errors
  6. Send email to: your email address
  7. Description: "Primes the cache"
  8. Schedule: Every hour
  9. Click Save Job

That's it!

If you have a number of pages, you can create a shell script with multiple curl commands and call the shell script using the scheduling service.

For instance, if foo.stanford.edu is hosted at /afs/ir/group/foo/cgi-bin/drupal, create a file called cacheprime.sh and save it at /afs/ir/group/foo/cgi-bin/cacheprime.sh. An example of the cacheprime.sh file would be as follows:

#!/bin/bash
curl -sS http://foo.stanford.edu/foo > /dev/null
curl -sS http://foo.stanford.edu/bar > /dev/null
curl -sS http://foo.stanford.edu/baz > /dev/null

(Be sure to make the script executable by using "chmod +x cacheprime.sh")

Then you would simply enter "/afs/ir/group/foo/cgi-bin/cacheprime.sh" in the Command field of the Scheduling Service form.