Viper2110: Optimizing SQUID in Linux

cache_mem

This is one of the top two options as far as performance impact goes. The default value here is 8 MB, which is a safe option for just about any system size. But since your box is not lacking in memory it can safely be raised to 32 MB, and even 48 if you have stripped all other services off of your server. If your box is lacking in memory, go back to the hardware section and reread the part about memory...you need a bunch of memory for a fast web cache. You should be aware that cache_mem does not limit the process size of squid. This sets how much memory squid is allowed to set aside for "hot objects" which are the most recently used objects. Having said all that, keep in mind that the buffer cache in Linux is also quite good, so the gains you'll see by raising this are very small, but still measurable.

cache_dir

This is where you set the directories you will be using. You should have already mkreiserfs'd your cache directory partitions, so you'll have an easy time deciding the values here. First, you will want to use about 60% or less of each cache directory for the web cache. If you use any more than that you will begin to see a slight degradation in performance. Remember that cache size is not as important as cache speed, since for maximum effectiveness your cache needs only store about a weeks worth of traffic. You'll also need to define the number of directories and subdirectories. The formula for deciding that is this:

x=Size of cache dir in KB (i.e. 6GB=~6,000,000KB) y=Average object size

(just use 13KB z=Number of directories per first level directory

`(((x / y) / 256) / 256) * 2 = # of directories

As an example, I use 6GB of each of my 13GB drives, so:

6,000,000 / 13 = 461538.5 / 256 = 1802.9 / 256 = 7 * 2 = 14

So my cache_dir line would look like this:

cache_dir 6000 14 256

Those are really the two important ones. Everything else is just picking nits. You may wish to turn off the store log since there isn't much one can do with it anyway:

cache_store_log none

If, after all of this you still find your squid being overloaded at times, try setting the max_open_disk_fds to some low number. For a single disk cache, 900 is a good choice. For a dual disk cache, try 1400. To get the perfect number for your cache you'll need to keep an eye on the info page from the cache manager during a particularly heavy load period. Watch the file descriptor usage, and then set this number about 10%below it. This can help your cache ride out the heaviest loads without getting too bogged down.

Also, the following may be set to improve performance marginally:

half_closed_clients off
maximum_object_size 1024 KB

--------------------------------------------------------

Disk space and memory

A cache can always use more disk space, but as the size of your disk-cache grows, you will need more memory to index it. There's a straightforward rule for memory.

Divide the size of your disk cache by 13 Kbytes, and multiply that by 130 bytes. Add the size of cache_mem, and add about 2.5 Mbytes more for executable files, libraries, and other sundry overhead. For example: We have a 10-Gbyte drive, and a cache_mem of 8 Mbytes.

10 Gbytes/13 Kbytes = 769,230 769,230 x 130 bytes = 99,999,900 bytes (or 97,656 Kbytes) 97,656 Kbytes + 2.5 Mbytes + 8 Mbytes = 10,849,656 Kbytes or about 108 Mbytes

The example server needs 108 Mbytes available to Squid to support 10 Gbytes of cache_dir.

Provide as much disk space as you can provide RAM to support it. Squid performs very badly when it starts to swap. Remember to set aside memory for anything else on the machine (DNS, cron, operating system, etc.).

Refresh patterns

Refresh patterns determine the lifetime of the object. Within an object lifetime, Squid will serve the object without requesting an IMS ("if modified since") request. Once the lifetime is exceeded, Squid will keep the object but will send an IMS request to the origin server. If the object has been modified since it was first cached, Squid requests the new copy. If not, it keeps the old copy. Either way, the object is marked as fresh again.

Here's our (default) basic refresh pattern:

refresh_pattern . 0 20% 4320

The dot (.) is the the regular expression pattern, and matches anything. It uses POSIX regular expressions. (See man 7 regex).

The zero (0) is the minimum freshness time. If it's anything other than zero, it will override any expiration headers given with the object. If the content provider actually provided an expiration header, we should usually honor it.

The last term (4320) is the maximum freshness time. The object becomes stale after this many minutes in the cache.

The 20 percent is used for our default case, for when there's no information from the content provider about the lifetime of the object. Squid takes x percent (20 percent in this example) of the difference between the last-modified time of the object and the current time, and uses that as the object lifetime. If the object lifetime is less than the minimum set by the refresh_pattern, it is increased to at least that. If it's greater than the supplied maximum, it's reduced to that.

Non-standard files

Some kinds of files can be maintained much longer than others. Zip, tar.gz, tgz, and .exe files rarely change content without also changing name. Using regular expressions, we can create a set of refresh patterns like this:

refresh_pattern -i exe$ 0 50% 999999 refresh_pattern -i zip$ 0 50% 999999 refresh_pattern -i tar\.gz$ 0 50% 999999 refresh_pattern -i tgz$ 0 50% 999999

Refresh pattern options

Note that these options violate the HTTP standard. Do not use them lightly.

override-expire pretends there is no expiration header on the object and calculates purely based on last-modified times. This permits you to cache sites that abuse the use of expiration headers, but also inhibits updates of frequently changed content (such as news sites).

ignore-reload prevents the object being refreshed when the user presses the refresh button on their browser. This does not perform well when the object has no content length -- you may wind up with a broken object that the users cannot reload.

reload-into-ims transforms reloads into validations. Beware: Web servers may permit an object to be updated without the last-modified time being altered. The server may then insist that the object is still valid when it actually is not.

DO increase maximum_object_size. 40 Mbytes is not too large. 800 Mbytes might cache the large downloads.

DO increase the ipcache_size and fqdncache_size.

Source: http://www.linux-faqs.com/squid.php

Source: http://www.linuxdevcenter.com/pub/a/linux/2001/07/26/squid.html?page=2