System Tuning Info for Linux Servers

(borrowed from RH alikins@redhat.com )

This page is about optimizing and tuning Linux based systems for server oriented tasks. Most of the info presented here I've used myself, and have found it to be beneficial. I've tried to avoid the well tread ground (hdparm, turning off hostname lookups in apache, etc) as that info is easy to find elsewhere.

Some cases where you might want to apply some of benchmarking, high traffic web sites, or in case of any load spike (say, a web transferred virus is pegging your servers with bogus requests).

Disk Tuning
File system Tuning
SCSI Tuning
Disk I/O Elevators
File limits
Process limits
Threads
Benchmarks
System Tuning Links

Disk Tuning

Benchmark performance is often heavily based on disk I/O performance. So getting as much disk I/O as possible is the real key.

Depending on the array, and the disks used, and the controller, you may want to try software raid. It is tough to beat software raid performance on a modern cpu with a fast disk controller.

The easiest way to configure software raid is to do it during the install. If you use the gui installer, there are options in the disk partition screen to create a " md " or multiple-device, linux talk for a software raid partition. You will need to make partitions on each of the drives of type " linux raid ", and then after creating all these partitions, create a new partition, say " /test ", and select md as its type. Then you can select all the partitions that should be part of it, as well as the raid type. For pure performance, RAID 0 is the way to go.

Note that by default, I believe you are limited to 12 drives in a MD device, so you may be limited to that. If the drives are fast enough, that should be sufficient to get >100 MB/s pretty consistently.

One thing to keep in mind is that the position of a partition on a hardrive does have performance implications. Partitions that get stored at the very outer edge of a drive tend to be significantly faster than those on the inside. A good benchmarks trick is to use RAID across several drives, but only use a very small partition on the outside of the disk. This give both consistent performance, and the best performance. On most modern drives, or least drives using ZCAV (Zoned Constant Angular Velocity), this tends to be sectors with the lowest address, aka, the first partitions. For a way to see the differences illustrated, see the ZCAV page.

This is just a summary of software RAID configuration. More detailed info can be found elsewhere including the Software-RAID-HOWTO, and the docs and man pages from the raidtools package.

File System Tuning

Some of the default kernel parameters for system performance are geared more towards workstation performance that file server/large disk io type of operations. The most important of these is the " bdflush " value in /proc/sys/vm/bdflush

These values are documented in detail in /usr/src/linux/Documentation/sysctl/vm.txt.

A good set of values for this type of server is:

echo 100 5000 640 2560 150 30000 5000 1884 2>/proc/sys/vm/bdflush

(you change these values by just echo'ing the new values to the file. This takes effect immediately. However, it needs to be reinitialized at each kernel boot. The simplest way to do this is to put this command into the end of /etc/rc.d/rc.local)

Also, for pure file server applications like web and samba servers, you probably want to disable the " atime" option on the filesystem. This disabled updating the " atime" value for the file, which indicates that the last time a file was accessed. Since this info isn't very useful in this situation, and causes extra disk hits, its typically disabled. To do this, just edit /etc/fstab and add " notime " as a mount option for the filesystem.

for example:

/dev/rd/c0d0p3 /test ext2 noatime 1 2

With these file system options, a good raid setup, and the bdflush values, filesystem performance should be sufficient.

The disk i/o elevators is another kernel tuneable that can be tweaked for improved disk i/o in some cases.

SCSI Tuning

SCSI tuning is highly dependent on the particular scsi cards and drives in questions. The most effective variable when it comes to SCSI card performance is tagged command queueing.

For the Adaptec aic7xxx series cards (2940's, 7890's, *160's, etc) this can be enabled with a module option like:

aic7xx=tag_info:{{0,0,0,0,}}

This enabled the default tagged command queuing on the first device, on the first 4 scsi ids.

options aic7xxxaic7xxx=tag_info:{{24.24.24.24.24.24}} in /etc/modules.conf will set the TCQ depth to 24

You probably want to check the driver documentation for your particular scsi modules for more info.

Disk I/O Elevators

On systems that are consistently doing a large amount of disk I/O, tuning the disk I/O elevators may be useful. This is a 2.4 kernel feature that allows some control over latency vs throughput by changing the way disk io elevators operate.

This works by changing how long the I/O scheduler will let a request sit in the queue before it has to be handled. Since the I/O scheduler can collapse some request together, having a lot of items in the queue means more can be coalesced, which can increase throughput.

Changing the max latency on items in the queue allows you to trade disk i/o latency for throughput, and vice versa.

The tool " /sbin/elvtune " (part of util-linux) allows you to change these max latency values. Lower values means less latency, but also less thoughput. The values can be set for the read and write queues separately.

To determine what the current settings are, just issue: /sbin/elvtune /dev/hda1 substituting the appropriate device of course. Default values are 8192 for read, and 16384 for writes.

To set new values of 2000 for read and 4000 for example: /sbin/elvtune -r 2000 -w 4000 /dev/hda1 Note that these values are for example purposes only, and are not recommended tuning values. That depends on the situation.

The units of these values are basically " sectors of writes before reads are allowed ". The kernel attempts to do all reads, then all writes, etc in an attempt to prevent disk io mode switching, which can be slow. So this allows you to alter how long it waits before switching.

One way to get an idea of the effectiveness of these changes is to monitor the output of `isostat -d -x DEVICE`. The " avgrq-sz " and " avgqu-sz " values (average size of request and average queue length, see man page for iostat) should be affected by these elevator changes. Lowering the latency should cause the " avqrq-sz " to go down, for example.

See the elvtune man page for more info. Some info from when this feature was introduced is also at Lwn.net

This info contributed by Arjan van de Ven.

Process Limits

For heavily used web servers, or machines that spawn off lots and lots of processes, you probably want to up the limit of processes for the kernel.

Also, the 2.2 kernel itself has a max process limit. The default values for this are 2560, but a kernel recompile can take this as high as 4000. This is a limitation in the 2.2 kernel, and has been removed from 2.3/2.4.

The values that need to be changed are:

If your running out how many task the kernel can handle by default, you may have to rebuild the kernel after editing:

/usr/src/linux/include/linux/tasks.h and change:

#define NR_TASKS 2560 /* On x86 Max 4092, or 4090 w/APM configured.*/

to #define NR_TASKS 4000 /* On x86 Max 4092, or 4090 w/APM configured.*/

and: #define MAX_TASKS_PER_USER (NR_TASKS/2)

to #define MAX_TASKS_PER_USER (NR_TASKS)

Then recompile the kernel.

also run: ulimit -u 4000

Note: This process limit is gone in the 2.4 kernel series.

Threads

Limitations on threads are tightly tied to both file descriptor limits, and process limits.

Under Linux, threads are counted as processes, so any limits to the number of processes also applies to threads. In a heavily threaded app like a threaded TCP engine, or a java server, you can quickly run out of threads.

For starters, you want to get an idea how many threads you can open. The `thread-limit` util mentioned in the Tuning Utilities section is probably as good as any.

The first step to increasing the possible number of threads is to make sure you have boosted any process limits as mentioned before.

There are few things that can limit the number of threads, including process limits, memory limits, mutex/semaphore/shm/ipc limits, and compiled in thread limits.

For most cases, the process limit is the first one to run into, then the compiled in thread limits, then the memory limits.

To increase the limits, you have to recompile glibc. Oh fun!. And the patch is essentially two lines!. Woohoo!

--- ./linuxthreads/sysdeps/unix/sysv/linux/bits/local_lim.h.akl Mon Sep 4 19:37:42 2000 +++ ./linuxthreads/sysdeps/unix/sysv/linux/bits/local_lim.h Mon Sep 4 19:37:56 2000 @@ -64,7 +64,7 @@ /* The number of threads per process. */ #define _POSIX_THREAD_THREADS_MAX 64 /* This is the value this implementation supports. */ -#define PTHREAD_THREADS_MAX 1024 +#define PTHREAD_THREADS_MAX 8192 /* Maximum amount by which a process can decrease its asynchronous I/O priority level. */ --- ./linuxthreads/internals.h.akl Mon Sep 4 19:36:58 2000 +++ ./linuxthreads/internals.h Mon Sep 4 19:37:23 2000 @@ -330,7 +330,7 @@ THREAD_SELF implementation is used, this must be a power of two and a multiple of PAGE_SIZE. */ #ifndef STACK_SIZE -#define STACK_SIZE (2 * 1024 * 1024) +#define STACK_SIZE (64 * PAGE_SIZE) #endif /* The initial size of the thread stack. Must be a multiple of PAGE_SIZE. * */

Now just patch glibc, rebuild, and install it. ;-> If you have a package based system, I seriously suggest making a new package and using it.

Two references on how to do this are Jlinux.org, and Volano.Both describe how to increase the number of threads so Java apps can use them.

A good resource on this is Tunings The Linux Kernel's Memory.

Benchmarks

Benchmarks Lies, damn lies, and statistics.

But aside from that, a good set of benchmarking utilities are often very helpful in doing system tuning work. It is impossible to duplicate " real world" situations, but that isn't really the goal of a good benchmark. A good benchmark typically tries to measure the performance of one particular thing very accurately. If you understand what the benchmarks are doing, they can be very useful tools.

Some of the common and useful benchmarks include: Bonnie

Bonnie has been around forever, and the numbers it produces are meaningful to many people. If nothing else, it's good tool for producing info to share with others.

This is a pretty common utility for testing driver performance. It's only drawback is it sometimes requires the use of huge datasets on large memory machines to get useful results, but I suppose that goes with the territory.

> Check Doug Ledford's list of benchmarks for more info on Bonnie. There is also a somewhat newer version of Bonnie called Bonnie++ that fixes a few bugs, and includes a couple of extra tests.

Dbench

My personal favorite disk io benchmarking utility is `dbench`. It is designed to simulate the disk io load of a system when running the NetBench benchmark suite. It seems to do an excellent job at making all the drive lights blink like mad. Always a good sign.

Dbench is available at The Samba ftp site and mirrors

dt does a lot. disk io, process creation, async io, etc.

dt is available at The dt page

Autobench Autobench is a simple Perl script for automating the process of benchmarking a web server (or for conducting a comparative test of two different web servers). The script is a wrapper around httperf. Autobench runs httperf a number of times against each host, increasing the number of requested connections per second on each iteration, and extracts the significant data from the httperf output, delivering a CSV or TSV format file which can be imported directly into a spreadsheet for analysis/graphing.

Info: http://www.xenoclast.org/autobench/
Download: http://www.xenoclast.org/autobench/downloads

Info provided by Bill Hilf.

General benchmark Sites

Doug Ledford's page

ResierFS benchmark page

System Tuning Links http://www.kegel.com Check out the " c10k problem" page in particular, but the entire site has _lots_ of useful tuning info.

http://linuxperf.nl.linux.org/ Site organized by Rik Van Riel and a few other folks. Probably the best linux specific system tuning page.

http://www.citi.umich.edu/projects/citi-netscape/ Linux Scalability Project at Umich.

NFS Performance Tunging Info on tuning linux kernel NFS in particular, and linux network and disk io in general

http://home.att.net/~jageorge/performance.html Linux Performance Checklist. Some useful content.

http://www.linux.com/enhance/tuneup/

Miscellaneous performance tuning tips at linux.com

http://www.psc.edu/networking/perf_tune.html#Linux