PostgreSQL Database Tuning

posted Sep 17, 2010, 2:35 PM by Sachchida Ojha
When there are performance problems, you need to determine the sources of the problems and your goals in resolving them. The steps for analyzing performance problems are:

1.Collect performance data to get baseline measurements. For example, you might use one or more of the following tools:

Benchmark tests developed in-house or industry-standard third-party tests.
sp_sysmon, a system procedure that monitors Adaptive Server performance and provides statistical output describing the behavior of your Adaptive Server system.
See Performance and Tuning Guide: Monitoring and Analyzing for Performance for information on using sp_sysmon.
Adaptive Server Monitor provides graphical performance and tuning tools and object-level information on I/O and locks.
Any other appropriate tools.

2.Analyze the data to understand the system and any performance problems. Create and answer a list of questions to analyze your Adaptive Server environment. The list might include questions such as:
What are the symptoms of the problem?
What components of the system model affect the problem?
Does the problem affect all users or only users of certain applications?
Is the problem intermittent or constant?

3.Define system requirements and performance goals:
How often is this query executed?
What response time is required?
  
4.Define the Adaptive Server environment–know the configuration and limitations at all layers.
5.Analyze application design – examine tables, indexes, and transactions.
6.Formulate a hypothesis about possible causes of the performance problem and possible solutions, based on performance data.
7.Test the hypothesis by implementing the solutions from the last step:
Adjust configuration parameters.
Redesign tables.
Add or redistribute memory resources.
8.Use the same tests used to collect baseline data in step 1 to determine the effects of tuning. Performance tuning is usually a repetitive process.
If the actions taken based on step 7 do not meet the performance requirements and goals set in step 3, or if adjustments made in one area cause new performance problems, repeat this analysis starting with step 2. You might need to reevaluate system requirements and performance goals.
9.If testing shows that your hypothesis is correct, implement the solution in your development environment.

PostgreSQL Performance Tuning


Tweak your hardware to get the most from this open-source database.

PostgreSQL is an object-relational database developed on the Internet by a group of developers spread across the globe. It is an open-source alternative to commercial databases like Oracle and Informix.

PostgreSQL was originally developed at the University of California, Berkeley. In 1996, a group began development of the database on the Internet. They used e-mail to share ideas and file servers to share code. PostgreSQL is now comparable to proprietary databases in terms of features, performance and reliability. It has transactions, views, stored procedures and referential integrity constraints. It supports a large number of programming interfaces, including ODBC, Java (JDBC), Tcl/Tk, PHP, Perl and Python. PostgreSQL continues to improve at a tremendous pace thanks to a talented pool of internet developers.
Performance Concepts

There are two aspects of database-performance tuning. One is improving the database's use of the CPU, memory and disk drives in the computer. The second is optimizing the queries sent to the database. This article talks about the hardware aspects of performance tuning. The optimization of queries is done using SQL commands like CREATE INDEX, VACUUM, VACUUM ANALYZE, CLUSTER and EXPLAIN. These are discussed in my book, PostgreSQL: Introduction and Concepts at http://www.postgresql.org/docs/awbook.html [see also Stephanie Black's review on page 76].

To understand hardware performance issues, it is important to understand what is happening inside the computer. For simplicity, a computer can be thought of as a central processing unit (CPU) surrounded by storage. On the same chip with the CPU are several CPU registers, which store intermediate results and various pointers and counters. Surrounding this is the CPU cache, which holds the most recently accessed information. Beyond the CPU cache is a large amount of random-access main memory (RAM), which holds executing programs and data. Beyond this main memory are disk drives, which store even larger amounts of information. Disk drives are the only permanent storage area, so anything to be kept when the computer is turned off must be placed there (see Table 1). Figure 1 shows the storage areas surrounding the CPU.


Table 1. Types of Computer Storage
You can see that storage areas increase in size as they get farther from the CPU. Ideally, a huge amount of permanent memory could be placed right next to the CPU, but this would be too slow and expensive. In practice, the most frequently used information is stored next to the CPU, and less frequently accessed information is stored farther away and brought to the CPU as needed.
Keeping Information Near the CPU

Moving information between various storage areas happens automatically. Compilers determine which information should be stored in registers. CPU chip logic keeps recently used information in the CPU cache. The operating system controls which information is stored in RAM and shuttles it back and forth from the disk drive.

CPU registers and the CPU cache cannot be tuned effectively by the database administrator. Effective database tuning involves increasing the amount of useful information in RAM, thus preventing disk access where possible.


You might think this is easy to do, but it is not. A computer's RAM contains many things, including executing programs, program data and stack, PostgreSQL shared buffer cache and kernel disk buffer cache. Proper tuning involves keeping as much database information in RAM as possible while not adversely affecting other areas of the operating system.
PostgreSQL Shared Buffer Cache

PostgreSQL does not directly change information on disk. Instead, it requests data be read into the PostgreSQL shared buffer cache. PostgreSQL backends then read/write blocks, and finally flush them back to disk. Backends that need to access tables first look for needed blocks in this cache. If they are already there, they can continue processing right away. If not, an operating system request is made to load the blocks. The blocks are loaded either from the kernel disk buffer cache or from disk. These can be expensive operations.

The default PostgreSQL configuration allocates 64 shared buffers. Each buffer is eight kilobytes. Increasing the number of buffers makes it more likely that backends will find the information they need in the cache, thus avoiding an expensive operating system request.
How Big Is Too Big?

You may think, ``I will just give all my RAM to the PostgreSQL shared buffer cache.'' However, if you do that, there will be no room for the kernel, or for any programs, to run. The proper size for the PostgreSQL shared buffer cache is the largest useful size that does not adversely affect other activity.

To understand adverse activity, you need to understand how UNIX operating systems manage memory. If there is enough memory to hold all programs and data, little memory management is required. However, if everything doesn't fit in RAM, the kernel starts forcing memory pages to a disk area called swap. It moves pages that have not been used recently. This operation is called a swap pageout. Pageouts are not a problem because they happen during periods of inactivity. What is bad is when these pages have to be brought back in from swap, meaning an old page that was moved out to swap has to be moved back into RAM. This is called a swap pagein. This is bad because while the page is moved from swap, the program is suspended until the pagein is complete.

Pagein activity is shown by system analysis tools like vmstat and sar and indicates there is not enough memory available to function efficiently. Do not confuse swap pageins with ordinary pageins, which may include pages read from the filesystem as part of normal system operation. If you can't find swap pageins, many pageouts is a good indicator you are also doing swap pageins.
Effects of Cache Size

You may wonder why cache size is so important. First, imagine the PostgreSQL shared buffer cache is large enough to hold an entire table. Repeated sequential scans of the table will require no disk access because all the data is already in the cache. Now imagine the cache is one block smaller than the table. A sequential scan of the table will load all table blocks into the cache until the last one. When that block is needed, the oldest block is removed, which in this case is the first block of the table. When another sequential scan happens, the first block is no longer in the cache, and to load it in, the oldest block is removed, which in this case is now the second block in the table. This pushing out of the next needed block continues to the end of the table. This is an extreme example, but you can see that a decrease of one block can change the efficiency of the cache from 100% to 0%. It shows that finding the right cache size can dramatically affect performance.
Proper Sizing of Shared Buffer Cache

Ideally, the PostgreSQL shared buffer cache will be large enough to hold most commonly accessed tables and small enough to avoid swap pagein activity. Keep in mind that the postmaster allocates all shared memory when it starts. This area stays the same size even if no one is accessing the database. Some operating systems pageout unreferenced shared memory, while others lock shared memory into RAM. The PostgreSQL 7.2 Administrator's Guide has information about kernel configuration for various operating systems (http://www.postgresql.org/devel-corner/docs/admin/kernel-resources.html).
Sort Memory Batch Size

Another tuning parameter is the amount of memory used for sort batches. When sorting large tables or results, PostgreSQL will sort them in parts, placing intermediate results in temporary files. These files are then merged and resorted until all rows are sorted. Increasing the batch size creates fewer temporary files and often allows faster sorting. However, if the sort batches are too large, they cause pageins because parts of the sort batch get paged out to swap during sorting. In these cases, it is much faster to use smaller sort batches and more temporary files, so again, swap pageins determine when too much memory has been allocated. Keep in mind that this parameter is used for every backend performing a sort, either for ORDER BY, CREATE INDEX or for a merge join. Several simultaneous sorts will use several times this amount of memory.
Cache Size and Sort Size

Both cache size and sort size affect memory usage, so you cannot maximize one without affecting the other. Keep in mind that cache size is allocated on postmaster startup, while sort size varies depending on the number of sorts being performed. Generally, cache size is more significant than sort size. However, certain queries that use ORDER BY, CREATE INDEX or merge joins may see increases in speed with larger sort batch sizes.

Also, many operating systems limit how much shared memory can be allocated. Increasing this limit requires operating system-specific knowledge to either recompile or reconfigure the kernel. More information can be found in the PostgreSQL 7.1 Administrator's Guide (http://www.postgresql.org/docs/admin/kernel-resources.html).
Disk Locality

The physical nature of disk drives makes their performance characteristics different from the other storage areas mentioned in this article. The other storage areas can access any byte with equal speed. Disk drives, with their spinning platters and moving heads, access data near the head's current position much faster than data farther away.

Moving the disk head to another cylinder on the platter takes quite a bit of time. UNIX kernel developers know this. When storing a large file on disk, they try to place the pieces of the file near each other. For example, suppose a file requires ten blocks on disk. The operating system may place blocks 1-5 on one cylinder and blocks 6-10 on another cylinder. If the file is read from beginning to end, only two head movements are required--one to get to the cylinder holding blocks 1-5, and another to get to blocks 6-10. However, if the file is read non-sequentially, e.g., blocks 1,6,2,7,3,8,4,9,5,10; ten head movements are required. As you can see, with disks, sequential access is much faster than random access. This is why PostgreSQL prefers sequential scans to index scans if a significant portion of the table needs to be read. This also highlights the value of the cache.
Multiple Disk Spindles

The disk head moves around quite a bit during database activity. If too many read/write requests are made, the drive can become saturated, causing poor performance (Vmstat and sar can provide information on the amount of activity on each disk drive).

One solution to disk saturation is to move some of the PostgreSQL data files to other disks. Remember, moving the files to other filesystems on the same disk drive does not help. All filesystems on a drive use the same disk heads. Database access can be spread across disk drives in several ways:

*Moving Databases--initlocation allows you to create databases on different drives.
*Moving Tables--symbolic links allow you to move tables and indexes to different drives. Movement should only be done while PostgreSQL is shut down. Also, PostgreSQL doesn't know about the symbolic links, so if you delete the table and recreate it, it will be created in the default location for that database. In 7.1, pg_database.oid and pg_class.relfilenode map database, table and index names to their numeric filenames.
*Moving Indexes--symbolic links allow moving indexes to different drives from their heap tables. This allows an index scan to be performed on one disk while a second disk performs heap lookups.
*Moving Joins--symbolic links allow the movement of joined tables to separate disks. If tables A and B are joined, lookups of table A can be performed on one drive while lookups of table B can be done on a second drive.
*Moving Log--symbolic links can be used to move the pg_xlog directory to a different disk drive. (Pg_xlog exists in PostgreSQL releases 7.1 and later.) Unlike other writes, PostgreSQL log writes must be flushed to disk before completing a transaction. The cache cannot be used to delay these writes. Having a separate disk for log writes allows the disk head to stay on the current log cylinder so writes can be performed without head movement delay. You can use the postgres -F parameter to prevent log writes from being flushed to disk, but an operating system crash may require a restore from backup.


Other options include the use of RAID features to spread a single filesystem across several drives.
Conclusion

Fortunately, PostgreSQL doesn't require much tuning. Most parameters are automatically adjusted to maintain optimum performance. Cache size and sort size are two parameters administrators can control to make better use of available memory. Disk access can also be spread across drives. Other parameters may be set in share/postgresql.conf.sample. You can copy this file to data/postgresql.conf to experiment with some of PostgreSQL's even more exotic parameters.


PostgreSQL 8.0 Performance Checklist

This is a set of rules of thumb for setting up your PostgreSQL 8.0 server. A lot of the below is based on anecdotal evidence or practical scaling tests; there's a lot about database performance that we, and OSDL, are still working out. However, this should get you started. All information below is useful as of January 12, 2005 and will likely be updated later. Discussions of settings below supercede the recommendations I've made on General Bits.
Five Hardware Principles of Setting Up Your PostgreSQL Server

1. Disks > RAM > CPU
If you're going to throw money at a PostgreSQL server, throw it at high-performance disk arrays and get average processors and adequate RAM. If you have a little more money, get more RAM. PostgreSQL, like other ACID-compliant RDBMSes, is very I/O intensive, and it's a rare application that taxes the CPU more than the SCSI card (some do exist, though). This applies to small servers as well as it applies to eight-ways with NetApps; get the cost-effective CPU if it allows you to buy a high-end RAID card and many disks.
2. More Spindles == Better
Given multiple disks, PostgreSQL and most operating systems will parallelize read and write requests on the database. This makes an enormous difference on transaction-processing systems, and a significant improvement on any application where the entire database does not fit in RAM. Given today's minimum disk sizes (72GB) you might be tempted to use just one disk, or a single RAID 1 mirrored pair; however, you'll find that using 4, 6, or even 14 disks will yield performance boosts. Oh, and SCSI is still signifiacantly higher DB throughput than IDE, even with Serial ATA.
3. Seperate the Transaction Log from the Database:
Assuming that you've already forked out the money for a decent-sized array, there are more intelligent options than throwing everything on a single RAID. For one thing, putting the database transaction log (pg_xlog) on its own, dedicated disk resource (an array or plain disk), makes as much as a 12% difference in performance on databases with high write activity. This is especially vital on small systems with slow SCSI or IDE disks: even in a two-disk server, you can put the transaction log onto the operating system disk and reap some benefits.
4. RAID 1 0/0 1 > RAID 5:
RAID 5 with 3 disks has become an unfortunate standard among large vendor's economy servers. This is possibly the slowest array configuration possible for PostgreSQL; you can expect as little as 50% of the query speed as you would get with a plain SCSI disk. Instead, focus on RAID 1 or 1 0 or 0 1 for any set of 2, 4 or 6 disks. Over 6 disks, RAID 5 starts to perform acceptably again, and the comparison tends to be a lot more on the basis of your individual controller. Perhaps more importantly, a cheap RAID card can be a liability; often it is better to use software RAID than the Adaptec built-in card that came with your server.
5. Applications must play nice together:
The other big mistake I see a lot of organizations making is putting PostgreSQL together on a server with several other applications which compete for the same resources. Worst among these is putting PostgreSQL and other RDBMSes on the same machine; both database systems will fight over disk bandwith and the OS disk cache, and both will perform poorly. Document servers and security logging programs are almost as bad. PostgreSQL can share a machine with applications which are mainly CPU-and-RAM-intensive, such as Apache, provided that there is enough RAM.

Twelve Settings You'll Want To Adjust in Your PostgreSQL.Conf File

There's a truly frightening amount of new options in the PostgreSQL.conf file. Even once-familiar options from the last 5 versions have changed names and parameter formats. It is intended to give you, the database administrator, more control, but can take some getting used to.

What follows are the settings that most DBAs will want to change, focused more on performance than anything else. There are quite a few "specialty" settings which most users won't touch, but those that use them will find indispensable. For those, you'll have to wait for the book.

Remember: PostgreSQL.conf settings must be uncommented to take effect, but re-commenting them does not necessarily restore the default values!
Connecting

listen_addresses: Replaces both the tcp_ip and virtual_hosts settings from 7.4. Defaults to localhost in most installations, allowing only connections on the console. Many DBAs will want to set this to "*", meaning all available interfaces, after setting proper permissions in the pg_hba.conf file, in order to make PostgreSQL accessable to the network. As an improvment over previous versions, the "localhost" default does permit connections on the "loopback" interface, 127.0.0.1, enabling many server browser-based utilities.

max_connections: exactly like previous versions, this needs to be set to the actual number of simultaneous connections you expect to need. High settings will require more shared memory (shared_buffers). As the per-connection overhead, both from PostgreSQL and the host OS, can be quite high, it's important to use connection pooling if you need to service a large number of users. For example, 150 active connections on a medium-end single-processor 32-bit Linux server will consume significant system resources, and 600 is about the limit on that hardware.  Of course, beefier hardware will allow more connections.
Memory

shared_buffers: As a reminder: This figure is NOT the total memory PostgreSQL has to work with. It is the block of dedicated memory PostgreSQL uses for active operations, and should be a minority of your total RAM on the machine, since PostgreSQL uses the OS disk cache as well. Unfortunately, the exact amount of shared buffers required is a complex calculation of total RAM, database size, number of connections, and query complexity. Thus it's better to go with some rules of thumb in allocating, and monitor the server (particuarly pg_statio views) to determine adjustments.
On dedicated servers, useful values seem to be between between 8MB and 400MB (between 1000 and 50,000 for 8K page size). Factors which raise the desired shared buffers are larger active portions of the database, large complex queries, large numbers of simultaneous queries, long-running procedures or transactions, more available RAM, and faster/more CPUs. And, of course, other applications on the machine. Contrary to some expectations, allocating much too much shared_buffers can actually lower peformance, due time required for scanning. Here's some examples based on anecdotes and TPC tests on Linux machines:

* Laptop, Celeron processor, 384MB RAM, 25MB database: 12MB/1500
* Athlon server, 1GB RAM, 10GB decision-support database: 120MB/15000
* Quad PIII server, 4GB RAM, 40GB, 150-connection heavy transaction processing database: 240MB/30000
* Quad Xeon server, 8GB RAM, 200GB, 300-connection heavy transaction processing database: 400MB/50000

Please note that increasing shared_buffers, and a few other memory parameters, will require you to modify your operating system's System V memory parameters. See the main PostgreSQL documentation for instructions on this.

work_mem: used to be called sort_mem, but has been re-named since it now covers sorts, aggregates, and a few other operations. This is non-shared memory, which is allocated per-operation (one to several times per query); the setting is here to put a ceiling on the amount of RAM any single operation can grab before being forced to disk. This should be set according to a calculation based on dividing the available RAM (after applications and shared_buffers) by the expected maximum concurrent queries times the average number of memory-using operations per query.
Consideration should also be paid to the amount of work_mem needed by each query; processing large data sets requires more. Web database applications generally set this quite low, as the number of connections is high but queries are simple; 512K to 2048K generally suffices. Contrawise, decision support applications with their 160-line queries and 10 million-row aggregates often need quite a lot, as much as 500MB on a high-memory server. For mixed-use databases, this parameter can be set per connection, at query time, in order to give more RAM to specific queries.

maintenance_work_mem: formerly called vacuum_mem, this is the quantity of RAM PostgreSQL uses for VACUUM, ANALYZE, CREATE INDEX, and adding foriegn keys. You should raise it the larger your database tables are, and the more RAM you have to spare, in order to make these operations as fast as possible. A setting of 50% to 75% of the on-disk size of your largest table or index is a good rule, or 32MB to 256MB where this can't be determined.
Disk and WAL

checkpoint_segments: defines the on-disk cache size of the transaction log for write operations. You can ignore this in mostly-read web database, but for transaction processing databases or reporting databases involving large data loads, raising it is performance-critical. Depening on the volume of data, raise it to between 12 and 256 segments, starting conservatively and raising it if you start to see warning messages in the log. The space required on disk is equal to (checkpoint_segments * 2 1) * 16MB, so make sure you have enough disk space (32 means over 1GB).


max_fsm_pages: sizes the register which tracks partially empty data pages for population with new data; if set right, makes VACUUM faster and removes the need for VACUUM FULL or REINDEX. Should be slightly more than the total number of data pages which will be touched by updates and deletes between vacuums. The two ways to determine this number are to run VACUUM VERBOSE ANALYZE, or if using autovacuum (see below) set this according to the -V setting as a percentage of the total data pages used by your database. fsm_pages require very little memory, so it's better to be generous here.

vacuum_cost_delay: If you have large tables and a significant amount of concurrent write activity, you may want to make use of a new feature which lowers the I/O burden of VACUUMs at the cost of making them take longer. As this is a very new feature, it's a complex of 5 dependant settings for which we have only a few performance tests. Increasing vacuum_cost_delay to a non-zero value turns the feature on; use a reasonable delay, somewhere between 50 and 200ms. For fine tuning, increasing vacuum_cost_page_hit and decreasing vacuum_cost_page_limit will soften the impact of vacuums and make them take longer; in Jan Wieck's tests on a transaction processing test, a delay of 200, page_hit of 6 and limit of 100 decreased the impact of vacuum by more than 80% while tripling the execution time.
Query Planner

These settings allow the query planner to make accurate estimates of operation costs, and thus pick the best possible query plan. There are two global settings worth bothering with:

effective_cache_size: tells the query planner the largest possible database object that could be expected to be cached. Generally should be set to about 2/3 of RAM, if on a dedicated server. On a mixed-use server, you'll have to estimate how much of the RAM and OS disk cache other applications will be using and subtract that.

random_page_cost: a variable which estimates the average cost of doing seeks for index-fetched data pages. On faster machines, with faster disks arrays, this should be lowered, to 3.0, 2.5 or even 2.0. However, if the active portion of your database is many times larger than RAM, you will want to raise the factor back towards the default of 4.0. Alternatively, you can base adjustments on query performance. If the planner seems to be unfairly favoring sequential scans over index scans, lower it; if it's using slow indexes when it shouldn't, raise it. Make sure you test a variety of queries. Do not lower it below 2.0; if that seems necessary, you need to adjust in other areas, like planner statistics.
Logging

log_destination: this replaces the unintuitive syslog setting in prior versions. Your choices are to use the OS's administrative log (syslog or eventlog) or to use a seperate PostgreSQL log (stderr). The former is better for system monitoring; the latter, better for database troubleshooting and tuning.

redirect_stderr: If you decide to go with a seperate PostgreSQL log, this setting allows you to log to a file using a native PostgreSQL utility instead of command-line redirection, allowing automated log rotation. Set it to True, and then set log_directory to tell it where to put the logs. The default settings for log_filename, log_rotation_size, and log_rotation_age are good for most people.
Autovacuum and You

As you tumble toward production on 8.0, you're going to want to set up a maintenance plan which includes VACUUMs and ANALYZEs. If your database involves a fairly steady flow of data writes, but does not require massive data loads and deletions or frequent restarts, this should mean setting up pg_autovacuum. It's better than time-scheduled vacuums because:

* Tables are vacuumed based on their activity, avoiding vacuuming read-only tables.
* The frequency of vacuums grows automatically with increasing database activity.
* It's easier to calculate free space map requirements and avoid database bloat.

Setting up autovacuum requires an easy build of the module in the contrib/pg_autovacuum directory of your PostgreSQL source (Windows users should find autovaccuum included in the PGInstaller package). You turn on the stats configuration settings detailed in the README. Then you start autovacuum after PostgreSQL is started as a seperate process; it will shut down automatically when PostgreSQL shuts down.

The default settings for autovacuum are very conservative, though, and are more suitable for a very small database. I generally use something aggressive like:
-D -v 400 -V 0.4 -a 100 -A 0.3
This vacuums tables after 400 rows 40% of the table has been updated or deleted, and analyzes after 100 rows 30% of the table has been inserted, updated or deleted. The above configuration also lets me set my max_fsm_pages to 50% of the data pages in the database with confidence that that number won't be overrun, causing database bloat. We are currently testing various settings at OSDL and will have more hard figures on the above soon.

Note that you can also use autovacuum to set the Vacuum Delay options, instead of setting them in PostgreSQL.conf. Vacuum Delay can be vitally important for systems with very large tables or indexes; otherwise an untimely autovacuum call can halt an important operation.


There are, unfortunately, a couple of serious limitations to 8.0's autovacuum which will hopefully be eliminated in future versions:

* Has no long-term memory: autovacuum forgets all activity tracking when you restart the database. So if you do regular restarts, you should do a full-database VACUUM ANALYZE right before or after.
* Pays no attention to how busy the server is: there were plans to check system load before vacuuming, but that's not a current feature. So if you have extreme load peaks, autovacuum may not be for you.
Comments