Update: My apologies for omitting Paris, which I have just added. 2012-09-02Lots O’ Conferences
Archive for August 2012
I previously explained the ability to set Postgres configuration variables at different levels. In this blog entry, I would like to explain how changes at the top level, postgresql.conf, propagate to running sessions.
The postgresql.conf file is usually stored at the top of the PGDATA directory, though it can be relocated. The most simplistic way to modify the file is to open it with a text editor. (Tools like pgAdmin allow file modifications via a GUI.)
Once the file has been modified, you must signal that the configuration file should be reloaded and your modifications applied. There are three methods to signal this:
- send a SIGHUP signal the postmaster process, or SIGHUP only individual backends
- run pg_ctl reload from the command-line
- call the SQL function pg_reload_conf()
My conference schedule has solidified and I am presenting at events in Philadelphia, Chicago, Moscow, and Prague during the next two months.Upcoming Conferences
All Postgres servers support TCP/IP connections, including localhost connections that allow clients to connect to servers on the same machine. Unix-like operating systems also support local or Unix-domain socket connections. These connections do not use the TCP/IP stack but rather a more efficient stack for local connections. (I previously showed that Unix-domain socket communication is measurably faster.)
Unix-domain socket connections require a socket file in the local file system. These are not normal files but more like entry points to listening servers. By default, Postgres places these socket files in the /tmp directory:
Unix Domain Socket Locationsrwxrwxrwx 1 postgres postgres 0 Jul 30 20:27 .s.PGSQL.5432=
- changes require application modifications
- changes are hard to deploy and customize
- central connection parameter management is difficult
Libpq does support the setting of connection parameters via environment variables, and this often avoids many of the down-sides of hard-coding database connection parameters. (I already covered the importance of libpq as the common Postgres connection library used by all client interfaces except jdbc.)
However, there is another libpq feature that makes connection parameter sharing even easier: pg_service.conf. This file allows you to name a group of connection parameters and reference the parameters by specifying the name when connecting. By placing this file in a network storage device, you can easily centrally-control application connections. Change the file, and every new database connection sees the changes. While you can store passwords in pg_service.conf, everyone who can access the file can see those passwords, so you would probably be better off using libpq’s password file.Centralizing Connection Parameters
Monitoring Postgres from the Command Linepostgres 2544 2543 0 10:47 ? 00:00:00 /u/pgsql/bin/postmaster -i postgres 2546 2544 0 10:47 ? 00:00:00 postgres: checkpointer process postgres 2547 2544 0 10:47 ? 00:00:00 postgres: writer process postgres 2548 2544 0 10:47 ? 00:00:00 postgres: wal writer process postgres 2558 2544 0 10:47 ? 00:00:01 postgres: autovacuum launcher process postgres 2575 2544 0 10:47 ? 00:00:02 postgres: stats collector process
During a recent conference, it was pointed out to me that there are contradictory recommendations about the use of caching on storage devices. For magnetic disk drives, it is pretty clear — if the cache is volatile (which it usually is on magnetic disk drives), switch the drive to write-through mode so all writes go to durable storage, i.e. the magnetic platters. The Postgres manual has a detailed description of how to do this.
For solid-state drives (SSDs), things are more complicated. If the cache is volatile, you should switch it to write-through mode. but be aware that this will slow writes and decrease the life-time of the drive, but it is the only durable solution. If the cache is non-volatile, you definitely don’t want to do this as this will give you all the negative aspects listed above, and not improve durability, which is already guaranteed because the cache is non-volatile.
So, turn the drive write cache? Turn it off? Hopefully this helps explain which one is appropriate.Turn off the drive write cache! Turn on the cache! Huh?
Have you ever run benchmarks that topped out at 250 Transactions per Second (TPS)? What about 120 or 90? If so, you might have been stumped about why you consistently hit these numbers and could never go higher.
These numbers are common rotational latency numbers for hard disks. Specifically, 250, 120, and 90 represent rotational latencies for 15k, 7.2k, and 5.4k RPM drives, respectively. Basically, if you have no durable cache between the CPU and your magnetic disk drives, this is the fastest you can durably perform write transactions. Of course, you can reduce durability requirements, and sometimes writes can be combined into a single disk write, but in general, disk drive rotational latency can be a significant limiting factor for any production server that lacks a durable cache layer, e.g. BBU (Battery-Backed Unit)>, non-volatile drive cache. Having a durable cache layer can improve write transaction performance 10 or 100-times.250 TPS