The idea behind the SQL-dump method is to generate a text file with SQL
commands that, when fed back to the server, will recreate the
database in the same state as it was at the time of the dump.
PostgreSQL provides the utility program
pg_dump for this purpose. The basic usage of this
command is:
pg_dump dbname > outfile
As you see, pg_dump writes its results to the
standard output. We will see below how this can be useful.
pg_dump is a regular PostgreSQL
client application (albeit a particularly clever one). This means
that you can do this backup procedure from any remote host that has
access to the database. But remember that pg_dump
does not operate with special permissions. In particular, it must
have read access to all tables that you want to back up, so in
practice you almost always have to run it as a database superuser.
To specify which database server pg_dump should
contact, use the command line options -h
host and -p port. The
default host is the local host or whatever your
PGHOST environment variable specifies. Similarly,
the default port is indicated by the PGPORT
environment variable or, failing that, by the compiled-in default.
(Conveniently, the server will normally have the same compiled-in
default.)
As any other PostgreSQL client application,
pg_dump will by default connect with the database
user name that is equal to the current operating system user name. To override
this, either specify the -U option or set the
environment variable PGUSER. Remember that
pg_dump connections are subject to the normal
client authentication mechanisms (which are described in Chapter 20).
Dumps created by pg_dump are internally consistent,
that is, updates to the database while pg_dump is
running will not be in the dump. pg_dump does not
block other operations on the database while it is working.
(Exceptions are those operations that need to operate with an
exclusive lock, such as VACUUM FULL.)
Important: When your database schema relies on OIDs (for instance as foreign
keys) you must instruct pg_dump to dump the OIDs
as well. To do this, use the -o command line
option.
The text files created by pg_dump are intended to
be read in by the psql program. The
general command form to restore a dump is
psql dbname < infile
where infile is what
you used as outfile
for the pg_dump command. The database dbname will not be created by this
command, you must create it yourself from template0 before executing
psql (e.g., with createdb -T template0
dbname).
psql supports options similar to pg_dump
for controlling the database server location and the user name. See
psql's reference page for more information.
Not only must the target database already exist before starting to
run the restore, but so must all the users who own objects in the
dumped database or were granted permissions on the objects. If they
do not, then the restore will fail to recreate the objects with the
original ownership and/or permissions. (Sometimes this is what you want,
but usually it is not.)
Once restored, it is wise to run ANALYZE on each database so the optimizer has
useful statistics. An easy way to do this is to run
vacuumdb -a -z to
VACUUM ANALYZE all databases; this is equivalent to
running VACUUM ANALYZE manually.
The ability of pg_dump and psql to
write to or read from pipes makes it possible to dump a database
directly from one server to another; for example:
pg_dump -h host1 dbname | psql -h host2 dbname
Important: The dumps produced by pg_dump are relative to
template0. This means that any languages, procedures,
etc. added to template1 will also be dumped by
pg_dump. As a result, when restoring, if you are
using a customized template1, you must create the
empty database from template0, as in the example
above.
For advice on how to load large amounts of data into
PostgreSQL efficiently, refer to Section 13.4.
The above mechanism is cumbersome and inappropriate when backing
up an entire database cluster. For this reason the pg_dumpall program is provided.
pg_dumpall backs up each database in a given
cluster, and also preserves cluster-wide data such as users and
groups. The basic usage of this command is:
pg_dumpall > outfile
The resulting dump can be restored with psql:
psql -f infile postgres
(Actually, you can specify any existing database name to start from,
but if you are reloading in an empty cluster then postgres
should generally be used.) It is always necessary to have
database superuser access when restoring a pg_dumpall
dump, as that is required to restore the user and group information.
Since PostgreSQL allows tables larger
than the maximum file size on your system, it can be problematic
to dump such a table to a file, since the resulting file will likely
be larger than the maximum size allowed by your system. Since
pg_dump can write to the standard output, you can
just use standard Unix tools to work around this possible problem.