One of the most frequent topics I see related to PostgreSQL in forums and on support lines is being able to establish simple remote connections to the database server. While all the elements are well documented, I haven’t always seen good explainers related to how these elements related to one another nor systematic walkthroughs related to troubleshooting problems with PostgreSQL connectivity.
This is a shame because it truly is the world’s most advanced open source database and really the only one that is well suited for enterprise business class applications. This article will make a concerted effort to be a primer that describes the different elements of PostgreSQL networking and security configuration and how they relate to one another. What this article will not do is make you a security expert generally, or with PostgreSQL, nor will it give you a one size fits all formula that can be applied to any situation. If you are unfamiliar with information technology security practices or how the information presented here may apply to your specific situation, I would strongly recommend that you engage a professional to address your needs.
This article will focus on PostgreSQL versions in the 8.X series and is written specifically with versions 8.3 and 8.4 in mind (please be sure to check the xTuple Supported PostgreSQL Versions Page for more information about which version is right for you). Other versions may work, but there may be some minor differences as well. This article will also assume that you have basic computer administration skills, enough so that looking at a configuration file or two isn’t a daunting thought. PostgreSQL is a simple to manage database compared others such as Oracle, but it is an enterprise class database with some complexity.
The 40,000 Foot, High Level Overview
When a client application makes an attempt to connect to a PostgreSQL instance there is a checklist of sorts which must be satisfied before an application can make use of a database in the following order (conceptually rather than technically):
- Should the PostgreSQL instance accept network connections? If so on which network interface(s)/IP address(es) & port?
- May remote computers talk to a PostgreSQL instance over the network? Similar to, but fundamentally different than point 1.
- Assuming the simple answers to 1 and 2 are ‘yes’, should your specific computer and network be allowed to talk to the PostgreSQL instance with the database you’ve asked for, using the user you want to login as, using the protocol that you’ve asked to use and verifying that you are who you say you are in the correct fashion?
- Do you have permission to use the databases and the PostgreSQL instances the way you have requested?
Each one of the points above is handled by some aspect of PostgreSQL configuration with the exception of point number 2 which is a reference to an external firewall or similar application/device. To answer these questions PostgreSQL will look at its currently running configuration and make a determination on whether or not you’ve satisfied each point along the way. To overlay the configuration points with the checklist above I might re-write it thus:
- Should the PostgreSQL instance accept network connections? If so on which network interface(s)/IP address(es) & port? (configured in postgresql.conf file)
- May remote computers talk to a PostgreSQL instance over the network? Similar to, but fundamentally different than point 1. (firewall configuration external to PostgreSQL)
- Assuming the simple answers to 1 and 2 are ‘yes’, should your specific computer and network be allowed to talk to the PostgreSQL instance with the database you’ve asked for, using the user you want to login as, using the protocol that you’ve asked to use, and verifying that you are who you say you are in the correct fashion? (configured in pg_hba.conf file)
- Do you have permission to use the databases and the PostgreSQL instances the way you have requested? (configured using PostgreSQL database roles within the instance)
There absolutely can be more to it than this, but these are the basics and satisfying these conditions will generally get you up and running with a networked PostgreSQL instance. So from here forward we’ll dive into some of the specifics of this configuration.
PostgreSQL Configuration Files
There are two files that specifically deal with the ability to connect to a PostgreSQL instance over the network. Assuming PostgreSQL has been compiled normally, they are:
- postgresql.conf: The main PostgreSQL configuration parameters file. Generally speaking and for our purposes, this file determines global and instance-wide configurations for how the database operates on your system.
- pg_hba.conf: This is the PostgreSQL Host Based Authentication file. Whereas the postgresql.conf determines how your server instance works, this configuration file determines who can connect to the databases in that instance and from which networks or computers they may connect from. This is a bit of a simplification, but it’s a good way to think about this file.
The default location of these files can vary depending on how your PostgreSQL instance was compiled or packaged and which platform you’re running. There are also ways to change the postgresql.conf location at server runtime. Because of the wide variety of locations I can’t give you a location where you’ll find these files. For that you’ll have to rely on the documentation for your specific instance or simply search for them. (Yes, even the names of these files can be changed, but that’s pretty rare in my experience!)
postgresql.conf
The postgresql.conf file controls many settings related to the operation of the server. The formal documentation for this file can be found at: http://www.postgresql.org/docs/current/static/runtime-config.html. For this discussion, we’ll look at a section labeled ‘CONNECTIONS AND AUTHENTICATION’. Below is an example of this section from an 8.4.1 instance:
#------------------------------------------------------------------------------ # CONNECTIONS AND AUTHENTICATION #------------------------------------------------------------------------------ # - Connection Settings - listen_addresses = '*' # what IP address(es) to listen on; # comma-separated list of addresses; # defaults to 'localhost', '*' = all # (change requires restart) port = 5432 # (change requires restart) max_connections = 100 # (change requires restart) # Note: Increasing max_connections costs ~400 bytes of shared memory per # connection slot, plus lock space (see max_locks_per_transaction). #superuser_reserved_connections = 3 # (change requires restart) #unix_socket_directory = '' # (change requires restart) #unix_socket_group = '' # (change requires restart) #unix_socket_permissions = 0777 # begin with 0 to use octal notation # (change requires restart) #bonjour_name = '' # defaults to the computer name # (change requires restart) # - Security and Authentication - #authentication_timeout = 1min # 1s-600s ssl = on # (change requires restart) ssl_ciphers = 'ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH' # allowed SSL ciphers # (change requires restart) #password_encryption = on #db_user_namespace = off # Kerberos and GSSAPI #krb_server_keyfile = '' #krb_srvname = 'postgres' # (Kerberos only) #krb_caseins_users = off # - TCP Keepalives - # see "man 7 tcp" for details #tcp_keepalives_idle = 0 # TCP_KEEPIDLE, in seconds; # 0 selects the system default #tcp_keepalives_interval = 0 # TCP_KEEPINTVL, in seconds; # 0 selects the system default #tcp_keepalives_count = 0 # TCP_KEEPCNT; # 0 selects the system default
The most important settings for basic TCP/IP connectivity are the listen_address and port settings. By default, the listen_address setting is configured to ‘localhost’, usually meaning that PostgreSQL is only listening on the loopback address (127.0.0.1) which effectively means that you can only connect to the PostgreSQL instance from the same computer that the instance is installed on. If you want remote computers to be able to connect to the instance, you must change this setting. In the most basic circumstances this would simply be the IP address of the computer that the PostgreSQL instance is installed on or ‘*’, which means listen on all IP addresses that are configured on the computer where PostgreSQL is installed. If the server has more than one IP address or more than one NIC (and therefore more than one IP address), this setting gives you to flexibility to pick and choose. The next setting is the port setting. The portsetting determines which TCP port the PostgreSQL instance will listen for remote connections on. The default value of 5432 is usually sufficient and is the recognized port for PostgreSQL instances to be listening on. The times when it would make sense to change the port number are primarily when you have more than one PostgreSQL instance on the same server.
Some may advocate that you should change the default port number setting on internet accessible servers in order ‘hide’ the instance from those looking to exploit the server. I personally don’t find this a compelling argument because firstly security through obscurity, in this case making the listening port obscure, is not security and second there are any number of tools that can scan a computer on the internet for listening ports and determine what program is listening on those open ports. The fact is that if you must have your PostgreSQL instance available to arbitrary computers on the open internet, you must use other techniques to securely offer the service other than simple port renumbering; to be fair, port renumbering won’t hurt, just keep your expectations about the technique in check.
This should be sufficient to get the PostgreSQL instance on the network and listening for connections. One topic we haven’t covered and is outside the scope of this article to cover in depth is that you should also be sure that if you have a firewall, be it a software firewall on the computer that will run PostgreSQL or an external firewall device, that will also govern connectivity to the PostgreSQL server, this must be configured to allow TCP/IP connections to both the listen_address and the port settings that you identified in the postgresql.conf file. Some Linux distributions can be pretty tight about this especially if you don’t use their pre-built packages, so be aware that connectivity can be blocked by external factors outside of PostgreSQL configuration.
pg_hba.conf
The pg_hba.conf file controls configuration related to authentication, the process of making sure you are who you say you are. PostgreSQL is somewhat unique amongst RDBMSs in that others will not have this intermediate level for defining connectivity permissions, relying either on networking elements (like firewalls, VLANS, etc) and simply on the internal database user constructs which is why many are thrown off by the pg_hba.conf file. The formal documentation for this file can be found at: http://www.postgresql.org/docs/current/static/auth-pg-hba-conf.html. The way the pg_hba.conf file enforces security is that it sets a number of hurdles that must be overcome, similar to firewall rules, based on what you are trying to do and from where you are trying to do it. The hurdles that each pg_hba.conf entry must define can be summarized as follows:
- The type of connection being allowed (TYPE). This setting sets the ground rules for the nature of the connection. It may specify that the entry allows you to connect from the same computer that the PostgreSQL instance is installed on using UNIX sockets (‘local’) or that you can connect from other computers but you have to use SSL to encrypt the connection (‘hostssl’).
- Which database does the entry allow you to access (DATABASE). This setting determines which specific database(s) the entry will allow connections to. This can be a single database name, a comma separated list of database names, ‘all’ for all databases in the instance or even a file whose contents are a list of database names allowed by this entry.
- Which database users are allowed by this entry (USER). Similar to the database setting, this setting defines the users that are allowed to connect by this entry. Just like the database names there are multiple ways you can define this setting, it can be a single name, a comma separated list of user names, a file containing a list of user names, or the members of a database role. When we speak of user in this context we mean PostgreSQL database roles, not OS users. Also remember this is only one of two places where a user must have permission to connect; inside of the database (item 4 from our 40,000 foot over view above) the user must also have the ‘login’ privilege in addition to pg_hba.conf permissions.
- Which networks of origin are allowed to connect (CIDR-ADDRESS). This setting allows us to say that we will only trust connections coming from specific computers or specific networks. This is really useful in situations where all the client IPs are either static (i.e. another server like a web server) or are easily defined (like on a private network without NAT). Of course you can set this to allow connections from all computers and all networks which can be necessary if you must, say, connect to a Virtual Private Server over the internet from your home DSL or Cable connection.
- How must you prove you are who you say you are (METHOD). The method setting allows you to define how the ‘USER’ that is allowed to connect must prove they are who they say they are. This can vary from the least secure (‘trust’) which means they don’t have to prove it at all, to the most typical (‘md5’) which uses password hashes stored in the database, all the way to more complex mechanisms such as LDAP, GSSAPI, client side certificates (8.4) and even Active Directory (with some configuration). You can even set up a method with ‘reject’ which means that if the other parameters are matched you specifically want to stop the connection rather than allow it. There are more and some have additional connection parameters that go beyond the scope of a basic configuration that are set here, too. We’ll stick with ‘md5’ for this article which won’t include these other METHOD parameters.
It is worth nothing that between PostgreSQL versions 8.3 and 8.4 that there were some pretty significant changes (http://www.postgresql.org/docs/8.4/static/release-8-4.html#RELEASE-8-4-PG-HBA-CONF) for more advanced configurations, but for a basic configuration using MD5 method connections, these changes won’t be very relevant in that it won’t change what we do here for pre-8.4 versions or the 8.4 version.
The order of the rules in pg_hba.conf are important. The first rule that matches any given connection attempt wins. So when you’re troubleshooting connectivity problems, you must not only look to see that you have an appropriate pg_hba.conf entry for your specific situation, but that you also don’t have a conflicting earlier entry in the file that would deny access.
Below are some example pg_hba.conf entries:
# TYPE DATABASE USER CIDR-ADDRESS METHOD # "local" is for Unix domain socket connections only local all all md5 # IPv4 local connections: host all all 127.0.0.1/32 md5 hostssl xtuple_data fred 192.168.0.0/24 md5 hostssl all all 10.0.0.0/8 md5 host all all 0.0.0.0/0 reject # IPv6 local connections: host all all ::1/128 md5
You can see the layout of the different options that were discussed earlier in the example; TYPE, DATABASE, USER, CIDR-ADDRESS, and METHOD all on one line separated by whitespace. While this is not a comprehensive list of scenarios that can crop up in pg_hba.conf (see the formal docs for that), it should be enough to cover many, if not most, circumstances. Let’s take a look at this line by line.
The first entry: local all all <blank> md5. This ‘local’ value for TYPE means connections made locally using UNIX sockets. DATABASE is set to ‘all’ meaning this rule matches all databases in the PostgreSQL instance. USER is also set to ‘all’ meaning that this rule applies to all database users attempting to connect. There is no CIDR-ADDRESS value since UNIX sockets don’t use IP networking. Finally METHOD is ‘md5’ which means that users will authenticate with passwords stored in the database as part of the database user definition, rather than, say, some external method like LDAP. So if a user connects to the instance where this pg_hba.conf file is running, and they are using a UNIX sockets based connection from the same server where the PostgreSQL instance is running, this rule will apply to them and allow them to connect so long as they supply a valid password. Even if there was another rule for ‘local’ later in the file, it would never apply since this rule applies to all database and all users using the ‘local’ method (first matching rule wins).
The second entry: host all all 127.0.0.1/32 md5. This entry applies to all databases and all users just like the first line, but here the connection TYPE is ‘host’, which means that this is expecting a connection over the network. Even so, this will only allow connections to the same computer that PostgreSQL is installed on because the CIDR_ADDRESS parameter is 127.0.0.1/32 which is the loopback address in the networking stack (for more about CIDR network addresses check this link out); remote connections cannot be made to 127.0.0.1 so it must be local. And, again just like the first example, the user must supply a password which will be checked against the database user set up in the PostgreSQL instance due to the METHOD of ‘md5’. So the only difference is that the first example uses UNIX sockets to connect to the instance and this line allows the local computer to connect to the PostgreSQL instance using TCP/IP.
The third entry: hostssl xtuple_data fred 192.168.0.0/24 md5. This time the TYPE is ‘hostssl’ which means that connections matching this rule must be encrypted SSL connections through the network. Unlike our previous two examples, we identify a specific DATABASE (‘xtuple_data’) and a specific database USER (‘fred’) which to grant access. Fred will still need to supply the password stored with his account in the database (METHOD = ‘md5’). Finally, it’s not just enough that Fred uses SSL and only tries to connect to the ‘xtuple_data’ database, he may only do so from computers connected to the 192.168.0.0/24 network as defined by the CIDR-ADDRESS.
The fourth entry: hostssl all all 10.0.0.0/8 md5. By now you should be catching on. This entry lets any database user connect to any database in the PostgreSQL instance using an SSL connection across the network, so long as they are connecting from the 10.0.0.0/8 network and have the right password for the user they are using.
The fifth entry: host all all 0.0.0.0/0 reject. This entry has a different purpose than the others, rather than giving permission to connect to the databases in the PostgreSQL instance, this line identifies who to keep out. By using METHOD ‘reject’ we’re saying that anyone (USER = ‘all’) matching the criteria of a non-SSL connection (TYPE = ‘host’), to any database (DATABSE = ‘all’), using any computer or network (CIDR-ADDRESS = ‘0.0.0.0/0’) will not be allowed to connect. Remember, though, that the order of the rules are important; if this fifth entry were before the second entry in the file, then users connecting from 127.0.0.1/32 would not be able to connect, but that entry is first so people logging in from the loopback address will be able to connect. Another small point is that 0.0.0.0/0 is the catch-all CIDR-ADDRESS for all networks and computers, so all TCP/IP connections will match this CIDR-ADDRESS. Finally, I must admit that I don’t see METHOD ‘reject’ used that often. The reason is that PostgreSQL will default to reject any connection that isn’t matched by one the entries that allow access, so the circumstances that require METHOD ‘reject’ aren't common.
The sixth entry: host all all ::1/128 md5. Lastly all this line does is show that if your system is setup to support IPv6, that pg_hba.conf can manage those connections as well by using standard IPv6 addresses in the CIDR-ADDRESS position. All other options retain their same meaning.
Remember that this is a pretty limited view of an otherwise very rich set of pg_hba.conf options. If you want to stretch out a bit, I strongly recommend taking a look at the formal documentation (http://www.postgresql.org/docs/current/static/auth-pg-hba-conf.html). It also deserves repeating that just because the pg_hba.conf file permits a user to connect to the PostgreSQL instance, that the authorization for that user account inside of the database may still deny the user’s connection attempt: it takes both pg_hba.conf and the user account login privileges to successfully connect to a given PostgreSQL instance.
Finally, when you do change the pg_hba.conf you must reload the database instance before the new configuration will take effect. This can be done with the postgresql program pg_ctl:
pg_ctl reload
from the command line, though you may have to specify the path to PostgreSQL’s bin directory if you don’t already have it in your PATH environment variable.
PostgreSQL Database Authorization and Database Roles
Database permissions and user management cover a broad range of topics and a small book could be written covering the options in depth. These permissions run the gamut from allowing a user to connect to the instance, to defining which tables and views the user can access, to can they insert, delete, update or select them and even defines these permissions down to the individual table column level as of PostgreSQL 8.4. Naturally, for this article we’ll stick to those permissions needed to establish a simple connection to the instance and the database. Links to the formal documentation will be listed in context below.
Database Roles
PostgreSQL ‘roles’ are the heart of PostgreSQL’s user management capabilities. In PostgreSQL a database role can be either a specific user or user group; a database role can be a ‘member’ of another role. From there, authorization for the different activities in the database are granted to the roles. Roles that represent individual users can inherit the permissions assigned to the roles for which that the user role is a member. After installing PostgreSQL, are two roles that are (typically) automatically created: ‘PUBLIC’ and ‘postgres’. The PUBLIC role is a default role of which every other role becomes a member automatically. The ‘postgres’ user (and it might be named differently) is the built-in superuser account; other roles may be granted superuser privileges, but ‘postgres’ is created at installation and is the automatic owner of some of the key databases that are created at install. The formal documentation for roles can be found at: http://www.postgresql.org/docs/8.4/static/user-manag.html.
To create a role that can login, assuming you are logged in as the pre-existing ‘postgres’ user, at an SQL prompt you would type:
CREATE ROLE fred WITH LOGIN INHERIT ENCRYPTED PASSWORD ‘lemmein’;
or optionally from a shell command line you could use the special ‘createuser’ script found in the ‘bin’ directory of the of the PostgreSQL instance installation. Then the example would look like:
createuser --username postgres --login --encrypted --pwprompt --no-createrole --no-createdb --no-superuser --inherit fred
when you hit enter, the createuser script would prompt you for the new user’s password and then again to confirm it. The createuser script is just a wrapper for the SQL commands in the CREATE ROLE example, except you must be more specific or the script will ask you what the correct setting will be. More on the createuser script can be found at this link: http://www.postgresql.org/docs/8.4/static/app-createuser.html
What we said with both of these commands is create a new role with the name ‘fred’, and give that role the right to login to the instance (LOGIN). We want this new role to automatically inherit the authorizations of the other roles of which it might be member (INHERIT), and we want to give it an md5 hashed password of ‘lemmein’ (ENCRYPTED PASSWORD). Remember that the LOGIN right at this stage is not enough to let the user connect to the PostgreSQL instance by itself, the user still must meet the criteria defined in the pg_hba.conf in order to connect regardless of the LOGIN permission here.
To prevent ‘fred’ from logging into the PostgreSQL instance (without deleting his role), we would issue the follow SQL command at the SQL prompt:
ALTER ROLE fred WITH NOLOGIN;
Database Connect Authorization
Now we’re ready to grant permissions to connect to a specific database in the PostgreSQL instance. In this case we’ll presume you want user ‘fred’ to be able to connect to a database called ‘xtuple_data’. If ‘fred’ was assigned as the owner of the database when it was created, he would already have permission to connect to it; but we’ll assume that this is not the case for the following example. To allow ‘fred’ to connect to database ‘xtuple_data’ you would enter the following SQL commands at the SQL prompt as superuser:
GRANT CONNECT ON DATABASE xtuple_data TO fred;
This is pretty self descriptive in that it does precisely what it sounds like it does: it grants ‘fred’ the right to connect to the database ‘xtuple_data’. And, just as with the LOGIN permission, this grant only matters if first a pg_hba.conf entry is satisfied to allow such a connection. If we wanted to make it so ‘fred’ could no longer connect to this database we would issue the following SQL command at the SQL prompt:
REVOKE CONNECT ON DATABASE xtuple_data FROM fred;
By this point you should have a PostgreSQL instance that you can connect to via the network, the ability to get logged in and the permission to connect to a database. As you can see this is all pretty simple stuff, it just requires an understanding of the flow described in the High Level Overview.
Steven C. Buttgereit is Manager of Consulting & Development Services for Muse Systems, a San Francisco, CA based business systems consultancy and xTuple Solution Provider Partner.