Planet PostgreSQL

Syndicate content
Planet PostgreSQL
Updated: 44 weeks 12 hours ago

Shaun M. Thomas: PG Phriday: A Postgres Persepctive on MongoDB

15 July, 2016 - 23:06

I’ve been almost exclusively a Postgres DBA for a seemingly interminable length of time. While this is good for specializing, nobody wants to be a One-Trick Pony. And aside from the occasional bit of Python to write more advanced tools when Bash isn’t up to the job, it’s All Postgres All The Time. While few things would make me happier, it pays to branch out occasionally.

When NoSQL databases hit the scene a few years ago, I pretty much ignored them wholesale. I read about what they did and how they did it, and while intriguing, none of that seemed particularly better than what Postgres delivered. Cassandra could scale outwards, but has no real concept of NULL, and limits searches to “indexed” columns. Yet sites like Reddit prove how well it can scale when properly leveraged. MongoDB is basically a native JavaScript filesystem, which I ignored because Postgres support of JSON and JSONB effectively deprecated it before it was even on my radar. There are others of course, in CouchDB, Redis, HBase, and a plethora of alternatives.

There’s only so much time in a day though, and being as we use MongoDB for a couple of projects, it only makes sense to investigate how it really works. What can a Postgres user, who eats, sleeps, and breathes Postgres, hope to learn from a NoSQL database? I guess we can start with some simple and basic operations. Can I make a table, fill it with 1-million rows, update a few, delete a few, add an extra index for future searches, and perform a query or two?

Let’s start with the Postgres schema I use for practically everything:

CREATE TABLE sensor_log ( id SERIAL NOT NULL PRIMARY KEY, location VARCHAR NOT NULL, reading BIGINT NOT NULL, reading_date TIMESTAMP NOT NULL );   \timing   INSERT INTO sensor_log (id, location, reading, reading_date) SELECT, % 1000, % 100, CURRENT_DATE - (( * 10) || 's')::INTERVAL FROM generate_series(1, 1000000) s(id);   TIME: 7236.906 ms   CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);   TIME: 804.044 ms

Perhaps the most appropriate question to ask at this junction would be: what is the MongoDB equivalent? Well, as strange as this might sound, there isn’t one. When I described MongoDB as a native JavaScript filesystem, I was being literal. There really are no tables, schemas, or databases as we might think of them. A MongoDB “collection” can be loosely interpreted as a table, in that it stores documents that can be thought of as rows, but that’s where the similarity ends.

A MongoDB collection has no structure at all. Let that sink in a bit. MongoDB collections can not be declared, and as such, literally anything can be stored in one. Check this out:

use mydb   db.myCol.insert( {animal: "cow", says: "moo"} ) db.myCol.insert( {length: 12, seven: [ 8, 9 ]} )   db.myCol.find()   { "_id" : ObjectId("5788fc361b81473a2b3d1998"), "animal" : "cow", "says" : "moo" } { "_id" : ObjectId("5788fc3a1b81473a2b3d1999"), "length" : 12, "seven" : [ 8, 9 ] }

While it’s in our best interests to keep similar documents restricted to certain collections, there’s no kind of constraint that enforces this. This makes it easy to keep tangentially related objects together, with dozens or hundreds of non-contiguous attributes, so long as there is some common expected core. Yet, we don’t have to. As a consequence, we can’t actually declare a predefined structure, required keys, or any other kind of constraint.

In fact, we can learn quite a lot from that previous code block. First of all, the “mydb” database was never defined. Neither was “myCol”, the collection used to store our bogus documents. Simply storing a document made them exist. It might be better to think of MongoDB databases and collections as root folders and subdirectories in a filesystem. When a document is stored, it triggers the creation of any necessary parent elements. Once instantiated, the database and collection will persist after being emptied unless explicitly dropped.

Another thing that’s hard to miss, is the presence of the _id field in our output. All MongoDB documents require one of these, and if we don’t specify one, MongoDB gracefully provides it. Of course, the default MongoDB injects is of limited utility due to its arbitrary nature, but that means we can override that behavior.

With all of that in mind, let’s create the same table and fill it with sample data:

use mydb   var currentDate = new Date(); currentDate.setHours(0,0,0,0) var batch = new Array()   var start = new Date() for (var i = 1; i <= 1000000; i++) { batch[i] = { _id: i, location: i % 1000, reading: i % 100, readingDate: new Date(currentDate - i * 10000) } }   db.sensorLog.insert(batch)   (new Date() - start) / 1000   31.791   start = new Date() db.sensorLog.ensureIndex( { readingDate: 1 } ) (new Date() - start) / 1000   3.2

We can see that for small data like this, Postgres is a bit more time-efficient for allocating new data. If we examine the filesystem, it’s also apparent this fake sensor data requires 448MB on MongoDB, while Postgres stored everything in 93MB. This is mostly because MongoDB manages storage by preallocating large segments under the assumption the empty space will soon be filled.

In any case, we learn even more from all of the above code. One benefit of MongoDB being a JavaScript native engine, is that we can directly manipulate objects using standard JavaScript syntax. It’s also convenient the insert method allows passing document arrays, as inserting them one-by-one was orders of magnitude slower. Unfortunately, that means the mongo shell doesn’t provide automatic timing output like psql provides.

On the other hand, MongoDB allows us to define whole libraries of local functions that would only be valid for the current session, giving us macro potential Postgres users can only imagine. The necessity of wrapping everything in some kind of SQL-valid statement can often be a hindrance.

Of course, a natural implication of using a JavaScript Engine is that object names are case sensitive. This is a given for nearly any language aside from SQL, but some of us will need to note the context change or we can get ourselves in trouble. This is another good reason to always name Postgres database objects in lowercase and use underscores. We don’t want to learn any inconsistent capitalization habits that may adversely affect us in other systems.

Let’s check out a few statements using Postgres and our sensor_log table.

\timing   UPDATE sensor_log SET reading = reading + 1 WHERE reading_date >= CURRENT_DATE - INTERVAL '8 days' AND reading_date < CURRENT_DATE - INTERVAL '7 days';   TIME: 114.671 ms   DELETE FROM sensor_log WHERE reading_date >= CURRENT_DATE - INTERVAL '9 days' AND reading_date < CURRENT_DATE - INTERVAL '8 days';   TIME: 16.248 ms   SELECT COUNT(*) FROM sensor_log;   COUNT -------- 991360 (1 ROW)   TIME: 111.820 ms   SELECT * FROM sensor_log WHERE reading_date < CURRENT_DATE - INTERVAL '2 week' ORDER BY reading_date ASC LIMIT 5 OFFSET 20;   id | location | reading | reading_date --------+----------+---------+--------------------- 999980 | 980 | 80 | 2016-03-21 06:16:40 999979 | 979 | 79 | 2016-03-21 06:16:50 999978 | 978 | 78 | 2016-03-21 06:17:00 999977 | 977 | 77 | 2016-03-21 06:17:10 999976 | 976 | 76 | 2016-03-21 06:17:20 (5 ROWS)   TIME: 1.034 ms

Let’s examine each MongoDB equivalent, with the timings in milliseconds. This first one is a translation of our UPDATE statement:

var currentDate = new Date() currentDate.setHours(0,0,0,0)   start = new Date() db.sensorLog.update({ readingDate: { $gte: new Date(currentDate.getTime() - 8*86400000), $lt: new Date(currentDate.getTime() - 7*86400000) } }, { $inc: { reading: 1 } }, { multi: true } ) (new Date() - start)   WriteResult({ "nMatched" : 8640, "nUpserted" : 0, "nModified" : 8640 }) 77

It’s a bit faster than our Postgres version, but a bit more inconvenient to write. One universal element with MongoDB is the increased complexity of statements. The primary tradeoff for having such convenient access to JavaScript in MongoDB, is that everything must be a JavaScript object. Much like Postgres forces us to wrap everything in a SQL-based statement, MongoDB demands even the simplest query invokes a JavaScript function.

In the case of the update function, the first parameter is the search, and the second is the list of changes we want to make. It’s here where we encounter constructs like $inc to increment the value of a field. There are several of these specialized operators we need for manipulating our data.

The last parameter to update is a series of flags that modify how the update operates. For instance, we enabled multi, which updates all matching rows, overriding the default of updating a single row. That seems fairly nonintuitive to a longtime user of SQL-based systems.

Let’s examine how deletes work:

start = new Date() db.sensorLog.remove({ readingDate: { $gte: new Date(currentDate.getTime() - 9*86400000), $lt: new Date(currentDate.getTime() - 8*86400000) } } ) (new Date() - start)   WriteResult({ "nRemoved" : 8640 }) 145

While the update was a bit faster than Postgres, the delete was much slower. Beyond that, this is very similar to the update we did earlier. Often in MongoDB, the first parameter to a function will be some kind of query in the form of a JSON object. We should also point out that the default operation of the remove function is to act on all matches, the exact opposite of how update works. There is an argument to only remove the first match, but we didn’t use it here.

Let’s keep going and check out how queries themselves work:

start = new Date() db.sensorLog.count() (new Date() - start)   2   start = new Date() db.sensorLog.find({ readingDate: { $lt: new Date(currentDate.getTime() - 14*86400000) } } ).sort({readingDate: 1}).skip(20).limit(5) (new Date() - start)   { "_id" : 999980, "location" : 980, "reading" : 80, "readingDate" : ISODate("2016-03-21T11:16:40Z") } { "_id" : 999979, "location" : 979, "reading" : 79, "readingDate" : ISODate("2016-03-21T11:16:50Z") } { "_id" : 999978, "location" : 978, "reading" : 78, "readingDate" : ISODate("2016-03-21T11:17:00Z") } { "_id" : 999977, "location" : 977, "reading" : 77, "readingDate" : ISODate("2016-03-21T11:17:10Z") } { "_id" : 999976, "location" : 976, "reading" : 76, "readingDate" : ISODate("2016-03-21T11:17:20Z") }   2

Obtaining counts is refreshingly simple. One of the cool things about functional approaches like this, is that operations are easily chainable. We can either use the count function directly, or tack it to the end to return the total matches for a long query. We can see such combinations in the second query with the inclusion of skip and limit where Postgres would use OFFSET and LIMIT.

There are a lot of differences here. Diverse tools are a great resource for varying project requirements. It should be fairly clear that MongoDB is geared toward tight integration into JavaScript projects. The incredibly loose document structure requirements make it a perfect target for storing polymorphic data that consists of highly varying fields or otherwise convoluted structure. Usually we would prefer to abstract such things into a relational schema suitable for a standard database like Postgres. Yet in the case of highly nested object representations, that might not be practical.

Despite this, there is some small resemblance between Postgres and MongoDB. Mongo even provides a mapping to relate similar concepts. While multiple methodologies have a suitable equivalent, others do not. We already mentioned that collections can be treated like tables, but since their structure is dictated by their contents, that’s not strictly true.

Other concepts like JOIN didn’t even exist until the latest MongoDB incarnation of 3.2. Even then, that functionality is only available during aggregation. The only other alternative is to embed nested queries within a foreach loop of a base result set. There’s actually a good reason for that, too. MongoDB is a document store, and as such, one could argue that elements we’re trying to look up are essential attributes of the underlying document. This is a world that operates in the land of denormalization, where doing otherwise is an impediment to the platform.

There’s always a bit of a “not quite” caveat when comparing elements between these two engines. I certainly learned a lot while digging around in a default MongoDB instance. While some features were cool and extremely convenient, others frustrated and confounded me. It’s certainly a different world over there in NoSQL land, yet the need for it is undeniable. This and many other reasons contributed to JSON and JSONB becoming integral parts of Postgres. With them, we get the best of both worlds; loose structure amorphic documents and relational logic to cement relationships and enforce at least some integrity within our datasets.

It’s a bit of a foreign concept and technically breaks normalization rules, but I’m willing to consider the potential benefits. Rules are, after all, made to be broken.

Ivan Lezhnjov IV: Upgrading Ubuntu LTS and PostgreSQL

15 July, 2016 - 15:57

If your PostgreSQL instance is running on an Ubuntu LTS system that you need to upgrade to the most recent release, say from precise to trusty – because, well, sooner or later you must – you need to consider what is going to happen to your database.

The upgrade process described in this article is similar to what you would have to do if you were upgrading from Trusty to Xenial, the newest Ubuntu LTS release.

Ubuntu attempts to make the process of upgrading to the newest distribution release easy and hassle-free. In fact, it is the case in many situations but not when there is PostgreSQL running in the system. If you just go ahead and try to run do-release-upgrade command, which is the officially recommended way to upgrade your Ubuntu LTS distribution to the newest release, you will end up seeing this error message: 


Get:72 trusty-backports/universe Translation-en [34.6 kB]
Fetched 23.3 MB in 6s (0 B/s)

Checking package manager
Reading package lists... Done
Building dependency tree
Reading state information... Done
Building data structures... Done
Calculating the changes
Calculating the changes
Could not calculate the upgrade

An unresolvable problem occurred while calculating the upgrade.

This can be caused by:

* Upgrading to a pre-release version of Ubuntu
* Running the current pre-release version of Ubuntu
* Unofficial software packages not provided by Ubuntu

If none of this applies, then please report this bug using the
command 'ubuntu-bug ubuntu-release-upgrader-core' in a terminal.

Not very helpful, is it? Well, clearly you need to troubleshoot. Where do you start?

Examine /var/log/dist-upgrade/main.log and look for ERROR messages. This is what you will see: 

2016-02-09 07:19:01,392 DEBUG blacklist expr '^postgresql-.*[0-9]\.[0-9].*' matches 'postgresql-plperl-9.3'
2016-02-09 07:19:01,393 DEBUG The package 'postgresql-plperl-9.3' is marked for removal but it's in the removal blacklist
2016-02-09 07:19:01,462 ERROR Dist-upgrade failed: 'The package 'postgresql-plperl-9.3' is marked for removal but it is in the removal blacklist.'

That's more specific. Basically, it says if any postgresql-* package is installed abort the mission. This means that what was supposed to be an easy and hassle-free, one-command-only upgrade has become your duty to figure out how to do it. Yeah, the promise of simplicity that Ubuntu broadcasts far and wide in its marketing campaigns clearly has its terms and conditions that may apply. Anyway, roll up your sleeves. We gotta do some manual work here.

It is not a problem with do-release-upgrade per se. APT itself is configured on Ubuntu LTS to not do anything with PostgreSQL packages, like, for example, automatically removing postgresql-* packages:

// File installed by postgresql-common. Currently not updated automatically,
// but might be in future releases.
// We mark all PostgreSQL packages as NeverAutoRemove because otherwise apt
// would remove the old postgresql-x.y package when the "postgresql" meta
// package changes its dependencies to a new version, rendering the old
// database cluster inaccessible. As access to the cluster might depend on
// other modules (like datatypes), we use a pretty wide pattern here. We might
// tighten this to match only actually used PostgreSQL versions in the future.


It's a good thing, if you think about it. You don't want a OS upgrade tool, or even a package manager to mess with your database.

So, before you even attempt to run do-release-upgrade and hope it lives up to Ubuntu's promise of making your life easier, you need to plan upgrade of your PostgreSQL. As a new Ubuntu LTS release will most likely mean a new version of PostgreSQL, you need to carefully put together a plan for the upgrade.

It means you should really start by reading release notes of the new PostgreSQL version (available on the main page of and see if you will need to do anything special to make sure your data isn't lost during the upgrade, and that any other requirement is met.

Once you have an idea of what's coming in the new version and what actions you may need to take to ensure that your upgrade is completed successfully, you can start working on getting your PostgreSQL server ready for the do-release-upgrade.

In most simple cases that don't require you to do anything special about PostgreSQL itself, it will come down to a number of steps:

  • Back up your data
  • apt-get remove postgresql-*
  • do-release-upgrade
  • Update PGDG repository configuration (precise->trusty)
  • apt-get update
  • apt-get -s dist-upgrade
  • Restore your data 
Back Up Your Data

You need to back up your cluster/databases and globals. You can read more about backups here. This time we're taking a look at how to use pg_dump. You can also upgrade by using pg_upgrade utility which we'll cover next time. For example, using pg_dumpall something like this will create a backup for globals and for every database in your cluster:


/usr/lib/postgresql/9.3/bin/pg_dumpall -g -Upostgres -p 5434 --file=globals.sql;
/usr/lib/postgresql/9.3/bin/psql -p 5434 -AtU postgres -c "SELECT datname FROM pg_database \
                         WHERE NOT datistemplate"| \
while read f;
  do /usr/lib/postgresql/9.3/bin/pg_dump -Upostgres -p 5434 --format=c --file=$f.sqlc $f;


In this example, we use PostgreSQL 9.3 that was installed on Ubuntu Precise from PGDG repository. Upon successful completion of do-release-upgrade Ubuntu Trusty will have PostgreSQL 9.3 installed as its default PostgreSQL version from official Ubuntu repositories. In our simple test setup all data survived do-release-upgrade just fine and PostgreSQL works as expected, without any problems, after the upgrade of operating system. However, in most real-life scenarios you will probably be making a transition from a lower to a higher PostgreSQL version. In which case you would need to upgrade by dumping and reloading your data.

Remove PostgreSQL Packages

This must sound kinda scary, but the way to successfully run do-release-upgrade is by making sure PostgreSQL doesn't get in its way. You literally need to remove PostgreSQL packages before attempting to run do-release-upgrade.

Just to be safe, make sure you first back up postgresql.conf, pg_hba.conf and other important configuration files. In fact, create a copy of entire /etc/postgresql/9.3/ and /var/lib/postgresql/9.3. This will include data directory, which may be in gigabytes or terabytes. Well, you must have backups of your database anyway. Just make sure the backups are recent, in working state and there's a way to restore them in case APT messes up your data.

Once you're absolutely positive that you have backups of your entire database cluster and configuration, remove all PostgreSQL packages from the system:

$ sudo apt-get remove postgresql-* 

Note that apt-get remove will not delete your configuration, or data. At least it shouldn't. That's what apt-get purge does. The only reason we recommend to take backups is because it is a good practice and you don't really want to rely on APT and learn one day that an obscure bug or a change in its policy of doing the 'remove' action results in data loss.

Run do-release-upgrade

Here's a checklist to go over before running do-release-upgrade:

Ensure /boot has enough disk space.

If it doesn't, do-release-upgrade will fail and display this error message:

"Not enough free disk space
The upgrade has aborted. The upgrade needs a total of 54.2 M free
space on disk '/boot'. Please free at least an additional 13.9 M of
disk space on '/boot'. Empty your trash and remove temporary packages
of former installations using 'sudo apt-get clean'. " 3rd party software may break do-release-upgrade

PostgreSQL is one example. You will see do-release-upgrade notify you that PGDG APT source list was disabled in a typically non-specific fashion: 

"Third party sources disabled
Some third party entries in your sources.list were disabled. You can
re-enable them after the upgrade with the 'software-properties' tool
or your package manager." Open additional SSH session on port 1022 before do-release-upgrade starts upgrading packages

If you lose your current SSH session you can retain access to your system in the second window.

Run do-release-upgrade

When you're ready run: 

$ sudo do-release-upgrade Update PGDG Repository Configuration

Assuming do-release-upgrade went fine and you're using PGDG repositories, you will need to uncomment a line with repository source in /etc/apt/sources.list.d/pgdg.list and change precise-pgdg to trusty-pgdg. Your pgdg.list file would look then like this:

deb trusty-pgdg main Resynchronize The Package Index Files

At this point you should be running Ubuntu Trusty LTS and you need to run:

$ sudo apt-get update Install Updates and Upgrades

You will probably realize that postgresql-* packages in PGDG are newer version than those installed during do-release-upgrade. So, in case you want to make sure you're using the most recent PGDG version of PostgreSQl run: 

$ sudo apt-get -s dist-upgrade

Drop -s after making sure that proposed changes look good.

Restore Your Data

Finally restore your data using the newer version of PostgreSQL. For the sake of example let's assume that you wanted to upgrade to PostgreSQL 9.5 once you're running trusty.

Essentially, to start using the new PostgreSQL version you would:

  • apt-get install postgresql-9.5
  • Restore your globals

 Here an example of you could restore globals:

/usr/lib/postgresql/9.5/bin/psql -U postgres -p 5435 < /home/admin/do-release-upgrade/db-backups/globals.sql 
  • Restore databases:

Consider this example

for i in databaseA databaseB databaseC;do /usr/lib/postgresql/9.5/bin/psql -U postgres -p 5435 -c "CREATE DATABASE $;"; done  for i in databaseA databaseB databaseC;do /usr/lib/postgresql/9.5/bin/pg_restore -U postgres -p 5435 --dbname=$i $i.sqlc;done

Port number may be different on your system and these examples are merely a guideline.

Although it's a complete set of actions required to upgrade your database various factors unique to your setup may introduce other steps that need to be carried out in order to successfully upgrade PostgreSQL. There's no one-size-fits-all upgrade strategy and this post demonstrates just one of the simplest scenarios you may encounter.

If your case is particularly complex to the point where you would rather have us double-check your upgrade plan or actually perform it for you, feel free to contact us and we'll figure out the best path to upgrade your system running PostgreSQL to the latest release.

Stefan Petrea: Analyzing PostgreSQL logs using pgBadger

15 July, 2016 - 10:15
Table of Contents Intro

Most web applications make use of databases. When deploying your app in production, you will need, among other things, monitoring and analysis/reports of log data generated by your application.

This comes in handy when troubleshooting problems, in situations involving bottlenecks in large applications, where we need to find low-performing SQL queries.


This is a simplified diagram that shows the interactions between the main components of a typical web application production environment.

The users make requests to the web application. In order to serve the requests, the web application needs information from a database, in this case, the databased being used is PostgreSQL, so it makes queries to it.

The PostgreSQL database server computes the queries, returns back data, but at the same time, it also writes to log files on disk.

The log files are being rotated on a daily basis. So for each day, there is one log file. The log files start out empty. Each time they are being used again, they're truncated and new log data will be written to them.

User 1 User 2 User 3 PostgreSQL database server Web application PostgreSQL
rotating logs
  • Monday
  • Tuesday
  • Wednesday
  • Thursday
  • Friday
  • Saturday
  • Sunday
    • PostgreSQL logging and log rotation

      Most of the settings described here are documented in the 18.8 Error reporting and logging section of the PostgreSQL manual.

      We're turning on the logging collector, we ensure that we're have %w in the daily log filenames (this is the 0-based weekday represented as a decimal number, with 0 being sunday, 1 is monday and so on).

      Once the logging collector is enabled, the logs are no longer stored in /var/log/postgresql/ , but instead, they will be located at /var/lib/postgresql/9.4/main/pg_log/ .

      We're also aiming to truncate the file on rotation.

      logging_collector = on log_filename = 'postgresql-%w.log' log_file_mode = 0640 log_truncate_on_rotation = on log_rotation_size = 600MB log_min_duration_statement = 0 log_checkpoints = on log_connections = on log_disconnections = on log_duration = on log_lock_waits = on log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h ' Running pgbadger on a schedule

      In order to run pgbadger periodically, we need to schedule it in cron.

      After running, pgBadger 8.1 seems to leave its pidfile behind, but we can easily clean up the pidfile for it.

      We only process/analyze log files that were recently modified (at the latest, 180 minutes ago).

      The -R 5 parameter tells pgbadger that it should only retain the last 5 weeks of binary data and reports.

      And finally, we delete PostgreSQL log files older than 2 days. In combination with the Pg 7-day rotating logs, this means we only keep 2 PostgreSQL log files (since there's one Pg log file per day, given the way we've set up PostgreSQL logging).

      We may have multiple applications using the same database. Since we've set up PostgreSQL to log the application name as well, we can build separate pgbadger reports for each application.

      For example, if you're running the queries in a Bash script, you may use the PGAPPNAME environment variable, which can be passed on by psql when it makes queries to the PostgreSQL server and therefore, the queries will be logged with the application name.

      The application name is usually passed in the connection string at application level. For example, if you're using a language like Python and the psycopg2 PostgreSQL driver, you can pass application_name in the connection string. The Java Pg driver pgjdbc-ng allows you to pass the applicationName parameter in the jdbc connection string.

      #!/bin/bash PG_LOG_PATH=/var/lib/postgresql/9.4/main/pg_log/ PGBADGER="/home/user/bin/pgbadger --anonymize -q -R 5 " ROOT=/home/user/pgbadger check_pgbadger_pid() { if [[ -e '/tmp/' ]]; then PAST_PID=$(cat /tmp/ CURRENT_PIDS=$(pgrep -f pgbadger) if [[ $CURRENT_PIDS =~ $PAST_PID ]]; then echo "[ERROR] pgbadger still running" exit -1 else rm '/tmp/' fi fi } RECENTLY_MODIFIED_LOGS=$(find $PG_LOG_PATH -mmin -180 -name "postgresql*") mkdir -p "$ROOT" "$ROOT/bot" "$ROOT/metrics" "$ROOT/collector" "$ROOT/web" $PGBADGER -l "$ROOT/.pgbadger_last_state_file" -d test1 -X -I -O "$ROOT" $RECENTLY_MODIFIED_LOGS check_pgbadger_pid $PGBADGER -l "$ROOT/bot/.pgbadger_last_state_file" -d test1 -X -I --appname "upstats-bot" -O "$ROOT/bot" $RECENTLY_MODIFIED_LOGS check_pgbadger_pid $PGBADGER -l "$ROOT/metrics/.pgbadger_last_state_file" -d test1 -X -I --appname "upstats-metrics" -O "$ROOT/metrics" $RECENTLY_MODIFIED_LOGS check_pgbadger_pid $PGBADGER -l "$ROOT/collector/.pgbadger_last_state_file" -d test1 -X -I --appname "upstats-collector" -O "$ROOT/collector" $RECENTLY_MODIFIED_LOGS check_pgbadger_pid $PGBADGER -l "$ROOT/web/.pgbadger_last_state_file" -d test1 -X -I --appname "upstats-web" -O "$ROOT/web" $RECENTLY_MODIFIED_LOGS check_pgbadger_pid OLD_LOGS=$(find $PG_LOG_PATH -type f -mtime +2) if [ ! -z "$OLD_LOGS" ]; then rm $OLD_LOGS; fi

      After analyzing the logs, pgBadger will create a number of reports including:

      • Queries that took up the most time
      • Histogram of query times
      • Overall statistics
      • The most frequent waiting queries
      • Queries per user and total duration per user
      • Distribution of queries type per database/application
      • Queries by type (select/insert/update/delete)
      • SQL queries statistics

      We've covered the usage of a log analysis tool called pgBadger in a PostgreSQL setup configured with rotating logs.

Craig Ringer: How to check the lock level taken by operations in PostgreSQL

15 July, 2016 - 04:24

PostgreSQL’s manual is generally pretty clear about the locks taken by various operations – but nothing’s perfect. If you need to, you can also ask PostgreSQL directly.

You can check lock levels trivially with psql or PgAdmin.

For example, to see what lock alter table some_table disable trigger some_trigger; takes:

test=> BEGIN; BEGIN test=> ALTER TABLE some_table DISABLE TRIGGER some_trigger; ALTER TABLE test=> SELECT locktype, mode FROM pg_locks WHERE pid = pg_backend_pid() AND relation = 'some_table'::regclass; locktype | mode ----------+----------------------- relation | ShareRowExclusiveLock (1 row)

That’s for a lock on a table. It’s easy to see from this that we take a SHARE ROW EXCLUSIVE lock, which according to the manual:

… protects a table against concurrent data changes, and is self-exclusive so that only one session can hold it at a time.

Other uses of pg_locks

You can also filter on pg_locks in other ways to see other locks held by a transaction. You’ll have to do some joins on pg_class etc to decode the relation OIDs to names – which is why we really need a pg_stat_locks view in PostgreSQL to make this easier.

People use a variety of canned queries of varying quality for looking to see which processes block others at the moment. With the addition of more detailed lock wait information and pg_blocking_pids() in 9.6, this will get a lot easier, though 9.6 doesn’t add a helper view yet.

What are the virtualxid and transactionid locks?

One important and possibly confusing thing you’ll see in pg_locks is that every transaction holds a special lock on its self, called the virtualxid lock:

test=> BEGIN; BEGIN test=> SELECT * FROM pg_locks WHERE pid = pg_backend_pid(); locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+-----------------+---------+---------- relation | 16386 | 11673 | | | | | | | | 2/3983 | 24250 | AccessShareLock | t | t virtualxid | | | | | 2/3983 | | | | | 2/3983 | 24250 | ExclusiveLock | t | t (2 rows) test=> select '11673'::regclass; regclass ---------- pg_locks (1 row)

As you can see, the relation AccessShareLock is just the lock we take on pg_locks when we query it, so you can ignore that.

The virtualxid lock is special. It’s a exclusive lock on the transaction’s own virtual transaction ID (the “2/3983″, above) that every transaction always holds. No other transaction can ever acquire it while the transaction is running. The purpose of this is to allow one transaction to wait until another transaction commits or rolls back using PostgreSQL’s locking mechanism, and it’s used internally. You don’t normally need to use it yourself, but it’s useful to understand what it is when you see it in pg_locks.

There’s a similar entry for transactions that get a real read/write transaction ID that other transactions can use to wait until they commit or roll back:

test=> select txid_current(); txid_current -------------- 2774 (1 row) test=> SELECT * FROM pg_locks WHERE pid = pg_backend_pid(); locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+-----------------+---------+---------- relation | 16386 | 11673 | | | | | | | | 2/3984 | 24250 | AccessShareLock | t | t virtualxid | | | | | 2/3984 | | | | | 2/3984 | 24250 | ExclusiveLock | t | t transactionid | | | | | | 2774 | | | | 2/3984 | 24250 | ExclusiveLock | t | f (3 rows)

so if you’ve ever wondered what they are, now you know.

You’ll see a transactionid lock wait at when two concurrent transactions try to insert the same key into a unique index (or primary key), among other things. In that case the second transaction must wait until the first commits or rolls back to know whether it should fail with an error or continue to insert.

Hans-Juergen Schoenig: Inheritance – One more reason to love PostgreSQL

14 July, 2016 - 11:31

After doing full-time PostgreSQL consulting for over 16 years now, I actually don’t remember a time without inheritance anymore. Of course things were improved over time, but in my head it has always been there and it has always just worked as expected. After so many years I still love the feature because it offers […]

The post Inheritance – One more reason to love PostgreSQL appeared first on Cybertec - The PostgreSQL Database Company.

Greg Sabino Mullane: Disabling Postgres constraints for pg_dump

14 July, 2016 - 06:46

Photo by Jocelyn Kinghorn

Constraints in Postgres are very powerful and versatile: not only are foreign keys, primary keys, and column uniqueness done internally via constraints, but you may create your own quite easily (at both the column and table level). Most of the time constraints are simply set and forget, but there is one time constraints may become a problem: copying the database using the pg_dump program.

The issue is that constraints are usually added *before* the data is copied to the new table via the COPY command. This means the constraint fires for each added row, to make sure that the row passes the conditions of the constraint. If the data is not valid, however, the COPY will fail, and you will not be able to load the output of your pg_dump into a new database. Further, there may be a non-trivial performance hit doing all that validation. Preventing the constraint from firing may provide a significant speed boost, especially for very large tables with non-trivial constraints.

Let's explore one way to work around the problem of pg_dump failing to work because some of the data is not valid according to the logic of the constraints. While it would be quicker to make some of these changes on the production system itself, corporate inertia, red tape, and the usual DBA paranoia means a better way is to modify a copy of the database instead.

For this example, we will first create a sample "production" database and give it a simple constraint. This constraint is based on a function, to both emulate a specific real-world example we came across for a client recently, and to allow us to easily create a database in which the data is invalid with regards to the constraint:

dropdb test_prod; createdb test_prod pgbench test_prod -i -n creating tables... 100000 of 100000 tuples (100%) done (elapsed 0.82 s, remaining 0.00 s) set primary keys... done. psql test_prod -c 'create function valid_account(int) returns bool language sql immutable as $$ SELECT $1 > 0$$;' CREATE FUNCTION psql test_prod -c 'alter table pgbench_accounts add constraint good_aid check ( valid_account(aid) )' ALTER TABLE

Note that the constraint was added without any problem, as all of the values in the aid column satisfy the function, as each one is greater than zero. Let's tweak the function, such that it no longer represents a valid, up to date constraint on the table in question:

## Verify that the constraint is working - we should get an error: psql test_prod -c 'update pgbench_accounts set aid = -1 where aid = 1' ERROR: new row for relation "pgbench_accounts" violates check constraint "good_aid" DETAIL: Failing row contains (-1, 1, 0, ...). ## Modify the function to disallow account ids under 100. No error is produced! psql test_prod -c 'create or replace function valid_account(int) returns bool language sql volatile as $$ SELECT $1 > 99$$' CREATE FUNCTION ## The error is tripped only when we violate it afresh: psql test_prod -c 'update pgbench_accounts SET aid=125 WHERE aid=125' UPDATE 1 psql test_prod -c 'update pgbench_accounts SET aid=88 WHERE aid=88' ERROR: new row for relation "pgbench_accounts" violates check constraint "good_aid" DETAIL: Failing row contains (88, 1, 0, ...).

The volatility was changed from IMMUTABLE to VOLATILE simply to demonstrate that a function called by a constraint is not bound to any particular volatility, although it *should* always be IMMUTABLE. In this example, it is a moot point, as our function can be immutable and still be "invalid" for some rows in the table. Owing to our function changing its logic, we now have a situation in which a regular pg_dump cannot be done:

dropdb test_upgraded; createdb test_upgraded pg_dump test_prod | psql test_upgraded -q ERROR: new row for relation "pgbench_accounts" violates check constraint "good_aid" DETAIL: Failing row contains (1, 1, 0, ...). CONTEXT: COPY pgbench_accounts, line 1: "1 1 0 " ## Ruh roh!

Time for a workaround. When a constraint is created, it may be declared as NOT VALID, which simply means that it makes no promises about the *existing* data in the table, but will start constraining any data changed from that point forward. Of particular importance is the fact that pg_dump can dump things into three sections, "pre-data", "data", and "post-data". When a normal constraint is dumped, it will go into the pre-data section, and cause the problems seen above when the data is loaded. However, a constraint that has been declared NOT VALID will appear in the post-data section, which will allow the data to load, as it will not be declared until after the "data" section has been loaded in. Thus, our workaround will be to move constraints from the pre-data to the post-data section. First, let's confirm the state of things by making some dumps from the production database:

pg_dump test_prod --section=pre-data -x -f test_prod.pre.sql pg_dump test_prod --section=post-data -x -f ## Confirm that the constraint is in the "pre" section: grep good_aid test*sql test_prod.pre.sql: CONSTRAINT good_aid CHECK (valid_account(aid))

There are a few ways around this constraint issue, but here is one that I like as it makes no changes at all to production, and produces valid SQL files that may be used over and over.

dropdb test_upgraded; createdb test_upgraded ## Note that --schema-only is basically the combination of pre-data and post-data pg_dump test_prod --schema-only | psql test_upgraded -q ## Save a copy so we can restore these to the way we found them later: psql test_upgraded -c "select format('update pg_constraint set convalidated=true where conname=%L and connamespace::regnamespace::text=%L;', \ conname, nspname) from pg_constraint c join pg_namespace n on (n.oid=c.connamespace) \ where contype ='c' and convalidated" -t -o restore_constraints.sql ## Yes, we are updating the system catalogs. Don't Panic! psql test_upgraded -c "update pg_constraint set convalidated=false where contype='c' and convalidated" UPDATE 3 ## Why 3? The information_schema "schema" has two harmless constraints pg_dump test_upgraded --section=pre-data -x -o test_upgraded.pre.sql pg_dump test_upgraded --section=post-data -x -o ## Verify that the constraint has been moved to the "post" section: grep good test*sql test_prod.pre.sql: CONSTRAINT good_aid CHECK (valid_account(aid)) Name: good_aid; Type: CHECK CONSTRAINT; Schema: public; Owner: greg ADD CONSTRAINT good_aid CHECK (valid_account(aid)) NOT VALID; ## Two diffs to show the inline (pre) versus ALTER TABLE (post) constraint creations: $ diff -u1 test_prod.pre.sql test_upgraded.pre.sql --- test_prod.pre.sql 2016-07-04 00:10:06.676766984 -0400 +++ test_upgraded.pre.sql 2016-07-04 00:11:07.978728477 -0400 @@ -54,4 +54,3 @@ abalance integer, - filler character(84), - CONSTRAINT good_aid CHECK (valid_account(aid)) + filler character(84) ) $ diff -u1 --- 2016-07-04 00:11:48.683838577 -0400 +++ 2016-07-04 00:11.57.265797869 -0400 @@ -17,2 +17,10 @@ +-- +-- Name: good_aid; Type: CHECK CONSTRAINT; Schema: public; Owner: greg +-- + +ALTER TABLE pgbench_accounts + ADD CONSTRAINT good_aid CHECK (valid_account(aid)) NOT VALID; + + SET default_tablespace = '';

Now we can simply sandwich our data load between the new pre and post files, and avoid having the constraints interfere with the data load portion at all:

dropdb test_upgraded; createdb test_upgraded psql test_upgraded -q -f test_upgraded.pre.sql pg_dump test_prod --section=data | psql test_upgraded -q psql test_upgraded -q -f ## As the final touch, make all the constraints we changed exactly how each were before: psql test_upgraded -f restore_constraints.sql

A final sanity check is always a good idea, to make sure the two databases are identical, despite our system catalog tweaking:

diff -s <(pg_dump test_prod) <(pg_dump test_upgraded) Files /dev/fd/63 and /dev/fd/62 are identical

Although we declared a goal of having the upgraded database match production as closely as possible, you can always not apply that final restore_constraints.sql file and leave the constraints as NOT VALID, which is a better reflection of the reality of things. It also means you will not have to go through this rigmarole again, as those constraints shall forevermore be put into the post-data section when doing a pg_dump (unless someone runs the ALTER TABLE ... VALIDATE CONSTRAINT ... command!).

While there is no direct way to disable constraints when loading data, using this pre-data to post-data trick can not only boost data load times, but get you out of a potential jam when your data is invalid!

Craig Kerstiens: When to use unstructured datatypes in Postgres–Hstore vs. JSON vs. JSONB

14 July, 2016 - 03:00

Since Postgres started supporting NoSQL (via hstore, json, and jsonb), the question of when to use Postgres in relational mode vs NoSQL mode has come up a lot. Do you entirely abandon traditional table structures, and go with documents all the way? Or do you intermingle both? The answer unsurprisingly is: it depends. Each newer model including hstore, JSON, and JSONB has their ideal use cases. Here we’ll dig deeper into each and see when you should consider using them.


If you exclude XML, this was the first truly unstructured datatype to arrive in Postgres. Hstore arrived way back in Postgres 8.3, before upsert, before streaming replication, and before window functions. Hstore is essentially a key/value store directly in Postgres. With hstore you’re a little more limited in terms of the datatypes you have: you essentially just get strings. You also don’t get any nesting; in short it’s a flat key/value datatype.

The upside of hstore is you don’t have to define any of your keys ahead of time. You can simply insert the record and it’ll save everything. Let’s say you’ve got an example table:

CREATE TABLE products ( id serial PRIMARY KEY, name varchar, attributes hstore );

From here you can insert whatever you want into the attributes column. And then query based on those various keys or values.

INSERT INTO products (name, attributes) VALUES ( 'Geek Love: A Novel', 'author => "Katherine Dunn", pages => 368, category => fiction' ); SELECT name, attributes->'author' as author FROM products WHERE attributes->'category' = 'fiction'

The obvious benefit here is flexibility, but where it really shines is being able to leverage various index types. In particular, a GIN or GiST index will index every key and value within the hstore. This way when you filter on something it’ll use the index if it makes sense to the planner within Postgres.

As hstore isn’t a full document equivalent, it’s a stretch to consider using it as such. If you have relational data as well as some data that may not always exist on a column: it can be a great fit. In the most basic case attributes of a product catalog can be a great candidate. In certain categories such as books you’d have things like whether it’s fiction or not; but in others such as clothes you might have things like size, and color. Having columns for every possible attribute for a product can at times very much be overkill.


When Postgres 9.2 arrived it was well received as the JSON release. Finally, Postgres can now complete against Mongo. (Although the JSON functionality in Postgres 9.2 was probably a little oversold.)

The JSON datatype in Postgres is under the covers still largely just a text field. With the JSON datatype what you do get is validation on it as it comes in though. Postgres does enforce that it’s actually JSON. One small potential benefit of it over JSONB (which we’ll get to next) is that it preserves the indentation of the data coming in. So if you are extremely particular about the formatting of your JSON, or have some need for it in a particular structure, JSON can be useful.

Furthermore, over time Postgres has picked up a number of niceties in the form of functions that can help. So, the question is: should you use JSON? At the end of the day, Postgres’ JSON type simply provides JSON validation on a text field. If you’re storing some form of log data you rarely need to query, JSON can work fine. Because it’s so simple, it will have a lot higher write throughput. For anything more complex, I’d recommend using JSONB, which is covered below.


Finally in Postgres 9.4 we got real and proper JSON in the form of JSONB. The B stands for better. JSONB is a binary representation of JSON, this means it’s compressed and more efficient for storage than just text. It also has a similar plumbing of hstore underneath. In fact, once upon a time there was almost hstore2 and a separate JSON type, but the two converged into the JSONB we have today.

JSONB is largely what you’d expect from a JSON datatype. It allows nested structures, use of basic datatypes, and has a number of built in functions for working with it. Though the best part similar to hstore is the indexing. Creating a GIN index on a JSONB column will create an index on every key and value within that JSON document. That with the ability to nest within the document means JSONB is the superior to hstore in most cases.

That still leaves a bit of question of when to use only JSONB though. If you want a document database, instead of one of the other options out there you could go directly to Postgres. With a package like MassiveJS this can become quite seamless as well But even then, there are some clear examples where going more document heavy does make most sense. Some of the most common examples include:

  • Event tracking data, where you may want to include the payload in the event which might vary
  • Gaming data is especially common, especially where you have single player games and have a changing schema based on the state of the user
  • Tools that integrate multiple data sources, an example here may be a tool that integrates customers databases to Salesforce to Zendesk to something else. The mix of schemas makes doing this in a multitenant fashion more painful than it has to be.

Let’s take a quick look at how the third example might work with JSONB. First lets create a table, and insert some example data:

CREATE TABLE integrations (id UUID, data JSONB); INSERT INTO integrations VALUES ( uuid_generate_v4(), '{ "service": "salesforce", "id": "AC347D212341XR", "email": "", "occurred_at": "8/14/16 11:00:00", "added": { "lead_score": 50 }, "updated": { "updated_at": "8/14/16 11:00:00" } }') INSERT INTO integrations ( uuid_generate_v4(), '{ "service": "zendesk", "email": "", "occurred_at": "8/14/16 10:50:00", "ticket_opened": { "ticket_id": 1234, "ticket_priority": "high" } } ")

In the above case I could easily search for all events that have happened to, then do some action. This could be some form of behavioral analytics such as find users that have done foo then bar, or a simple report such as find me all high scoring leads that have opened a ticket. By adding a GIN index: CREATE INDEX idx_integrations_data ON integrations USING gin(data); all of the data within my JSONB field is automatically indexed.

In conclusion

In most cases JSONB is likely what you want when looking for a NoSQL, schema-less, datatype. Hstore and JSON can have their place as well but it’s less common. More broadly, JSONB isn’t always a fit in every data model. Where you can normalize there are benefits, but if you do have a schema that has a large number of optional columns (such as with event data) or the schema differs based on tenant id then JSONB can be a great fit. In general you want:

  • JSONB - In most cases
  • JSON - If you’re just processing logs, don’t often need to query, and use as more of an audit trail
  • hstore - Can work fine for text based key-value looks, but in general JSONB can skill work great here

Are you using any of Postgres schema-less datatypes? What type of workloads are you using it for? We’d love to hear about them @citusdata.

Joshua Drake: What is good for the community is good for the company (profit is the reward)

13 July, 2016 - 22:25

As the PostgreSQL community continues down its path of world domination I can't help but wonder whether the various PostgreSQL companies are going to survive the changes. Once upon a time there was an undercurrent of understanding that what was good for the community was good for the company. Whatever company that may be. However, over the last few years it seems that has changed. It seems there is more prevalance toward: What is good for the company is good for the community, or in other words, "The goal is profit."

That is a flawed discipline to follow in the Open Source world. A truly beneficial, strong and diverse community has to eliminate that thought entirely. The goal is not profit; profit is the reward.

That isn't to say that profit is bad. That would be stupid. It is profit that allows Command Prompt to sponsor my activities with United States PostgreSQL and Software in the Public Interest. It is to say that my contributions to the community as a whole drive Command Prompt's profit. It is symbiotic; a constant ebb and flow of the relationship between community and commerce.

I would invite other PostgreSQL companies to consider this. I would challenge them to upend their profiteering motive and focus on community building with profit being the reward. The profit will follow. How do you do this? How do you continue to derive profit from community without sacrificing the community or your standing within the community? Here are some practical ideas:

  • If you aren't going to help, offer to find someone that will help. The answer, "Why would I help you" is never appropriate.
  • If you release something to the community, support it as the community would. Otherwise, keep it to yourself (open source or not).
  • Avoid language such as, "It's free, funding for new features or requirements is welcome." It is embarrassing and arrogant. It doesn't provide a solution and has no place on a community support list. If that is your approach contact the person directly. The community isn't your advertising platform. Instead try something like, "It is Free/Open Source, we are open to patches or feedback." 

Lastly, this isn't a post suggesting that you abandon good business practice. I am not suggesting that you shouldn't contact someone if you are looking for funding, only that you contact them directly. I am not suggesting you take losses every month for the community, that is bad for the community and bad for business. I am only suggesting, nay declaring that community works for business when business puts community first.

US PostgreSQL Association: Autonomous employment

13 July, 2016 - 00:02
Did you know that there is not a single major contributor to PostgreSQL that is not paid to be a contributor to the project?

There is nothing wrong with that. I would argue that it is a sign of a healthy project. However, it does come with downsides. The most important one being that no matter how autonomous your employer says that your work is, at any point your priorities are driven by what is best for the company. That means, it is driven by profit. If the work being done is not going to lend itself to the bottom line of the sponsoring company, those priorities may change.

read more

Tomas Vondra: On the benefits of sorted paths

12 July, 2016 - 23:25

I had the pleasure to attend PGDay UK last week – a very nice event, hopefully I’ll have the chance to come back next year. There was plenty of interesting talks, but the one that caught my attention in particular was Performace for queries with grouping by .

I have given a fair number of similar performance-oriented talks in the past, so I know how difficult it is to present benchmark results in a comprehensible and interesting way, and Alexey did a pretty good job, I think. So if you deal with data aggregation (i.e. BI, analytics, or similar workloads) I recommend going through the slides and if you get a chance to attend the talk on some other conference, I highly recommend doing so.

But there’s one point where I disagree with the talk, though. On a number of places the talk suggested that you should generally prefer HashAggregate, because sorts are slow.

I consider this a bit misleading, because an alternative to HashAggregate is GroupAggregate, not Sort. So the recommendation assumes that each GroupAggregate has a nested Sort, but that’s not quite true. GroupAggregate requires sorted input, and an explicit Sort is not the only way to do that – we also have IndexScan and IndexOnlyScan nodes, that eliminate the sort costs and keep the other benefits associated with sorted paths (especially IndexOnlyScan).

Let me demonstrate how (IndexOnlyScan+GroupAggregate) performs compared to both HashAggregate and (Sort+GroupAggregate) – the script I’ve used for the measurements is here. It builds four simple tables, each with 100M rows and different number of groups in the “branch_id” column (determining the size of the hash table). The smallest one has 10k groups

-- table with 10k groups create table t_10000 (branch_id bigint, amount numeric); insert into t_10000 select mod(i, 10000), random() from generate_series(1,100000000) s(i);

and three additional tables have 100k, 1M and 5M groups. Let’s run this simple query aggregating the data:

SELECT branch_id, SUM(amount) FROM t_10000 GROUP BY 1

and then convince the database to use three different plans:

1) HashAggregate SET enable_sort = off; SET enable_hashagg = on; EXPLAIN SELECT branch_id, SUM(amount) FROM t_10000 GROUP BY 1; QUERY PLAN ---------------------------------------------------------------------------- HashAggregate (cost=2136943.00..2137067.99 rows=9999 width=40) Group Key: branch_id -> Seq Scan on t_10000 (cost=0.00..1636943.00 rows=100000000 width=19) (3 rows) 2) GroupAggregate (with a Sort) SET enable_sort = on; SET enable_hashagg = off; EXPLAIN SELECT branch_id, SUM(amount) FROM t_10000 GROUP BY 1; QUERY PLAN ------------------------------------------------------------------------------- GroupAggregate (cost=16975438.38..17725563.37 rows=9999 width=40) Group Key: branch_id -> Sort (cost=16975438.38..17225438.38 rows=100000000 width=19) Sort Key: branch_id -> Seq Scan on t_10000 (cost=0.00..1636943.00 rows=100000000 ...) (5 rows) 3) GroupAggregate (with an IndexOnlyScan) SET enable_sort = on; SET enable_hashagg = off; CREATE INDEX ON t_10000 (branch_id, amount); EXPLAIN SELECT branch_id, SUM(amount) FROM t_10000 GROUP BY 1; QUERY PLAN -------------------------------------------------------------------------- GroupAggregate (cost=0.57..3983129.56 rows=9999 width=40) Group Key: branch_id -> Index Only Scan using t_10000_branch_id_amount_idx on t_10000 (cost=0.57..3483004.57 rows=100000000 width=19) (3 rows) Results

After measuring timings for each plan on all the tables, the results look like this:

For small hash tables (fitting into L3 cache, which is 16MB in this case), HashAggregate path is clearly faster than both sorted paths. But pretty soon GroupAgg+IndexOnlyScan gets just as fast or even faster – this is due to cache efficiency, the main advantage of GroupAggregate. While HashAggregate needs to keep the whole hash table in memory at once, GroupAggregate only needs to keep the last group. And the less memory you use, the more likely it’s to fit that into L3 cache, which is roughly an order of magnitude faster compared to regular RAM (for the L1/L2 caches the difference is even larger).

So although there’s a considerable overhead associated with IndexOnlyScan (for the 10k case it’s about 20% slower than the HashAggregate path), as the hash table grows the L3 cache hit ratio quickly drops and the difference eventually makes the GroupAggregate faster. And eventually even the GroupAggregate+Sort gets on par with the HashAggregate path.

You might argue that your data generally have fairly low number of groups, and thus the hash table will always fit into L3 cache. But consider that the L3 cache is shared by all processes running on the CPU, and also by all parts of the query plan. So although we currently have ~20MB of L3 cache per socket, your query will only get a part of that, and that bit will be shared by all nodes in your (possibly quite complex) query.


While HashAggregate is probably faster than GroupAggregate with an explicit Sort (I’m hesitant to say it’s always the case, though), using GroupAggregate with IndexOnlyScan faster can easily make it much faster than HashAggregate.

Of course, you don’t get to pick the exact plan directly – the planner should do that for you. But you affect the selection process by (a) creating indexes and (b) setting work_mem. Which is why sometimes lower work_mem (and maintenance_work_mem) values result in better performance.

Additional indexes are not free, though – they cost both CPU time (when inserting new data), and disk space. For IndexOnlyScans the disk space requirements may be quite significant because the index needs to include all the columns referenced by the query, and regular IndexScan would not give you the same performance as it generates a lot of random I/O against the table (eliminating all the potential gains).

Another nice feature is the stability of the performance – notice how the HashAggregate timings chance depending on the number of groups, while the GroupAggregate paths perform mostly the same.

Hubert 'depesz' Lubaczewski: Getting list of unique elements in table, per group

12 July, 2016 - 18:12
Today, on irc, someone asked interesting question. Basically she ran a query like: SELECT a, b, c, d, e, f FROM TABLE ORDER BY a then, she processed the query to get, for each a array of unique values of b, c, d, e, and f, and then he inserted it back to database, to […]

Magnus Hagander: Locating the recovery point just before a dropped table

11 July, 2016 - 16:36

A common example when talking about why it's a good thing to be able to do PITR (Point In Time Recovery) is the scenario where somebody or some thing (operator or buggy application) dropped a table, and we want to do a recover to right before the table was dropped, to keep as much valid data as possible.

PostgreSQL comes with nice functionality to decide exactly what point to perform a recovery to, which can be specified at millisecond granularity, or that of an individual transaction. But what if we don't know exactly when the table was dropped? (Exactly being at the level of specific transaction or at least millisecond).

On way to handle that is to "step forward" through the log one transaction at a time until the table is gone. This is obviously very time-consuming.

Assuming that DROP TABLE is not something we do very frequently in our system, we can also use the pg_xlogdump tool to help us find the exact spot to perform the recovery to, in much less time. Unfortunately, the dropping of temporary tables (implicit or explicit) is included in this, so if your application uses a lot of temporary tables this approach will not work out of the box. But for applications without them, it can save a lot of time.

Let's show an example. This assumes you have already set up the system for log archiving, you have a base backup that you have restored, and you have a log archive.

The first thing we do is try to determine the point where a DROP TABLE happened. We can do this by scanning for entries where rows have been deleted from the pg_class table, as this will always happen as part of the drop.

Francesco Canovai: PostgreSQL 9.6: Parallel Sequential Scan

11 July, 2016 - 11:25

For a long time, one of the most known shortcomings of PostgreSQL was the ability to parallelise queries. With the release of version 9.6, this will no longer be an issue. A great job has been done on this subject, starting from the commit 80558c1, the introduction of parallel sequential scan, which we will see in the course of this article.

First, you must take note: the development of this feature has been continuous and some parameters have changed names between a commit and another. This article has been written using a checkout taken on June 17 and some features here illustrated will be present only in the version 9.6 beta2.

Compared to the 9.5 release, new parameters have been introduced inside the configuration file. These are:

  • max_parallel_workers_per_gather: the number of workers that can assist a sequential scan of a table;
  • min_parallel_relation_size: the minimum size that a relation must have for the planner to consider the use of additional workers;
  • parallel_setup_cost: the planner parameter that estimates the cost of instantiate a worker;
  • parallel_tuple_cost: the planner parameter that estimates the cost of transferring a tuple from one worker to another;
  • force_parallel_mode: parameter useful for testing, strong parallelism and also a query in which the planner would operate in other ways.

Let’s see how the additional workers can be used to speed up our queries. We create a test table with an INT field and one hundred million records:

postgres=# CREATE TABLE test (i int); CREATE TABLE postgres=# INSERT INTO test SELECT generate_series(1,100000000); INSERT 0 100000000 postgres=# ANALYSE test; ANALYZE

PostgreSQL has max_parallel_workers_per_gather set to 2 by default, for which two workers will be activated during a sequential scan.

A simple sequential scan does not present any novelties:

postgres=# EXPLAIN ANALYSE SELECT * FROM test; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Seq Scan on test (cost=0.00..1442478.32 rows=100000032 width=4) (actual time=0.081..21051.918 rows=100000000 loops=1) Planning time: 0.077 ms Execution time: 28055.993 ms (3 rows)

In fact, the presence of a WHERE clause is required for parallelisation:

postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE i=1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Gather (cost=1000.00..964311.60 rows=1 width=4) (actual time=3.381..9799.942 rows=1 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on test (cost=0.00..963311.50 rows=0 width=4) (actual time=6525.595..9791.066 rows=0 loops=3) Filter: (i = 1) Rows Removed by Filter: 33333333 Planning time: 0.130 ms Execution time: 9804.484 ms (8 rows)

We can go back to the previous action and observe the differences setting max_parallel_workers_per_gather to 0:

postgres=# SET max_parallel_workers_per_gather TO 0; SET postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE i=1; QUERY PLAN -------------------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..1692478.40 rows=1 width=4) (actual time=0.123..25003.221 rows=1 loops=1) Filter: (i = 1) Rows Removed by Filter: 99999999 Planning time: 0.105 ms Execution time: 25003.263 ms (5 rows)

A time 2.5 times greater.

The planner does not always consider a parallel sequential scan to be the best option. If a query is not selective enough and there are many tuples to transfer from worker to worker, it may prefer a “classic” sequential scan:

postgres=# SET max_parallel_workers_per_gather TO 2; SET postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE i<90000000; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..1692478.40 rows=90116088 width=4) (actual time=0.073..31410.276 rows=89999999 loops=1) Filter: (i < 90000000) Rows Removed by Filter: 10000001 Planning time: 0.133 ms Execution time: 37939.401 ms (5 rows)

In fact, if we try to force a parallel sequential scan, we get a worse result:

postgres=# SET parallel_tuple_cost TO 0; SET postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE i<90000000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..964311.50 rows=90116088 width=4) (actual time=0.454..75546.078 rows=89999999 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on test (cost=0.00..1338795.20 rows=37548370 width=4) (actual time=0.088..20294.670 rows=30000000 loops=3) Filter: (i < 90000000) Rows Removed by Filter: 3333334 Planning time: 0.128 ms Execution time: 83423.577 ms (8 rows)

The number of workers can be increased up to max_worker_processes (default: 8). We restore the value of parallel_tuple_cost and we see what happens by increasing max_parallel_workers_per_gather to 8.

postgres=# SET parallel_tuple_cost TO DEFAULT ; SET postgres=# SET max_parallel_workers_per_gather TO 8; SET postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE i=1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Gather (cost=1000.00..651811.50 rows=1 width=4) (actual time=3.684..8248.307 rows=1 loops=1) Workers Planned: 6 Workers Launched: 6 -> Parallel Seq Scan on test (cost=0.00..650811.40 rows=0 width=4) (actual time=7053.761..8231.174 rows=0 loops=7) Filter: (i = 1) Rows Removed by Filter: 14285714 Planning time: 0.124 ms Execution time: 8250.461 ms (8 rows)

Even though PostgreSQL could use up to 8 workers, it has instantiated only six. This is because Postgres also optimises the number of workers according to size of the table and the min_parallel_relation_size. The number of workers made available by postgres is based on a geometric progression with 3 as common ratio 3 and min_parallel_relation_size as scale factor. Here is an example. Considering the 8MB of default parameter:

Size Worker <8MB 0 <24MB 1 <72MB 2 <216MB 3 <648MB 4 <1944MB 5 <5822MB 6 … …

Our table size is 3458MB, so 6 is the maximum number of available workers.

postgres=# \dt+ test List of relations Schema | Name | Type | Owner | Size | Description --------+------+-------+----------+---------+------------- public | test | table | postgres | 3458 MB | (1 row)

Finally, I will give a brief demonstration of the improvements achieved by through this patch. Running our query with a growing number of growing workers, we obtain the following results:

Workers Time 0 24767.848 ms 1 14855.961 ms 2 10415.661 ms 3 8041.187 ms 4 8090.855 ms 5 8082.937 ms 6 8061.939 ms

We can see that the times dramatically improve, until you reach a third of the initial value. It is also simple to explain the fact that we do not see improvements between the use of 3 and 6 workers: the machine on which the test was run has 4 CPUs, so the results are stable after having added 3 more workers to the original process.

Finally, PostgreSQL 9.6 has set the stage for query parallelisation, in which parallel sequential scan is only the first great result. We will also see that in 9.6, aggregations have been parallelised, but that is information for another article which will be released in the upcoming weeks!

Gulcin Yildirim: Evolution of Fault Tolerance in PostgreSQL: Replication Phase

11 July, 2016 - 11:00

PostgreSQL is an awesome project and it evolves at an amazing rate. We’ll focus on evolution of fault tolerance capabilities in PostgreSQL throughout its versions with a series of blog posts. This is the second post of the series and we’ll talk about replication and its importance on fault tolerance and dependability of PostgreSQL.

If you would like to witness the evolution progress from the beginning, please check the first blog post of the series: Evolution of Fault Tolerance in PostgreSQL

PostgreSQL Replication

Database replication is the term we use to describe the technology used to maintain a copy of a set of data on a remote system.  Keeping a reliable copy of a running system is one of the biggest concerns of redundancy and we all like maintainable, easy-to-use and stable copies of our data.

Let’s look at the basic architecture. Typically, individual database servers are referred to as nodes. The whole group of database servers involved in replication is known as a cluster. A database server that allows a user to make changes is known as a master or primary, or may be described as a source of changes. A database server that only allows read-only access is known as a Hot Standby, or sometimes, a slave server. (Hot Standby term is explained in detailed under Standby Modes title.)

The key aspect of replication is that data changes are captured on a master, and then transferred to other nodes. In some cases, a node may send data changes to other nodes, which is a process known as cascading or relay. Thus, the master is a sending node but not all sending nodes need to be masters. Replication is often categorized by whether more than one master node is allowed, in which case it will be known as multimaster replication.

Let’s see how PostgreSQL is handling replication over time and what is the state-of-art for fault tolerance by the terms of replication.

PostgreSQL Replication History

Historically (around year 2000-2005), Postgres only concentrated in single node fault tolerance/recovery which is mostly achieved by the WAL, transaction log. Fault tolerance is handled partially by MVCC (multi-version concurrency system), but it’s mainly an optimisation.

Write-ahead logging was and still is the biggest fault tolerance method in PostgreSQL. Basically, just having WAL files where you write everything and can recover in terms of failure by replaying them. This was enough for single node architectures and replication is considered to be the best solution for achieving fault tolerance with multiple nodes.

Postgres community used to believe long time that replication is something that Postgres should not provide and should be handled by external tools, this is why tools like Slony and Londiste became existing. (We’ll cover trigger-based replication solutions at the next blog posts of the series.)

Eventually it became clear that, one server tolerance is not enough and more people demanded proper fault tolerance of the hardware and proper way of switching, something in built-in in Postgres. This is when physical (then physical streaming) replication came to life.

We’ll go through all of the replication methods later in the post but let’s see the chronological events of PostgreSQL replication history by major releases:

  • PostgreSQL 7.x (~2000)
    • Replication should not be part of core Postgres
    • Londiste – Slony (trigger based logical replication)
  • PostgreSQL 8.0 (2005)
    • Point-In-Time Recovery  (WAL)
  • PostgreSQL 9.0 (2010)
    • Streaming Replication (physical)
  • PostgreSQL 9.4 (2014)
    • Logical Decoding (changeset extraction)
 Physical Replication

PostgreSQL solved the core replication need with what most relational databases do; took the WAL and made possible to send it over network. Then these WAL files are applied into a separate Postgres instance that is running read-only.

The read-only standby instance just applies the changes (by WAL) and the only write operations come again from the same WAL log. This is basically how streaming replication mechanism works. In the beginning, replication was originally shipping all files -log shipping-, but later it evolved to streaming.

In log shipping, we were sending whole files via the archive_command. The logic is pretty simple there: you just send the archive and log it to somewhere – like the whole 16MB WAL file –  and then you apply it to somewhere,  and then you fetch the next one and apply that one and it goes like that. Later on, it became streaming over network by using libpq protocol in PostgreSQL version 9.0.

The existing replication is more properly known as Physical Streaming Replication, since we are streaming a series of physical changes from one node to another. That means that when we insert a row into a table we generate change records for the insert plus all of the index entries.

When we VACUUM a table we also generate change records.

Also, Physical Streaming Replication records all changes at the byte/block level, making it very hard to do anything other than just replay everything

Fig.1 Physical Replication

 Fig.1 shows how physical replication works with just two nodes. Client execute queries on the master node, the changes are written to a transaction log (WAL) and copied over network to WAL on the standby node. The recovery on the standby process on the standby then reads the changes from WAL and applies them to the data files just like during recovery. If the standby is in hot standby mode, clients may issue read-only queries on the node while this is happening

Note: Physical Replication simply refers sending WAL files over network from master to standby node. Files can be send by different protocols like scp, rsync, ftp… The difference between Physical Replication and Physical Streaming Replication is Streaming Replication uses an internal protocol for sending WAL files (sender and receiver processes)

Standby Modes

Multiple nodes provide High Availability. For that reason modern architectures usually have standby nodes. There are different modes for standby nodes (warm and hot standby). The list below explains the basic differences between different standby modes, and also shows the case of multi-master architecture.

Warm Standby

Can be activated immediately, but cannot perform useful work until activated. If we continuously feed the series of WAL files to another machine that has been loaded with the same base backup file, we have a warm standby system: at any point we can bring up the second machine and it will have a nearly-current copy of the database. Warm standby does not allow read-only queries, Fig.2 simply represents this fact.

Fig.2 Warm Standby

 Recovery performance of a warm standby is sufficiently good that the standby will typically be only moments away from full availability once it has been activated. As a result, this is called a warm standby configuration which offers high availability.

Hot Standby

Hot standby is the term used to describe the ability to connect to the server and run read-only queries while the server is in archive recovery or standby mode. This is useful both for replication purposes and for restoring a backup to a desired state with great precision.

Fig.3 Hot Standby

The term hot standby also refers to the ability of the server to move from recovery through to normal operation while users continue running queries and/or keep their connections open. Fig.3 shows that standby mode allows read-only queries.


All nodes can perform read/write work.  (We’ll cover multi-master architectures at the next blog posts of the series.)

WAL Level parameter

There is a relation between setting up wal_level parameter in postgresql.conf file and what is this setting is suitable for. I created a table for showing the relation for PostgreSQL version 9.6.

Quick Note:  wal_level parameter determines how much information is written to the WAL. The default value is  minimal, which writes only the information needed to recover from a crash or immediate shutdown. replica adds logging required for WAL archiving as well as information required to run read-only queries on a standby server. Finally, logical adds information necessary to support logical decoding. Each level includes the information logged at all lower levels.

In releases prior to 9.6, this parameter also allowed the values archive and hot_standby. These are still accepted but mapped to replica.

Failover and Switchover

In single-master replication, if the master dies, one of the standbys must take its place (promotion). Otherwise, we will not be able to accept new write transactions. Thus, the term designations, master and standby, are just roles that any node can take at some point. To move the master role to another node, we perform a procedure named Switchover.

If the master dies and does not recover, then the more severe role change is known as a Failover. In many ways, these can be similar, but it helps to use different terms for each event.  (Knowing the terms of failover and switchover will help us with the understanding of the timeline issues at the next blog post.)


In this blog post we discussed PostgreSQL replication and its importance for providing fault tolerance and dependability. We covered Physical Streaming Replication and talked about Standby Modes for PostgreSQL. We mentioned Failover and Switchover. We’ll continue with PostgreSQL timelines at the next blog post.


PostgreSQL Documentation

Logical Replication in PostgreSQL 5432…MeetUs presentation by Petr Jelinek

PostgreSQL 9 Administration Cookbook – Second Edition

Michael Paquier: Postgres 9.6 feature highlight - pg_blocking_pids

11 July, 2016 - 09:40

pg_blocking_pids is one of those things that makes the life of analysis tasks easier in Postgres. It has been introduced in 9.6 with the following commit:

commit: 52f5d578d6c29bf254e93c69043b817d4047ca67 author: Tom Lane <> date: Mon, 22 Feb 2016 14:31:43 -0500 Create a function to reliably identify which sessions block which others. This patch introduces "pg_blocking_pids(int) returns int[]", which returns the PIDs of any sessions that are blocking the session with the given PID. Historically people have obtained such information using a self-join on the pg_locks view, but it's unreasonably tedious to do it that way with any modicum of correctness, and the addition of parallel queries has pretty much broken that approach altogether. (Given some more columns in the view than there are today, you could imagine handling parallel-query cases with a 4-way join; but ugh.) [...]

You can refer to the commit text in full to get more details regarding why this function is better than a join on the system catalogs pg_locks (self join with one portion being the waiter, and the other the holder, doing field-by-field comparisons), from which is a short summary:

  • Better understanding of which lock mode blocks the other.
  • When multiple sessions are queuing to wait for a lock, only the one at the head is reported.
  • With parallel queries, all the PIDs of the parallel sessions are reported. Note that it is possible in this case that duplicated PIDs are reported here because for example multiple waiters are blocked by the same PID.

Note that the primary reason for its introduction is to simplify the isolation testing facility that has been querying directly pg_locks to get information on the lock status between lock holders and waiters.

This function takes in input the PID of a session, and returns a set of PIDS taking a lock that this session whose PID is used in input is waiting for. So let’s take an example, here is a session 1:

=# CREATE TABLE tab_locked (a int); CREATE TABLE =# SELECT pg_backend_pid(); pg_backend_pid ---------------- 68512 (1 row)

And a session 2:

=# BEGIN; BEGIN =# LOCK tab_locked IN ACCESS EXCLUSIVE MODE; LOCK TABLE =# SELECT pg_backend_pid(); pg_backend_pid ---------------- 69223 (1 row)

Finally by coming back to session 1, let’s stuck it:

=# INSERT INTO tab_locked VALUES (1); -- Hey I am stuck here

Then here comes pg_blocking_pids, one can fetch the following result, reporting that the session taking the lock on table ‘tab_locked’ is blocking the session trying to insert a tuple:

=# SELECT pg_blocking_pids(68512); pg_blocking_pids ------------------ {69223} (1 row)

… Which is not something complicated in itself, but it is surely going to save a lot of typing or simplify a couple of extensions that have been doing the same kind of work. Now, looking at the code in lockfuncs.c, this code is actually far faster because it does directly lookups of the PGPROC entries to gather the information regarding what the blocking information.

An even more interesting thing is the introduction of GetBlockerStatusData(), which allows fetching the locking status data of a blocked PID to be able to use that in a reporting function or any other facility. This is definitely useful for people working on monitoring facilities aimed to track activity of Postgres instances.

Simon Riggs: Oracle’s rising open source problem

10 July, 2016 - 23:28

A salesman from Oracle recently spoke to my sales colleague at a conference, teasing him that he should “come make some money”. That was quite surprising, given Oracle’s well documented problems in gaining new sales, most especially what they say in their own public filings. The reply was unsurprisingly: “No, thanks – I’d like a permanent job.”

Oracle’s rising open source problem

Oracle Sales Erode as Startups Embrace Souped-Up Free Software

Of course, its well known that squeezing existing customers is one of the things they’re doing to avoid a catastrophic slump in sales. The only thing I can add from my own experience is the level of pure anger that the current policies are generating with their current customers. More than that, avoiding database vendor lock-in for their next purchase is the number #1 requirement, so the 2ndQuadrant story around PostgreSQL sits well with soon-to-be ex-Oracle customers.

Shaun M. Thomas: PG Phriday: All in a Name

9 July, 2016 - 00:29

Naming objects in a database can sometimes be an exercise in frustration. What kind of guidelines should we follow for schemas and tables? What about columns or variables? Should the same rules apply to indexes, constraints, and sequences? Functions and triggers are much different than all of those elements, yet still exist within the same namespace. Then we have to contend with Postgres reserved words, many of which are probably only obvious to someone well versed in database lingo.

Luckily, a few short rules essentially address of most of these questions directly. For objects that defy description or are otherwise elusive, we can still converge on something workable. Let’s start with a short list of basics:

  • Don’t quote.
  • Underscores separate words.
  • Motive counts.
  • Prefixes prevent problems.
  • Specific as possible.

Not only are these rules simple to apply, they make a great mnemonic acronym: DUMPS! Let’s explore each of these in depth and consider why they help make the database a place everyone can share and enjoy.

Don’t Quote

This is a rarity, but not uncommon in systems facilitated by an ORM or some kind of design tool that directly translates user inputs into object names:


Quoting allows us to override both case insensitivity and any reserved words Postgres might have—heck, we can even use spaces. While some developers and applications interpret this as consent, it greatly complicates actually using the objects. Quoted labels that contain mixed capitalization, special characters, or reserved words must be quoted forever. Whether it’s a DBA hand-writing a query, a reporting team trying to mine a series of tables for correlations, or an ORM.

The Postgres list of reserved words reveals that Postgres is actually more permissive than the standards dictate. The word “value” for example, might not be accepted in other database platforms, while Postgres will allow using it without quotes. This becomes a problem in mixed environments where various tools transmit data back and forth between database engines. The inconsistent behavior regarding reserved words means potential for unexpected results. There’s also the potential for Postgres to disallow common reserved words in the future.

Generally it’s best to simply avoid any database-esque words for object names and nullify the issue outright. Luckily following the specificity guideline pretty much solves syntax conflicts automatically, since our specific object name is very unlikely to be reserved by Postgres.


Unlike most languages, SQL is case insensitive. Let’s take a look at what that means in practice:

CREATE OR REPLACE FUNCTION doSomethingCool(myVar INT) RETURNS INT AS $$ BEGIN RETURN myVar * myVar; END; $$ LANGUAGE plpgsql;   CREATE OR REPLACE FUNCTION DoSomethingCool(myVar INT) RETURNS INT AS $$ BEGIN RETURN myVar; END; $$ LANGUAGE plpgsql;   SELECT doSomethingCool(5);   dosomethingcool ----------------- 5

While Postgres allows function overloading by specifying different arguments, that’s not what happened here. Instead, our carefully crafted CamelCase distinctions are discarded upon evaluation. As a consequence, the function names are identical, and the second definition overwrites the first one.

But it gets even more insidious than that. Check this out:

CREATE OR REPLACE FUNCTION check_me_out(nEst INT) RETURNS INT AS $$ DECLARE nest BOOLEAN; -- Should we nest the output? BEGIN -- ... other code here. RETURN nEst; END; $$ LANGUAGE plpgsql;   SELECT check_me_out(5);   dosomethingcool -----------------

Wait, what? This time, “nEst” is the same as “nest”, opening the potential for unexpected or undefined behavior. While this could arguably be called a Postgres bug since we shouldn’t be able to declare a variable with the same name as a functional parameter, it’s just one example.

Otherwise, the rule is simple: use underscores to split words. We don’t want onegiantunreadablestring, so it’s the only reliable way to retain user readability. Postgres doesn’t preserve mixed case at all in object names unless they’re quoted. As a consequence, it’s up to us to avoid those situations.

Here’s another illustration to drive the point home:

CREATE TABLE UserFormsAndOtherStuff (stuff INT);   \dt UserFormsAndOtherStuff   List OF relations Schema | Name | TYPE | Owner --------+------------------------+-------+---------- public | userformsandotherstuff | TABLE | postgres

My eyes!

Motive Counts

This is essentially an extension of our specificity rule. This time, we’re incorporating the use case along with any necessary context and intent behind our objects. What does this mean? Take a look at these structures:

CREATE TABLE toll_sensor (...); CREATE OR REPLACE VIEW v_all_active_sensors AS ...;   CREATE INDEX idx_sensor_ignore_inactive ON toll_sensor (sensor_id) WHERE is_online;   ALTER TABLE toll_sensor_log ADD CONSTRAINT chk_sensor_log_reject_invalid CHECK ...;   CREATE OR REPLACE FUNCTION f_log_traffic_encounter(...); CREATE OR REPLACE FUNCTION f_nullify_logged_transponder(...);   CREATE TRIGGER t_sensor_log_advisory_audit_a_iud ...;

In a database context, it’s assumed tables will store multiple things. In fact, it’s probably easier to consider the name of the table as a description for a single row it contains. In our above toll_sensor table, each row within the table is a sensor. This suggests table names should be singular. Simultaneously, view names are an instantiation of a query, meaning they’re a result set and therefore inherently plural.

When we think about indexes, why is the index necessary? Is it just a basic index for a column, or is it necessary to facilitate a subset of searches? Views are essentially reports, so why does the view exist, and what is it returning? Constraints define how a table acts in many cases, and we should list those conditions or some summary of them. With functions, we want to know what the function does, and if possible, to what. Triggers can do a lot of things, and since they add quite a bit of overhead, we really want to explain their presence.

Just consider the context, and this should be easy.


Database schemas often become a literal dumping ground of database objects. Sometimes from multiple different competing applications. Imagine we just wrote an application to track automated highway tolling. We know in the future that there could be companion applications that use the same data, but might rely on tables of their own.

Here’s how that might look:

CREATE SCHEMA toll_system; SET search_path TO toll_system;   CREATE TABLE toll_sensor ( sensor_id SERIAL NOT NULL, ... );   ALTER TABLE toll_sensor ADD CONSTRAINT pk_toll_sensor PRIMARY KEY (sensor_id);   CREATE TABLE toll_transponder ( transponder_id SERIAL PRIMARY KEY NOT NULL, ... );   CREATE TABLE toll_traffic_log ( traffic_log_id SERIAL PRIMARY KEY NOT NULL, ... );   CREATE UNIQUE INDEX udx_traffic_log_encounter (...);   CREATE VIEW v_transponder_history AS SELECT ...;   CREATE FUNCTION f_log_transponder(...) RETURNS BOOLEAN AS ...;   CREATE TRIGGER t_traffic_log_audit_a_iud AFTER INSERT OR UPDATE OR DELETE ON toll_traffic_log ...;

Things like tables, sequences, functions, views, indexes, and triggers all exist in the same namespace, and many even share the same context. We can avoid confusion as well as collisions by prefixing. The “rules” for this are straight-forward:

  • Label tables with the primary intent. For a tolling system, this would be “toll”, while an inventory system might prefer “inv”.
  • Use “v” for views.
  • Use “f” for functions. (Or “sp” for “stored procedure” if you prefer.)
  • Use “t” for triggers. (Optionally, suffix the trigger firing conditions: “b” for before, “a” for after, and any one of “i”, “u”, and “d” for insert, update, and delete.)
  • Indexes can be “idx” for a regular index, or “udx” for a unique index.
  • Indexes should also be prefixed with the parent table name, sans the table’s prefix.
  • Primary keys should use “pk”.
  • Foreign keys should use “fk”.
  • Boolean columns should be prefixed with “is”, “can”, “has”, etc.

Complicated? Maybe, but only slightly. It’s easy to summarize: abbreviate, shorten, and add. It’s hard to miss that we’re just using the object type as a letter prefix. This way, anyone looking at the schema can tell which things are views, tables, sequences, and so on. Further, examining the table will immediately show which indexes are unique, which constraints are a foreign key or a check, and so on. The point is consistency; this isn’t a bible.

These prefixes give us structure visibility, as well as some peace of mind in avoiding naming collisions in the giant blob of a namespace.


Motive explains how and why, while being specific tells us what. Every once in a while, I come across a table like this:


Databases can contain hundreds or thousands of tables, each of which may encapsulate dozens or hundreds of rows. What is “storage”? What are we storing? Data? What kind of data? From where? This kind of table is a nightmare to decipher. There are no clues to track down which app might be using it, and no way to tell how the table is related to other tables in the schema.

A column name such as “data” may seem like an acceptable idea for a single application, but when there are a dozen applications which have their own “data”, it quickly starts to lose meaning. Sure, the application developer probably knows what all of these extremely generic terms refer to, but does anyone else? In this case, the intent was probably to use “key” as some kind of lookup value as often seen with key/value pairs in associative arrays or noSQL-based composite text keys. And likewise the “data” column is probably the associated JSON object for that lookup key.

But why should anyone have to guess? This is only one example, but there are many like it. Let’s assume this table is used by an application that stores user testimonies. The application captures a lot of varying data from polymorphic forms, so the JSON can’t really be transformed into a reliable set of columns. With those constraints, we might try this instead:

CREATE TABLE user_testimony ( testimony_id SERIAL NOT NULL PRIMARY KEY, lookup_key VARCHAR NOT NULL, comment_type VARCHAR NOT NULL, user_feedback JSON NOT NULL, date_created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now() );

Isn’t that better? Now a quick glance reveals everything we need to know about the table and its contents. Tables are a back-end storage element, and are often shared by multiple applications and user groups. Shared resources should be as descriptive as possible to prevent confusion and potential misinterpretation or misuse.

This applies to any database object. Views should distinctly describe their purpose. Columns need to differentiate themselves from columns in other tables to prevent join ambiguity. Functions, sequences, everything should have a clearly distinguishable name.


This is a lot to consider. The DUMPS acronym helps, but an executive summary might be a simple: consistent and descriptive is best. A consistent naming scheme will prevent a lot of future headaches, and being descriptive forces us to think about the data we’re storing and why. Not only can this help us consider the architecture before setting it in stone, but it conveys meaning by simply existing.

Anyone who joins the company in the future can make sense of data that is at least partially self-documented. Queries can be more consistent and precise, with less ambiguity in every facet from column names to join conditions. Applications can be extended, or new APIs written to leverage old data. Why wade through ancient and probably undocumented code to determine original intent or business logic?

Just remember that access vectors are myriad and ephemeral, but data is forever. No matter how hard it is to think of good names, don’t let it get you down in the DUMPS.

Jobin Augustine: MySQL Foreign Data Wrapper : A quick tour

8 July, 2016 - 16:00

Data centers are no longer dominated by a single DBMS. Many companies have heterogeneous environments and may want their Postgres database to talk to other database systems. Foreign Data Wrappers can be the right solution for many scenarios. The BigSQL Project provides a well tested, ready to use MySQL FDW with Postgres. This makes life easy for a DevOps or DataCenter person.

Here is a quick tour on how to configure Foreign Data Wrappers for MySQL, so that Postgres can query a MySQL table. For this quick guide, I use a CentOS Linux machine. This, or a similar setup, should work fine on all other operating systems.

Setting up a test MySQL server for the test

In this demo I’m going to create a table in MySQL  which should be available to Postgres though the FDW.
The FDW can talk to any MySQL distribution including Oracle’s MySQL, Percona Server or MariaDB. I’m going to use MariaDB, which is more community friendly.

Install MariaDB Server and Start the service

$ sudo yum install mariadb-server.x86_64
$ sudo systemctl start mariadb

Connect as root user of mariadb and create a database

$ mysql -uroot
MariaDB [(none)]> create database postgres;

Connect to Database and create a table


MariaDB [(none)]> use postgres;
MariaDB [postgres]> create table t1m(id int,name varchar(30));

Insert some data in the table:

MariaDB [postgres]> insert into t1m values (1,'abc');
Query OK, 1 row affected (0.04 sec)

MariaDB [postgres]> insert into t1m values (2,’def’);
Query OK, 1 row affected (0.00 sec)

MariaDB [postgres]> insert into t1m values (3,’hij’);
Query OK, 1 row affected (0.03 sec)

Setting up Postgres Database Install Postgres

For this test, I’m going to use the Postgres DevOps Sandbox from the BigSQL project.
Download the Sandbox from BigSQL
Since this is a sandbox, you just need to unpack it
$ tar -xvf bigsql-9.5.3-5-linux64.tar.bz2

Install MySQL FDW

Go to the unpacked directory and invoke the bigsql command line tool to install MySQL FDW

$ cd bigsql

$ ./pgc list
Category | Component | Version | Status | Port | Updates
PostgreSQL pg92 9.2.17-5 NotInstalled
PostgreSQL pg93 9.3.13-5 NotInstalled
PostgreSQL pg94 9.4.8-5 NotInstalled
PostgreSQL pg95 9.5.3-5 NotInitialized
Extensions cassandra_fdw3-pg95 3.0.0-1 NotInstalled
Extensions hadoop_fdw2-pg95 2.5.0-1 NotInstalled
Extensions mysql_fdw2-pg95 2.1.2-1 NotInstalled
Extensions oracle_fdw1-pg95 1.4.0-1 NotInstalled
Extensions orafce3-pg95 3.3.0-1 NotInstalled
Extensions pgtsql9-pg95 9.5-1 NotInstalled
Extensions pljava15-pg95 1.5.0-1 NotInstalled
Extensions plv814-pg95 1.4.8-1 NotInstalled
Extensions postgis22-pg95 2.2.2-2 NotInstalled
Extensions slony22-pg95 2.2.5-2 NotInstalled
Extensions tds_fdw1-pg95 1.0.7-1 NotInstalled
Servers bam2 1.5.0 NotInstalled
Servers cassandra30 3.0.6 NotInstalled
Servers hadoop26 2.6.4 NotInstalled
Servers hive2 2.0.1 NotInstalled
Servers pgbouncer17 1.7.2-1 NotInstalled
Servers pgha2 2.1b NotInstalled
Servers pgstudio2 2.0.1-2 NotInstalled
Servers spark16 1.6.1 NotInstalled
Servers tomcat8 8.0.35 NotInstalled
Servers zookeeper34 3.4.8 NotInstalled
Applications backrest 1.02 NotInstalled
Applications birt 4.5.0 NotInstalled
Applications ora2pg 17.4 NotInstalled
Applications pgbadger 8.1 NotInstalled
Frameworks java8 8u92 NotInstalled

$ ./pgc install mysql_fdw2-pg95
Get:1 mysql_fdw2-pg95-2.1.2-1-linux64
Unpacking mysql_fdw2-pg95-2.1.2-1-linux64.tar.bz2

Note:- We can use the same command line tool to initalize a new postgres cluster
$ ./pgc init pg95

## Initializing pg95 #######################

Superuser Password [password]:
Confirm Password:
Giving current user permission to data dir

Initializing Postgres DB at:
-D “/home/vagrant/bigsql/data/pg95″

Using PostgreSQL Port 5432

Password securely remembered in the file: /home/vagrant/.pgpass

to load this postgres into your environment, source the env file:



Create the extension in the postgres database

create extension mysql_fdw;

Create foreign server

postgres=# CREATE SERVER mysql_svr
OPTIONS (host ‘localhost’, port ‘3306’);

Create foreign table

postgres=# CREATE FOREIGN TABLE mysql_tab (
postgres(# id int,
postgres(# name varchar(30)
postgres(# )
postgres-# SERVER mysql_svr
postgres-# OPTIONS (dbname 'postgres', table_name 't1m');

Create user mapping

postgres-# SERVER mysql_svr
postgres-# OPTIONS (username 'root');

(if your user is having password authentication to mysql, you have to pass that also in the format (username ‘username’, password ‘password’))

Now everything is set, You can test by querying the table.

postgres=# select * from mysql_tab;
id | name
1 | abc
2 | def
3 | hij
(3 rows)

Note:- MySQL FDW for Postgres requires MySQL Client Libraries. Please make sure that is there in the LD_LIBARY_PATH. if this file name is something different like “″, you may have to create a softlink with name “”

gabrielle roth: PDXPUG: July meeting – It’s Our 10th Anniversary!

8 July, 2016 - 07:08

When: 6-8:30pm Thursday July 21, 2016
Where: iovation
What: 10th Anniversary Social

I can’t believe it’s been 10 years since OSCON 2006, when Selena and I spoke to Josh B about starting a user group.

And here we are.

This meeting is 100% social and will take the place of our regular July meeting. iovation will provide a light dinner and beverages. We will provide the conversation and reminiscing. There will not be any long speeches; there might be cupcakes.

iovation is on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry! Elevators open at 5:45 and building security closes access to the floor at 6:30.

See you there!

Craig Kerstiens: PG Conf SV - Call For Papers Extended

8 July, 2016 - 03:00

PG Conf Silicon Valley is happening again this year in November and we’re looking to make it even better and more informative than last year. To do that we’re looking to you, both as an attendee and to come speak. We’ve already received a lot of great submissions, but we want more. Already we’ve gotten a number of great talks on:

  • What’s New in PostgreSQL
  • High-Availability
  • Backup and Restore

From attendees last year we heard that a number of other topics we equally as interesting. As such we’re extending out call for papers to 7/27 to give even more of you a chance to submit.

  • Use cases
  • Migration to PostgreSQL
  • Performance Tuning
  • Big Data / Internet of Things / Data warehousing
  • App dev’s perspectives on Postgres

Additionally, as we work to make the Postgres community a more inclusive one we understand that not everyone has the means to attend a conference. To help with that we’re offering a limited number of scholarship tickets this year to the conference. We’ll have more details on this in the coming weeks, but if you’re interested in attending or speaking but need assistance please reach out to us

Finally, its not just the speakers that make up the quality of the conference it’s the attendees as well. Last year we had a great collection of people that used Postgres in a variety of interesting ways. If you’re using Postgres or even just considering it we expect you’ll find something useful at the conference. Early bird tickets are still on sale so get yours today.