Planet PostgreSQL

Syndicate content
Planet PostgreSQL
Updated: 44 weeks 12 hours ago

Chris Travers: PostgreSQL vs Hadoop

19 August, 2016 - 08:43
So one of the folks I do work with is moving a large database from PostgreSQL to Hadoop.  The reasons are sound -- volume and velocity are major issues for them, and PostgreSQL is not going away in their data center and in their industry there is a lot more Hadoop usage and tooling than there is PostgreSQL tooling for life science analytics (Hadoop is likely to replace both PostgreSQL and, hopefully, a massive amount of data on NFS).  However this has provided an opportunity to think about big data problems and solutions and their implications.  At the same time I have seen as many people moving from Hadoop to PostgreSQL as the other way around.  No, LedgerSMB will never likely use Hadoop as a backend.  It is definitely not the right solution to any of our problems.

Big data problems tend to fall into three categories, namely managing ever increasing volume of data, managing increasing velocity of data, and dealing with greater variety of data structure.  It's worth noting that these are categories of problems, not specific problems themselves, and the problems within the categories are sufficiently varied that there is no solution for everyone.  Moreover these solutions are hardly without their own significant costs.  All too often I have seen programs like Hadoop pushed as a general solution without attention to these costs and the result is usually something that is overly complex and hard to maintain, may be slow, and doesn't work very well.

So the first point worth noting is that big data solutions are specialist solutions, while relational database solutions for OLTP and analytics are generalist solutions.  Usually those who are smart start with the generalist solutions and move to the specialist solutions unless they know out of the box that the specialist solutions address a specific problem they know they have.  No, Hadoop does not make a great general ETL platform.....

One of the key things to note is that Hadoop is built to solve all three problems simultaneously.  This means that you effectively buy into a lot of other costs if you are trying to solve only one of the V problems with it.

The single largest cost comes from the solutions to the variety of data issues.  PostgreSQL and other relational data solutions provide very good guarantees on the data because they enforce a lack of variety.  You force a schema on write and if that is violated, you throw an error.  Hadoop enforces a schema on read, and so you can store data and then try to read it, and get a lot of null answers back because the data didn't fit your expectations.  Ouch.  But that's very helpful when trying to make sense of a lot of non-structured data.

Now, solutions to check out first if you are faced with volume and velocity problems include Postgres-XL and similar shard/clustering solutions but these really require good data partitioning criteria.  If your data set is highly interrelated, it may not be a good solution because cross-node joins are expensive.  Also you wouldn't use these for smallish datasets either, certainly not if they are under a TB since the complexity cost of these solutions is not lightly undertaken either.

Premature optimization is the root of all evil and big data solutions have their place.  However don't use them just because they are cool or new, or resume-building.  They are specialist tools and overuse creates more problems than underuse.

Simon Riggs: Postgres-BDR: 2 Years in Production

18 August, 2016 - 18:30

Postgres-BDR has now reached 1.0 production status.

Over the last 2 years, Postgres-BDR has been used daily for mission critical production systems.

As you might imagine, it’s been improved by both bug fixes and feature enhancements that allow it to be used smoothly, so its mature, robust and feature-rich.

The BDR Project introduced logical replication for PostgreSQL, now available as pglogical. In addition, it introduced replication slots, background workers and many other features. But is it still relevant?

Postgres-BDR delivers all of these features that aren’t yet in PostgreSQL 9.6, and likely won’t all be in PostgreSQL 10.0 either

  • Automatic replication of DDL, reducing maintenance costs for DBAs
  • Automatic replication of sequences
  • Conflict resolution and logging

Ernst-Georg Schmid: Hexastores are easy

18 August, 2016 - 15:10
Did you know that you can make a Hexastore from a RDF triple in just one line of SQL? (This needs PostgreSQL 9.4 or better, because of the multi-array unnest)

    IN sub text,
    IN pred text,
    IN obj text)
  RETURNS TABLE(ord text, a text, b text, c text) AS
$$select A.t || B.t || C.t as ord, A.v, B.v, C.v from (select * from unnest(ARRAY[sub, pred, obj],ARRAY['s', 'p', 'o'])) as A(v, t) cross join (select * from unnest(ARRAY[sub, pred, obj],ARRAY['s', 'p', 'o'])) as B(v, t) cross join (select * from unnest(ARRAY[sub, pred, obj],ARRAY['s', 'p', 'o'])) as C(v, t) where a.v != b.v and a.v != c.v and b.v != c.v order by ord desc$$
  COST 100
  ROWS 6;

SELECT* FROM hexify('subject','predicate','object');

Sometimes, PostgreSQL SQL is just awesome...

More on Hexastores here and here.

Colin Copeland: Postgres Present and Future (PyCon 2016 Must-See Talk: 6/6)

18 August, 2016 - 15:00

Part six of six in our annual PyCon Must-See Series, a weekly highlight of talks our staff especially loved at PyCon. With so many fantastic talks, it’s hard to know where to start, so here’s our short list.

Coming from a heavy database admin background, I found Craig Kerstiens’s “Postgres Present and Future “to be incredibly well organized and engaging. Of particular interest to me, because I am somewhat new to Postgres (while having more background history with MS SQL), was the deep dive into indexes in Postgres.

Check out 5:44-8:39 to find out when to use different types of indexes, outside of the standard B-Tree. For instance, Gin indexes are helpful when searching multiple values for a single column, ie. an array field or a JSONB field.

Click over to 17:22-19:33 to learn about the new Bloom Filter in Postgres 9.6, which is coming out in a few months. This extension seems like it will be incredibly useful to speed up queries on wide tables with a bunch of different options.

More in the annual PyCon Must-See Talks Series.

Gulcin Yildirim: Evolution of Fault Tolerance in PostgreSQL: Synchronous Commit

17 August, 2016 - 12:22

PostgreSQL is an awesome project and it evolves at an amazing rate. We’ll focus on evolution of fault tolerance capabilities in PostgreSQL throughout its versions with a series of blog posts. This is the fourth post of the series and we’ll talk about synchronous commit and its effects on fault tolerance and dependability of PostgreSQL.

If you would like to witness the evolution progress from the beginning, please check the first three blog posts of the series below. Each post is independent, so you don’t actually need to read one to understand another.

  1. Evolution of Fault Tolerance in PostgreSQL 
  2. Evolution of Fault Tolerance in PostgreSQL: Replication Phase 
  3. Evolution of Fault Tolerance in PostgreSQL: Time Travel
Synchronous Commit

By default, PostgreSQL implements asynchronous replication, where data is streamed out whenever convenient for the server. This can mean data loss in case of failover. It’s possible to ask Postgres to require one (or more) standbys to acknowledge replication of the data prior to commit, this is called synchronous replication (synchronous commit).

With synchronous replication, the replication delay directly affects the elapsed time of transactions on the master. With asynchronous replication, the master may continue at full speed.

Synchronous replication guarantees that data is written to at least two nodes before the user or application is told that a transaction has committed.

The user can select the commit mode of each transaction, so that it is possible to have both synchronous and asynchronous commit transactions running concurrently.

This allows flexible trade-offs between performance and certainty of transaction durability.

Configuring Synchronous Commit

For setting up synchronous replication in Postgres we need to configure synchronous_commit parameter in postgresql.conf.

The parameter specifies whether transaction commit will wait for WAL records to be written to disk before the command returns a success indication to the client. Valid values are on, remote_apply, remote_write, local, and off. We’ll discuss how things work in terms of synchronous replication when we setup synchronous_commit parameter with each of the defined values.

Let’s start with Postgres documentation (9.6):

The default, and safe, setting is on. When off, there can be a delay between when success is reported to the client and when the transaction is really guaranteed to be safe against a server crash. (The maximum delay is three times wal_writer_delay.) Unlike fsync, setting this parameter to off does not create any risk of database inconsistency: an operating system or database crash might result in some recent allegedly-committed transactions being lost, but the database state will be just the same as if those transactions had been aborted cleanly. So, turning synchronous_commit off can be a useful alternative when performance is more important than exact certainty about the durability of a transaction.

Here we understand the concept of synchronous commit, like we described at the introduction part of the post, you’re free to set up synchronous replication but if you don’t, there is always a risk of losing data. But without risk of creating database inconsistency, unlike turning fsync off – however that is a topic for another post -. Lastly, we conclude that if we need don’t want to lose any data between replication delays and want to be sure that the data is written to at least two nodes before user/application is informed the transaction has committed, we need to accept losing some performance.

Let’s see how different settings work for different level of synchronisation. Before we start let’s talk how commit is processed by PostgreSQL replication. Client execute queries on the master node, the changes are written to a transaction log (WAL) and copied over network to WAL on the standby node. The recovery process on the standby node then reads the changes from WAL and applies them to the data files just like during crash recovery. If the standby is in hot standby mode, clients may issue read-only queries on the node while this is happening. For more details about how replication works you can check out the replication blog post in this series.


Fig.1 How replication works

synchronous_commit = off

When we set sychronous_commit = off,  the COMMIT does not wait for the transaction record to be flushed to the disk. This is highlighted in Fig.2 below.

Fig.2 synchronous_commit = off

synchronous_commit = local

When we set synchronous_commit = local,  the COMMIT waits until the transaction record is flushed to the local disk. This is highlighted in Fig.3 below.

Fig.3 sychronous_commit = local

synchronous_commit = on (default)

When we set synchronous_commit = on, the COMMIT will wait until the server(s) specified by synchronous_standby_names confirm reception of the transaction record (meaning that it has the data in the memory).This is highlighted in Fig.4 below.

Note: When synchronous_standby_names is empty, this setting behaves same as synchronous_commit = local.

Fig.4 synchronous_commit = on

synchronous_commit = remote_write

When we set synchronous_commit = remote_write, the COMMIT will wait until the server(s) specified by synchronous_standby_names confirm write of the transaction record to the disk. This is highlighted in Fig.5 below.

Fig.5 synchronous_commit = remote_write

synchronous_commit = remote_apply

When we set synchronous_commit = remote_apply, the COMMIT will wait until the server(s) specified by synchronous_standby_names confirm that the transaction record was applied to the database. This is highlighted in Fig.6 below.

Fig.6 synchronous_commit = remote_apply

Now, let’s look at sychronous_standby_names parameter in details, which is referred above when setting synchronous_commit as on, remote_apply or remote_write.

synchronous_standby_names = ‘standby_name [, …]’

The synchronous commit will wait for reply from one of the standbys listed in the order of priority. This means that if first standby is connected and streaming, the synchronous commit will always wait for reply from it even if the second standby already replied. The special value of  * can be used as stanby_name which will match any connected standby.

synchronous_standby_names = ‘num (standby_name [, …])’

The synchronous commit will wait for reply from at least num number of standbys listed in the order of priority. Same rules as above apply. So, for example setting synchronous_standby_names = '2 (*)' will make synchronous commit wait for reply from any 2 standby servers.

synchronous_standby_names is empty

If this parameter is empty as shown it changes behaviour of setting synchronous_commit to on, remote_write or remote_apply to behave same as local (ie, the COMMIT will only wait for flushing to local disk).

Note: synchronous_commit parameter can be changed at any time; the behaviour for any one transaction is determined by the setting in effect when it commits. It is therefore possible, and useful, to have some transactions commit synchronously and others asynchronously. For example, to make a single multi-statement transaction commit asynchronously when the default is the opposite, issue SET LOCAL synchronous_commit TO OFF within the transaction.


In this blog post, we discussed synchronous replication and described different levels of protection which are available in Postgres. We’ll continue with logical replication in the next blog post.


Special thanks to my colleague Petr Jelinek for giving me the idea for illustrations.

PostgreSQL Documentation
PostgreSQL 9 Administration Cookbook – Second Edition

Joshua Drake: Rich in the Jungle: A AWS to Softlayer comparison for PostgreSQL

16 August, 2016 - 21:28

I have updated my Rich in the Jungle presentation with new pricing for AWS vs. Softlayer. Things haven't changed much, in terms of raw performance per dollar (which is not the only qualifier) Softlayer is clearly the winner.

Kaarel Moppel: Insert-only data modeling with PostgreSQL?

16 August, 2016 - 12:28

During recent years there has been quite a lot of fuzz about “insert-only” approaches and some database-like products (Datomic, Apache Samza) have emerged, being inspired by the idea of having an immutable datastore. In light of cheap storage and powerful hardware, I see the idea definitely having potential for certain use cases. So why not […]

The post Insert-only data modeling with PostgreSQL? appeared first on Cybertec - The PostgreSQL Database Company.

Tatsuo Ishii: Importing PostgreSQL 9.6's SQL parser

16 August, 2016 - 11:42
In almost every releases of Pgpool-II, we import the latest version of PostgreSQL's SQL parser (more precisely, the raw parser). This time, our new Pgpool-II developer faced with an interesting problem while importing PostgreSQL 9.6's parser.

In PostgreSQL, the SQL parser is written in bion, a general-purpose parser generator. The particular file including the SQL grammar rules is named "gram.y". gram.y used to include "scan.l", which is a lexical scanner written in flex.

 In reality, gram.y is translated into a C source file by bison, then compiled. Same thing can be said to scan.l, which is translated by flex though.
So the main part of SQL parser  source file was single big file consisted of gram.y and scan.l.

From PostgreSQL 9.6, however, PostgreSQL developers decided to keep gram.y and flex.l separated.

Build backend/parser/scan.l and interfaces/ecpg/preproc/pgc.l standalone.

This gives enough confusion to the developer in charge of the work and took some time before realize the change. I would say it's a fun part of the task when we work on an OSS project:-)  However I cannot stop saying that it would be nice if the SQL parser is exported as a separate library so that we do not need this kind of work in every releases of Pgpool-II.

Chris Travers: Forthcoming new scalable job queue extension

14 August, 2016 - 10:48
So for those of you who know, I now spend most of my time doing more general PostgreSQL consulting and a fair bit of time still on LedgerSMB.  One of my major projects lately has been on a large scientific computing platform currently run on PostgreSQL, but due to volume and velocity of data being moved to Hadoop (the client maintains other fairly large PostgreSQL instances with no intention of moving btw).

With this client's permission I have decided to take a lot of the work I have done in optimizing their job queue system and create an extension under PostgreSQL for it..  The job queue currently runs tens of millions of jobs per day (meaning twice that number of write queries, and a fair number of read queries too) and is one of the most heavily optimized parts of the system, so this will be based on a large number of lessons learned on what is a surprisingly hard problem.

It is worth contrasting this to pg_message_queue of which I am also the author.  pg_message_queue is intended as a light-weight, easy to use message queue extension that one can use to plug into other programs to solve common problems where notification and message transfer are the main problems.  This project will be an industrial scale job queuing system aimed at massive concurrency.  As a result simplicity and ease of use take second place to raw power and performance under load.  In other words here I am not afraid to assume the dba and programming teams know what they are doing and has the expertise to read the manual and implement appropriately.

The first version (1.x) will support all supported versions of PostgreSQL and make the following guarantees:

  1. massively multiparallel, non-blocking performance  (we currently use with 600+ connections to PostgreSQL by worker processes.
  2. Partitioning, coalescing, and cancelling of jobs similar in some ways to TheSchwartz
  3. Exponential pushback based on number of times a job has failed
  4. Jobs may be issued again after deletion but that this can always be detected and bad jobs pruned
  5. Optionally job table partitioning.
The first client written will rely on hand-coded SQL along with DBIx::Class's schema objects.  This client will guarantee that:
  1. Work modules done always succeeds or fails in a transaction
  2. A job notifier class will be shown
  3. Pruning of completed jobs will be provided via the  perl module and a second query.
The history of this is that this came from a major client's use of The Schwartz and they out grew it for scalability reasons.  While the basic approach is thus compatible, the following changes are made:
  1. Job arguments are in json format rather than in Storable format in bytea columns
  2. Highly optimized performance on PostgreSQL
  3. Coalesce is replaced by a single integer cancellation column
  4. Jobs may be requested by batches of various sizes
2.x will support 9.5+ and dispense with the need for both advisory locks and rechecking.  I would like to support some sort of graph management as well (i.e. a graph link that goes from one job type to another which specifies "for each x create a job for y" type of semantics.  That is still all in design.

Shaun M. Thomas: PG Phriday: Inevitable Interdiction

12 August, 2016 - 19:48

“Hey! That row shouldn’t be in that table! How the heck did that get there!? Alright, who wrote the application client filters, because you’re fired!”

Good application developers know never to trust client input, but not all realize that a single app is rarely the only vector into a database. Databases don’t just preserve data with various levels of paranoia, they’re also the central nexus of a constellation of apps, scripts, APIs, GUIs, BMIs, HMOs, and STDs. As such, unless every single one of those share a common ancestor that sanitizes, boils, renders, and formats content before storage, there’s bound to be inconsistencies. That’s just how things work.

One of the major benefits of using an RDBMS, is that engines like Postgres provide several mechanisms for ensuring data integrity beyond crash durability and transaction control. Continuing our discussion on database objects from last week, we vaguely referred to other types of constraint. So, what other mechanisms are available aside from primary keys and unique constraints?

Let’s start with foreign keys, since they illustrate how tables are related and enforce that relation to reject invalid content. Here are two simply related tables and a couple of rows:

CREATE TABLE pet_type ( type_id SERIAL PRIMARY KEY, animal VARCHAR UNIQUE NOT NULL );   CREATE TABLE pet ( pet_id SERIAL PRIMARY KEY, type_id INT NOT NULL, pet_name VARCHAR NOT NULL, owner_name VARCHAR NOT NULL );   ALTER TABLE pet ADD CONSTRAINT fk_pet_pet_type FOREIGN KEY (type_id) REFERENCES pet_type (type_id);   INSERT INTO pet_type (animal) VALUES ('cat'), ('dog');   INSERT INTO pet (type_id, pet_name, owner_name) VALUES (1, 'Meow Meow Fuzzyface', 'Cedric'), (2, 'Mr. Peanutbutter', 'Paul');

Foreign keys provide a buffer between pending modifications by enforcing the relationship. In this case, we can’t remove “dog” as a pet type because at least one pet references it. We also can’t insert a pet fish, because there’s no corresponding type.

The other job the foreign key fills is to normalize the type names. If we had used a VARCHAR column in the pet table instead, we could have types of “Dog”, “dog”, “GDo”, or any number of typos preserved for eternity and forever complicating searches. This helps illustrate and sanitize the relationship and all affected data. Let’s see it in action:

DELETE FROM pet_type WHERE animal = 'dog';   ERROR: UPDATE OR DELETE ON TABLE "pet_type" violates FOREIGN KEY CONSTRAINT "fk_pet_pet_type" ON TABLE "pet"   INSERT INTO pet (type_id, pet_name, owner_name) VALUES (3, 'Wanda', 'Lisa');   ERROR: INSERT OR UPDATE ON TABLE "pet" violates FOREIGN KEY CONSTRAINT "fk_pet_pet_type"

Foreign keys do have other modes of operation. We could for instance, declare the constraint as ON DELETE CASCADE. That would enable us to delete from the parent table, but a delete cascade would mean every row in any table that referenced the deleted value would also be removed. That’s a fairly dangerous operation, and a pretty good reason it’s not the default in Postgres.

Foreign keys are fairly limited in application, however. Beyond describing a relationship and enforcing its integrity, there isn’t much left. If we wanted to impose actual rules on the data itself, we need to go a bit further into the toolbox.

For example, suppose we want to ensure pet birth dates are firmly established in the past. None of the previously mentioned constraints will let us apply arbitrary rules on column values, right? That’s where CHECK constraints come in!

ALTER TABLE pet ADD birth_date DATE;   ALTER TABLE pet ADD CONSTRAINT ck_pet_no_future CHECK (CURRENT_DATE - birth_date > 0);   INSERT INTO pet (type_id, pet_name, owner_name, birth_date) VALUES (1, 'Princess Carolyn', 'Amy', '2017-08-12');   ERROR: NEW ROW FOR relation "pet" violates CHECK CONSTRAINT "ck_pet_no_future"

Not bad, eh? Check constraints are exceptionally useful when there are very simple rules we want to enforce. Maybe prices should always be positive. Perhaps invoice line items should be positive unless they’re a credit. There is a lot of potential here, but there’s also room for abuse. There’s technically no limit on the amount of conditionals a check constraint enforces, or the number of checks we prescribe, so we must be judicious or risk performance degradation.

Still, preventing critical data flaws prior to insert is a stupendous capability. Can we go further, though? Of course we can! The final constraint type is for data exclusion. Imagine in our examples that pets can change owners, but can’t be owned by two people simultaneously. Well, we can’t use check constraints for that since they only operate on the current row, and a unique constraint won’t work either.

Let’s watch EXCLUDE handle the situation with ease:

CREATE EXTENSION btree_gist;   ALTER TABLE pet ADD owner_range TSRANGE;   ALTER TABLE pet ADD CONSTRAINT ex_owner_overlap EXCLUDE USING GIST ( pet_name WITH =, owner_range WITH && );   INSERT INTO pet_type (animal) VALUES ('horse');   INSERT INTO pet (type_id, pet_name, owner_name, birth_date, owner_range) VALUES (3, 'Bojack', 'Will', '1964-06-12', '[1964-06-12,2014-09-07)');   INSERT INTO pet (type_id, pet_name, owner_name, birth_date, owner_range) VALUES (3, 'Bojack', 'Arnett', '1964-06-12', '[2013-09-07,)');   ERROR: conflicting KEY VALUE violates exclusion CONSTRAINT "ex_owner_overlap"

There are a couple prerequisites for using exclusion this way, of course. Since the gist index type wasn’t designed to handle types like INT or VARCHAR natively, we need to give it B-Tree capability first with an extension.

Beyond that oddity, we merely added the new ownership date range and then added the constraint itself. The exclusion syntax is to list the column and then the type of operator that should be applied. For ranges, that operator is && to indicate overlap. For our particular example, no pet with the same name can have overlapping ownership ranges. This is a tiny universe indeed!

Exclusion constraints work better for things like scheduling, preventing archive overlaps, and other operations that would be awkward or impossible otherwise. With other database engines, an application might have to search for a date range and other parameters and self-verify that a record is safe to insert. Under this paradigm, any tertiary data vector that isn’t so diligent would be free to ignore scheduling conflicts.

But no application, script, or API can ignore rules the database itself enforces. That’s what constraints are for: those times when being a shared resource is a liability. Judicious application of various constraints can protect as well as describe the data, and make things easier (and safer) for everyone.

And if that isn’t the Postgres motto, maybe it should be.

Craig Ringer: BDR 1.0

12 August, 2016 - 08:22

I’m pleased to say that we’ve just released Postgres-BDR 1.0, based on PostgreSQL 9.4.9.

This release contains significant improvements to DDL replication locking, global sequences, documentation, performance, and more. It also removes the deprecated UDR component in favour of pglogical.

It’s taken a lot of work to get to this point. This release sets the foundation to port BDR to PostgreSQL 9.6 and to enhance its high-availability capabilities, and I’m excited to be pushing BDR forward.

gabrielle roth: PDXPUG: August meeting in two days

12 August, 2016 - 02:10

When: 6-8pm Thursday August 18, 2106
Where: iovation
Who: Brian Panulla, Marty Zajac, Gabrielle Roth
What: ETL Throwdown (or up)

For August, we’re having a panel discussion on various ETL tools we’ve tried.
Brian: moving from bulk ETL to near real time ETL with Tungsten Replicator to replicate from MySQL to PostgreSQL and Solr
Marty: Pentaho
Gabrielle: CloverETL, home-grown solution using postgres_fdw, and a brief rant about Informatica if we have time.

If you have a job posting or event you would like me to announce at the meeting, please send it along. The deadline for inclusion is 5pm the day before the meeting.

Our meeting will be held at iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

iovation provides us a light dinner (usually sandwiches or pizza).

Elevators open at 5:45 and building security closes access to the floor at 6:30.

See you there!

Kaarel Moppel: Logging of data modifications and the “log_statement” configuration parameter

11 August, 2016 - 09:36

PostgreSQL has a bagful of server configuration parameters (249 according to my counting for version 9.5) at your disposal, which mostly is a good thing as it enables to take the maximum out of your hardware if you’re willing to put in the necessary time. But some of the parameters might leave the door open […]

The post Logging of data modifications and the “log_statement” configuration parameter appeared first on Cybertec - The PostgreSQL Database Company.

Paul Ramsey: Your Broken PostGIS Upgrade

10 August, 2016 - 19:05

Since the Dawn of Time, people have found PostGIS upgrades difficult and confusing, and this is entirely to be expected, because a PostGIS upgrade consists of a number of interlocking parts. Sometimes, they “upgrade” their version of PostGIS and find out they’ve bricked their system. What gives?

What Makes PostGIS Work?

Before talking about upgrades, it’s important to understand how PostGIS works at all, because that understanding is key to seeing how upgrade scenarios go bad.

PostGIS is a “run-time loadable library” for PostgreSQL. That means we have a block of C code that is added to a running PostgreSQL database. That C code sits in a “library file” which is named (for the current 2.2 version):

Just to add to the confusion: for Windows, the name of the library file is postgis-2.2.dll. For every rule, there must be an exception. For users of Apple OSX, yes, there’s a further exception for you: even though most dynamic libraries on OSX are suffixed .dylib, the PostgreSQL modules on OSX are suffixed .so, just like their Linux counterparts.

The location of the file will vary from system to system.

The presence of the alone is not sufficient to “PostGIS enable” a database. PostGIS consists of a large collection of SQL functions in the database.

The SQL functions are created when you run the CREATE EXTENSION postgis command. Until that time your database knows nothing about the existence or definition of the PostGIS functions.

Once the extension is installed, you can see the definitions of the PostGIS functions in the system tables.

The use of dynamic function and type management catalogs is one of the things which makes PostgreSQL so incredibly flexible for extensions like PostGIS

SELECT * FROM pg_proc WHERE proname = 'st_pointonsurface'; -[ RECORD 1 ]---+-------------------- proname | st_pointonsurface pronamespace | 2200 proowner | 10 prolang | 13 procost | 100 prorows | 0 provariadic | 0 protransform | - proisagg | f proiswindow | f prosecdef | f proleakproof | f proisstrict | t proretset | f provolatile | i pronargs | 1 pronargdefaults | 0 prorettype | 667466 proargtypes | 667466 proallargtypes | proargmodes | proargnames | proargdefaults | prosrc | pointonsurface probin | $libdir/postgis-2.2 proconfig | proacl |

Lots to see here, but most important bit is the entry for the probin column: $libdir/postgis-2.2. This function (like all the other PostGIS functions) is bound to a particular version of the PostGIS C library.

Those of you thinking forward can now begin to see where upgrades could potentially go wrong.

How Things Go Wrong Package Managers

The most common way for things to go wrong is to upgrade the library on the system without upgrading the database.

So, in Red Hat Linux terms, perhaps running:

yum upgrade postgresql94-postgis

This seems straight-forward, but think about what a package manager does during an upgrade:

  • Downloads a new version of the software
  • Removes the old version
  • Copies in the new version

So, if we had PostGIS 2.1.3 installed, and the latest version is 2.2.2, what has happend?

  • The file has been removed
  • The file has been added
  • So, the pg_proc entries in every PostGIS-enabled database now point to a library file that does not exist

Fortunately this mismatch between the pg_proc entries and the system state is usually solved during the very next step of the upgrade. But it’s a manual step, and if the DBA and system administrator are different people with different schedules, it might not happen.

Your next step should be to go and update the SQL function definitions by running an extension update on all your databases:


If you don’t, you’ll find that none of the PostGIS functions work. That, in fact, you cannot even dump your database. The very act of outputting a representation of the geometry data is something that requires the PostGIS C library file, and until you run ALTER EXTENSION the database doesn’t know where the new library file is.


Since the use of CREATE EXTENSION postgis (available since PostgreSQL 9.1+ and PostGIS 2.0+) became commonplace, migrations now almost always “just work”, which is excellent news.

  • When you dump a modern PostGIS-enabled database, that was created using the CREATE EXTENSION postgis command, the dump file just includes a CREATE EXTENSION postgis command of its own at the top.
  • When you load the dump file into a new version of PostgreSQL even with a new version of PostGIS, the extension is created and the data magically loads.

However, there are still some old databases around that were created before the PostgreSQL extension system was invented, and when you dump them you get not only the data, but all the “custom” function and type definitions, including the defintions for PostGIS. A function definition looks like this:

CREATE OR REPLACE FUNCTION ST_PointOnSurface(geometry) RETURNS geometry AS '$libdir/postgis-2.2', 'pointonsurface' LANGUAGE 'c' IMMUTABLE STRICT;

And look what is hiding inside of it: a reference to a particular version of the PostGIS library! So you cannot simply dump your old PostGIS 1.5 database on PostgreSQL 8.4 and load it into a fresh PostGIS 2.2 database on PostgreSQL 9.5: the function definitions won’t reference the right library file.

The best bet for a really old database that was created without the extension mechanism is to use the “hard upgrade” process. The hard upgrade works by:

  • Taking a special “custom-format” back-up that includes an object catalog;
  • Filtering the back-up to clean out all the PostGIS-specific function and object definitions; and then
  • Loading the “cleaned” back-up into a new database with the desired version of PostGIS already installed (using CREATE EXTENSION postgis this time, so you never have to hard upgrade again).

In the case of upgrades that change out the underlying library and other situations that result in a mismatch between the SQL definitions in the database and the state of the system, there are a couple hacks that provide short-term fixes for emergencies:

  • Symlink the library name the database is looking for to the library name you have. So if your database wants and all you have is, you can ln -s and your database will “work” again.
  • Update the PostgreSQL catalog definitions for the functions. As a super-user, you can do all kinds of dangerous things, and one of them is to just UPDATE pg_proc SET probin = '$libdir/postgigs-2.2' WHERE probin ~ 'postgis-2.1'

Both hacks “work” because the PostGIS project doesn’t change underlying function names often, and inter-version changes mostly involve adding functions to the C library, not removing old ones.

However, there’s no guarantee that an underlying function name hasn’t change between versions, it’s just unlikely. In the worst case, the function name hasn’t changed, but the parameters have, so it’s now possible that calling the function will crash your database.

All this to say: linking and SQL catalogue hacks should be used temporarily only until you can properly upgrade your database using a hard upgrade.

Raghavendra Rao: How to rotate PgBouncer logs in Linux/Windows ?

9 August, 2016 - 00:10
Before doing a deep dive into the subject, a short outline about PgBouncer, its a lightweight connection pooler for PostgreSQL that dramatically reduces the processing time and resources for maintaining a large number of client connections to one or more databases. Typically used to increase the number of user connections that can be handled in a high performance environment. For more details on Installing/Configuring PgBouncer refer to the documentation here.Like other tools, PgBouncer has a  stderr/syslog logging architecture to record connection, disconnection, and  pooler_errors with different verbosity levels. As of now, the greater part of logging go to one single file "pgbouncer.log" and grows endlessly. Sometimes, it might be a potential risk of making a system unresponsive due to lack of disk space on the log file location. At present, PgBouncer logging has no in-built configuration to rotate logs on the basis of age or size, hence it forces users to choose alternative methods. IMO, there are two approaches to handle it :-
  1. Configure PgBouncer in "syslog" method to rely on OS log rotation or
  2. Configure log rotation using OS utilities on "pgbouncer.log" file.
Method 1:Its pretty straightforward to configure syslog in PgBouncer, set "syslog" to 1 (default 0); give a name to begin the log line in OS logs in "syslog_ident" (default 'pgbouncer') and specify the facility details in "syslog_facility" (default daemon). A sample output from my OS logs(/var/log/messages):Aug 5 16:54:27 raghavt pgbouncer[62549]: C-0x1cdfe60: postgres/postgres@unix(62621):6432 login attempt: db=postgres user=postgres tls=no
Aug 5 16:54:27 raghavt pgbouncer[62549]: S-0x1ce4b10: postgres/postgres@ new connection to server (from
Aug 5 16:54:27 raghavt pgbouncer[62549]: C-0x1cdfe60: postgres/postgres@unix(62621):6432 closing because: client close request (age=0)Note: If "syslog" enabled, comment or blank out the "logfile" parameter, else it will be additional logging. Method 2:Logrotate is one of the OS utility that has an ability to rotate logs systematically and archive to reduce an operating system's disk space requirement. Each log file may be handled daily, weekly, monthly, or when it grows too large. A default configuration file "/etc/logrotate.conf" defines the log rotation age/size/interval.  Using this tool logs can be kept longer with less disk space. Many people have articulated about the usage of the utility which you can discover it over net anyway, thus am jumping directly into the implementation phase.First, create a configuration file in /etc/logrotate.d/ directory for pgbouncer logs. I have named it as "/etc/logrotate.d/pgbouncer" with below details:/var/log/pgbouncer/pgbouncer.log {
      rotate 10
      size 10m
      create 0640 postgres postgres
           /bin/kill -HUP `cat /var/pgbouncer-postgres/ 2> /dev/null` 2>/dev/null ||true
}About the configuration file, first line indicate the pgbouncer log file location("logfile" parameter values in pgbouncer.ini file) and next are the parameters that work on rotation thresholds like; how many log files to maintain (rotate); issue no error and go on to next log (missingok); what script should be executed pre/post rotation (prerotate/postrotate); run once or multiple times pre/post scripts (sharedscripts); do not rotate the log if it is empty (notifempty); after rotation an old log file should be compressed with gzip utility (compress/nocompress); on how much size log rotation should be performed (size); how often to rotate a particular log (daily); and what permission new log file should be (create).
Now we can see new log files rotated with 10M size. (We can even force the rotation with command "logrotate -f /etc/logrotate.conf")
[root@ pgbouncer]# ls -lrth
total 16K
-rw-r-----. 1 postgres postgres 10M Jul 27 15:30 pgbouncer.log-20160727
-rw-r-----. 1 postgres postgres 11K Jul 27 18:32 pgbouncer.logThat was simple right , now lets check the same on Windows environment.On Windows:I know very less about windows utilities, consequently I did some googling and found a Windows version utility called "LogRotateWin"  which works same like Linux version of logrotate. For more details refer to detailed documentation available on Installation/Configuration/Usage hereLet's see how it works, first download ".msi" version of LogRotateWin available on the site as "logrotateSetup*.zip" file. Extract and execute the ".msi" file, it will install the utility to "c:\Program Files (x86)\LogRotate" location. You can find the default configuration file(logrotate.conf) under "c:\Program Files (x86)\LogRotate\Content". Next, edit the "c:\Program Files (x86)\LogRotate\Content\logrotate.conf" file and specify the full path of "pgbouncer.log" file with same rotation parameters. A sample copy of my configuration file tested on Windows 10. (Note: Below parameter values are used to test the utility)c:\Program Files (x86)\LogRotate\Content>more logrotate.conf
"c:\Program Files (x86)\PgBouncer\log\pgbouncer.log" {
rotate 10
size 200k
}To verify, I have forced the log rotation with "-f" optionc:\Program Files (x86)\LogRotate>logrotate.exe -f Content\logrotate.conf
logrotate: Force option set to trueHere's the result:C:\Program Files (x86)\PgBouncer\log>dir
Volume in drive C has no label.
Volume Serial Number is F226-9FFB

Directory of C:\Program Files (x86)\PgBouncer\log

08/08/2016 01:31 PM <DIR> .
08/08/2016 01:31 PM <DIR> ..
08/08/2016 01:31 PM 0 pgbouncer.log
08/08/2016 01:31 PM 6,626 pgbouncer.log.1
08/08/2016 01:31 PM 13,252 pgbouncer.log.2
3 File(s) 19,878 bytes
2 Dir(s) 26,905,051,136 bytes freeNice right !!!.On most Linux distributions, logrotate runs daily using "logrotate.conf" as part of cronjob, similarly on Windows, we can schedule a task in Windows Task Scheduler to rotate the logs daily. FYI, I have not explored much on "LogRotateWin" utility just a basic level. In case, if you encounter any issue please post it on logrotate General Discussion forum.Thank you for reading. --Raghav

Tomas Vondra: PostgreSQL vs. Linux kernel versions

8 August, 2016 - 18:05

I’ve published multiple benchmarks comparing different PostgreSQL versions, as for example the performance archaeology talk (evaluating PostgreSQL 7.4 up to 9.4), and all those benchmark assumed fixed environment (hardware, kernel, …). Which is fine in many cases (e.g. when evaluating performance impact of a patch), but on production those things do change over time – you get hardware upgrades and from time to time you get an update with a new kernel version.

For hardware upgrades (better storage, more RAM, faster CPUs, …), the impact is usually fairly easy to predict, and moreover people generally realize they need to assess the impact by analyzing the bottlenecks on production and perhaps even testing the new hardware first.

But for what about kernel updates? Sadly we usually don’t do much benchmarking in this area. The assumption is mostly that new kernels are better than older ones (faster, more efficient, scale to more CPU cores). But is it really true? And how big is the difference? For example what if you upgrade a kernel from 3.0 to 4.7 – will that affect the performance, and if yes, will the performance improve or not?

From time to time we get reports about serious regressions with a particular kernel version, or sudden improvement between kernel versions. So clearly, kernel versions may affects performance.

I’m aware of a single PostgreSQL benchmark comparing different kernel versions, made in 2014 by Sergey Konoplev in response to recommendations to avoid 3.0 – 3.8 kernels. But that benchmark is fairly old (the last kernel version available ~18 months ago was 3.13, while nowadays we have 3.19 and 4.6), so I’ve decided to run some benchmarks with current kernels (and PostgreSQL 9.6beta1).

PostgreSQL vs. kernel versions

But first, let me discuss some significant differences between policies governing commits in the two projects. In PostgreSQL we have the concept of major and minor versions – major versions (e.g. 9.5) are released roughly once a year, and include various new features. Minor versions (e.g. 9.5.2) only include bugfixes, and are released about every three months (or more frequently, when a serious bug is discovered). So there should be no major performance or behavior changes between minor versions, which makes it fairly safe to deploy minor versions without extensive testing.

With kernel versions, the situation is much less clear. PostgreSQL kernel also has branches (e.g. 2.6, 3.0 or 4.7), those are by no means equal to “major versions” from PostgreSQL, as they continue to receive new features and not just bugfixes. I’m not claiming that the PostgreSQL versioning policy is somehow automatically superior, but the consequence is that updating between minor kernel versions may easily significantly affect performance or even introduce bugs (e.g. 3.18.37 suffers by OOM issues due to a such non-bugfix commit).

Of course, distributions realize these risks, and often lock the kernel version and do further testing to weed out new bugs. This post however uses vanilla longterm kernels, as available on


There are many benchmarks we might use – this post presents a suite of pgbench tests, i.e. a fairly simple OLTP (TPC-B-like) benchmark. I plan to do additional tests with other benchmark types (particularly DWH/DSS-oriented), and I’ll present them on this blog in the future.

Now, back to the pgbench – when I say “collection of tests” I mean combinations of

  • read-only vs. read-write
  • data set size – active set does (not) fit into shared buffers / RAM
  • client count – single client vs. many clients (locking/scheduling)

The values obviously depend on the hardware used, so let’s see what hardware this round of benchmarks was running on:

  • CPU: Intel i5-2500k @ 3.3 GHz (3.7 GHz turbo)
  • RAM: 8GB (DDR3 @ 1333 MHz)
  • storage: 6x Intel SSD DC S3700 in RAID-10 (Linux sw raid)
  • filesystem: ext4 with default I/O scheduler (cfq)

So it’s the same machine I’ve used for a number of previous benchmarks – a fairly small machine, not exactly the newest CPU etc. but I believe it’s still a reasonable “small” system.

The benchmark parameters are:

  • data set scales: 30, 300 and 1500 (so roughly 450MB, 4.5GB and 22.5GB)
  • client counts: 1, 4, 16 (the machine has 4 cores)

For each combination there were 3 read-only runs (15-minute each) and 3 read-write runs (30-minute each). The actual script driving the benchmark is available here (along with results and other useful data).

Note: If you have significantly different hardware (e.g. rotational drives), you may see very different results. If you have a system that you’d like to test, let me know and I’ll help you with that (assuming I’ll be allowed to publish the results).

Kernel versions

Regarding kernel versions, I’ve tested the latest versions in all longterm branches since 2.6.x (2.6.39, 3.0.101, 3.2.81, 3.4.112, 3.10.102, 3.12.61, 3.14.73, 3.16.36, 3.18.38, 4.1.29, 4.4.16, 4.6.5 and 4.7). There’s still a lot of systems running on 2.6.x kernels, so it’s useful to know how much performance you might gain (or lose) by upgrading to a newer kernel. But I’ve been compiling all the kernels on my own (i.e. using vanilla kernels, no distribution-specific patches), and the config files are in the git repository.


As usual, all the data is available on bitbucket, including

  • kernel .config file
  • benchmark script (
  • PostgreSQL config (with some basic tuning for the hardware)
  • PostgreSQL logs
  • various system logs (dmesg, sysctl, mount, …)

The following charts show the average tps for each benchmarked case – the results for the three runs are fairly consistent, with ~2% difference between min and max in most cases.


For the smallest data set, there’s a clear performance drop between 3.4 and 3.10 for all client counts. The results for 16 clients (4x the number of cores) however more than recovers in 3.12.

For the medium data set (fits into RAM but not into shared buffers), we can see the same drop between 3.4 and 3.10 but not the recovery in 3.12.

For large data sets (exceeding RAM, so heavily I/O-bound), the results are very different – I’m not sure what happened between 3.10 and 3.12, but the performance improvement (particularly for higher client counts) is quite astonishing.


For the read-write workload, the results are fairly similar. For the small and medium data sets we can observe the same ~10% drop between 3.4 and 3.10, but sadly no recovery in 3.12.

For the large data set (again, significantly I/O bound) we can see similar improvement in 3.12 (not as significant as for the read-only workload, but still significant):


I don’t dare to draw conclusions from a single benchmark on a single machine, but I think it’s safe to say:

  • The overall performance is fairly stable, but we can see some significant performance changes (in both directions).
  • With data sets that fit into memory (either into shared_buffers or at least into RAM) we see a measurable performance drop between 3.4 and 3.10. On read-only test this partially recovers in 3.12 (but only for many clients).
  • With data sets exceeding memory, and thus primarily I/O-bound, we don’t see any such performance drops but instead a significant improvement in 3.12.

As for the reasons why those sudden changes happen, I’m not quite sure. There are many possibly-relevant commits between the versions, but I’m not sure how to identify the correct one without extensive (and time consuming) testing. If you have other ideas (e.g. are aware of such commits), let me know.

Andrew Dunstan: Using EXPLAIN json format output in plpgsql

8 August, 2016 - 16:55
It's possible to use EXPLAIN output in plpgsql. In the default text format, the result comes back as a set of text values, so you can process them in a loop like this:

   exp text;
   for exp in explain myquery
      raise notice '%', exp;
   end loop;

If you use json format output, however, the result comes back as a single json document rather than a set of lines. You still might need a loop - I haven't found another way yet of getting the output from EXPLAIN into a variable - but the loop will only have one iteration. Here is an example taken from a function I wrote the other day that lets you get the estimated number of rows from the plan:

   exp json;
   for exp in explain (format json) myquery
      raise notice 'rows: %', exp#>>'{0,Plan,Plan Rows}';
   end loop;

Hans-Juergen Schoenig: Exploding runtime: How nested loops can destroy speed

8 August, 2016 - 11:21

Sometimes it happens that a query slows down dramatically for no obvious reason. In other cases queries go south in case a certain set of input parameters is used. In some of those situations wrong optimizer estimates can be the root cause of all evil. One of my “favorite” issues with wrong planner estimates: Underestimated […]

The post Exploding runtime: How nested loops can destroy speed appeared first on Cybertec - The PostgreSQL Database Company.

Oleg Bartunov: Wildspeed for Postgres - a fast wildcard search for LIKE operator

7 August, 2016 - 23:12
This is a technical post !

What if you want to search %asd% pattern and do it fast ? Wildspeed extension provides GIN index support for wildcard search for LIKE operator.


Oleg Bartunov , Moscow, Moscow University, Russia
Teodor Sigaev , Moscow, Moscow University,Russia


Stable version, included into PostgreSQL distribution, released under BSD license. Development version, available from this site, released under the GNU General Public License, version 2 (June 1991)


Latest version of wildspeed is available from
git clone git://

Wildspeed provides opclass (wildcard_ops) and uses partial match feature of GIN, available since 8.4. Also, it supports full index scan.

The size of index can be very big, since it contains entries for all permutations of the original word, see [1] for details. For example, word hello will be indexed as well as its all permutations:

=# select permute('hello'); permute -------------------------------------- {hello$,ello$h,llo$he,lo$hel,o$hell}

Notice, symbol '$' is used only for visualization, in actual implementation null-symbol '\0' is used.

Search query rewritten as prefix search:

*X -> X$* X*Y -> Y$X* *X* -> X*

For example, search for 'hel*o' will be rewritten as 'o$hel'.

Special function permute(TEXT), which returns all permutations of argument, provided for test purposes.

Performance of wildspeed depends on search pattern. Basically, wildspeed is less effective than btree index with text_pattern_ops for prefix search (the difference is greatly reduced for long prefixes) and much faster for wildcard search.

Wildspeed by default uses optimization (skip short patterns if there are long one), which can be turned off in Makefile by removing define -DOPTIMIZE_WILDCARD_QUERY.


1., see also,


Table words contains 747358 records, w1 and w2 columns contains the same data in order to test performance of Btree (w1) and GIN (w2) indexes:

Table "public.words" Column | Type | Modifiers --------+------+----------- w1 | text | w2 | text | words=# create index bt_idx on words using btree (w1 text_pattern_ops); CREATE INDEX Time: 1885.195 ms words=# create index gin_idx on words using gin (w2 wildcard_ops); vacuum analyze; CREATE INDEX Time: 530351.223 ms

words=# select pg_relation_size('words'); pg_relation_size ------------------ 43253760 words=# select pg_relation_size('gin_idx'); pg_relation_size ------------------ 417816576 (1 row) words=# select pg_relation_size('bt_idx'); pg_relation_size ------------------ 23437312 (1 row) words=# select count(*) from words where w1 like 'a%'; count ------- 15491 (1 row) Time: 7.502 ms words=# select count(*) from words where w2 like 'a%'; count ------- 15491 (1 row) Time: 31.152 ms

Wildcard search:

words=# select count(*) from words where w1 like '%asd%'; count ------- 26 (1 row) Time: 147.308 ms words=# select count(*) from words where w2 like '%asd%'; count ------- 26 (1 row) Time: 0.339 ms

Full index scan:

words=# set enable_seqscan to off; words=# explain analyze select count(*) from words where w2 like '%'; QUERY PLAN -------------------------------------------------------------------------------------------------------------- ----------------------------- Aggregate (cost=226274.98..226274.99 rows=1 width=0) (actual time=2218.709..2218.709 rows=1 loops=1) -> Bitmap Heap Scan on words (cost=209785.73..224406.77 rows=747283 width=0) (actual time=1510.516..1913. 430 rows=747358 loops=1) Filter: (w2 ~~ '%'::text) -> Bitmap Index Scan on gin_idx (cost=0.00..209598.91 rows=747283 width=0) (actual time=1509.358..1 509.358 rows=747358 loops=1) Index Cond: (w2 ~~ '%'::text) Total runtime: 2218.747 ms (6 rows)

Pavel Stehule: Prototype of XMLTABLE function is done

7 August, 2016 - 12:32
The parsing of XML is not simple in Postgres. For more complex data I prefer external procedures in PLPerlu or PLPythonu based on usage of Xmlreader. I hope so this this time will be history. With ANSI SQL XMLTABLE function a transformation of any XML document to table is simple and fast:

postgres=# SELECT * FROM xmldata;
│ data │
│ <ROWS> ↵│
│ <ROW id="1"> ↵│
│ </ROW> ↵│
│ <ROW id="2"> ↵│
│ </ROW> ↵│
│ <ROW id="3"> ↵│
│ </ROW> ↵│
│ <ROW id="4"> ↵│
│ </ROW> ↵│
│ <ROW id="5"> ↵│
│ </ROW> ↵│
│ <ROW id="6"> ↵│
│ <REGION_ID>3</REGION_ID><SIZE unit="km">791</SIZE> ↵│
│ </ROW> ↵│
│ </ROWS> │
(1 row)

postgres=# SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL xmltable('/ROWS/ROW'
COLUMNS id int PATH '@id',
country_name text PATH 'COUNTRY_NAME',
country_id text PATH 'COUNTRY_ID',
region_id int PATH 'REGION_ID',
size float PATH 'SIZE',
unit text PATH 'SIZE/@unit',
premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified');

│ id │ country_name │ country_id │ region_id │ size │ unit │ premier_name │
│ 1 │ Australia │ AU │ 3 │ ¤ │ ¤ │ not specified │
│ 2 │ China │ CN │ 3 │ ¤ │ ¤ │ not specified │
│ 3 │ HongKong │ HK │ 3 │ ¤ │ ¤ │ not specified │
│ 4 │ India │ IN │ 3 │ ¤ │ ¤ │ not specified │
│ 5 │ Japan │ JP │ 3 │ ¤ │ ¤ │ Sinzo Abe │
│ 6 │ Singapore │ SG │ 3 │ 791 │ km │ not specified │
(6 rows)

link to patch