Planet PostgreSQL

Syndicate content
Planet PostgreSQL
Updated: 1 year 21 hours ago

Shaun M. Thomas: PG Phriday: Elephantary, My Dear

5 August, 2016 - 20:38

Occasionally with a lot of similar-sounding terminology, there’s ample room for misunderstandings. This is especially relevant with overloaded terms like ‘index’, which can be anything from a data lookup to a term used in mutual funds. This is further complicated if a developer’s first experience with databases is with another variant with sufficiently divergent syntax, methodology, or assumptions. To prevent future code refactors born of misunderstandings, let’s build a basic Postgres glossary to act as an executive summary for app devs.

Let’s consider this a continuation of our exploration of databases and schemas. We need a solid glossary of terms centered around frequently conflated concepts.


At its core, a Postgres table is just a container for columns of various types. Each record in a table is stored as a row of one or more columns. We can see that easily enough here:

CREATE TABLE animal_sound ( animal VARCHAR, sound VARCHAR );   INSERT INTO animal_sound (animal, sound) VALUES ('Cow', 'Moo'); INSERT INTO animal_sound (animal, sound) VALUES ('Cat', 'Meow');   SELECT * FROM animal_sound;   animal | sound --------+------- Cow | Moo Cat | Meow

Nothing groundbreaking here, right? Even the most rank database newbie should know this much. It’s when we start adding elements on top that things start to go sideways. Let’s take this one step at a time to prevent that from happening.


The next thing a developer will probably want is an auto-incrementing field of some kind. MySQL does this with their AUTO_INCREMENT decorator. MS SQL uses a function called IDENTITY that needs parameters to determine operation. Oracle requires something called a SEQUENCE that must either be used explicitly or tied to the table with a TRIGGER. Postgres uses sequences too, but they tend to be hidden behind the SERIAL data type.

DROP TABLE animal_sound;   CREATE TABLE animal_sound ( id SERIAL, animal VARCHAR, sound VARCHAR );   INSERT INTO animal_sound (animal, sound) VALUES ('Cow', 'Moo'); INSERT INTO animal_sound (animal, sound) VALUES ('Cat', 'Meow');   SELECT * FROM animal_sound;   id | animal | sound ----+--------+------- 1 | Cow | Moo 2 | Cat | Meow   \d animal_sound   COLUMN | TYPE | Modifiers --------+-------------------+------------------------------------------------ id | INTEGER | NOT NULL DEFAULT NEXTVAL('animal_sound_id_seq') animal | CHARACTER VARYING | sound | CHARACTER VARYING |

Behind the scenes, Postgres is transforming the SERIAL type into a SEQUENCE with a DEFAULT value for the column. This makes the Postgres approach something of a middle-ground between the strict Oracle approach, and the much looser MySQL decorator. We saw this explicitly by asking Postgres to tell us the structure of the table.

If we truly desired to use a manual approach, it would have looked like this:

DROP TABLE animal_sound;   CREATE SEQUENCE animal_sound_id_seq;   CREATE TABLE animal_sound ( id INT DEFAULT NEXTVAL('animal_sound_id_seq'), animal VARCHAR, sound VARCHAR );   ALTER SEQUENCE animal_sound_id_seq OWNED BY;

So that SERIAL type really saved us quite a bit of work. We didn’t need to explicitly create the sequence, and we didn’t need to marry it to the table column that uses it. That last ALTER SEQUENCE statement is necessary so Postgres knows the sequence is associated with the table. This way, if we drop the table, or export it into a dump file, the sequence (and its most recent value!) is included. Basically, It becomes integral to the table’s operation.

Since a sequence is a separate database object, it has its own attributes. Beyond just the starting value and increment size, we can tweak wraparound, caching, and a couple of other things. There really is very little magic in a Postgres database, and anything that seems magical is probably just a wrapper for something we could have done by hand. This makes for a very powerful hybrid approach that covers both the common and advanced use cases.

Keep in mind that sequences being a separate entity from the table means the value is not affected by underlying operations on the table or its data. What happens to a sequence if we delete all data from a table it’s associated with?

DELETE FROM animal_sound;   SELECT last_value FROM animal_sound_id_seq;   last_value ------------ 3

So far as the sequence is concerned, the next ‘id’ value will be 3 unless it’s explicitly set to a different value. If this were MySQL or SQL Server, we’d start back at 1. This may seem like a limitation, but it means sequences can be used for things other than incrementing data for a single table column. One sequence can be shared by many tables, used to generate other values, or just act as a seed for some other process. These are all uncommon or otherwise advanced techniques, but they remain possible due to sequences remaining distinct entities.


At its most basic, an index is just a functional calculation of some kind, performed on the value of one or more table columns. The result is then processed into some kind of storage structure that is optimized for instant (or at least very fast) lookup. The Postgres default is to use a B-tree, but there are numerous other options.

Let’s add an index to the id column of our fake table:

CREATE INDEX idx_animal_sound_id ON animal_sound (id);   \d animal_sound   COLUMN | TYPE | Modifiers --------+-------------------+------------------------------------------------ id | INTEGER | NOT NULL DEFAULT NEXTVAL('animal_sound_id_seq') animal | CHARACTER VARYING | sound | CHARACTER VARYING | Indexes: "idx_animal_sound_id" btree (id)

Just like other database-level objects, indexes exist distinctly and separately from every other object in the database. They utilize storage, and incur overhead. After all, new or updated data necessarily invokes the function that maintains indexed versions of database columns. This is why we don’t simply index every column in a table.

Due to this, there are only two cases where Postgres will automatically create an index: for primary keys and unique constraints. We’ll discuss these special cases next. For now though, consider that indexes exist separately as objects we can manipulate independently as with other base-level objects. They do not replace the column data, nor even represent it. Indexes are just a lookup structure bolted onto the table, with implicit routines to maintain themselves for fast lookup.

Indexes come into use when dereferencing the lookup is faster than just reading the entire table. As such, our example table is far too small to see an index in action. But we can get around that:

CREATE TABLE huge_blob AS SELECT id FROM generate_series(1, 1000) id;   CREATE INDEX idx_huge_blob_id ON huge_blob (id);   EXPLAIN SELECT * FROM huge_blob WHERE id = 17;   QUERY PLAN ----------------------------------------------------- INDEX ONLY Scan USING idx_huge_blob_id ON huge_blob (cost=0.28..8.29 ROWS=1 width=4) INDEX Cond: (id = 17)

See that? Instead of reading the entire table, Postgres calculated the hash for the lookup value of 17, and found that position in the B-tree to obtain the actual location of the row. There’s a longer discussion to have regarding good index candidates, optimal index types, index creation parameters and other minutiae, but we’ll save that for a future article.


Constraints are rules we want to apply to our table columns. Maybe they should only represent specific values, or reject certain combinations. The most common however, are PRIMARY KEY and UNIQUE constraints. And why not? A primary key effectively represents the main lookup value for a row in our table, and the need for unique combinations is easily understood.

Let’s start with primary keys, since it’s very rare to encounter a table without one. Let’s modify our sample table a bit and show one in action:

DROP TABLE animal_sound;   CREATE TABLE animal_sound ( id SERIAL, animal VARCHAR, color VARCHAR, sound VARCHAR );   ALTER TABLE animal_sound ADD CONSTRAINT animal_sound_pkey PRIMARY KEY (id);   INSERT INTO animal_sound (id, animal, color, sound) VALUES (1, 'Cow', 'Brown', 'Moo'); INSERT INTO animal_sound (id, animal, color, sound) VALUES (1, 'Cow', 'Spotted', 'Moo?');   ERROR: duplicate KEY VALUE violates UNIQUE CONSTRAINT "animal_sound_pkey"   \d animal_sound   COLUMN | TYPE | Modifiers --------+-------------------+------------------------------------------------ id | INTEGER | NOT NULL DEFAULT NEXTVAL('animal_sound_id_seq') animal | CHARACTER VARYING | color | CHARACTER VARYING | sound | CHARACTER VARYING | Indexes: "animal_sound_pkey" PRIMARY KEY, btree (id)

We can see here that the primary key prevented a duplicate value in the id column of our table. Adding the primary key also gave us a “free” index on the column as well. This should be expected, since we need to check candidates to prevent utilizing existing id values. Without this constraint, both insert statements would have succeeded, and we’d have no easy way to differentiate the data at a later date without using every column.

With the primary key in place, we can always rely on the id column being unique, and acting as the direct representative of each table row. Do not confuse a primary key with an index however! A primary key is a constraint that uses an index.

This is a statement that also applies to unique constraints. Let’s add one and test it out:

DROP TABLE animal_sound;   CREATE TABLE animal_sound ( id SERIAL PRIMARY KEY, animal VARCHAR, color VARCHAR, sound VARCHAR );   ALTER TABLE animal_sound ADD CONSTRAINT udx_animal_sound_by_animal_color UNIQUE (animal, color);   INSERT INTO animal_sound (animal, color, sound) VALUES ('Cow', 'Brown', 'Moo'); INSERT INTO animal_sound (animal, color, sound) VALUES ('Cow', 'Spotted', 'Moo?'); INSERT INTO animal_sound (animal, color, sound) VALUES ('Cow', 'Brown', 'MooOOOoOo');   ERROR: duplicate KEY VALUE violates UNIQUE CONSTRAINT "udx_animal_sound_by_animal_color"   \d animal_sound   COLUMN | TYPE | Modifiers --------+-------------------+------------------------------------------------ id | INTEGER | NOT NULL DEFAULT NEXTVAL('animal_sound_id_seq') animal | CHARACTER VARYING | color | CHARACTER VARYING | sound | CHARACTER VARYING | Indexes: "animal_sound_pkey" PRIMARY KEY, btree (id) "udx_animal_sound_by_animal_color" UNIQUE CONSTRAINT, btree (animal, color)

There are a couple new pieces here. First, notice that we took advantage of some shorthand to create the primary key this time around. By adding the PRIMARY KEY decorator after a column definition, Postgres will automatically add the primary key constraint and choose a name for us following a fairly simple naming scheme. It’s still the same constraint, but like SERIAL, this method saves us a lot of work.

Next, we can see that our unique constraint automatically created an index on our behalf. This works the same way as the primary key; Postgres prevents non-unique values by checking new rows against the existing record of unique combinations. In our case, we’re allowed to have a brown or spotted cow, but not two separate brown cows. Again, a unique constraint is not an index, but uses a unique index to enforce the constraint.

There’s a lot to learn about constraints, but for now, they’re beyond the scope of this article. We’ll probably cover them in the future, of course!


In the end, we only really need to remember that all of the base elements discussed here are individually distinct. A column is not an index. A sequence is not a primary key. An index is not a constraint. A primary key is not an index. But a column can be indexed. Sequences can provide default values to a column. Constraints can utilize an index to enforce themselves.

Tables, sequences, and indexes can be tied together with syntactical glue, but exist independently. Don’t get these things mixed up because other database engines use them differently or deviate conceptually. While invoking the handy SERIAL and PRIMARY KEY shorthand, keep in mind what’s really going on under the hood. There may be occasions where it matters, or we need to leverage the distinction in our favor for advanced use cases.

At least Postgres gives us that option.

Chris Travers: use lib '.' considered harmful (exploits discussed)

5 August, 2016 - 19:22
Which the discussion of CVE-2016-1238, a quick and easy fix for broken code that has been suggested is to add the following line to the top of broken Perl scripts:  Note this applies to Perl as run anywhere, whether pl/perlU, plain perl, or something else.

use lib '.';

In some corners, this has become the goto solution for the problem (pun quite deliberate).  It works, gets the job done, and introduces subtle, hidden, and extremely dangerous problems in the process.

For those interested in the concerns specific to PostgreSQL, these will be discussed near the end of this article.

Security and the Garden Path
I am borrowing an idea here from linguistics, the idea of the garden path, as something that I think highlights a lot of subtle security problems.  Consider the newspaper headline "Number of Lothian patients made ill by drinking rockets."  Starts off simple enough and you get to the end, realizing you must have misread it (and you did, probably, since the number of patients increased who were made ill by drinking, not that some people got sick because they drank hydrazine).  The obvious reading and the logical reading diverge and this leads to a lot of fun in linguistic circles.

The same basic problem occurs with regard to security problems.  Usually security problems occur because of two problems. Either people do something obviously insecure (plain text authentication for ftp users where it matters) or they do something that looks on the surface like it is secure but behind the scenes does something unexpected.

Perl here has a few surprises here because parsing and running a script is a multi-pass process but we tend to read it as a single pass. Normally these don't cause real problems but in certain cases there are very subtle dangers lurking.  Here, with use lib '.', it is possible to inject code into a running program as long as an attacker can get a file placed in the current working directory of the program.

Relative paths, including the current working directory, do have legitimate use cases, but the problems and pitfalls must be understood before selecting this method.

What the lib pragma does
Perl looks for files to require or include based on an array of paths, globally defined, called @INC.  Use lib stores a copy of the original lib at first use, and then ensures that the directory specified occurs at the start of the search order.  So directories specified with use lib are searched before the default library directories.  This becomes important as we look at how Perl programs get executed.

How Perl runs a program
Perl runs a program in two passes.  First it creates the parse tree, then it runs the program.  This is a recurive process and because of how it works, it is possible for malicious code that gets accidently run in this process to transparently inject code into this (and other portions) of the Perl process.

Keep in mind that this makes Perl a very dynamic language which, on one hand, has serious garden path issues, but on the other ensures that it is an amazingly flexible language.

During the parse stage, Perl systematically works through the file, generating a parse tree, and running any "BEGIN" blocks, "use" statements, and "no" statements.  This means that injected code can be run even if later sytnax errors appear to prevent the bulk of the program from running at all or if earlier errors cause run-time exception.

After this process finishes, Perl executes the parse tree that results.  This means that Perl code can rewrite the parse tree before your code is ever really written and that code can be inserted into that part o the process.

Transparent code injection during 'use'
Consider a simple Perl script:


use lib '.';
use Cwd;
use 5.010;
use strict;
use warnings;

say getcwd();

Looks straight-forward.  And in most cases it will do exactly what it looks like it does.  It loads the standard Cwd module and prints out the current working directory.
However, suppose I run it in a different directory, where I add two additional files: contains:
package Cwd;
use Injected;

hmmmm  that doesn't look good. What does do?

package Injected;use strict;
sub import {   local @INC = @INC;   my ($module) = caller;;   warn $module;   delete $INC{''};   delete $INC{"$"};   @INC = grep { $_ ne '.' } @INC;   eval "require $module";   warn "Got you!, via $module";

So when Cwd imports Injected, it deletes itself from the memory of having been included, deletes its caller too, reloads the correct caller (not from the current working directory) and then executes some code (here a harmless warning). then returns success runs Cwd->import() which is now the correct one, but we have already run unintended code that could in theory do anything.

Any program capable of being run in arbitrary directories, written in Perl, which has this line in it (use lib '.') is subject to arbitrary code injection attacks using any module in the dependency tree, required or not.

Instead, do the opposite (where you can)
As a standard partof boilerplate in any secure Perl program, I strongly recommend adding the following line to the top of any script.  As long as modules don't add it back in behind your back (would be extremely rare that they would), adding the following line:

no lib '.';

Note that this strips out the current working directory even if it si supplied as a command-line argument.  So it may not be possible in all cases.  So use common sense, and do some testing, and document this as desired behavior.  Note one can still invoke with perl -I./. in most cases so it is possible to turn this safety off.....  Additionally if you put that at the start of your module, something you include could possibly put it back in.

Safer Alternatives
In a case where you need a path relative to the script being executed, FindBin is the ideal solution.  It gives you a fixed path relative to the script being run, which is usually sufficient for most cases of an application being installed on a system as a third party.  So instead you would do:

use FindBin;
use lib $FindBin::Bin;

Then the script's directory will be in the include path.

PL/PerlU notes:
I always add the explicit rejection of cwd in my plperlu functions.  However if someone has a program that is broken by CVE-2016-1238 related fixes, it is possible that someone would add a use lib '.' to a perl module, which is a bad idea.  As discussed in the previous post, careful code review is required to be absolutely safe.  Additionally, it is a very good idea to periodically check the PostgreSQL data directory for perl modules which would indicate a compromised system.

Simon Riggs: PostgreSQL Solutions Roadmap

5 August, 2016 - 09:30

In a recent blog, I described features for PostgreSQL Core that we’ve been working on

Many people have asked for a similar roadmap for BDR and Postgres-XL. I can confirm that both are under active development and in active use.

Postgres-XL (link)

  • XL 9.5 v1.2 is now available, with more updates coming.
  • XL 9.6 has begun development work, together with active consideration of how to merge parts of that back into Postgres Core and/or minimize the overall set of changes.

BDR (link)

  • BDR 9.4 v0.9.3 is current version. We’re continuing to work on BDR 9.4 and will be publishing v1.0 sometime soon
  • BDR 9.6 is the next objective.
  • We’re also working to merge Logical Replication into PostgreSQL 10 and hopefully the rest of BDR functionality into PostgreSQL 10 or 11.

More details on the contents of those releases will be available separately.

Hans-Juergen Schoenig: paginators – The story about voluntary pain

3 August, 2016 - 13:17

It happens on a quite regular basis that people contact the Cybertec PostgreSQL Support Desk to get  support for a pretty common thing: Paginators on websites. What is the main issue? How hard can be it to display simple tables after all? A typical scenario with paginators Suppose you want to display a table on […]

The post paginators – The story about voluntary pain appeared first on Cybertec - The PostgreSQL Database Company.

Chris Travers: PostgreSQL, PL/Perl, and CVE-2016-1238

3 August, 2016 - 07:39
This post is about the dangers in writing user defined functions in untrusted languages, but it is also specifically about how to avoid CVE-2016-1238-related problems when writing PL/PerlU functions.  The fix is simple and straight-forward and it is important for it to be in pl/perlU stored procedures and user defined functions for reasons I will discuss.  This discusses actual exploits and the severity of being able to inject abritrary Perl code into the running database backend is a good reason to be disciplined and careful about the use of this language.

It is worth saying at the outset that I have been impressed by how well sensible design choices in PostgreSQL generally mitigate problems like this.  In essence you have to be working in an environment where a significant number of security measures have been bypassed either intentionally or not.  This speaks volumes on the security of PostgreSQL's design since it is highly unlikely that this particular attack vector was an explicit concern.  In other words these decisions make many attacks even against untrusted languages far more difficult than they would be otherwise.

The potential consequences are severe enough, however, that secure coding is particularly important in this environment even with the help of the secure design.  And in any language it is easy to code yourself into corners where you aren't aware you are introducing security problems until they bite you.

The current implementation, we will see, already has a fair bit of real depth of defense behind it.  PostgreSQL is not, by default, vulnerable to the problems in the CVE noted in the title.  However, with a little recklessness, it is possible to open the door to the possibility of real problems and it is possible for these problems to be hidden from the code reviewer by error rather than actual malice.  Given the seriousness of what can happen if you can run arbitrary code in the PostgreSQL back-end, my view is that all applicable means should be employed to prevent problems.

PL/PerlU can be used in vulnerable ways, but PL/Perl (without the U) is by design safe.  Even with PL/PerlU it is worth noting that multiple safety measures have to be bypassed before vulnerability becomes a concern.  This is not about any vulnerabilities in PostgreSQL, but what vulnerabilities can be added through carelessly writing stored procedures or user-defined functions.

There are two lessons I would  like people to take away from this.  The first is how much care has been taken with regard to PostgreSQL regarding security in design.  The second is how easily one can accidentally code oneself into a corner.  PL/PerlU often is the right solution for many problem domains and it can be used safely but some very basic rules need to be followed to stay out of trouble.

Extending SQL in PostgreSQL using Arbitrary Languages
PostgreSQL has a language handler system that allows user defined functions and stored procedures to be written in many different languages.  Out of the box, Python, TCL, C, and Perl come supported out of the box.  Perl and TCL come in trusted and untrusted variants (see below), while Python and C are always untrusted.

There are large numbers of external language handlers as well, and it is possible to write ones own.  Consequently, PostgreSQL allows, effectively, SQL to be extended by plugins written in any language.  The focus here will be on untrusted Perl, or PL/perlU.  Untrusted languages have certain inherent risks in their usage and these become important, as here, when there is concern about specific attack vectors.

Trusted vs Untrusted Languages, and what PL/Perl and PL/PerlU can do
PostgreSQL allows languages to be marked as 'trusted' or 'untrusted' by the RDBMS.  Trusted languages are made available for anyone to write functions for while untrusted languages are restricted to database superusers.  Trusted languages are certified by the developers not to interact with file handles, not to engage in any other activity other than manipulating data in the database.

There is no way to 'use' or 'require' a Perl module in pl/perl (trusted) and therefore it is largely irrelevant for our discussion.  However PL/PerlU can access anything else on the system and it does so with the same permissions as the database manager itself.  Untrusted languages, like PL/PerlU make it possible to extend SQL in arbitrary ways by injecting (with the database superuser's permission!) code into SQL queries written in whatever langauges one wants.  Untrusted languages make PostgreSQL one of the most programmable relational databases in the world, but they also complicate database security in important ways which are well beyond the responsibility of PostgreSQL as a project.

Like test harnesses, untrusted languages are insecure by design (i.e. they allow arbitrary code injection into the database backend) and this issue is as heavily mitigated as possible, making PostgreSQL one of the most security-aware databases in the market.

To define a function  in an untrusted language, one must be a database superuser, so the PostgeSQL community places primary trust in the database administration team not to do anything stupid, which is generally a good policy.  This article is largely in order to help that policy be effective.

Paths, Permissions, and CVE-2016-1238
The CVE referenced in the title of this article and section is one which allows attackers to inject code into a running Perl routine by placing it in the current working directory of a Perl process. If an optional dependency of a dependency is placed i the current working directory, it may be included in the current Perl interpreter without the understanding of the user.  This is a problem primarily because Perl programs are sufficiently complex that people rarely understand the full dependency tree of what they are running.

If the current working directory ends up being one which includes user-writeable data of arbitrary forms, the problem exists.  If not, then one is safe.

The problem however is that changing directories changes the include path.  You can demonstrate this by doing as follows:

In ./ create a file that contains:

use test;
use test2;

./ is:

chdir test;
test/ is:

warn 'haxored again';

What happens is that the use test statement includes ./ which changes directory, so when you use test2, you are including from test/   This means that if any period during this cycle of the Perl interpreter's life you are in a world-writable directory, and including or requring files, you are asking for trouble.

Usual Cases for PL/PerlU
The usual use case of PL/PerlU is where you need a CPAN module to process or handle information..  For example you may want to return json using, or you may want to write to a (non-transactional log).

For example, the most recent PL/PerlU function I wrote used basic regular expressions to parse a public document record stored in a database to extract information relating to patents awarded on protein sequences.  It was trivial but was easier to use JSON than to write json serialization inline (and yes, performed well enough given the data sizes operate on).

Are usual cases safe?  Deceptively so!
Here are a few pl/perlU functions which show the relevant environment that PL/PerlU functions run in by default:

postgres=# create or replace function show_me_inc() returns setof text language plperlu as
return \@INC;
postgres=# select show_me_inc();
(7 rows)

postgres=# create or replace function get_cwd() returns text language plperlu as
postgres-# $$
postgres$# use Cwd;
postgres$# return getcwd();
postgres$# $$;
postgres=# select get_cwd();
(1 row)

Wow.  I did not expect such a sensible, secure implementation.  PostgreSQL usually refuses to start if non-superusers of the system have write access to the data directory.  So this is, by default, a very secure configuration right up until the first chdir() operation......

Now, in the normal use case of a user defined function using PL/PerlU, you are going to have no problems.  The reason is that most of the time, if you are doing sane things, you are going to want to write immutable functions which have no side effects and maybe use helpers like to format data.  Whether or not there are vulnerabilities exploitable via, they cannot be exploited in this manner.

However, sometimes people do the wrong things in the database and here, particularly, trouble can occur.

What gets you into trouble?
To have an exploit in pl/perlU code, several things have to happen:

  1. Either a previous exploit must exist which has written files to the PostgreSQL data directory, or one must change directories
  2. After changing directories, a vulnerable module must be loaded while in a directory the attacker has write access to.
It is possible, though unlikely, for the first to occur behind your back.  But people ask me all the time how to send email from the database backend and so you cannot always guarantee people are thinking through the consequences of their actions.
So vulnerabilities can occur when people are thinking tactically and coding in undisciplined ways.  This is true everywhere but here the problems are especially subtle and they are as dangerous as they are rare.

An attack scenario.
So suppose company A receives a text file via an anonymous ftp drop box in X12 format (the specific format is not relevant, just that it comes in with contents and a file name that are specified by the external user).  A complex format like X12 means they are highly unlikely to do the parsing themselves so they implement a module loader in PostgreSQL.  The module loader operates on a file handle as such:

On import, the module loader changes directories to the incoming directory.  In the actual call to get_handle, it opens a file handle, and creates an iterator based on that, and returns it.  Nobody notices that the directory is not changed back because this is then loaded into the db and no other file access is done here.  I have seen design problems of this magnitude go undetected for an extended period, so coding defensively means assuming they will exist.

Now, next, this is re-implemented in the database as such:

CREATE OR REPLACE FUNCTION load_incoming_file(fiilename text) 
language plperlu as
use CompanyFileLoader '/var/incoming'; # oops, we left the data directory
use CompanyConfig 'our_format'; # oops optional dependency that falls back to .
                                                    # in terms of exploit, the rest is irrelevant
                                                    # for a proof of concept you could just return 1 here
use strict;
use warnings;

my $filename = shift;
my $records = CompanyFileLoader->get_handle($filename);
while ($r = $records->next){
    # logic to insert into the db omitted

return $records->count;


The relevant poart of is (the rest could be replaced with stubs for a proof of concept):

package CompanyFileLoader;
use strict;
use warnings;

my @dirstack;
sub import {
    my $dir = pop;
    push @dirstack, $dir;
    chdir $dir;

Now, in a real-world published module, this would cause problems but in a company's internal operations it might not pose discovered problems in a timely fashion.

The relevant part of CompanyConfig is:

package CompanyConfig;
use strict;
use warnings;

eval { require 'SupplementalConfig' };

Now, if a is loaded into the same directory as the text files, it will get loaded and run as part of the pl/perlU function.

Now to exploit this someone with knowledge of the system has to place this in that directory.  It could be someone internal due to failure to secure the inbound web service properly.  It could be someone external who has inside knowledge (a former employee for example).  Or a more standard exploit could be tried on the basis that some other shared module might have a shared dependency.

The level of inside knowledge required to pull that off is large but the consequences are actually pretty dire.  When loaded, the perl module interacts with the database with the same permissions as the rest of the function, but it also has filesystem access as the database server.  This means it could do any of the following:

  1. Write perl modules to exploit this vulnerability in other contexts to the Pg data directory
  2. delete or corrupt database files
  3. possibly alter log files depending on setup.
  4. Many other really bad things.

These risks are inherent with the use of untrusted languages, that you can write vulnerable code and introduce security problems into your database.  This is one example of that and I think the PostgreSQL team has done an extremely good job of making the platform secure.

Disciplined coding to prevent problems
The danger can be effectively prevented by following some basic rules:

All user defined functions and stored procedures in PL/PerlU should include the line:

no lib '.';

It is possible that modules could add this back in behind your back, but for published modules this is extremely unlikely.  So local development projects should not use lib '.' in order to prevent this.

Secondly, never use chdir in a pl/perl function.  Remember you can always do file operations with absolute paths.   Without chdir, no initial exploit against the current working directory is possible through pl/perlu.  Use of chdir circumvents important safety protections in PostgreSQL.

Thirdly it is important that one sticks to well maintained modules.  Dangling chdir's in a module's load logic are far more likely to be found and fixed when lots of other people are using a module, and a dangling chdir is a near requirement to accidental vulnerability.  For internal modules, they need to be reviewed both for optional dependencies usage and dangling chdir's in the module load logic.

Dinesh Kumar: pgBucket beta2 is ready

3 August, 2016 - 05:49
Hi Everyone,

I would like to inform to you all that, pgBucket beta2[Simple concurrent job scheduler for postgresql] version is ready with more stability.

Thank you all in advance for your inputs/comments/suggestions.


Robert Haas: Uber's move away from PostgreSQL

2 August, 2016 - 22:32
Last week, a blog post by an Uber engineer explained why Uber chose to move from PostgreSQL to MySQL. This article was widely reported and discussed within the PostgreSQL community, with many users and developers expressing the clear sentiment that Uber had indeed touched on some areas where PostgreSQL has room for improvement. I share that sentiment. I believe that PostgreSQL is a very good database, but I also believe there are plenty of things about it that can be improved. When users - especially well-known names like Uber - explain what did and did not work in their environment, that helps the PostgreSQL community, and the companies which employ many of its active developers, figure out what things are most in need of improvement.  I'm happy to see the PostgreSQL community, of which I am a member, reacting to this in such a thoughtful and considered way.
Read more »

Simon Riggs: Thoughts on Uber’s List of Postgres Limitations

2 August, 2016 - 18:25

An Uber technical blog of July 2016 described the perception of “many Postgres limitations”. Regrettably, a number of important technical points are either not correct or not wholly correct because they overlook many optimizations in PostgreSQL that were added specifically to address the cases discussed. In most cases, those limitations were actually true in the distant past of 5-10 years ago, so that leaves us with the impression of comparing MySQL as it is now with PostgreSQL as it was a decade ago. This is no doubt because the post was actually written some time/years? ago and only recently published.

This document looks in detail at those points to ensure we have detailed information available for a wider audience, so nobody is confused by PostgreSQL’s capabilities.

Having said that, I very much welcome the raising of those points and also wish to show that the PostgreSQL project and 2ndQuadrant are responsive to feedback. To do this, detailed follow-ups are noted for immediate action.

These points were noted in the blog
* Poor replica MVCC support
* Inefficient architecture for writes
* Inefficient data replication
* Difficulty upgrading to newer releases

Poor replica MVCC support

“If a streaming replica has an open transaction, updates to the database are blocked if they affect rows held open by the transaction. In this situation, Postgres pauses the WAL application thread until the transaction has ended.”

This is true, though misses the point that a parameter exists to control that behaviour, so that when
hot_standby_feedback = on
the described behaviour does not occur in normal circumstances. This is supported from PostgreSQL 9.1 (2011) and above. If you’re not using it, please consider doing so.

Later, this comment leads to the conclusion “Postgres replicas … can’t implement MVCC” which is wholly incorrect and a major misunderstanding. PostgreSQL replicas certainly allow access to data with full MVCC semantics.

Inefficient architecture for writes

“If old transactions need to reference a row for the purposes of MVCC MySQL copies the old row into a special area called the rollback segment.”

“This design also makes vacuuming and compaction more efficient. All of the rows that are eligible to be vacuumed are available directly in the rollback segment. By comparison, the Postgres autovacuum process has to do full table scans to identify deleted rows.”

Moving old rows to a rollback segment adds time to the write path for UPDATEs, but that point isn’t mentioned. PostgreSQL is more efficient architecture for writes in relation to MVCC because it doesn’t need to do as many push-ups.

Later, if the workload requires that we access old rows from the rollback segment that is also more expensive. That is not always needed, yet it is very common for longer running queries to need to access older data. However, if all transactions are roughly the same short duration access to the rollback segment is seldom needed, which just happens to make benchmark results appear good while real-world applications suffer.

By contrast, PostgreSQL has multiple optimizations that improve vacuuming and compaction. First, an optimization called HOT improves vacuuming in heavily updated parts of a table (since 2007), while the visibility map ensures that VACUUM can avoid full table scans (since 2008).

Whether rollback segments help or hinder an application depend on the specific use case and it’s much more complex than this first appears.

Next, we discuss indexes…

“With Postgres, the primary index and secondary indexes all point directly to the on-disk tuple offsets.”

This point is correct; PostgreSQL indexes currently use a direct pointer between the index entry and the heap tuple version. InnoDB secondary indexes are “indirect indexes” in that they do not refer to the heap tuple version directly, they contain the value of the Primary Key (PK) of the tuple.

Comparing direct and indirect indexes we see
* direct indexes have links that go index → heap
* indirect indexes have links that go index → PK index → heap

Indirect indexes store the PK values of the rows they index, so if the PK columns are wide or contain multiple columns the index will use significantly more disk space than a direct index, making them even less efficient for both read and write (as stated in MySQL docs). Also indirect indexes have index search time >=2 times worse than direct indexes, which slows down both reads (SELECTs) and searched writes (UPDATEs and DELETEs).
Performance that is >=100% slower is understated as just a “slight disadvantage” [of MySQL].

“When a tuple location changes, all indexes must be updated.”

This is misleading, since it ignores the important Heap Only Tuple (HOT) optimization that was introduced in PostgreSQL 8.3 in 2007. The HOT optimization means that in the common case, a new row version does not require any new index entries, a point which effectively nullifies the various conclusions that are drawn from it regarding both inefficiency of writes and inefficiency of the replication protocol.

“However, these indexes still must be updated with the creation of a new row tuple in the database for the row record. For tables with a large number of secondary indexes, these superfluous steps can cause enormous inefficiencies.”

As a result of ignoring the HOT optimization this description appears to discuss the common case, rather than the uncommon case. It is currently true that for direct indexes if any one of the indexed columns change then new index pointers are required for all indexes. It seems possible for PostgreSQL to optimize this further and I’ve come up with various designs and will be looking to implement this best fairly soon.

Although they have a higher read overhead, indirect indexes have the useful property that if a table has multiple secondary indexes then an update of one secondary index does not affect the other secondary indexes if their column values remain unchanged. This makes indirect indexes useful only for the case where an application needs indexes that would be infrequently used for read, yet with a high update rate that does not touch those columns.

Thus, it is possible to construct cases in which PostgreSQL consistently beats InnoDB, or vice versa. In the “common case” PostgreSQL beats InnoDB on reads and is roughly equal on writes for btree access. What we should note is that PostgreSQL has the widest selection of index types of any database system and this is an area of strength, not weakness.

The current architecture of PostgreSQL is that all index types are “direct”, whereas in InnoDB primary indexes are “direct” and secondary indexes “indirect”. There is no inherent architectural limitation that prevents PostgreSQL from also using indirect indexes, though it is true that has not been added yet.

We’ve done a short feasibility study and it appears straightforward to implement indirect indexes for PostgreSQL, as an option at create index time. We will pursue this if the HOT optimizations discussed above aren’t as useful or possible, giving us a second approach for further optimization. Additional index optimizations have also been suggested.

Inefficient data replication

“However, the verbosity of the Postgres replication protocol can still cause an overwhelming amount of data for a database that uses a lot of indexes.”

Again, these comments discuss MySQL replication which can be characterized as Logical Replication. PostgreSQL provides both physical and logical replication. All of the benefits discussed for MySQL replication are shared by PostgreSQL’s logical replication. There are also benefits for physical replication in many cases, which is why PostgreSQL provides both logical and physical replication as options.

PostgreSQL physical replication protocol itself is not verbose – this comment is roughly the same as the “inefficient writes” discussion: if PostgreSQL optimizes away index updates then they do not generate any entries in the transaction log (WAL), so there is no inefficiency. Also, the comment doesn’t actually say what we mean by “overwhelming”. What this discussion doesn’t consider is the performance of replication apply. Physical replication is faster than logical replication because including the index pointers in the replication stream allows us to insert them directly into the index, rather than needing to search the index for the right point for insertion. Including the index pointers actually increases not decreases performance, even though the replication bandwidth requirement is higher.

PostgreSQL Logical Replication is available via 2ndQuadrant’s pglogical and will be available in PostgreSQL 10.0 in core.

MySQL “Statement-based replication is usually the most compact but can require replicas to apply expensive statements to update small amounts of data. On the other hand, row-based replication, akin to the Postgres WAL replication, is more verbose but results in more predictable and efficient updates on the replicas.”

Yes, statement-based replication is more efficient in terms of bandwidth, but even less efficient in terms of the performance of applying changes to receiving servers. Most importantly, it leads to various problems and in various cases replication may not work as expected, involving developers in diagnosing operational problems. PostgreSQL probably won’t adopt statement-based replication.

Difficulty upgrading to newer releases

“the basic design of the on-disk representation in 9.2 hasn’t changed significantly since at least the Postgres 8.3 release (now nearly 10 years old).”

This is described as if it were a bad thing, but actually it’s a good thing and is what allows major version upgrades to occur quickly without unloading and reloading data.

“We started out with Postgres 9.1 and successfully completed the upgrade process to move to Postgres 9.2. However, the process took so many hours that we couldn’t afford to do the process again. By the time Postgres 9.3 came out, Uber’s growth increased our dataset substantially, so the upgrade would have been even lengthier.”

The pg_upgrade -k option provides an easy and effective upgrade mechanism. Pg_upgrade does require some downtime, which is why 2ndQuadrant has been actively writing logical replication for some years, focusing on zero-downtime upgrade.

Although the logical replication upgrade is only currently available from 9.4 to 9.5, 9.4 to 9.6 and 9.5 to 9.6, there is more good news coming. 2ndQuadrant is working on highly efficient upgrades from earlier major releases, starting with 9.1 → 9.5/9.6. When PostgreSQL 9.1 is desupported later in 2016 this will allow people using 9.1 to upgrade to the latest versions. This is available as a private service, so if you need zero-downtime upgrade from 9.1 upwards please get in touch.

In 2017, upgrades from 9.2 and 9.3 will also be supported, allowing everybody to upgrade efficiently with zero-downtime prior to the de-supporting of those versions.

Simon Riggs: PostgreSQL 10 Roadmap

2 August, 2016 - 17:50

At the developer meeting we discussed putting up everybody’s roadmap projects in one place

2ndQuadrant PostgreSQL Roadmap Projects

Logical Replication
* Use Case: Upgrade
* Use Case: Partial database replication
* Aiming for a full and complete implementation for 10.0, allowing other use cases to be supported in later releases.

Column Storage & Executor Efficiency
* Indexes
* Columnar Indexes (for Column Store)
* Parallel BRIN scan
* Row insertion using index (BRIN)
* HOT optimizations

Optimizer & Statistics
* Our calculations suggest that more statistics will help query planning

Transactions & Programmability
* Procedures
* Autonomous Transactions

Management features for Replication & Backup
* Bring recovery.conf parameters into postgresql.conf, since it’s v10.0 next
* Allow snapshots on standby servers, allowing parallel backup etc..

Locking & Data Availability
* Further lock reductions for DDL

Distributed Systems
* Node Registry
* Feeding back from Postgres-XL into Core Postgres

Various other projects as well

Chris Travers: CVE-2016-1238 and the hazards of testing frameworks

2 August, 2016 - 09:06
Because I have lost confidence in the approach taken by those in charge of fixing this problem, I have decided to do a full disclosure series on CVE-2016-1238.  As mentioned in a previous post, the proposed fixes for the minor versions don't even remotely fix the problem  and at most can provide a false sense of security.  For this reason it is extremely important that sysadmins understand how these exploits work and how to secure their systems.

A lot of the information here is not specific to this CVE but concerns security regarding running tests generally.  For this reason while this CVE is used as an example, the same basic concepts apply to PostgreSQL db testing, and much more.

As I mentioned in a previous post, prove cannot currently be run safely in any directory that is world writeable, and the current approach of making this a module problem make this far worse, not better.  Moreover this is not something which can be patched in prove without breaking the basic guarantee that it tests the application as it would run on the current system's Perl interpreter and if you break that, all bets are off.

All the exploits I cover in this series are exploitable on fully patched systems.  However they can be prevented by good system administration.  In every case, we will look at how system administration best practices can prevent the problem.

One key problem with the current approach is that it fails to differentiate between unknowing inclusion and user errors brought on simply by not understanding the issues.  The latter has been totally disregarded by the people attempting stop-gap patches.

Test harnesses generally are insecure by design.  The whole point is to execute arbitrary code and see what happens and with this comes a series of inherent risks.  We accept those risks because they are important to the guarantees we usually want to make that our software will perform in the real world as expected.  Moreover even the security risks inherent in test harnesses are good because it is better to find problems in an environment that is somewhat sandboxed than it is in your production environment.

So testing frameworks should be considered to be code injection frameworks and they can be vectors by which code can be injected into a tested application with system security implications.

Understood this way, testing frameworks are not only insecure by design but this is desirable since you don't want a testing framework to hide a problem from you that could bite you in production.

This is a generic problem of course.  A database testing framework would (and should) allow sql injection that could happen in the tested code so that these can be tested against.  So whether you working with Perl, Python, PostgreSQL, or anything else, the testing framework itself has to be properly secured.  And in each platform this means specific things.

In PostgreSQL, this means you need to pay close attention to certain things, such as the fact that the database tests should probably not run as a superuser for example since a superuser can do things like create functions wit system access.

In Perl, one of the most important things to consider is the inclusion of modules from the current working directory and the possibility that someone could do bad things there.

What Prove Guarantees
Prove guarantees that your perl code will run, in its current configuration, in accordance with your test cases.  This necessarily requires arbitrary code execution and arbitrary dependency requirements resolved in the way Perl would resolve them on your system.

Prove guarantees that the guarantees you specify in your test cases are met by your current Perl configuration.  It therefore cannot safely do any extra sandboxing for you.

How Prove Works
The basic architecture of prove is that it wraps a test harness which runs a specified program (via the shebang line) parses its output assuming it to be in the test-anything protocol, and generates a report from the rest.  For example if you create a file test.t:


echo 'ok 1';
echo 'ok 2';
echo '1..2';

and run prove test.t

You will get a report like the following:

$ prove test.t 
test.t .. ok   
All tests successful.

What prove has done is invoke /bin/bash, run the file on it, parse the output, check that 2 tests were run, and that both printed ok (it is a little more complex than this but....), and let you know it worked.

Of course, usually we run perl, not bash.

An Attack Scenario
The most obvious attach scenarios would occur with automated test environment that are poorly secured.  In this case, if prove runs from a directory containing material more than one user might submit, user A may be able to inject code into user B's test runs.

Suppose user A has a hook for customization as follows:

eval { require '' };

Now this is intended to run a file, at most once, when the code is run, and it checks the current @INC paths.  If this doesn't exist it falls back to the current working directory, i.e. the directory the shell was working in when prove was run.  If this directory shares information between users, user b can write a file into that directory which will run when user A's scheduled tests are run.

The code would then run with the permissions of the test run and any misbehavior would tie back to user A's test run (it is harder to tie the behavior to user B).  In the event where user A's test run operates with more system permissions than user B's, the situation is quite a bit worse.  Or maybe user B doesn't even have tests run anymore for some reason.

Now, it has been proposed that prove prune its own @INC but that doesn't address this attack because prove has to run perl separately.  Additionally separation of concerns dictates that this is not prove's problem.

Behavior Considered Reasonably Safe
As we have seen, there are inherent risks to test frameworks and these have to be properly secured against the very real fact that one is often running untrusted code on them.  However, there are a few things that really should be considered safe.  These include:

  • Running tests during installation of trusted software as root.  If you don't trust the software, you should not be installing it.
  • Running tests from directories which store only information from a single user (subdirectories of a user's home directory for example).

Recommendations for Test Systems
Several basic rules for test systems apply:

  1. Understand that test systems run arbitrary code and avoid running test cases for automated build and test systems as privileged users.
  2. Properly secure all working directories to prevent users from unknowingly sharing data or test logic
  3. Running as root is only permitted as part of a process installing trusted software.

Shaun M. Thomas: PG Phriday: The Audacity of NoSQL

22 July, 2016 - 20:55

The pure, unadulterated, presumptuous impudence of NoSQL. Engines like MongoDB recklessly discard concepts like ACID in some futile quest to achieve “web scale”, and end up accomplishing neither. RDBMS systems have literally decades of history to draw upon, and have long since conquered the pitfalls NoSQL platforms are just now encountering. There may be something to a couple of them, but by and large, they’re nothing we really need.

At least, that’s something I might have said a couple of weeks ago.

Now, I’ve never really ascribed to the supposition that NoSQL is the strict antithesis of traditional SQL-driven systems. The intro paragraph is something of a hyperbolic exaggeration of the general disdain that seems to permeate the NoSQL vs SQL debate. Most DBAs I’ve met basically say “meh” and move on with their day, optimizing queries, tweaking storage metrics, and what not. Usually NoSQL is on our RADAR, but we have other stuff to worry about.

Last week, I was forced to reconsider due to an existing internal project that needed my input. That analysis really showed me why devs and data guys from all walks of life are seduced to the Dark Side of data. After another week slogging through documentation, experimenting with self-healing replication sets, self-balancing sharding, and taking copious notes on everything, something finally clicked. The Dark Side, it seems, has some merit beyond simplifying data structure.

So what can Postgres potentially learn from its eternal adversaries? Beyond stealing their encoding mechanisms via JSON and JSONB datatypes, that is. Though my analysis thus far has focused mostly on MongoDB, it alone incorporates several fundamental concepts that I’ve always felt Postgres lacked. For now, I’ll focus on the two that have—in my opinion, anyway—really prevented Postgres from reaching its full potential.

Break Me

Postgres has replication, but no meaningful application beyond having a hot standby or a few read slaves. Why no master election consensus system like RAFT? Last year, I came across an excellent article that discussed implementing it with HAProxy, Governor, and etcd instead of the more historic (and infinitely more complicated) Pacemaker stack. Someone even forked Governer into Patroni to incorporate alternative consensus approaches. Still, that’s a lot of external dependencies.

Postgres has all of the parts to handle this itself, it just doesn’t. It has the replication stream. It knows among all replica nodes which has the highest transaction id. It has a process for cloning an existing instance in pg_basebackup. If an old primary node needs to become a replica, it can either do it directly, or use pg_rewind if necessary. Yet these fragments must be combined manually. Some use scripts, others use Pacemaker or some variant of the Governer approach above, but it’s much easier for a new project to simply chose another platform.

The specific issue at hand, is that Postgres elects to consider itself an autonomous process. In this view, a database is a self-contained data management structure, where everything goes into a single extremely high integrity bucket. As a consequence, Postgres has no concept of a cluster at all, except perhaps a slight awareness of currently streaming replicas.

MongoDB takes itself much less seriously. In the MongoDB world, a mongod instance is a bucket that will accept any and all content, and do a passable job of protecting it. As such, it’s much less concerned with spillage, giving it greater leverage for passing that bucket around. The authors of MongoDB clearly wanted to focus on data availability rather than integrity. To get that, they settled on a consensus system that some might argue is inadequate when combined with their data writing strategies.

Despite that, it works. MongoDB is a concept beyond mere data storage. Where Postgres is becoming more of a storage middleware through additions like extensions and foreign data wrappers, MongoDB is “where your data goes.” That’s a very alluring attitude, and a subtle shift in focus few (if any) mainstream RDBMS have adopted. First and foremost, engines like MongoDB are a cluster, incorporating multiple nodes that interchangeably share roles but present a united face to the world. Postgres has no equivalent.

But it could! Imagine a universe where we have the integrity guarantee of an ACID database, with a Mongo-like front-end that manages the nodes such that our bulletproof data is always available. Speaking of front ends…

Into Pieces

Shards. I have 50TB of data, and I have no Earthly idea what to do with it. I’d love to blow it up into a trillion little bits, but Postgres gives me no easy way to do that. Here’s the second place MongoDB trumps Postgres in its current incarnation. It’s not exactly a cake-walk, considering I drew up this chart while compiling my notes:

But all the tools are there. Anyone with a dozen servers or VMs can build a scalable data store without writing their own distribution implementation. That’s hard to ignore.

For Postgres, it really boils down to the citus extension and Postgres-XL. Unfortunately citus has severe limitations with joins including non-distributed tables. Likewise, Postgres-XL has been trying to merge the scaling code into the 9.5 branch for a year now. Despite the increased visibility and presumed quality of 2ndQuadrant’s work, there are a couple of orders of magnitude fewer eyes on that code. Even assuming they manage to merge everything before 9.6 drops, will they be able to keep up with core afterwards? Citus has its limitations, but in the end, it is an extension anyone can install. Anything that isn’t an extension risks falling woefully behind or being abandoned.

Is there another way? If we treated Postgres backends like dumb containers the way MongoDB does, things suddenly change quite a bit. MongoDB has a process called mongos which is basically just a glorified proxy that manages shard metadata and forks out a balancing job that ushers data around to ensure shard content isn’t lopsided. That’s the “Manager / Balancer” in my chart. What if it also included a transaction manager?

Postgres has supported background workers since 9.3. In theory, it could adopt management of shard metadata, chunk migration, or global transaction roles where appropriate. This is essentially what Postgres-XL (and similar) is doing. Where MongoDB wrote a glue process to bring everything together, Postgres-XL (and its ilk) opted to directly patch the Postgres source, and all the pain that entails.

I’m in danger of being too dismissive of implementation details, of course. Yet there are massive potential benefits to taking a cluster approach by default. Such a distribution proxy could be database agnostic, compatible with any engine that speaks SQL. Consider too that Postgres foreign data wrappers imply it could be the communication layer.

Last Thoughts

Postgres is tantalizingly close to being a cluster system simply through the communication protocols it provides. It just needs that final piece. I’m fully convinced that someone with the right skillset (definitely not me) could whip up a proof of concept for this in a few days. This is, after all, how most application-level sharding approaches work anyway. In the end, that’s all MongoDB, and several other NoSQL solutions ultimately provide. Take one part integrated RAFT process to always present a writable primary, and mix in a data distribution and balancing proxy. Done.

If they so desired, the MongoDB developers could theoretically even use Postgres as a back-end storage mechanism. In fact, I highly encourage them to do so! Given a natural shard split and extensions like cstore_fdw, suddenly NoSQL becomes a scalable column-store.

Barring that, corporate use of Postgres will be necessarily limited. There’s only so much you can do in a single instance, and without integral scaling features, it becomes migration time. I don’t want to tell our devs to convert all of their ORMs and queries to JSON, give up easy joins, or forgo NULL handling, but I will. I have to. This 50TB is only growing every day, and without a straight-forward and reliable migration path, we need to retool.

I disagree with too many of the data compromises MongoDB makes to use it for important data, but the availability approach is fundamentally sound. Mongo’s implementation is refreshing and innovative, and one which many NoSQL engines appear to share. Is it any wonder there has been a silent transition to these systems, when everything has several billion rows, and applications fully expect to reach well beyond Terabytes of storage into Petabyte territory?

No single instance can absorb that kind of volume, and not everyone has time or inclination to work around that limitation. We need a common middle-ground with the NoSQL world. We have the better storage engine, and they have the better distribution and availability mechanisms. Can’t there be a place where both of those things work in tandem?

I, for one, welcome our cluster overlords.

Oskari Saarenmaa: Backing up tablespaces and streaming WAL with PGHoard

22 July, 2016 - 14:32
We've just released a new version of PGHoard, the PostgreSQL cloud backup tool we initially developed for Aiven and later open sourced.

Version 1.4.0 comes with the following new features:
  • Support for PostgreSQL 9.6 beta3
  • Support for backing up multiple tablespaces
  • Support for StatsD and DataDog metrics collection
  • Basebackup restoration now shows download progress
  • Experimental new WAL streaming mode walreceiver, which reads the write-ahead log data directly from the PostgreSQL server using the streaming replication protocol
  • New status API in the internal REST HTTP server
Please see our previous blog post about PGHoard for more information about the tool and a guide for deploying it.

Backing up multiple tablespaces This is the first version of PGHoard capable of backing up multiple tablespaces. Multiple tablespaces require using the new local-tar backup option for reading files directly from the disk instead of streaming them using pg_basebackup as pg_basebackup doesn't currently allow streaming multiple tablespaces without writing them to the local filesystem.

The current version of PGHoard can utilize the local-tar backup mode only on a PG master server, PostgreSQL versions prior to 9.6 don't allow users to run the necessary control commands on a standby server without using the pgespresso extension. pgespresso also required fixes which we contributed to support multiple tablespaces - once a fixed version has been released we'll add support for it to PGHoard.

The next version of PGHoard, due out by the time of PostgreSQL 9.6 final release, will support local-tar backups from standby servers, natively when running 9.6 and using the pgespresso extension when running older versions with the latest version of the extension.

A future version of PGHoard will support backing up and restoring PostgreSQL basebackups in parallel mode when using the local-tar mode.  This will greatly reduce the time required for setting up a new standby server or restoring a system from backups.

Streaming replication supportThis version adds experimental support for reading PostgreSQL's write-ahead log directly from the server using the streaming replication protocol which is also used by PostgreSQL's native replication and related tools such as pg_basebackup and pg_receivexlog. The functionality currently depends on an unmerged psycopg2 pull request which we hope to see land in a psycopg2 release soon.

While the walreceiver mode is still experimental it has a number of benefits over other methods of backing up the WAL and allows implementing new features in the future: temporary, uncompressed, files as written by pg_receivexlog are no longer needed saving disk space and I/O and incomplete WAL segments can be archived at specified intervals or, for example, whenever a new COMMIT appears in the WAL stream.

New contributorsThe following people contributed their first patches to PGHoard in this release:
  • Brad Durrow
  • Tarvi Pillessaar

PGHoard in We're happy to talk more about PGHoard and help you set up your backups with it.  You can also sign up for a free trial of our PostgreSQL service where PGHoard will take care of your backups.

Team Aiven

Joshua Drake: The fall of Open Source

21 July, 2016 - 22:15

Once upon a time FOSS was about Freedom. It was about exposing equality within source code. It allowed everyone equal rights and equal access to the technology they were using. An idea that if you were capable, you could fix code or pay someone to fix code. An ideology that there was something greater than yourself and that there was an inherent right built into what it is to be human with software.

Leaders to lemmings

I sat in a bar slowly nursing beers with other community members over a period of hours. We spoke of many things. We spoke of the never-done new PostgreSQL website. We spoke of my distaste for Amazon Web Services since reformed, with the exception of S3. We spoke of life. We argued, we had honest discourse and that is excellent. There was nobody complaining of political correctness. There was nobody claiming to be “offended”. There was nobody leaving because their feelings were hurt. There was a community member who passed out in his chair and dropped his phone. We walked him to his room to make sure he was safe. All was good.

This retrospective has been digging around in my grey matter since that night six months ago. Originally this was going to just be the stuff of legendary and exaggerated stories among community members that are only getting older and a few who are young but will get there someday. That is, until it began to itch, and as with any good community member, I am scratching that itch.

“My time is precious to me”

It seems like a harmless thing to say. Of course your time is precious to you. I would say that is probably true of most people. I know that my time is precious to me. I make it a point of working part time from May - September so I can take time for my family. (Don’t worry, I more than make up for it the rest of the year).

The problem with the statement is the context. The statement came from a well known contributor and a very smart guy. The reference was in relation to why someone would use software as a service and the general idea was: Software as a Service is awesome because it allows me to have more time for me.

The great compromise

A lot of companies have come up through the ranks to become dominant players in the Open Source industry: for user groups, Github for development, Heroku for software a service and Slack for communications. When considered independently there is nothing wrong with these services. They offer a great value, they increase productivity, more code gets developed, more software gets released and communities grow.

The problem is that not a single one of these services are open source. The use of these services creates an intrinsic advocate position for closed source software. In turn you will see the use of these services increase whilst the use of open source alternatives decrease.

Consider Slack, which is widely considered the hot new collaboration tool. Yet, it does not adhere to open standards, its network is closed as is its software. Yes, you can interoperate with it using open source tools as well as standard protocols but in no way is it actually open in the sense of our community or our licensing. The argument is, “I use slack because there are no other tools like it”.

XMPP (Jabber) which is Open Source and a standard IETF protocol (RFC 3920) can provide a similar environment as Slack. It supports Video, Voice, Plugins, External protocols and bridges, Image embedding, Video sharing, File sharing and yes, Chat. It also supports federation which allows any community to communicate with any other community using XMPP.

I appreciate the PostgreSQL community. The PostgreSQL community hosts its own code repositories, website, and mailing lists. We collaborate in the true vision of Open Source and actively reject moving our project to externally hosted facilities controlled by services which are not Open Source. The community does it even though it may be quicker or more convenient to use a service. The community puts forth the effort for the community. There is an ideology that is about fairness, freedom, equality, rights, and the greater good.  

And that, was the fall of Open Source

The moment that Open Source becomes primarily about “my time” is the moment that Open Source is no longer a movement. It is no longer an ideology. It is no longer about fairness, freedom, equality, rights, or the greater good.  

gabrielle roth: PDXPUG: 10th anniversary meeting tonight!

21 July, 2016 - 17:35

We’re having our 10th Anniversary meeting tonight!

As in previous years, I generated speaker & topic tag clouds:

When: 6-8:30pm Thursday July 21, 2016 Where: iovation What: 10th Anniversary Social This meeting is 100% social and will take the place of our regular July meeting.  iovation will provide a light dinner and beverages.  We will provide the conversation and reminiscing.  There will not be any long speeches, but there will be cupcakes. iovation is on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak).  It’s right on the Green & Yellow Max lines.  Underground bike parking is available in the parking garage;  outdoors all around the block in the usual spots.  No bikes in the office, sorry!  Elevators open at 5:45 and building security closes access to the floor at 6:30. See you there!

Gulcin Yildirim: Evolution of Fault Tolerance in PostgreSQL: Time Travel

21 July, 2016 - 11:00

PostgreSQL is an awesome project and it evolves at an amazing rate. We’ll focus on evolution of fault tolerance capabilities in PostgreSQL throughout its versions with a series of blog posts. This is the third post of the series and we’ll talk about timeline issues and their effects on fault tolerance and dependability of PostgreSQL.

If you would like to witness the evolution progress from the beginning, please check the first two blog posts of the series:

  1. Evolution of Fault Tolerance in PostgreSQL 
  2. Evolution of Fault Tolerance in PostgreSQL: Replication Phase 


The ability to restore the database to a previous point in time creates some complexities which we’ll cover some of the cases by explaining failover (Fig. 1), switchover (Fig. 2) and pg_rewind (Fig. 3) cases later in this topic.

For example, in the original history of the database, suppose you dropped a critical table at 5:15PM on Tuesday evening, but didn’t realise your mistake until Wednesday noon. Unfazed, you get out your backup, restore to the point-in-time 5:14PM Tuesday evening, and are up and running. In this history of the database universe, you never dropped the table. But suppose you later realize this wasn’t such a great idea, and would like to return to sometime Wednesday morning in the original history. You won’t be able to if, while your database was up-and-running, it overwrote some of the WAL segment files that led up to the time you now wish you could get back to.

Thus, to avoid this, you need to distinguish the series of WAL records generated after you’ve done a point-in-time recovery from those that were generated in the original database history.

To deal with this problem, PostgreSQL has a notion of timelines. Whenever an archive recovery completes, a new timeline is created to identify the series of WAL records generated after that recovery. The timeline ID number is part of WAL segment file names so a new timeline does not overwrite the WAL data generated by previous timelines. It is in fact possible to archive many different timelines.

Consider the situation where you aren’t quite sure what point-in-time to recover to, and so have to do several pointin-time recoveries by trial and error until you find the best place to branch off from the old history. Without timelines this process would soon generate an unmanageable mess. With timelines, you can recover to any prior state, including states in timeline branches that you abandoned earlier.

Every time a new timeline is created, PostgreSQL creates a ”timeline history” file that shows which timeline it branched off from and when. These history files are necessary to allow the system to pick the right WAL segment files when recovering from an archive that contains multiple timelines. Therefore, they are archived into the WAL archive area just like WAL segment files. The history files are just small text files, so it’s cheap and appropriate to keep them around indefinitely (unlike the segment files which are large). You can, if you like, add comments to a history file to record your own notes about how and why this particular timeline was created. Such comments will be especially valuable when you have a thicket of different timelines as a result of experimentation.

The default behaviour of recovery is to recover along the same timeline that was current when the base backup was taken. If you wish to recover into some child timeline(that is, you want to return to some state that was itself generated after a recovery attempt), you need to specify the target timeline ID in recovery.conf. You cannot recover into timelines that branched off earlier than the base backup.

For simplifying timelines concept in PostgreSQL, timeline related issues in case of failover, switchover and pg_rewind are summarised and explained with Fig.1, Fig.2 and Fig.3.

Failover scenario:

 Fig.1 Failover 

  • There are outstanding changes in the old master (TL1)
  • Timeline increase represents new history of changes (TL2)
  • Changes from the old timeline can’t be replayed on the servers that switched to new timeline
  • The old master can’t follow the new master
Switchover scenario:

 Fig.2 Switchover

  • There are no outstanding changes in the old master (TL1)
  • Timeline increase represents new history of changes (TL2)
  • The old master can become standby for the new master
pg_rewind scenario:

Fig.3 pg_rewind

  • Outstanding changes are removed using data from the new master (TL1)
  • The old master can follow the new master (TL2)

pg_rewind is a tool for synchronising a PostgreSQL cluster with another copy of the same cluster, after the clusters’ timelines have diverged. A typical scenario is to bring an old master server back online after failover, as a standby that follows the new master.

The result is equivalent to replacing the target data directory with the source one. All files are copied, including configuration files. The advantage of pg_rewind over taking a new base backup, or tools like rsync, is that pg_rewind does not require reading through all unchanged files in the cluster. That makes it a lot faster when the database is large and only a small portion of it differs between the clusters.

How it works?

The basic idea is to copy everything from the new cluster to the old cluster, except for the blocks that we know to be the same.

  1. Scan the WAL log of the old cluster, starting from the last checkpoint before the point where the new cluster’s timeline history forked off from the old cluster. For each WAL record, make a note of the data blocks that were touched. This yields a list of all the data blocks that were changed in the old cluster, after the new cluster forked off.
  2. Copy all those changed blocks from the new cluster to the old cluster.
  3. Copy all other files such as clog and configuration files from the new cluster to the old cluster, everything except the relation files.
  4. Apply the WAL from the new cluster, starting from the checkpoint created at failover. (Strictly speaking, pg_rewind doesn’t apply the WAL, it just creates a backup label file indicating that when PostgreSQL is started, it will start replay from that checkpoint and apply all the required WAL.)

Note: wal_log_hints must be set in postgresql.conf for pg_rewind to be able to work.This parameter can only be set at server start. The default value is off.


In this blog post, we discussed timelines in Postgres and how we handle failover and switchover cases. We also talked about how pg_rewind works and its benefits to Postgres fault tolerance and dependability. We’ll continue with synchronous commit in the next blog post.


PostgreSQL Documentation
PostgreSQL 9 Administration Cookbook – Second Edition
pg_rewind Nordic PGDay presentation by Heikki Linnakangas

Peter Eisentraut: Using GNU GLOBAL with PostgreSQL

20 July, 2016 - 15:00

When you are coding in a source tree as big as PostgreSQL’s, you will at some point want to look into some kind of source code indexing. It’s often convenient not to bother, since git grep is actually superfast. But when you want to find where a function is defined among all the call sites, some more intelligence is useful.

The traditional tools for this are ctags and etags, which create index files intended for use by vi and Emacs, respectively. The PostgreSQL source tree has some customized support for these in the tools src/tools/make_ctags and src/tools/make_etags. Because these tools operate on a directory level, those wrapper scripts create a single tag file (named tags or TAGS respectively) in the top-level directory and symlink it to all the other directories. This allows you to easily look for entries across the entire source tree. But it’s clearly a hack, and at least Emacs is often somewhat confused by this setup.

But there is something much better that works very similarly: GNU GLOBAL. A main difference is that GNU GLOBAL works on a project basis not on a directory basis, so you don’t need to do contortions to create and manage tags files all over your source tree. Also, GLOBAL can be used from the command line, so you don’t need to be an editor wizard to get started with it. Plus, it appears to be much faster.

The whole thing is very simple. Install the package, which is usually called global and available in most operating system distributions. To start, run

$ gtags

in the top-level directory. This creates the files GPATH, GRTAGS, and GTAGS.

Then you can use global to search for stuff, like

$ global elog src/include/utils/elog.h

Or you can look for places a function is called:

$ global -r write_stderr

You can run global in any directory.

Or how about you want to look at the code where something is defined:

$ less -t elog

Note no file name is required. (See the manual for the required setup to make this work with less.)

Or of course use editor integration. For Emacs, there is ggtags-mode.

Here is some fine-tuning for use with the PostgreSQL source tree. Generally, I don’t want to index generated files. For example, I don’t want to see hits in gram.c, only in gram.y. Plus, you don’t want to index header files under tmp_install. (Super annoying when you use this to jump to a file to edit and later find that your edits have been blown away by make check.) But when you run gtags in a partially built tree, it will index everything it finds. To fix that, I have restricted gtags to only index files that are registered in Git, by first running

git ls-files >gtags.files

in the top-level directory. Then gtags will only consider the listed files.

This will also improve the workings of the Emacs mode, which will at random times call global -u to update the tags. If it finds a gtags.files file, it will observe that and not index random files lying around.

I have a shell alias pgconfigure which calls configure with a bunch of options for easier typing. It’s basically something like

pgconfigure() { ./configure --prefix=$(cd .. && pwd)/pg-install --enable-debug --enable-cassert ... }

At the end I call

git ls-files >gtags.files gtags -i &

to initialize the source tree for GNU GLOBAL, so it’s always there.

Gabriele Bartolini: Speed up getting WAL files from Barman

19 July, 2016 - 17:00

Starting from Barman 1.6.1, PostgreSQL standby servers can rely on an “infinite” basin of WAL files and finally pre-fetch batches of WAL files in parallel from Barman, speeding up the restoration process as well as making the disaster recovery solution more resilient as a whole.

The master, the backup and the standby

Before we start, let’s define our playground. We have our PostgreSQL primary server, called angus. A server with Barman, called barman and a third server with a reliable PostgreSQL standby, called chris – for different reasons, I had to rule out the following names bon, brian, malcolm, phil, cliff and obviously axl. ;)

angus is a high workload server and is continuously backed up on barman, while chris is a hot standby server with streaming replication from angus enabled. This is a very simple, robust and cheap business continuity cluster that you can easily create with pure open source PostgreSQL, yet capable of reaching over 99.99% uptime in a year (according to our experience with several customers at 2ndQuadrant).

What we are going to do is to instruct chris (the standby) to fetch WAL files from barman whenever streaming replication with angus is not working, as a fallback method, making the entire system more resilient and robust. Most typical examples of these problems are:

  1. temporary network failure between chris and angus;
  2. prolonged downtime for chris which causes the standby to go out of sync with angus.

For further information, please refer to the Getting WAL files from Barman with ‘get-wal’ blog article that I wrote some time ago.

Technically, we will be configuring the standby server chris to remotely fetch WAL files from barman as part of the restore_command option in the recovery.conf file. Since the release of Barman 1.6.1 we can take advantage of parallel pre-fetching of WAL files, which exploits network bandwidth and reduces recovery time of the standby.


This scenario has been tested on Linux systems only, and requires:

  • Barman >= 1.6.1 on the barman server
  • Python with argparse module installed (available as a package for most Linux distributions) on chris
  • Public Ssh key of the postgres@chris user in the ~/.ssh/authorized_keys file of the barman@barman user (procedure known as exchange of Ssh public key)

As postgres user on chris download the script from our Github repository in your favourite directory (e.g. ~postgres/bin, or /var/lib/pgsql/bin directly) with:

cd ~postgres/bin wget chmod +700

Then verify it is working:

./ -h

You will get this output message:

usage: [-h] [-V] [-U USER] [-s SECONDS] [-p JOBS] [-z] [-j] BARMAN_HOST SERVER_NAME WAL_NAME WAL_DEST This script will be used as a 'restore_command' based on the get-wal feature of Barman. A ssh connection will be opened to the Barman host. positional arguments: BARMAN_HOST The host of the Barman server. SERVER_NAME The server name configured in Barman from which WALs are taken. WAL_NAME this parameter has to be the value of the '%f' keyword (according to 'restore_command'). WAL_DEST this parameter has to be the value of the '%p' keyword (according to 'restore_command'). optional arguments: -h, --help show this help message and exit -V, --version show program's version number and exit -U USER, --user USER The user used for the ssh connection to the Barman server. Defaults to 'barman'. -s SECONDS, --sleep SECONDS sleep for SECONDS after a failure of get-wal request. Defaults to 0 (nowait). -p JOBS, --parallel JOBS Specifies the number of files to peek and transfer in parallel. Defaults to 0 (disabled). -z, --gzip Transfer the WAL files compressed with gzip -j, --bzip2 Transfer the WAL files compressed with bzip2

If you get this output, the script has been installed correctly. Otherwise, you are most likely missing the argparse module in your system.

Configuration and setup

Locate the recovery.conf in chris and properly set the restore_command option:

restore_command = "/var/lib/pgsql/bin/ -p 8 -s 10 barman angus %f %p"

The above example will connect to barman as barman user via Ssh and execute the get-wal command on the angus PostgreSQL server backed up in Barman. The script will pre-fetch up to 8 WAL files at a time and, by default, store them in a temporary folder (currently fixed: /var/tmp/barman-wal-restore).

In case of error, it will sleep for 10 seconds. Using the help page you can learn more about the available options and tune them in order to best fit in your environment.


All you have to do now is restart the standby server on chris and check from the PostgreSQL log that WALs are being fetched from Barman and restored:

Jul 15 15:57:21 chris postgres[30058]: [23-1] LOG: restored log file "00000001000019EA0000008A" from archive

You can also peek in the /var/tmp/barman-wal-restore directory and verify that the script has been executed.

Even Barman logs contain traces of this activity.


This very simple Python script that we have written and is available under GNU GPL 3 makes the PostgreSQL cluster more resilient, thanks to the tight cooperation with Barman.

It not only provides a stable fallback method for WAL fetching, but it also protects PostgreSQL standby servers from the infamous 255 error returned by Ssh in the case of network problems – which is different than SIGTERM and therefore is treated as an exception by PostgreSQL, causing the recovery process to abort (see the “Archive Recovery Settings” section in the PostgreSQL documentation).

Stay tuned with us and with Barman’s development as we continue to improve disaster recovery solutions for PostgreSQL. We would like to thank our friends at, Navionics and Jobrapido for helping us with the development of this important feature, as well as many others 2ndQuadrant customers who we cannot mention due to non disclosure agreements but still continue to support our work.

Side note: hopefully I won’t have to change the way I name servers due to AC/DC continuously changing their formation. ;)

Simon Riggs: Report from DatabaseCamp, NYC

18 July, 2016 - 19:45

Just got back from extended trip to Database.Camp – Sunday, July 10, 2016 – at the UN in NYC – Totally Free!

Wide range of presentations from CTOs/CEOs from across the database software world, with about 200 attendees, slightly more people earlier in the day.

Very happy to report we had 3 presentations covering different aspects of PostgreSQL
* EnterpriseDB – Ed Boyajian – Building Postgres Ecosystems
* 2ndQuadrant – Simon Riggs – PostgreSQL: The Next 10 Years
* CartoDB – Javier de la Torre – Celebrating PostgreSQL and demonstrating execellent data visualizations

I don’t think any attendee would have missed the fact that 3 companies spoke about Postgres, whereas all other technologies were discussed only once. I noted that when asked “Do you use database X?”, more attendees hands went up for Postgres than any other. Also very happy to report that our talks interlocked nicely with one another, showing attendees that we can work together as a community.

Best talk of the day was Minerva Tantoco, CTO for the City of New York. Engaging, entertaining and greatly encouraging to see technology being adopted for the public good. Free Wifi in JFK would be nice… but looks like its coming across the whole of NYC, which will really change what is possible.

Well worth attending if they hold it again next year. Congrats to Eric for arranging, and making it for a good cause: Techie Youth.

damien clochard: Restarting PostgreSQL Magazine in a new format

18 July, 2016 - 18:17

TL;DR : With the kind support of PostgreSQL Europe, I am launching a new edition of the PostgreSQL Magazine in an innovative paper format. The primary goal is to publish it and send copies to PostgreSQL Conference Europe in Tallinn in November. If we’re sucessful, additional copies will be sent to other events.

I’m here to talk to you about the new direction for PG Magazine. So far, we’ve produced 2 issues of the magazine and we’ve received an entousiastic feedback from long-time community members as well as complete newcomers. I still believe that paper is a great medium to promote PostgreSQL and there’s a need for a regular printed media in the community.

However we’ve been struggling to find a stable economic model to ensure this media remains a community-driven effort. So far the balance we’ve found is that the content is produced, edited and controlled by community members, while the graphic editing, printing and shipping is done by professionnals and paid by non-profit associations (PG Europe, SPI, and others) or commercial sponsors (EDB for instance)

This model works with the current format of the magazine (36 pages) but it’s very slow. Producing a new issue requires a lot of time to gather the articles, edit the content and proof-read everything. It’s also very hard to translate. So far only the Chinese community has been able to fully translate the magazine.

Based on this considerations, here’s an attempt to make the whole process more dynamic while switching to a more appealing format.

Over the last month, I’ve worked on what I called a “folded magazine”. The basic idea is that the magazine is presented as an A5 newspaper. Then you unfold it and you get an A4 newspaper. Then you unfold it and you get an A3 newspaper. Then you unfold it and you get an A2 poster that you can take back at the office…

I used this concept for the 2016 PG Day France booklet. Here’s a video :

This format has drawbacks

a- The graphical editing needs to be done by professionnals. A PostgreSQL community member with basic graphic skill can no longer participate to this effort. Previously some Chinese guys where able to produce a translation of the magazine, this format will be harder for them

b- The printing needs to be done by professionals. You can no longer print it at home unless you’re the happy owner of an A2 printer.

c- The PDF version will be weird

But also many advantages :

1- It requires less content and we’ll be able to produce the magazine on a regular basis.

2- It’s easier to give away : giving away a 34 pages magazines to random stranger at a booth was sometime a waste of paper

3- The A2 poster is a great way to provide more visibility for PostgreSQL inside the PostgreSQL our users workplace.

4- It’s easier to translate

If you want to join the effort here’s an optimistic roadmap :

  • Step 1 : Gathering content (deadline 31/07/2016)
  • Step 2 : Editing Content (deadline 15/08/2016)
  • Step 3 : Reviewing (deadline 31/08/2016
  • Step 4 : Layout (deadline 24/09/2016)
  • Step 5 : Last Check (01/10/2016)
  • Step 6 : Printing (15/10/2016)

The 3th step (reviewing) and 5th step (last check) are the ones that require more manpower and time. So prepare yourself for some proof-reading in august and september :)

Jan Wieck: My PL/pgSQL code is stuck, but where?

16 July, 2016 - 18:16
In the short time that I am enhancing the PL profiler ( I have been asked multiple times if it can be abused as a debugging aid. Not directly. The conversation goes something like this:

Q: When my PL code is stuck somewhere, can I turn on profiling and see where?
A: No.
Q: Will this feature be added?
A: No.

Of course would that be a useful feature. I don't argue that. And it seems to be that this is precisely how Oracle users find out where their PL/SQL code gets stuck or moves like molasses. However, the reason why I am against adding this is because a profiler, or even parts of it, should not be enabled 24x7 on a production database server for the purpose of eventually using it to extract some debugging information some day. There are other ways to get that information and I will show you one of them.

If a program is stuck somewhere, one uses a debugger to get a backtrace. This works with C programs, like the database server, provided that symbol information is available. In that backtrace (and some variables) we also find the PL/pgSQL backtrace. Having symbol information available is also useful in case of a program crash, to find out why it crashed by loading the core dump into the debugger. 

Every line of PL code, that is executed, goes through the PL/pgSQL executor function exec_stmt(). At that place we find the current PL code's line in a variable. Likewise every function execution goes through either plpgsql_exec_function() or plpgsql_exec_trigger(). In those stack frames we find the OID of the function as well as its signature (name and call argument types).

Doing this eventually several times for a deeply nested PL/pgSQL program is tedious and no fun. So here is a little script called plbacktrace ( that does all of that. It is invoked with the PID of the PostgreSQL backend and will output information like

[postgres@db1 tmp]$ ./ 13239fn_oid=105672 lineno=5 func="life_get_cell_pl(life_board,integer,integer)"fn_oid=105673 lineno=12 func="life_num_neighbors_pl(life_board,integer,integer)"fn_oid=105675 lineno=11 func="life_tick_pl(life_board)"fn_oid=105674 lineno=22 func="life_pl(integer,integer,boolean)" I will explain another time why I have a database server playing Conway's Game of Life written in PL/pgSQL. 

Regards, Jan