Planet PostgreSQL

Syndicate content
Planet PostgreSQL
Updated: 1 year 21 hours ago

Stefan Petrea: Geolocation using multiple services

6 July, 2016 - 07:00
Table of Contents Intro

In a previous post I wrote about PostGIS and ways of querying geographical data.

This post will focus on building a system that queries free geolocation services 1 and aggregates their results.


In summary, we'll make requests to different web services (or APIs), then we're going to reverse geocoding the results and then aggregate the results.

Comparison between geonames and openstreetmap

To relate to the previous post, here are some differences between geonames and openstreetmap:

criterion OSM geonames size 49.1 GB compressed 309 MB compressed has administrative area data yes yes has lat/long city data yes yes has region/area polygonal areas yes no has neighbourhood/district data yes no has intracity-level metadata yes no has terrain metadata yes no

They are meant for different purposes. Geonames is meant for city/administrative area/country data. Openstreetmap has much more detailed data than geonames.

Asynchronous requests to geolocation services

We're using the gevent library to make asynchronous requests to the geolocation services.

import gevent import gevent.greenlet from gevent import monkey; gevent.monkey.patch_all() geoip_service_urls=[ ['geoplugin' , '{ip}' ], ['ip-api' , '{ip}' ], ['nekudo' , '{ip}' ], ['geoiplookup' , '{ip}' ], ] # fetch url in asynchronous mode (makes use of gevent) def fetch_url_async(url, tag, timeout=2.0): data = None try: opener = urllib2.build_opener(urllib2.HTTPSHandler()) opener.addheaders = [('User-agent', 'Mozilla/')] urllib2.install_opener(opener) data = urllib2.urlopen(url,timeout=timeout).read() except Exception, e: pass return [tag, data] # expects req_data to be in this format: [ ['tag', url], ['tag', url], .. ] def fetch_multiple_urls_async(req_data): # start the threads (greenlets) threads_ = [] for u in req_data: (tag, url) = u new_thread = gevent.spawn(fetch_url_async, url, tag) threads_.append(new_thread) # wait for threads to finish gevent.joinall(threads_) # retrieve threads return values results = [] for t in threads_: results.append(t.get(block=True, timeout=5.0)) return results def geolocate_ip(ip): urls = [] for grp in geoip_service_urls: tag, url = grp urls.append([tag, url.format(ip=ip)]) results = fetch_multiple_urls_async(urls) processed_answers = process_service_answers(results) aggregate = aggregate_answers(processed_answers) return aggregate def process_service_answers(location_data): # 1) extract lat/long data from responses # 2) reverse geocoding using geonames # 3) aggregate location data # (for example, one way of doing this would # be to choose the location that most services # agree on) pass City name ambiguity Cities with the same name within the same country

There are many cities with the same name within a country, in different states/administrative regions. There's also cities with the same name in different countries.

For example, according to Geonames, there are 24 cities named Clinton in the US (in 23 different states, with two cities named Clinton in the same state of Michigan).

WITH duplicate_data AS ( SELECT city_name, array_agg(ROW(country_code, region_code)) AS dupes FROM city_region_data WHERE country_code = 'US' GROUP BY city_name, country_code ORDER BY COUNT(ROW(country_code, region_code)) DESC ) SELECT city_name, ARRAY_LENGTH(dupes, 1) AS duplicity, ( CASE WHEN ARRAY_LENGTH(dupes,1) > 9 THEN CONCAT(SUBSTRING(ARRAY_TO_STRING(dupes,','), 1, 50), '...') ELSE ARRAY_TO_STRING(dupes,',') END ) AS sample FROM duplicate_data LIMIT 5; city_name duplicity sample Clinton 24 (US,NY),(US,AR),(US,NC),(US,MA),(US,MD),(US,OH),(U… Franklin 19 (US,ME),(US,MA),(US,NC),(US,TX),(US,NC),(US,LA),(U… Springfield 19 (US,MN),(US,KY),(US,SD),(US,MI),(US,VA),(US,IL),(U… Madison 18 (US,CT),(US,MN),(US,NJ),(US,ME),(US,SD),(US,FL),(U… Greenville 18 (US,NC),(US,SC),(US,MS),(US,KY),(US,RI),(US,ME),(U… Cities with the same name in the same country and region

Worldwide, even in the same region of a country, there can be multiple cities with the exact same name.

Take for example Georgetown, in Indiana. Geonames says there are 3 towns with that name in Indiana. Wikipedia says there are even more:

WITH duplicate_data AS ( SELECT city_name, array_agg(ROW(country_code, region_code)) AS dupes FROM city_region_data WHERE country_code = 'US' GROUP BY city_name, region_code, country_code ORDER BY COUNT(ROW(country_code, region_code)) DESC ) SELECT city_name, ARRAY_LENGTH(dupes, 1) AS duplicity, ( CASE WHEN ARRAY_LENGTH(dupes,1) > 9 THEN CONCAT(SUBSTRING(ARRAY_TO_STRING(dupes,','), 1, 50), '...') ELSE ARRAY_TO_STRING(dupes,',') END ) AS sample FROM duplicate_data LIMIT 4; city_name duplicity sample Plantation 3 (US,FL),(US,FL),(US,FL) Georgetown 3 (US,IN),(US,IN),(US,IN) Robinwood 3 (US,MD),(US,MD),(US,MD) Prospect Park 2 (US,NJ),(US,NJ) Reverse geocoding

Both (city_name, country_code) and (city_name, country_code, region_name) tuples have failed as candidates to uniquely identify location.

We would have the option of using zip codes or postal codes except we can't use those since most geolocation services don't offer those.

But most geolocation services do offer longitude and latitude, and we can make use of those to eliminate ambiguity.

Geometric data types in PostgreSQL

I looked further into the PostgreSQL docs and found that it also has geometric data types and functions for 2D geometry. Out of the box you can model points, boxes, paths, polygons, circles, you can store them and query them.

PostgreSQL has some additional modules in the contrib directory. They are available out of the box in most Linux and Windows distributions.

In this situation we're interested in the cube and earthdistance modules 2. The cube extension allows you to model n-dimensional vectors, and the earthdistance extension uses 3-cubes to store vectors and represent points on the surface of the Earth.

We'll be using the following:

  • the earth_distance function is available, and it allows you to compute the great-circle distance between two points
  • the earth_box function to check if a point is within a certain distance of a reference point
  • a gist expression index on the expression ll_to_earth(lat, long) to make fast spatial queries and find nearby points
Designing a view for city & region data

Geonames data was imported into 3 tables:

Then we create a view that pulls everything together 3. We now have population data, city/region/country data, and lat/long data, all in one place.

CREATE OR REPLACE VIEW city_region_data AS ( SELECT AS country_code, b.asciiname AS city_name, AS region_name, b.region_code, b.population, b.latitude AS city_lat, b.longitude AS city_long, AS country_name FROM geo_admin1 a JOIN ( SELECT *, (country || '.' || admin1) AS country_region, admin1 AS region_code FROM geo_geoname WHERE fclass = 'P' ) b ON a.code = b.country_region JOIN geo_countryinfo c ON = c.iso_alpha2 ); Designing a nearby-city query and function

In the most nested SELECT, we're only keeping the cities in a 23km radius around the reference point, then we're applying a country filter and city pattern filter (these two filters are optional), and we're only getting the closest 50 results to the reference point. (the @> check will make use of the gist index we're creating).

Next, we're reordering by population because geonames sometimes has districts/neighbourhoods cities around bigger cities too 4, and it does not mark them in a specific way, so we just want to select the larger city (for example let's say the geolocation service returned a lat/long that would resolve to a distrct of a larger metropolitan area. in my case, I'd like to resolve this to the larger city it's associated with instead of getting back the district/neighbourhood)

CREATE INDEX geo_geoname_latlong_idx ON geo_geoname USING gist(ll_to_earth(latitude,longitude)); CREATE OR REPLACE FUNCTION geo_find_nearest_city_and_region( latitude double precision, longitude double precision, filter_countries_arr varchar[], filter_city_pattern varchar, ) RETURNS TABLE( country_code varchar, city_name varchar, region_name varchar, region_code varchar, population bigint, _lat double precision, _long double precision, country_name varchar, distance numeric ) AS $$ BEGIN RETURN QUERY SELECT * FROM ( SELECT * FROM ( SELECT *, ROUND(earth_distance( ll_to_earth(c.city_lat, c.city_long), ll_to_earth(latitude, longitude) )::numeric, 3) AS distance_ FROM city_region_data c WHERE earth_box(ll_to_earth(latitude, longitude), 23000) @> ll_to_earth(c.city_lat, c.city_long) AND (filter_countries_arr IS NULL OR c.country_code=ANY(filter_countries_arr)) AND (filter_city_pattern IS NULL OR c.city_name LIKE filter_city_pattern) ORDER BY distance_ ASC LIMIT 50 ) d ORDER BY population DESC ) e LIMIT 1; END; $$ LANGUAGE plpgsql; Conclusion

We've started from the design of a system that would query multiple geoip services, would gather the data and would then aggregate it to get a more reliable result.

We first looked at some ways of uniquely identifying locations.

We've then picked a way that would eliminate ambiguity in identifying them. In the second half, we've looked at different ways of structuring, storing and querying geographical data in PostgreSQL.

Then we've built a view and a function to find cities near a reference point which allowed us to do reverse geocoding.

Footnotes: 1

By using multiple services (and assuming they use different data sources internally) after aggregation, we can have a more reliable answer than if we were using just one.

Another advantage here is that we're using free services, no setup is required, we don't have to take care of updates, since these services are maintained by their owners.

However, querying all these web services will be slower than querying a local geoip data structures. But, there are city/country/region geolocation database out there such as geoip2 from maxmind, ip2location or db-ip.


There's a nice post here using the earthdistance module to compute distances to nearby or far away pubs.


Geonames has geonameIds as well, which are geonames-specific ids we can use to accurately refer to locations.


geonames does not have polygonal data about cities/neighbourhoods or metadata about the type of urban area (like openstreetmap does) so you can't query all city polygons (not districts/neighbourhoods) that contain that point.

Barry Jones: Insanity with Elixir + Phoenix + PostgreSQL

6 July, 2016 - 03:00
I'm at the borderline of obsessed with Elixir and Phoenix lately. I've avoided writing about it so far because it feels a bit too good to be true. In an effort to test my own enthusiam, I decided to rebuild this site with them in the most rediculous way possible just to try to test some limits. Because I already have an unhealthy obsession with PostgreSQL, we're getting crazy with it too.DISCLAIMER: This is not a "how to build a blog" article. If you do what I'm about to do, people will look at you funny (and probably should).

Gulcin Yildirim: Evolution of Fault Tolerance in PostgreSQL

4 July, 2016 - 12:57

“It is paradoxical, yet true, to say, that the more we know, the more ignorant we become in the absolute sense, for it is only through enlightenment that we become conscious of our limitations. Precisely one of the most gratifying results of intellectual evolution is the continuous opening up of new and greater prospects.” Nikola Tesla

PostgreSQL is an awesome project and it evolves at an amazing rate. We’ll focus on evolution of fault tolerance capabilities in PostgreSQL throughout its versions with a series of blog posts.

 PostgreSQL in a nutshell

PostgreSQL is fault-tolerant by its nature. First, it’s an advanced open source database management system and will celebrate its 20th birthday this year. Hence it is a proven technology and has an active community, thanks to which it has a fast development progress.

PostgreSQL is SQL-compliant (SQL:2011) and fully ACID-compliant (atomicity, consistency, isolation, durability).

Note: A(tomicity) C(onsistency) I(solation) D(urability) in PostgreSQL

Atomicity ensures that results of a transaction are seen entirely or not at all within other transactions but a transaction need not appear atomic to itself. PostgreSQL is consistent and system-defined consistency constraints are enforced on the results of transactions. Transactions are not affected by the behaviour of concurrently-running transactions which shows isolation (we’ll have a discussion about transaction isolation levels later in the post). Once a transaction commits, its results will not be lost regardless of subsequent failures and this makes PostgreSQL durable.

PostgreSQL allows physical and logical replication and has built-in physical and logical replication solutions. We’ll talk about replication methods (on the next blog posts) in PostgreSQL regarding fault tolerance.

PostgreSQL allows synchronous and asynchronous transactions, PITR (Point-in-time Recovery) and MVCC (Multiversion concurrency control).  All of these concepts are related to fault tolerance at some level and I’ll try to explain their effects while explaining necessary terms and their applications in PostgreSQL.

PostgreSQL is robust!

All actions on the database are performed within transactions, protected by a transaction log that will perform automatic crash recovery in case of software failure.

Databases may be optionally created with data block checksums to help diagnose hardware faults. Multiple backup mechanisms exist, with full and detailed  PITR, in case of the need for detailed recovery. A variety of diagnostic tools are available.

Database replication is supported natively. Synchronous Replication can provide greater than “5 Nines” (99.999 percent) availability and data protection, if properly configured and managed.

Considering the facts above we can easily claim that PostgreSQL is robust!

PostgreSQL Fault Tolerance: WAL

Write ahead logging is the main fault tolerance system for PostgreSQL.

The WAL consists of a series of binary files written to the pg_xlog subdirectory of the PostgreSQL data directory. Each change made to the database is recorded first in WAL, hence the name “write-ahead” log, as a synonym of “transaction log”. When a transaction commits, the default—and safe—behaviour is to force the WAL records to disk.

Should PostgreSQL crash, the WAL will be replayed, which returns the database to the point of the last committed transaction, and thus ensures the durability of any database changes.

Transaction? Commit?

Database changes themselves aren’t written to disk at transaction commit. Those changes are written to disk sometime later by the background writer or checkpointer on a well-tuned server. (Check the WAL description above.)

Transactions are a fundamental concept of all database systems. The essential point of a transaction is that it bundles multiple steps into a single, all-or-nothing operation.

Note: Transactions in PostgreSQL

PostgreSQL actually treats every SQL statement as being executed within a transaction. If you do not issue a BEGIN command, then each individual statement has an implicit BEGIN and (if successful) COMMIT wrapped around it. A group of statements surrounded by BEGIN and COMMIT is sometimes called a transaction block.

The intermediate states between the steps are not visible to other concurrent transactions, and if some failure occurs that prevents the transaction from completing, then none of the steps affect the database at all. PostgreSQL does not support dirty-reads (transaction reads data written by a concurrent uncommitted transaction).

Note: Transaction Isolation

The SQL standard defines 4 levels of transaction isolation: Read uncommitted, read committed, repeatable read, serializable.

Table 1: Standard SQL Transaction Isolation Levels

The most strict is Serializable, which is defined by the standard in a paragraph which says that any concurrent execution of a set of Serializable transactions is guaranteed to produce the same effect as running them one at a time in some order.

For more info about topic check the Postgres documentation about transaction isolation.


Crash recovery replays the WAL, but from what point does it start to recover?

Recovery starts from points in the WAL known as checkpoints. The duration of crash recovery depends on the number of changes in the transaction log since the last checkpoint. A checkpoint is a known safe starting point for recovery, since it guarantees that all the previous changes to the database have already been written to disk.

A checkpoint can be either immediate or scheduled. Immediate checkpoints are triggered by some action of a superuser, such as the CHECKPOINT command or other; scheduled checkpoints are decided automatically by PostgreSQL.


In this blog post we listed important features of PostgreSQL that are related with fault tolerance in PostgreSQL. We mentioned write-ahead logging, transaction, commit, isolation levels, checkpoints and crash recovery. We’ll continue with PostgreSQL replication at the next blog post.


PostgreSQL documentation

PostgreSQL 9 Administration Cookbook – Second Edition


Dinesh Kumar: pgBucket beta version is ready

3 July, 2016 - 23:20
Hi Everyone,

I would like to inform to you all that, pgBucket[Simple concurrent job scheduler for postgresql] beta version is ready with enhanced architecture and new features.

It would be more great if you could share your inputs and suggestions on this, which will help me to make this tool as stable.

Thank you all in advance.


Leo Hsu and Regina Obe: regexp_split_to_table and string_to_array unnest performance

2 July, 2016 - 06:13

Whenever you need to split a text into multiple records breaking by some delimeter, there are two common options that PostgreSQL provides. The first is regpexp_split_to_table and then next popular is using the unnest function in combination with string_to_array.

Here is an example using regexp_split_to_table:

SELECT a FROM regexp_split_to_table('john,smith,jones', ',') AS a;

Which outputs:

a ------- john smith jones (3 rows)

You can achieve the same result by using the construct:

SELECT a FROM unnest(string_to_array('john,smith,jones', ',')) AS a;

With short text you won't notice much perfomance difference. But what happens if we pass in a humungous text?

Continue reading "regexp_split_to_table and string_to_array unnest performance"

Josh Williams: WAL-based Estimates For When a Record Was Changed

2 July, 2016 - 01:08
I originally titled this: Inferring Record Timestamps by Analyzing PITR Streams for Transaction Commits and Cross-Referencing Tuple xmin Values. But that seemed a little long, though it does sum up the technique.

In other words, it's a way to approximate an updated_at timestamp column for your tables when you didn't have one in the first place.

PostgreSQL stores the timestamp of a transaction's commit into the transaction log. If you have a hot standby server, you can see the value for the most-recently-applied transaction as the output of the pg_last_xact_replay_timestamp() function. That's useful for estimating replication lag. But I hadn't seen any other uses for it, at least until I came up with the hypothesis that all the available values could be extracted wholesale, and matched with the transaction ID's stored along with every record.

If you're on 9.5, there's track_commit_timestamps in postgresql.conf, and combined with the pg_xact_commit_timestamp(xid) function has a similar result. But it can't be turned on retroactively.

This can -- sort of. So long as you have those transaction logs, at least. If you're doing Point-In-Time Recovery you're likely to at least have some of them, especially more recent ones.

I tested this technique on a pgbench database on stock PostgreSQL 9.4, apart from the following postgresql.conf settings that (sort of) turn on WAL archival -- or at least make sure the WAL segments are kept around:

wal_level = archive archive_mode = on archive_command = '/bin/false'
We'll be using the pg_xlogdump binary to parse those WAL segments, available from 9.3 on. If you're on an earlier version, the older xlogdump code will work.

Once pgbench has generated some traffic, then it's time to see what's contained in the WAL segments we have available. Since I have them all I went all the way back to the beginning.

$ pg_xlogdump -p pg_xlog/ --start 0/01000000 --rmgr=Transaction rmgr: Transaction len (rec/tot): 12/ 44, tx: 3, lsn: 0/01006A58, prev 0/01005CA8, bkp: 0000, desc: commit: 2016-05-15 22:32:32.593404 EDT rmgr: Transaction len (rec/tot): 12/ 44, tx: 4, lsn: 0/01008BC8, prev 0/01008A60, bkp: 0000, desc: commit: 2016-05-15 22:32:32.664374 EDT rmgr: Transaction len (rec/tot): 12/ 44, tx: 5, lsn: 0/01012EA8, prev 0/01012E58, bkp: 0000, desc: commit: 2016-05-15 22:32:32.668813 EDT (snip) rmgr: Transaction len (rec/tot): 12/ 44, tx: 1746, lsn: 0/099502D0, prev 0/099501F0, bkp: 0000, desc: commit: 2016-05-15 22:55:12.711794 EDT rmgr: Transaction len (rec/tot): 12/ 44, tx: 1747, lsn: 0/09951530, prev 0/09951478, bkp: 0000, desc: commit: 2016-05-15 22:55:12.729122 EDT rmgr: Transaction len (rec/tot): 12/ 44, tx: 1748, lsn: 0/099518D0, prev 0/099517F0, bkp: 0000, desc: commit: 2016-05-15 22:55:12.740823 EDT pg_xlogdump: FATAL: error in WAL record at 0/99518D0: record with zero length at 0/9951900
The last line just indicates that we've hit the end of the transaction log records, and it's written to stderr, so it can be ignored. Otherwise, that output contains everything we need, we just need to shift around the components so we can read it back into Postgres. Something like this did the trick for me, and let me import it directly:

$ pg_xlogdump -p pg_xlog/ --start 0/01000000 --rmgr=Transaction | awk -v Q=\' '{sub(/;/, ""); print $8, Q$17, $18, $19Q}' > xids postgres=# CREATE TABLE xids (xid xid, commit timestamptz); CREATE TABLE postgres=# \copy xids from xids csv COPY 1746
At which point it's a simple join to pull in the commit timestamp records:

postgres=# select xmin, aid, commit from pgbench_accounts inner join xids on pgbench_accounts.xmin = xids.xid; xmin | aid | commit ------+--------+------------------------------- 981 | 252710 | 2016-05-15 22:54:34.03147-04 1719 | 273905 | 2016-05-15 22:54:35.622406-04 1183 | 286611 | 2016-05-15 22:54:34.438701-04 1227 | 322132 | 2016-05-15 22:54:34.529027-04 1094 | 331525 | 2016-05-15 22:54:34.26477-04 1615 | 383361 | 2016-05-15 22:54:35.423995-04 1293 | 565018 | 2016-05-15 22:54:34.688494-04 1166 | 615272 | 2016-05-15 22:54:34.40506-04 1503 | 627740 | 2016-05-15 22:54:35.199251-04 1205 | 663690 | 2016-05-15 22:54:34.481523-04 1585 | 755566 | 2016-05-15 22:54:35.368891-04 1131 | 766042 | 2016-05-15 22:54:34.33737-04 1412 | 777969 | 2016-05-15 22:54:34.953517-04 1292 | 818934 | 2016-05-15 22:54:34.686746-04 1309 | 940951 | 2016-05-15 22:54:34.72493-04 1561 | 949802 | 2016-05-15 22:54:35.320229-04 1522 | 968516 | 2016-05-15 22:54:35.246654-04

Shaun M. Thomas: PG Phriday: EXCEPTIONal Performance

1 July, 2016 - 19:41

Like any programming language, the PL/pgSQL Postgres procedural language has plenty of handy control structures. Among those thankfully modern accoutrements is the humble EXCEPTION block. While not the more prevalent try/catch methodology, we can use BEGIN anywhere to start an embedded section for the same effect.

Knowing this is the case, what are the costs of actually using them? Postgres is fairly streamlined, and as such, can take several shortcuts when there are no exception blocks within a chunk of PL/pgSQL. Thus, it’s not uncommon for members of mailing lists to mutter about ‘performance costs’ and suggest that the worst performance hit is felt by even breathing the word ‘exception’ in the presence of a Postgres instance. The implication is that further leverage will be incremental by comparison.

But is that the case? And if not, what are the actual metrics observed while using exceptions in different contexts? Let’s go beyond nebulous hand waving and take a look, starting with a basic table to hold data:

CREATE TABLE sensor_log ( id SERIAL NOT NULL PRIMARY KEY, location VARCHAR NOT NULL, reading BIGINT NOT NULL, reading_date TIMESTAMP NOT NULL );   CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);

The above table is one we’ve used often before. It helps us illustrate using a function to insert data as one potential entry vector commonly associated with projects like PL/Proxy. Really this will apply to any application that is designed to interact through a database-side API of predefined functions.

A very basic example of this is a simple insert function:

CREATE OR REPLACE FUNCTION f_capture_log( sensor_loc VARCHAR, sensor_val BIGINT ) RETURNS BOOLEAN AS $$ BEGIN INSERT INTO sensor_log (location, reading, reading_date) VALUES (sensor_loc, sensor_val, now());   RETURN TRUE; END; $$ LANGUAGE plpgsql;   \o /dev/NULL   SELECT f_capture_log('x' || % 1000, % 100) FROM generate_series(1, 100000) s(id);   TIME: 2567.990 ms

There’s nothing too strange here. All we have is a function that’s wrapping our insert and applying a timestamp when the function is called. We used 100,000 rows to push enough data that our timings remain mostly consistent. By and large, we were successful here; any variance in run times is generally within 5% on our otherwise idle test VM.

Knowing that, we can really start having fun now. We came here to examine exceptions, so let’s add one that does nothing but invert the success of our function call. Keep in mind that this exception will never fire, since we’re relying on a sequence to provide conflict-free primary key IDs.

CREATE OR REPLACE FUNCTION f_capture_log_except( sensor_loc VARCHAR, sensor_val BIGINT ) RETURNS BOOLEAN AS $$ BEGIN INSERT INTO sensor_log (location, reading, reading_date) VALUES (sensor_loc, sensor_val, now()); RETURN TRUE; EXCEPTION WHEN OTHERS THEN RETURN FALSE; END; $$ LANGUAGE plpgsql;   \o /dev/NULL   SELECT f_capture_log_except('x' || % 1000, % 100) FROM generate_series(1, 100000) s(id);   TIME: 3211.702 ms

It looks like we have a 25% performance hit simply by including the exception block. Does that justify the argument that there is a significant upfront cost associated with exceptions, and that once we cross that threshold, everything is smooth sailing? Indeed, 25% is rather substantial for an exception that literally does nothing and will never actually run. Maybe, or maybe not. Whatever we find, at least we have some numbers to work with!

So let’s screw up our function a little, and make it work for its dinner.

CREATE OR REPLACE FUNCTION f_capture_log_broken( sensor_loc VARCHAR, sensor_val BIGINT ) RETURNS BOOLEAN AS $$ BEGIN INSERT INTO sensor_log (id, location, reading, reading_date) VALUES (sensor_val, sensor_loc, sensor_val, now()); RETURN TRUE; EXCEPTION WHEN OTHERS THEN RETURN FALSE; END; $$ LANGUAGE plpgsql;   \o /dev/NULL   SELECT f_capture_log_broken('x' || % 1000, % 100) FROM generate_series(1, 100000) s(id);   TIME: 5235.468 ms

Ouch! Our function is now about twice as slow as the original incarnation that didn’t use an exception block. Still, this is a worst case scenario where literally every invocation of the function will cause an exception. If we reduce the amount of overlap and truncate the sensor_log table, run times are affected proportionally. In other words, if only 5% of the calls caused a key violation, we’d only see a further 5% performance cost.

So far, everyone has been correct. The initial investment cost from using exceptions is still the highest we’ve paid, barring ridiculous examples like this one. Does that change if we catch the actual error that’s occurring so there are two conditionals? Let’s see:

CREATE OR REPLACE FUNCTION f_capture_log_extra_broken( sensor_loc VARCHAR, sensor_val BIGINT ) RETURNS BOOLEAN AS $$ BEGIN INSERT INTO sensor_log (id, location, reading, reading_date) VALUES (sensor_val, sensor_loc, sensor_val, now()); RETURN TRUE; EXCEPTION WHEN unique_violation THEN RETURN FALSE; WHEN OTHERS THEN RETURN FALSE; END; $$ LANGUAGE plpgsql;   \o /dev/NULL   SELECT f_capture_log_extra_broken('x' || % 1000, % 100) FROM generate_series(1, 100000) s(id);   TIME: 5370.879 ms

It turns out that the answer is no. Using further exception conditions doesn’t appear to be statistically significant. There’s probably a small amount of overhead associated with the underlying code represented in the jump logic, but that’s an expected element of any language. But this is still boring and pointless; our exceptions still do nothing but tell us the insert failed.

What happens if we actually modify the exception to raise another exception that the caller can process? So long as the exception level we use isn’t fatal, it won’t short-circuit function loops, allowing us to feel the full cost of every uncaught exception.

CREATE OR REPLACE FUNCTION f_capture_log_super_broken( sensor_loc VARCHAR, sensor_val BIGINT ) RETURNS BOOLEAN AS $$ BEGIN INSERT INTO sensor_log (id, location, reading, reading_date) VALUES (sensor_val, sensor_loc, sensor_val, now()); RETURN TRUE; EXCEPTION WHEN unique_violation THEN RAISE WARNING 'This sucks!'; RETURN FALSE; WHEN OTHERS THEN RAISE WARNING 'Ditto!'; RETURN FALSE; END; $$ LANGUAGE plpgsql;   \o /dev/NULL SET client_min_messages TO error;   SELECT f_capture_log_super_broken('x' || % 1000, % 100) FROM generate_series(1, 100000) s(id);   TIME: 6974.891 ms

Almost three times slower? Now we’re talking! Again, this is a worst case scenario, but it’s also proportional. Since the new function is about 30% slower than simply catching the key violation, we can extrapolate that there’s a 30% cost associated with raising an exception. In that case, if our data had a 10% overlap, we’d have the initial 10% from the exception logic, and another 3% for raising our own exceptions afterwards.

This is in addition to the initial 25% cost associated with even using exceptions in the first place. Again, the upfront price from having an exception block is worse than activating various functionality within it. Even in our highly contrived example, we’d need to call our function on data that had 20% duplicates before we’d start to eclipse the initial exception cost.

How likely is that, really? In most cases where APIs are used like this, there will be an occasional race condition every few hundred thousand or million rows. In that case, the full run time of even our most broken function would be indistinguishable from the best case that had a completely inactive exception block. In the end, that’s what we’ll likely see in real-world scenarios.

At least with Postgres 9.5 on an idle VM, it would appear that exceptions add a 25% performance penalty. Whether this is enough to justify a different approach, or bar them outright, is completely dependent on the expected use case. Indeed, we’re already paying a hefty price by using a function instead of a direct INSERT statement anyway, so it could be a moot argument. A highly sensitive transactional database would probably avoid the overhead, while a more common day-to-day system wouldn’t even notice.

Just keep that number in mind: 25%. It’s the cost of doing business with exceptions, but hardly a deal breaker.

Joe Abbate: The Future of Pyrseas: Part 3

1 July, 2016 - 00:25

The second Andromeda feature, aside from schema version control, that Pyrseas wanted to rescue was the automatic generation of search and detailed editing screens.

It is somewhat amazing that 40 years after Moshé Zloof’s “Query by Example” and over 30 years after “A Forms Application Development System” by Prof. Larry Rowe and Kurt Shoens, most applications for searching, entering and maintaining data are still reinventing the wheel.

There are of course apps such as PgAdmin and Adminer that can be used for general querying and editing, but their target audience is developers and DBAs, not end users. Furthermore, these tools typically only allow querying a single table at a time (see QBE queries #3 and subsequent, in the aforementioned paper, for examples that usually can’t be handled).

At first, I thought I would develop a typical Python WSGI application. I planned to use Werkzeug because the extra machinery in larger frameworks such as Django is unnecessary for the envisioned purpose. An additional consideration was that a smaller library or framework would make it easier for anyone wanting to extend the automatically generated pieces into a full-fledged app.

I started designing a prototype application and also exploring other WSGI contenders. Then Jacob Kaplan-Moss and Tarek Ziadé jolted my thought process: a JavaScript-based framework was the way forward for a web-based app with Python (and Werkzeug) relegated to a REST API server.

So I took a look at AngularJS, Ember.js, Backbone.js and several other JavaScript frameworks and libraries. At this time, it seems a simpler library such as Backbone.js, or perhaps the Marionette framework, may be more appropriate. It is comforting to see that PgAdmin4 has chosen a similar path for its web-deployed client.

This capability is ultimately quite ambitious, so perhaps Pyrseas ought to stick just to schema version control. Nevertheless, whether integrated with Pyrseas or not, this functionality ought to re-use the database-catalog-to-Python-structures-to-YAML (and vice versa) features developed for Pyrseas.

This concludes this three-part series. Please also see part 1 and part 2.

Filed under: Postgres, Python, User interfaces

Umair Shahid: Using Querydsl with PostgreSQL

30 June, 2016 - 18:53

Querydsl is a domain model centric ORM that was created out of a need to add typesafety to HQL. As demonstrated in my blog on HQL, to write queries using HQL, at times long string concatenations need to be used. Such strings are not only hard to read, they are prone to runtime errors, avoiding which is the main driver for using ORMs.

Although it was initially primarily targeted at HQL, Querydsl is now also available for JPA, JDO, JDBC, Lucene, Hibernate Search, MongoDB, Collections and RDFBean.

Why Use Querydsl?

The biggest advantage Querydsl offers is typesafety. Rather than relying on strings, it provides you the ability of querying your database using functions. This takes care of syntactic correctness and ensures that you don’t get runtime errors.

With typesafety, comes the added benefit of auto code completion using your favorite IDE. Your ability to reference domain types and properties is greatly enhanced as is your ability to quickly adopt changes in the domain.

With proper object annotation, Querydsl automatically generates domain classes that you can use in your subsequent querying as demonstrated below.

Prominent Features

In order to demonstrate some of the features of Querydsl, let’s continue using the same ‘largecities’ example we have been using for the previous blogs in this series.


In order to get Querydsl working, you first need to download the libraries from Also, you need to have a persistence.xml file present in your project’s META-INF folder. A sample format for the XML file is given below:

<?xml version="1.0" encoding="UTF-8"?> <persistence version="2.0"    xmlns="" xmlns:xsi=""    xsi:schemaLocation="">    <persistence-unit name="QuerydslExample" transaction-type="RESOURCE_LOCAL">        <provider>org.eclipse.persistence.jpa.PersistenceProvider</provider>        <class>org.secondquadrant.javabook.querydsl.LargeCities</class>        <properties>            <property name="javax.persistence.jdbc.driver" value="org.postgresql.Driver" />            <property name="javax.persistence.jdbc.url" value="jdbc:postgresql://localhost:5432/postgres" />            <property name="javax.persistence.jdbc.user" value="postgres" />            <property name="javax.persistence.jdbc.password" value="" />        </properties>    </persistence-unit> </persistence>

In case you are using an IDE, you will need to ensure that it understands how to translate the annotations. For Eclipse, as an example, besides ensuring that Querydsl libraries are on the build path, you need to specify annotation processing available under project Properties -> Java Compiler -> Annotation Processing. You should enable the project-specific settings and specify an output source directory for generated code along with adding a new key/value pair: defaultOverwrite/true. Under ‘Factory Path’ you should give the path to annotation processing jar file: querydsl-jpa-x.x.x-apt-hibernate-one-jar.jar.

If you don’t use an IDE, these settings will need to be specified in ant and maven settings XML.

Generating the Query Type

With all settings in place, the query type will automatically get generated when you make and save the domain type. For our case, the domain type is:

package org.secondquadrant.javabook.querydsl; import javax.persistence.Entity; import javax.persistence.Id; @Entity public class LargeCities { @Id private int rank; private String name; public int getRank() { return rank; } public void setRank(int rank) { this.rank = rank; } public String getName() { return name; } public void setName(String name) { = name; } }

Note how the annotations have been used here.

When you save this domain type, the following code will automatically get generated as the query type by Querydsl:

package org.secondquadrant.javabook.querydsl; import static com.querydsl.core.types.PathMetadataFactory.*; import com.querydsl.core.types.dsl.*; import com.querydsl.core.types.PathMetadata; import javax.annotation.Generated; import com.querydsl.core.types.Path; /** * QLargeCities is a Querydsl query type for LargeCities */ @Generated("com.querydsl.codegen.EntitySerializer") public class QLargeCities extends EntityPathBase<LargeCities> {    private static final long serialVersionUID = -1795361894L;    public static final QLargeCities largeCities = new QLargeCities("largeCities");    public final StringPath name = createString("name");    public final NumberPath<Integer> rank = createNumber("rank", Integer.class);    public QLargeCities(String variable) {        super(LargeCities.class, forVariable(variable));    }    public QLargeCities(Path<? extends LargeCities> path) {        super(path.getType(), path.getMetadata());    }    public QLargeCities(PathMetadata metadata) {        super(LargeCities.class, metadata);    } }

You are now all set to go!

Querying the Database

In order to query the database, you need to create an entitymanager, using which you can create the query factory. This query factory then uses the query type generated above to fetch data from the database. A simple select-all, therefore is programmed as:

       try  {                 EntityManager em = getEmf().createEntityManager();         JPAQueryFactory factory = new JPAQueryFactory(em);         QLargeCities cities = QLargeCities.largeCities;         List<LargeCities> list = factory.selectFrom(cities).fetch();                 for (LargeCities a : list) {         System.out.println("Rank: " + a.getRank() + " Name: " + a.getName());         }        }        catch (Exception e) {            e.printStackTrace();        }

The output of this program (using the same data set as in the previous sections) is:

Rank: 1 Name: Tokyo Rank: 2 Name: Seoul Rank: 3 Name: Shanghai Rank: 4 Name: Guangzhou Rank: 5 Name: Karachi Rank: 6 Name: Delhi Rank: 7 Name: Mexico City Rank: 8 Name: Beijing Rank: 9 Name: Lagos Rank: 10 Name: Sao Paulo Specifying Extract Criteria

There are different ways of how you can specify an extract criteria. Simple WHERE clause usage is demonstrated below in a program that fetches the top 5 cities in the table.

       try  {                 EntityManager em = getEmf().createEntityManager();         JPAQueryFactory factory = new JPAQueryFactory(em);         QLargeCities cities = QLargeCities.largeCities;         List<LargeCities> list = factory.selectFrom(cities).where(;                 for (LargeCities a : list) {         System.out.println("Rank: " + a.getRank() + " Name: " + a.getName());         }        }        catch (Exception e) {            e.printStackTrace();        }

Notice the usage of function ‘lt’ which is used for ‘less than’. Output of this program is:

Rank: 1 Name: Tokyo Rank: 2 Name: Seoul Rank: 3 Name: Shanghai Rank: 4 Name: Guangzhou Rank: 5 Name: Karachi Data Ordering

Specifying order of the retrieved data is similarly handled using functions.

       try  {                 EntityManager em = getEmf().createEntityManager();         JPAQueryFactory factory = new JPAQueryFactory(em);         QLargeCities cities = QLargeCities.largeCities;         List<LargeCities> list = factory.selectFrom(cities).orderBy(cities.rank.desc()).fetch();                 for (LargeCities a : list) {         System.out.println("Rank: " + a.getRank() + " Name: " + a.getName());         }        }        catch (Exception e) {            e.printStackTrace();        }

Output of this program is:

Rank: 10 Name: Sao Paulo Rank: 9 Name: Lagos Rank: 8 Name: Beijing Rank: 7 Name: Mexico City Rank: 6 Name: Delhi Rank: 5 Name: Karachi Rank: 4 Name: Guangzhou Rank: 3 Name: Shanghai Rank: 2 Name: Seoul Rank: 1 Name: Tokyo Updating Data

Much like in JPA, for any data changes, you need to start and then commit a transaction. The example below updates the database replacing the 10th entry with an 11th one, i.e Mumbai.

       try  {                 EntityManager em = getEmf().createEntityManager();         JPAQueryFactory factory = new JPAQueryFactory(em);         QLargeCities cities = QLargeCities.largeCities;                 em.getTransaction().begin();         factory.update(cities).where(cities.rank.eq(10)).set(cities.rank, 11).set(, "Mumbai").execute();         em.getTransaction().commit();                 List<LargeCities> list = factory.selectFrom(cities).fetch();                 for (LargeCities a : list) {         System.out.println("Rank: " + a.getRank() + " Name: " + a.getName());         }        }        catch (Exception e) {            e.printStackTrace();        }

Output of this program is:

Rank: 1 Name: Tokyo Rank: 2 Name: Seoul Rank: 3 Name: Shanghai Rank: 4 Name: Guangzhou Rank: 5 Name: Karachi Rank: 6 Name: Delhi Rank: 7 Name: Mexico City Rank: 8 Name: Beijing Rank: 9 Name: Lagos Rank: 11 Name: Mumbai

As you can see, the 10th entry has now been updated to reflect Mumbai as the 11th largest city.

Deleting Data

Deleting data follows a pattern very similar to the update shown above.

       try  {                 EntityManager em = getEmf().createEntityManager();         JPAQueryFactory factory = new JPAQueryFactory(em);         QLargeCities cities = QLargeCities.largeCities;                 em.getTransaction().begin();         factory.delete(cities).where(cities.rank.eq(11)).execute();         em.getTransaction().commit();                 List<LargeCities> list = factory.selectFrom(cities).fetch();                 for (LargeCities a : list) {         System.out.println("Rank: " + a.getRank() + " Name: " + a.getName());         }        }        catch (Exception e) {            e.printStackTrace();        }

Output of this code is:

Rank: 1 Name: Tokyo Rank: 2 Name: Seoul Rank: 3 Name: Shanghai Rank: 4 Name: Guangzhou Rank: 5 Name: Karachi Rank: 6 Name: Delhi Rank: 7 Name: Mexico City Rank: 8 Name: Beijing Rank: 9 Name: Lagos

As you can see, the last entry that we had updated to ‘Mumbai’ has now been deleted.

Drawbacks of Using Querydsl

While Querydsl can provide you excellent typesafety and the ability for rapid development, it is not a tool that would fit all development scenarios. As an example, because of the way it is designed, Querydsl takes a domain centric approach. If your application is going to be primarily driven by your relational model, Querydsl is likely not the best tool suited for the job.

Building on this point, Querydsl is also not well suited in an environment where an existing application needs to be enhanced (the scenario is more common than you would imagine!). An existing application typically has an existing relational model. If you are forced to use Querydsl in such a scenario, you will need to write your object classes that map exactly to the existing data model, raising the probability of runtime errors besides being a fairly tedious task.

Hubert 'depesz' Lubaczewski: What’s happening in my database?

29 June, 2016 - 16:14
Lately at least two people on irc asked questions similar to “how do I know how many queries there are in database, per second?“. So, let's see what we can find out. First of all, we can't really tell how many queries per second there are. But we can find out some other metrics, that […]

Nikolay Shaplov: Fun with random

28 June, 2016 - 19:29
My colleague found curious behavior of random():

postgres=# select (select random() as value ) from generate_series(1,10) as k; value ------------------- 0.715956253930926 0.715956253930926 0.715956253930926 0.715956253930926 0.715956253930926 0.715956253930926 0.715956253930926 0.715956253930926 0.715956253930926 0.715956253930926 (10 rows)

Something like
int rnd=5; // This figure is really random. Got it by rolling a die

gabrielle roth: Slow Query Checklist

28 June, 2016 - 04:15
Every slow query situation has its own quirks.  Here’s my generic template for troubleshooting them. First, I start with the holy trinity of IT Troubleshooting Questions: What are you trying to do? What data do you want? What is the end result you want? How does your result differ from what you expected? “I thought […]

Shaun M. Thomas: PG Phriday: DIY in the CLI (Part 1)

24 June, 2016 - 20:35

On a higher level, Postgres has a bevy of libraries, interfaces, and clients for accessing a database instance. From language APIs to GUIs like pgAdmin, or SaaS entries like JackDB, every flavor of interaction is covered. And yet, that’s only a small part of the story. For those who dare to tread into the watery depths, there’s also the world of dark incantations that is the command-line.

While most are aware of psql, the Postgres command-line client for accessing databases, there are far more creatures lurking in the black oblivion which deserve more visibility. In the spirit of getting the most from our tools, let’s take a look at these and consider how they might be useful in scenarios that aren’t restricted to boring old DBA considerations. To that end, let’s cover some of the standard client applications that anyone might use.

To begin, there are several commands that cover some standard SQL operations. Things like creating or dropping databases, users, or procedural languages, are all possible without actually connecting to Postgres. Normally, we’d have to use the equivalent SQL commands:

CREATE DATABASE foo; DROP DATABASE foo;   CREATE USER kitty_cat WITH PASSWORD 'M30w9|_|rr'; DROP USER kitty_cat;   CREATE LANGUAGE plpythonu; DROP LANGUAGE plpythonu;

While these are all possible to use by forcing them through psql -c, that makes automation testing and other tasks somewhat awkward, or force sysadmins to learn SQL for no reason. So Postgres provides functional wrappers we can use instead. They’re named about what we might expect, too:

createdb foo dropdb foo   createuser kitty_cat --pwprompt dropuser kitty_cat   createlang plpythonu droplang plpythonu

As would be expected, all of these commands include the optional parameters from their SQL equivalents. This means a couple of handy things: flags can be listed with -? or --help, and longer descriptions are available via comprehensive man pages. The Postgres online documentation is great, and having all of this information at our fingertips is even better.

Of course, we’re not limited to databases, users, and languages. There are a few commands to streamline basic maintenance as well, and as might be expected, these also have SQL equivalents. Well, mostly. Consider these three scenarios:

  1. We want to VACUUM every database every night following a busy day of transactions. We’ve disabled autovacuum because we have a very active database, and have instead, opted for nightly maintenance.
  2. Over the years, our indexes on the user_session and login_state tables have gotten a bit fragmented, and we want to rebuild them.
  3. We have clustered multiple tables, sorting their disk pages along beneficial indexes. Due to MVCC storage, tables that experience updates need periodic re-clustering to maintain this ordering. To do this, we need to run CLUSTER regularly.

Assuming we have two databases, named ‘trading’ and ‘operations’ for example, we could do these things using SQL commands:

\c trading VACUUM ANALYZE; CLUSTER; REINDEX TABLE user_session; REINDEX TABLE login_state;   \c operations VACUUM ANALYZE; CLUSTER;

This is all perfectly serviceable. Yet for administrators or end users that don’t really care about SQL syntax, or for operations we want to automate, there is an easier way. We could get the same result using three command-line tools:

vacuumdb --all --analyze reindexdb --table=user_session --table=login_state trading clusterdb --all   # Or with short options:   vacuumdb -az reindexdb -t user_session -t login_state trading clusterdb -a

Unlike their SQL equivalents, the command-line tools can combine operations or target multiple objects. We leveraged that to reindex both tables with a single command, and vacuum or cluster all databases in our instance. This is about the time our tools become more than functional wrappers of the SQL commands. Parameters like --all illustrate client-level looping at the very least, and provide a great segue into heavier utilities.

Backing up databases and restoring them, for instance, leverages multiple high and low level Postgres interfaces. Due to this complexity, there is no SQL analog. The Postgres backup and restore utilities have also matured a lot over the years, gaining parallel dumping functionality as recently as 9.3.

Taking this into account, we could clone one of the previously mentioned databases using provided tools in two different ways:

# Just copy a target database by itself:   pg_dump --jobs=2 --format=directory --file=backup_dir trading createdb trading_clone pg_restore --jobs=2 --dbname=trading_clone backup_dir   # Copy the whole freaking instance at the binary level.   pg_basebackup -D clone_dir

Just so we’re clear, both approaches have pluses and minuses as well as applicable scenarios beyond the scope of this article. We’re merely illustrating very basic usage. Beyond that, the first method is a variant of the common dump/restore pattern used since time immemorial to perform Postgres upgrades until pg_upgrade hit the scene in 9.0. One of the primary reasons it fell out of favor was due to the growing trend of immense databases.

Even using parallel functionality, dumping the contents of every table in a database 200GB or larger will be extremely slow. Restoring that information is even worse, as we not only have to import all of the same data, but all indexes and constraints must be rebuilt from scratch. Before pg_restore was capable of parallel restores in 8.4, restores were even more frustrating. This, along with the new online replication capabilities, is the origin of pg_basebackup.

By default, pg_basebackup merely utilizes the Postgres data replication stream and reconstructs the host instance in a specified directory. This mans we can start Postgres from that directory as a replica, or use pg_upgrade to test newer versions while the old one keeps running. This encourages online failbacks in case of failed upgrades, multiple upgrade test runs, or running applications on new versions to test for edge cases and compatibility concerns.

Principally, it allows us to separate instance management from content management. We still need pg_dump and pg_restore, but they’re no longer primarily backup tools. This is espicially true for pg_dump. It has multiple export formats which can theoretically be used to transfer table data into other storage systems. Or maybe we just want to export and inspect raw schema creation commands.

What else is left on the client end of things? For those that want to build Postgres extensions, or get information about the Postgres binaries installed on a server, we have pg_config. Its output is basically a bunch of environment settings that were used to build Postgres itself. This is especially useful with distributed builds that have long lists of arbitrary configure flags. After all, here’s how Ubuntu compiles it:

pg_config --configure   '--with-tcl' '--with-perl' '--with-python' '--with-pam' '--with-openssl' '--with-libxml' '--with-libxslt' '--with-tclconfig=/usr/lib/x86_64-linux-gnu/tcl8.6' '--with-includes=/usr/include/tcl8.6' 'PYTHON=/usr/bin/python' '--mandir=/usr/share/postgresql/9.4/man' '--docdir=/usr/share/doc/postgresql-doc-9.4' '--sysconfdir=/etc/postgresql-common' '--datarootdir=/usr/share/' '--datadir=/usr/share/postgresql/9.4' '--bindir=/usr/lib/postgresql/9.4/bin' '--libdir=/usr/lib/x86_64-linux-gnu/' '--libexecdir=/usr/lib/postgresql/' '--includedir=/usr/include/postgresql/' '--enable-nls' '--enable-integer-datetimes' '--enable-thread-safety' '--enable-tap-tests' '--enable-debug' '--disable-rpath' '--with-uuid=e2fs' '--with-gnu-ld' '--with-pgport=5432' '--with-system-tzdata=/usr/share/zoneinfo' 'CFLAGS=-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -I/usr/include/mit-krb5 -fPIC -pie -DLINUX_OOM_SCORE_ADJ=0 -fno-omit-frame-pointer' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -L/usr/lib/mit-krb5 -L/usr/lib/x86_64-linux-gnu/mit-krb5' '--with-krb5' '--with-gssapi' '--with-ldap' '--with-selinux' 'CPPFLAGS=-D_FORTIFY_SOURCE=2

Some of that might be nice to know on occasion.

We also have pg_isready, a “new” addition with 9.3. It’s just a quick connection check with an exit status that follows standard exit codes. This is good for basic monitoring, but not much else. On the other hand, we no longer have to run a bogus command through psql and deal with the much more verbose client-oriented output. On an automation front, that’s a major step forward.

And finally, there’s benchmarks. Perhaps we’re testing new hardware, or a new Postgres version, or maybe we’re just curious. Having recently escaped existence as a “mere” contrib tool in 9.5, now anyone and their dog can beat up Postgres for fun and profit. It’s great to obtain data for articles like this, too.

Here’s a quick example:

createdb pgbench pgbench --initialize --scale=100 pgbench pgbench --client=4 --jobs=2 --select-only --time=10 pgbench   starting vacuum...end. transaction type: SELECT only scaling factor: 100 query mode: simple number of clients: 4 number of threads: 2 duration: 10 s number of transactions actually processed: 168447 latency average: 0.237 ms tps = 16844.169409 (including connections establishing) tps = 16853.718425 (excluding connections establishing)

Generally read/write tests should be larger than memory to really stress a system, but we’d need a much larger scale for that on most modern systems. At scale 100 we get about 1.5GB over 10M rows of data, so on a small 8GB test VM, we’d need a scale of 600 or higher to even start hitting the disks.

Beyond the basic test framework, we can also execute arbitrary SQL scripts. This makes it possible to stress-test transactional databases with multiple parallel clients, running anything from simple SELECT statements on our own tables, to intricate stored procedures that update fact tables. As such, it’s one of the most valuable tools in the arsenal of both users and DBAs, provided either even know it exists.

There are a couple more tools I’ve left out, but they’re related to logical and stream replication and are better covered in a more in-depth manner. Beyond that, take a look at your Postgres binary directory; you might be surprised what’s hiding there.

Because distributions like to hide all the cool stuff, we may have to go looking for some of these utilities. Debian and Ubuntu users can find binaries in /usr/lib/postgresql/9.x/bin, while RHEL variants like CentOS should look in /usr/pgsql-9.x/bin.

Explore often, but do so cautiously!

Joshua Drake: PgConf.US: 2016 Kicking the donkey of PostgreSQL Replication

24 June, 2016 - 00:24

My slides from my presentation and PgConf.US 2016:


Hans-Juergen Schoenig: PostgreSQL instance-level encryption

23 June, 2016 - 12:25

We have recently published a patch to provide full database encryption for PostgreSQL. Many business critical applications require full encryption to match legal or business requirements. As more and more customers asked for this, we decided to tackle this important issue. The idea behind our patch is to store data on disk in encrypted format […]

The post PostgreSQL instance-level encryption appeared first on Cybertec - The PostgreSQL Database Company.

Ernst-Georg Schmid: When 'good enough' is good enough - approximate answers with PostgreSQL 9.4999

23 June, 2016 - 09:44
Approximation in databases seems to be an alien concept at first. But if your application can deal with a known and controllable degree of error, it can help even in cases where conventional tuning is not an option for whatever reason.
Approximation is not evil One of the top requirements for database systems is reliability. Whether you run a bank or a large retail business, you don't want to lose a cent here or there or charge your customer twice for the Pink Fluffy Unicorn he just bought, just because the DBMS gave a wrong answer. Classic OLTP operations have to be always 100% correct.

However, for the case of analytics, things become different. In some cases, it can be desirable to trade a bit of accuracy for a lot of speed. This is called approximation and to many database people (and users), the concept of accepting results with less than 100% accuracy seems strange at first.

But if you know - and can control - the error introduced by approximation, it is not. It can even be very useful, if a 95% accurate answer now is worth more than a 100% accurate answer tomorrow.

Welcome to approximation in PostgreSQL 9.5.
Approximating queriesApproximate queries work on subsets, called samples, of the whole data set, called the population.
If the sampling is done statistically correct, a sample much smaller than the whole population gives answers close to the real answer within a known error range.

A possible application for the hypothetical retail business would be to find which product is currently trending.
Instead of knowing that exactly 500, 1000 and 2000 Pink Fluffy Unicorns were sold in the last three weeks, knowing that 498, 1001 and 1999 Pink Fluffy Unicorns were sold in the last three weeks with let's say 5% error tells the procurement people that Pink Fluffy Unicorns are a trending product just as fine as the exact numbers. Only, they might have to wait a few seconds for the answer instead of a few hours...

PostgreSQL 9.5 has built-in support for approximate queries. Because I'm lazy and already wrote about this I just point to the corresponding post.
 Still, all the population data has to be there for approximate queries to work. How about running queries without storing the underlying data at all?Approximating data structures If PostgreSQL has a weakness, it's the comparably poor performance of count() and distinct. Due to the lock-free multiversion concurrency design of PostgreSQL, count() has to touch each row in a table to check whether it is visible in the current transaction or not. Unlike locking DBMS like Oracle, it can only use an index to count in a few cases . Full table scan.

Distinct always has to sort the table. It can use an index, but only covering indexes, and the larger the index is compared to the table, the less likely PostgreSQL will use it. Sorting can be tuned by raising work_mem, but since this is a per session parameter, it is limited by available RAM.

So count(distinct) is like the worst of both worlds (In the following example distinct alone is slower, because it has to return ten million rows to the client, count(distinct) returns only one value). Like here (times are w/o Index / w Index):   create table hmr(id serial, value real);

insert into hmr (value) select random()*10000000 from generate_series(1,10000000);

select count (value) from hmr; --826 msec. / 817 msec.

select distinct value from hmr; --33917 msec. / 32420 mesc.

select count (distinct value) from hmr; -- 9665 msec. / 9439 msec.
Enter the HyperLogLog cardinality estimator. Some clever people at Google observed, that the cardinality of a multiset of evenly distributed random numbers can be predicted by finding the maximum number of leading zeroes in the binary representation of those numbers: For a maximum of k leading zeroes, the cardinality is 2^k.
HyperLogLog uses a hash function to transform arbitrary input values into such random numbers and thus allows to estimate the cardinality of an input multiset for cardinalities > 10^9 with a 2-3% error, using only 1280 bytes of storage
PostgreSQL has a HyperLogLog extension, hll.   create extension hll;

CREATE TABLE cardinality (
            id      integer,
            set     hll

INSERT INTO cardinality(id, set)
    SELECT 1, (select hll_add_agg(hll_hash_any(value))
    FROM hmr); -- 2267 msec.

SELECT hll_cardinality(set)::int FROM cardinality WHERE id = 1; -- 11 msec.   Since count distinct(value) = 8187749 and hll_cardinality = 8470057, the error is ~3%

Another, not so PostgreSQL specific example would be a database that has a stream table, e.g. holding only one hour worth of events at any given point in time. I showed how to do this with stock PostgreSQL and a bit of Java here and here.

If you also want to know, how many distinct events that stream has seen in total, it's impossible, unless you store all distinct values and update their counts every time a new event arrives. But then, you might end up in storing all events - which is not what you wanted in the first place if you chose to use a stream table.

With HyperLogLog it's easy. Update your HyperLogLog estimator on every new event and you get a good approximation how many distinct values the stream has seen in total.
Approximating indexes9.5 introduced BRIN indexes for very large tables. Unlike e.g. a btree, BRIN stores only ranges of values and points to the physical pages where a value that falls into that range could possibly be found.
A BRIN index thus only gives precise answers to the question where a certain value could not be found on disk.
9.6 will have Bloom-Filter indexes as an extension. Bloom filters can tell you that a value does not exist in a set with perfect accuracy. But the question if a value exists in the set can only be answered with a probability that increases with the collision resilience of the underlying hash.

So, as BRIN and Bloom indexes both are approximating indexes, every index hit has to be rechecked by the DBMS against the underlying data. But if you know their limitations and use them wisely, they too can speed up your queries significantly.

Michael Paquier: Postgres 9.6 feature highlight, pg_visibility

23 June, 2016 - 08:25

PostgreSQL 9.6 is shipping with a new contrib module manipulating and giving some input on the visibility map of a relation:

Add pg_visibility contrib module. This lets you examine the visibility map as well as page-level visibility information. I initially wrote it as a debugging aid, but was encouraged to polish it for commit. Patch by me, reviewed by Masahiko Sawada. Discussion:

The visibility map, associated to a relation in its own file, which is named with the suffix _vm, tracks information related to the visibility of tuples on relation pages for each backend. Up to 9.5, 1 bit was used per heap page, meaning that if this bit is set all the tuples stored on this page are visible to all the transactions. In 9.6, 2 bits are being used, the extra bit added is used to track if all tuples on a given page have been frozen or not, critically improving VACUUM performance by preventing full table scans.

pg_visibility contains a couple of functions allowing one to get a look at the status of each page’s bits. The first one, and aimed at general purposes, gives an insight about the all-frozen and all-visible status for each page of a relation, plus the status of PD_ALL_VISIBLE which is the same information as the all-visible flag except that it is stored in the heap page itself and not the VM file:

=# CREATE TABLE tab_visible AS SELECT generate_series(1,1000) AS id; SELECT 1000 =# SELECT * FROM pg_visibility('tab_visible'::regclass); blkno | all_visible | all_frozen | pd_all_visible -------+-------------+------------+---------------- 0 | f | f | f 1 | f | f | f 2 | f | f | f 3 | f | f | f 4 | f | f | f (5 rows)

This function can take an optional argument in the shape of a block number. pg_visibility_map is similar to the previous function, except that it does not scan the all-visible flag value on the page and it just fetches what is available on the visibility map.

Then come the sanity checkers: pg_check_visible and pg_check_frozen that return a list of TIDs where refer to tuples that are respectively not all-visible and all-frozen even if the page they are on is marked as such. Those functions returning an empty set means that the database is not corrupted. If there are entries. Oops.

=# SELECT pg_check_visible('tab_visible'::regclass); pg_check_visible ------------------ (0 rows) =# SELECT pg_check_frozen('tab_visible'::regclass); pg_check_frozen ----------------- (0 rows)

And finally is a function that may become useful for maintenance purposes: pg_truncate_visibility_map which removes the visibility map of a relation. The next VACUUM that runs on this relation will forcibly rebuilt the visibility map of the relation. Note that this action is WAL-logged.

=# SELECT pg_truncate_visibility_map('tab_visible'::regclass); pg_truncate_visibility_map ---------------------------- (1 row)

Ian Barwick: Letter from Japan

22 June, 2016 - 10:16

Greetings, honourable reader.

My name is Barwick of the 2ndQuadrant Company. With the onset of the Plum Rain, in my humble estimation it has become hot and humid recently. I trust all is well in your exalted undertakings?

No, I haven’t flipped a bit – the above is actually pretty much how a Japanese business letter – or email – starts off, and there’s a plethora of sites such as this one providing examples of various kinds of formal email. But fear not, this won’t be a lesson in written Japanese business etiquette, but the first in an occasional series of posts from Japan, a country which embraced PostgreSQL very early on and has contributed much to its development.

A potted PostgreSQL history, Japanese style

Back in 2006, on a visit to Japan before I started living here, I happened to wander into a bookstore and was looking for something on handling Japanese full-text search (a tricky subject in a language which has three or four different writing systems and multiple ways of writing the same word but no spaces), when something odd struck me – there were clearly more PostgreSQL books than MySQL ones. This was at a time when PostgreSQL was globally still in a bit of a niche and an equivalent bookshop in Europe would feature a couple of often dated volumes
cowering next to masses of dolphin-orientated literature.

Occasionally it’s postulated that PostgreSQL’s popularity in Japan is because it’s easier to pronounce than MySQL – this is a country where “Los Angeles” is abbreviated to “Rosu“, so it’s no surprise that the even more cumbersome rendering of PostgreSQL (“posutogure-esu-kyuu-eru“) is shortened to “posugure” (look it up on Google if you don’t believe me) – whereas the equivalent “mai” is short and ambiguous to use in normal conversation.

However the reality is more prosaic – during the mid/late 1990s, as the internet was taking off and open source databases were becoming viable, PostgreSQL got a head start in the accessibility stakes. A Japanese-language mailing list was set up as far back as Postgres95 and the Japan PostgreSQL User Group (JPUG) was founded in 1999, from which a vibrant community has emerged, providing seminars, conferences and perhaps crucially, very up-to-date Japanese language translations of the PostgreSQL documentation. And the rest, as they say, is history – PostgreSQL is now supported by most of the big players in the IT industry here, some of whom are also household names globally, and of the 22 PostgreSQL committers, three are from Japan (listed in order of first commit: Tatsuo Ishii, Itagaki Takahiro and Fujii Masao).

In future posts I’ll be writing about PostgreSQL-related subjects in Japan and hope to present some new perspectives on aspects which aren’t widely known in the outside world.

Oh, and that “Plum Rain” I mentioned? It’s not an actual precipitation of fruit, but the literal translation of “tsuyu“, the hot sticky rainy period between spring and summer, often described as the fifth of Japan’s four seasons.

Hans-Juergen Schoenig: BRIN indexes: Correlation, correlation, correlation

21 June, 2016 - 12:01

Since BRIN indexes have been introduced in PostgreSQL 9.5, many people have gladly adopted this new index type. A lot has been written about this new feature and a lot of positive feedback has been reported. While BRIN indexes are clearly a success and definitely a win, some people tend to exagerate and use them […]

The post BRIN indexes: Correlation, correlation, correlation appeared first on Cybertec - The PostgreSQL Database Company.

gabrielle roth: RDS: log vacuum messages

21 June, 2016 - 04:36
Last year I commented that “autovacuum log messages are still missing in [RDS] 9.4“. Amazon fixed that for versions 9.4.5 and up with the custom GUC rds.force_autovacuum_logging_level. I discovered this week that it’s also available on earlier versions of RDS – I have it enabled on a 9.3.10 instance. The default value in the config […]