Planet PostgreSQL
Bruce Momjian: Why People Contribute to Postgres and Open Source
There was a rather philosophical discussion on the email lists last week about why people contribute to Postgres and open source that is definitely worth reading. I believe the most profound comment was this:
The work on PostgreSQL is adventure, and very good experience, very good school for me. It's job only for people who like programming, who like hacking, it isn't job for people, who go to office on 8 hours. Next I use PostgreSQL for my job - and hacking on PostgreSQL put me a perfect knowledge, perfect contacts to developers, and I can work together with best programmers on planet. and I can create some good things. Probably if I work on commercial projects I can have a better money - but life is only one, and money is important, but not on top for me - life have to be adventure!
Robert Treat: BWPUG September Meeting 2010-09-08: PostgreSQL Security and SE-Postgres
When: Septmber 8th, ~6:30PM.
Where: 7070 Samuel Morse Dr, Columbia, MD, 21042.
Host: OmniTI
As always we have time for networking and likely hit one of the local
restaurants after the meeting, hope to see you there.
Joshua D. Drake: PgWest 2010, CFP about to close!
Dimitri Fontaine: Synchronous Replication
Although the new asynchronous replication facility that ships with 9.0 ain't released to the wide public yet, our hackers hero are already working on the synchronous version of it. A part of the facility is rather easy to design, we want something comparable to DRBD flexibility, but specific to our database world. So synchronous would either mean recv, fsync or apply, depending on what you need the standby to have already done when the master acknowledges the COMMIT. Let's call that the service level.
The part of the design that's not so easy is more interesting. Do we need to register standbys and have the service level setup per standby? Can we get some more flexibility and have the service level set on a per-transaction basis? The idea here would be that the application knows which transactions are meant to be extra-safe and which are not, the same way that you can set synchronous_commit to off when dealing with web sessions, for example.
Why choosing? I hear you ask. Well, it's all about having more data safety, and a typical setup would contain an asynchronous reporting server and a local failover synchronous server. Then add a remote one, too. So even if we pick the transaction based facility, we still want to be able to choose at setup time which server to failover to. Than means we don't want that much flexibility now, we want to know where the data is safe, we don't want to have to guess.
Some way to solve that is to be able to setup a slave as being the failover one, or say, the sync one. Now, the detail that ruins it all is that we need a timeout to handle worst cases when a given slave loses its connectivity (or power, say). Now, the slave ain't in sync any more and some people will require that the service is still available (timeout but COMMIT) and some will require that the service is down: don't accept a new transaction if you can't make its data safe to the slave too.
The answer would be to have the master arbitrate between what the transaction wants and what the slave is setup to provide, and what it's able to provide at the time of the transaction. Given a transaction with a service level of apply and a slave setup for being async, the COMMIT does not have to wait, because there's no known slave able to offer the needed level. Or the COMMIT can not happen, for the very same reason.
Then I think it all flows quite naturally from there, and while arbitrating the master could record which slave is currently offering what service level. And offering the information in a system view too, of course.
The big question that's not answered in this proposal is how to setup that being unable to reach the wanted service level is an error or a warning?
That too would need to be for the master to arbitrate based on a per standby and a per transaction setting, and in the general case it could be a quorum setup: each slave is given a weight and each transaction a quorum to reach. The master sums up the weights of the standby that ack the transaction at the needed service level and the COMMIT happens as soon as the quorum is reached, or is canceled as soon as the timeout is reached, whichever comes first.
Such a model allows for very flexible setups, where each standby has a weight and offers a given service level, and each transaction waits until a quorum is reached. Giving the right weights to your standbys (like, powers of two) allow you to set the quorum in a way that only one given standby is able to acknowledge the most important transactions. But that's flexible enough you can change it at any time, it's just a weight that allows a sum to be made, so my guess would be it ends up in the feedback loop between the standby and its master.
The most appealing part of this proposal is that it doesn't look complex to implement, and should allow for highly flexible setups. Of course, the devil is in the details, and we're talking about latencies in the distributed system here. That's also being discussed on the mailing list.
PostgreSQL Weekly News: PostgreSQL Weekly News September 5th 2010
Selena Deckelmann: Background reading: Locking (1970)
I’ve been reading some old papers about locking and MVCC in preparation for writing about MVCC in PostgreSQL, and for giving a talk at CouchCamp next week!
I just finished “Locking“, by Jim Gray. He discusses semaphores, and makes the argument for implementing a locking scheduler to handle errors and deadlocks (which he calls interlocks, or a “deadly embrace” – a term I’m sad we’ve stopped using).
An example from the start of the paper illustrates the power of MVCC:
The classic example is an accounting file. Processes reading the file may share it
concurrently. However, a process requesting write access to the file blocks until all processes currently reading have released the file.
A lovely thing about Postgres’ MVCC is that readers (SELECT) don’t require this type of lock, and most writers don’t block readers. For SELECT, the only statements that will block it are those that make changes to tables which move all rows physically around (VACUUM FULL, CLUSTER, REINDEX, TRUNCATE), or make changes to table structure (ALTER TABLE, DROP TABLE).
Have a look at the explicit locking docs for more detail on the lock modes automatically used by PostgreSQL.
Related posts:
Selena Deckelmann: Explaining MVCC in Postgres: system defined columns
I’m playing around with some diagrams for explaining MVCC that I’ll be posting here over the next few days. Not sure if I’ll end up giving up on slides and just use a whiteboard for the talk. I made an illustrated shared buffers deck to go along with Greg Smith’s excellent talk on shared buffers a while back. This is the beginning of a talk that I hope will emulate that.
Here are my first few slides, showing the system-defined columns. The next few slides will describe optimizations PostgreSQL has for managing the side effects of our pessimistic rollback strategy, and reducing IO during vacuuming and index updates.
Related posts:
Selena Deckelmann: Using logger with pg_standby
Piping logs to syslog is pretty useful for automating log rotation and forwarding lots of different logs to a central log server.
To that end, the command-line utility ‘logger’ is nice for piping output from utilities like pg_standby without having to add syslogging code to the utility itself. Another thing is that logger comes by default with modern packages of syslog.
Here’s an easy way to implement this:
restore_command = 'pg_standby -d -s 2 -t /pgdata/trigger /shared/wal_archive/ %f %p %r 2>&1 | logger -p local3.info -t pgstandby'
Related posts:
Andreas Scherbaum: PostgreSQL @ FrOSCamp 2010 in Zurich
Andreas 'ads' Scherbaum
There's a new FOSS conference taking place at the ETH in Zurich, called FrOSCamp. It's a two day conference but unlike many others it's on friday and saturday, september 17th/18th. FUDCon, the Fedora conference, is running parallel to FrOSCamp. Same time, same location.
The PostgreSQL guys from Switzerland and myself submitted a project booth and several talks and workshops. Four talks and workshops as well as the booth got accepted. You can attend:
- PostGIS/PostgreSQL for interactive maps (talk, speaker: Stefan Keller)
- Replication with PostgreSQL 9.0 (workshop, speaker: Andreas Scherbaum)
- PostgreSQL administration and optimization (talk, speaker: Stephan Wagner)
- What's new in PostgreSQL 9.0? (talk, speaker: Andreas Scherbaum)
Come and visit us in Zurich!
Selena Deckelmann: Variable substitution with psql
Updated: Thanks @johto for s/:bar/:foo/.
A coworker asked about variable substitution with psql using \set, and so I looked into it a bit further.
You definitely can do things like this:
16:55 sdeckelmann@[local]:5432|postgres=> \set test 'select * from :foo limit 10;'
16:56 sdeckelmann@[local]:5432|postgres=> \set foo 'test'
16:56 sdeckelmann@[local]:5432|postgres=> :test
myint
-------
1
2
3
4
5
6
7
8
9
10
(10 rows)
But, what about something like this:
=> \set test 'select * from :var limit 10;'
=> :test mytable
Unfortunately, this isn’t supported.
The best you could do is something pathological like:
=> \set s 'select * from '
=> \set pr ' limit 10;'
=> :s mytable :pr
=> :s test :pr
myint
-------
1
2
3
4
5
6
7
8
9
10
(10 rows)
Related posts:
Dimitri Fontaine: Happy Numbers
After discovering the excellent Gwene service, which allows you to subscribe to newsgroups to read RSS content (blogs, planets, commits, etc), I came to read this nice article about Happy Numbers. That's a little problem that fits well an interview style question, so I first solved it yesterday evening in Emacs Lisp as that's the language I use the most those days.
A happy number is defined by the following process. Starting with any positive integer, replace the number by the sum of the squares of its digits, and repeat the process until the number equals 1 (where it will stay), or it loops endlessly in a cycle which does not include 1. Those numbers for which this process ends in 1 are happy numbers, while those that do not end in 1 are unhappy numbers (or sad numbers).
Now, what about implementing the same in pure SQL, for more fun? Now that's interesting! After all, we didn't get WITH RECURSIVE for tree traversal only, did we?
Unfortunately, we need a little helper function first, if only to ease the reading of the recursive query. I didn't try to inline it, but here it goes:
create or replace function digits(x bigint) returns setof int language sql as $$ select substring($1::text from i for 1)::int from generate_series(1, length($1::text)) as t(i) $$;That was easy: it will output one row per digit of the input number — and rather than resorting to powers of ten and divisions and remainders, we do use plain old text representation and substring. Now, to the real problem. If you're read what is an happy number and already did read the fine manual about Recursive Query Evaluation, it should be quite easy to read the following:
with recursive happy(n, seen) as ( select 7::bigint, '{}'::bigint[] union all select sum(d*d), h.seen || sum(d*d) from (select n, digits(n) as d, seen from happy ) as h group by h.n, h.seen having not seen @> array[sum(d*d)] ) select * from happy; n | seen -----+------------------ 7 | {} 49 | {49} 97 | {49,97} 130 | {49,97,130} 10 | {49,97,130,10} 1 | {49,97,130,10,1} (6 rows) Time: 1.238 msThat shows how it works for some happy number, and it's easy to test for a non-happy one, like for example 17. The query won't cycle thanks to the seen array and the having filter, so the only difference between an happy and a sad number will be that in the former case the last line output by the recursive query will have n = 1. Let's expand this knowledge into a proper function (because we want to be able to have the number we test for happiness as an argument):
create or replace function happy(x bigint) returns boolean language sql as $$ with recursive happy(n, seen) as ( select $1, '{}'::bigint[] union all select sum(d*d), h.seen || sum(d*d) from (select n, digits(n) as d, seen from happy ) as h group by h.n, h.seen having not seen @> array[sum(d*d)] ) select n = 1 as happy from happy order by array_length(seen, 1) desc nulls last limit 1 $$;We need the desc nulls last trick in the order by because the array_length() of any dimension of an empty array is NULL, and we certainly don't want to return all and any number as unhappy on the grounds that the query result contains a line input, {}. Let's now play the same tricks as in the puzzle article:
=# select array_agg(x) as happy from generate_series(1, 50) as t(x) where happy(x); happy ---------------------------------- {1,7,10,13,19,23,28,31,32,44,49} (1 row) Time: 24.527 ms =# explain analyze select x from generate_series(1, 10000) as t(x) where happy(x); QUERY PLAN ---------------------------------------------------------------------------------------- Function Scan on generate_series t (cost=0.00..265.00 rows=333 width=4) (actual time=2.938..3651.019 rows=1442 loops=1) Filter: happy((x)::bigint) Total runtime: 3651.534 ms (3 rows) Time: 3652.178 ms(Yes, I tricked the EXPLAIN ANALYZE output so that it fits on the page width here). For what it's worth, finding the first 10000 happy numbers in Emacs Lisp on the same laptop takes 2830 ms, also running a recursive version of the code.
Pavel Stehule: performance issue of SQL functions
CREATE OR REPLACE FUNCTION is_empty(text)
RETURNS bool AS $$
SELECT $1 IS NULL OR $1 = ''
$$ LANGUAGE sql;
This function returns true when parameter is empty string or is null. The first parameter is used twice and then the body of this function isn't inlined.
postgres=# EXPLAIN VERBOSE SELECT count(is_empty(CASE WHEN random() 0.5 THEN NULL ELSE 'x' END)) FROM generate_series(1,100000);
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Aggregate (cost=12.50..12.77 rows=1 width=0)
Output: count(is_empty(CASE WHEN (random() 0.5::double precision) THEN NULL::text ELSE 'x'::text END))
-> Function Scan on pg_catalog.generate_series (cost=0.00..10.00 rows=1000 width=0)
Output: generate_series
Function Call: generate_series(1, 100000)
(5 rows)
The execution time is 458ms. When I modify the body for one time paremeter using:
CREATE OR REPLACE FUNCTION is_empty(text)
RETURNS bool AS $$
SELECT COALESCE($1,'') = ''
$$ LANGUAGE sql;
Then function is inlined:
postgres=# EXPLAIN VERBOSE SELECT count(is_empty(CASE WHEN random() 0.5 THEN NULL ELSE 'x' END)) FROM generate_series(1,100000);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=12.50..12.52 rows=1 width=0)
Output: count((COALESCE(CASE WHEN (random() 0.5::double precision) THEN NULL::text ELSE 'x'::text END, ''::text) = ''::text))
-> Function Scan on pg_catalog.generate_series (cost=0.00..10.00 rows=1000 width=0)
Output: generate_series
Function Call: generate_series(1, 100000)
(5 rows)
and execution time is only 68ms (5 x faster than not inlined function).Attention: this note is valid for some more complex parameters - like volatile functions. The life isn't simple - for basic parameters (like columns or simple expression), the most simple SQL function (in my article function with OR operator and twice used parameter) can be better - wirking better with planner.
PostgreSQL Weekly News: PostgreSQL Weekly News August 29th 2010
Selena Deckelmann: Online aggregation paper from 1997 and PSUs database reading group
A couple weeks ago, Mark Wong and I took a field trip over to the Database Reading Group at Portland State University. It’s a group of students and professors that meet weekly throughout the school year to go over research papers. The papers are picked by the participants, and vary in topic from obscure to very practical.
This week’s paper reading was led by Professor Len Shapiro, and titled “Online Aggregation“. The paper is considered a foundational paper about SQL aggregates (like COUNT() or AVERAGE), and was published in 1997 by researchers from UC Berkeley and IBM. It’s also precursor to research into query parallelization and streaming databases. It was also awarded the SIGMOD “Test of Time” award in 2007, and is cited by over 170 other papers in the ACM archive.
The basic idea behind the paper centered around how to improve user experience in reporting results of aggregate queries – asking questions about how to solve three key problems when solving aggregates: blocking, fairness and control (from a user’s perspective). Roughly: Blocking is what happens when some part of the system waits and doesn’t return results to the user as a result of the waiting. Fairness concerns whether certain types of operations prevent certain groups of data from being processed (the example given had to do with GROUP BY and groups being processed one at a time). Control concerns whether or not a user can exert control over the speed of computation applied to a group (example given being a lever that “speeds up” processing of a set).
One insight from the paper is how online aggregates should be treated differently than traditional query processing – which might favor expensive plans involving sorts so that the output is ordered. When you’re dealing with online aggregates, you prefer unordered, or ideally random order, because your intermediate results will be more representative of the ultimate result. I guess that’s probably obvious once you think about it, but the paper provided some concrete examples.
Another interesting thought experiment involving the planner is how you pick plans that favor non-blocking, fairness and user control. Each of those properties is not narrowly defined, and changes based on individual user expectation. Professor Kristen Tufte mentioned that she’d be interested in how the ideas presented in this paper would be applied today, and Professor David Meier brought up that we might most be interested in applications involving managing Hadoop.
Prof Meier also brought up an interesting paper involving alternating nested loop joins during a discussion about optimizing JOIN algorithms for online aggregates. Another cool thing about the paper is that it involved modifications to Postgres! Granted, it was Postgres95, which doesn’t resemble the modern PostgreSQL 9.0 very much. But it was nice to revisit research that used Postgres that’s still relevant today.
Related posts:
Dimitri Fontaine: Playing with bit strings
The idea of the day ain't directly from me, I'm just helping with a very thin subpart of the problem. The problem, I can't say much about, let's just assume you want to reduce the storage of MD5 in your database, so you want to abuse bit strings. A solution to use them works fine, but the datatype is still missing some facilities, for example going from and to hexadecimal representation in text.
create or replace function hex_to_varbit(h text) returns varbit language sql as $$ select ('X' || $1)::varbit; $$; create or replace function varbit_to_hex(b varbit) returns text language sql as $$ select array_to_string(array_agg(to_hex((b << (32*o))::bit(32)::bigint)), '') from (select b, generate_series(0, n-1) as o from (select $1, octet_length($1)/4) as t(b, n)) as x $$;To understand the magic in the second function, let's walk through the tests one could do when wanting to grasp how things work in the bitstring world (using also some reading of the fine documentation, too).
=# select ('101011001011100110010110'::varbit << 0)::bit(8); bit ---------- 10101100 (1 row) =# select ('101011001011100110010110'::varbit << 8)::bit(8); bit ---------- 10111001 (1 row) =# select ('101011001011100110010110'::varbit << 16)::bit(8); bit ---------- 10010110 (1 row) =# select * from *TEMP VERSION OF THE FUNCTION FOR TESTING* o | b | x ---+----------------------------------+---------- 0 | 10101100101111010001100011011011 | acbd18db 1 | 01001100110000101111100001011100 | 4cc2f85c 2 | 11101101111011110110010101001111 | edef654f 3 | 11001100110001001010010011011000 | ccc4a4d8 (4 rows)What do we get from that, will you ask? Let's see a little example:
=# select hex_to_varbit(md5('foo')); hex_to_varbit ---------------------------------------------------------------------------------------------------------------------------------- 10101100101111010001100011011011010011001100001011111000010111001110110111101111011001010100111111001100110001001010010011011000 (1 row) =# select md5('foo'), varbit_to_hex(hex_to_varbit(md5('foo'))); md5 | varbit_to_hex ----------------------------------+---------------------------------- acbd18db4cc2f85cedef654fccc4a4d8 | acbd18db4cc2f85cedef654fccc4a4d8 (1 row)Storing varbits rather than the text form of the MD5 allows us to go from 6510 MB down to 4976 MB on a sample table containing 100 millions rows. We're targeting more that that, so that's a great win down here!
In case you wonder, querying the main index on varbit rather than the one on text for a single result row, the cost of doing the conversion with varbit_to_hex seems to be around 28 µs. We can afford it.
Hope this helps!
Leo Hsu and Regina Obe: Explain Plans PostgreSQL 9.0 - Part 2: JSON and JQuery Plan Viewer
In part 1 of this series on PostgreSQL 9.0 planner outputs, we demonstrated how to render explain plans in YAML, JSON, and XML using the new explain features in PostgreSQL 9.0. In this second part, we'll demonstrate how to build a user interface that allows you input a JSON formatted explain plan and have it render into a printable, navigateable display using JQuery, javascript and a little bit of HTML coding. In part 3 we'll do something similar using XML and XSLT programming.
For those who aren't familiar with JQuery, it is an MIT licensed javascript library that is fairly light weight and allows you to inspect and change html elements with fairly intuitive syntax, has some nice ajax methods and tools for converting xml/json to native objects that can be manipulated. You can check it out at JQUERY.
We are not experts in JQuery, but from what we have used of it, we really like it and the fact the base package is MIT licensed, fairly light weight and lots of plugins available for it are real pluses.
The most difficult thing I think most people find about reading explain plans is that they are upside down; it starts with a conclusion and backtracks how to arrive at it. Humans by nature think about planning steps from start to finish. In order to make an explain plan understandable to mere mortals, we generally display them upside down or having the child-nodes shown left most. We shall follow that approach.
Continue reading "Explain Plans PostgreSQL 9.0 - Part 2: JSON and JQuery Plan Viewer"
Joshua D. Drake: PgWest: 2010 Call for Papers (2nd call)
Here is the announcement for everyone to review, enjoy and click on the CFP link:
Following on the smashing success of PostgreSQL Conference East, PostgreSQL Conference West, The PostgreSQL Conference for Decision Makers, End Users and Developers, is being held at the Sir Francis Drake Hotel in San Francisco from November 2nd through 4th 2010. Please join us in making this the largest PostgreSQL Conference to date!
Thank you to our sponsors: Founding: Command Prompt Diamond: EnterpriseDB
Time line:July 14th: Talk submission opens Sept 5th: Talk submission closes Sept 10th: Speaker notification This year we will be continuing our trend of covering the entire PostgreSQL ecosystem. We would like to see talks and tutorials on the following topics:
* General PostgreSQL: * Administration * Performance * High Availability * Migration * GIS * Integration * Solutions and White Papers * The Stack: * Python/Django/Pylons/TurboGears/Custom * Perl5/Catalyst/Bricolage * Ruby/Rails * Java (PLJava would be great)/Groovy/Grails * Operating System optimization (Linux/FBSD/Solaris/Windows) * Solutions and White Papers
Hubert Lubaczewski: Waiting for 9.1 – concat, concat_ws, right, left, reverse
Andrew Dunstan: Adding enums
I posted a work in progress patch on the -hackers list yesterday. In it current form, it will allow you to do these things:
- ALTER TYPE myenum ADD 'newlabel';
- ALTER TYPE myenum ADD 'newlabel' BEFORE 'existinglabel';
- ALTER TYPE myenum ADD 'newlabel' AFTER 'existinglabel';
The last one is not strictly necessary, and there is some opinion both for and against having it.
If labels are added at the end of the list (the first form does this) then, unless there is Oid wraparound, the enum Oids are in sorted order, which makes sorting and comparison operations very fast, as now. Labels added earlier in the list will sort and compare correctly, but not quite as fast.
Note that I have not provided for dropping existing labels, nor for changing the sort order of existing labels. The reason is that this would involve rewriting tables and indexes, where the current proposal will not require any such thing.
Francisco Figueiredo Jr: Npgsql receives donation of an MSDN subscription!!
It all started when Josh Cooley told me about this post: http://devlicio.us/blogs/tuna_toksoz/archive/2010/07/27/codebetter-devlicio-us-msdn-ultimate-giveaways.aspx
I sent a mail talking about Npgsql and how a VS.Net would help us to add design time support and today I received a very nice mail saying that Npgsql was choosen to receive an MSDN subscription!
I'd like to thank the people who contributed to make this possible: Codebetter Crew: Ben Hall, Ward Bell, James Kovacs. Devlicious Crew: Hadi Hariri, Christopher Bennage, Tim Barcz, Rob Reynolds. Lostechies Crew: Eric Hexter, Jimmy Bogard, Keith Dahlby and Josh Cooley for heads up!
Thank you very much! Your support to OSS projects is awesome!
And stay tuned for better support of Npgsql inside VS.Net :)
