Dennis Ritchie, pioneer of programming, has died

October 13, 2011 Programming, Unix 1 comment , ,

Dennis Ritchie, pioneer of programming, creator of the C programming language and one of the creators of UNIX, has died.

Tim Bray said it best: “It is impossible — absolutely impossible — to overstate the debt my profession owes to Dennis Ritchie. I’ve been living in a world he helped invent for over thirty years.”  If you’ve written a program on any computer in any language since the mid-1970s, you’ve been influenced by the man’s work.

In his honor, the choir will now sing one of my favorite programming songs, “Write in C”.

When I find my code in tons of trouble,
Friends and colleagues come to me,
Speaking words of wisdom:
“Write in C.”

And as the deadline fast approaches,
And bugs are all that I can see,
Somewhere someone whispers:
“Write in C.”

Write in C, write in C,
Write in C, yeah, write in C.
Don’t even think of COBOL
Write in C.

If you’ve just spent 30 hours
Debugging some assembly,
Soon you will be glad to
Write in C.

I used to write a lot of FORTRAN,
For science it worked flawlessly.
Try using it for graphics!
Write in C.

Write in C, write in C,
Write in C, oh, write in C.
BASIC’s not the answer,
Write in C.

Now, let’s move forward with our lives, and remember to always place the opening brace of a block on the same line as the control statement, as he would have wanted it.

Which of my PostgreSQL indexes are getting used most heavily?

October 10, 2011 Open source, Programming No comments , , , ,

Ever since we got the fast new database server with SSDs, I’ve been monitoring which tables are getting heavy traffic and should go live on the SSDs. We have two tablespaces, “fast” which is faster but smaller, and “slow” which is bigger but slower. I’ve been using this query to determine which indexes should live in which tablespace. There are different forms of this query around the web, but I needed to see the tablespaces, too.

    i.indexrelname AS index,
    it.spcname AS index_tablespace,
    i.relname AS table,
    tt.spcname AS table_tablespace,
    pg_size_pretty(pg_relation_size(i.indexrelname::text)) as index_size
FROM pg_stat_all_indexes i
    INNER JOIN pg_class ic ON (i.indexrelid = ic.oid)
    LEFT OUTER JOIN pg_tablespace it ON (ic.reltablespace = it.oid)
    INNER JOIN pg_class tc ON (i.relid = tc.oid)
    LEFT OUTER JOIN pg_tablespace tt ON (tc.reltablespace = tt.oid)
ORDER BY 1 desc, 2 desc, 3 desc

The output looks like this (in \x mode because of the width):

-[ RECORD 1 ]----+----------------------------------------------------
idx_scan         | 395974172
idx_tup_read     | 432974893
idx_tup_fetch    | 426070104
index            | testbook_pkey
index_tablespace | fast
table            | testbook
table_tablespace | fast
index_size       | 289 MB
-[ RECORD 2 ]----+----------------------------------------------------
idx_scan         | 133416135
idx_tup_read     | 133441801
idx_tup_fetch    | 133413399
index            | lists_listid_custid
index_tablespace | fast
table            | lists
table_tablespace | fast
index_size       | 7096 kB
-[ RECORD 3 ]----+----------------------------------------------------
idx_scan         | 50310975
idx_tup_read     | 1286116
idx_tup_fetch    | 742639
index            | listdetail_bkkey_listid_where_ctr2_is_zero
index_tablespace | fast
table            | listdetail
table_tablespace | fast
index_size       | 682 MB

I have one case where a heavily-trafficked table is still staying on the slow tablespace. It’s a log of user login history that is only ever appended to, and is searched only a few times a day. SSDs are great at random reads, but not much faster than physical spindles on sequential writes. Therefore, my login history would not benefit much from moving to the SSD tablespace, and I can allocate that precious space to another table or index instead.

Watch for the surprises

September 23, 2011 Programming, Work life No comments , , ,

Look for and act on the surprises around you ever day.  That’s where we have the most opportunity to make changes for the better.

Yesterday, amidst all the discussion of CERN’s announcement that it seemed to have measured neutrinos moving faster than light, Mark Jason Dominus reminded me of the line often attributed to Asimov that the sound of scientific breakthrough is not “Eureka!” but “That’s funny…”

Back in the late 90s, I had a new boss in the IT department. When he called his first staff meeting, he told us to bring some sort of metrics from our areas of responsibility.  I’d recently created the company’s intranet, so I ran some reports of hits by top-level directory from the Apache logs.  My boss gave my stats a cursory glance, handed them back and asked “So what surprised you?”  The question itself surprised me.  “Now that I look at it,” I answered, “I didn’t expect that the /foo directory would be getting so many hits.”  Then came his crucial follow-up: “So what should we do about it?”

So what’s going on around you that you didn’t expect?  Are you even looking?  How?  Where?

Look at your server log files.  Try to absorb patterns.  Who’s requesting the same non-existent .GIF file every 11 minutes?

Run a profiler on your code. Why is the sort function being called so many times?  Why would a simple string transformation function take so long to execute?

Measure your system performance with a tool like Munin or Cacti.  Look for use spikes.  What happens at 3:30am that thrashes the system?  Why do the cache hits drop to near zero twice a day?

Always keep an eye open for the unexpected behavior, the strange blip, the neutrino that took 60 nanoseconds longer to get there than expected.  Then follow up on what you find.

Notes and comments from Postgres Open 2011

September 22, 2011 Programming 1 comment , , , , , , , ,

Like I posted my Notes and comments from OSCON 2011, here are my notes and comments from Postgres Open 2011. Some of it is narrative, and some of it is just barely-formatted notes. The target here is my own use of what was most interesting and useful for me at work, but I make them public here for anyone who’s interested.

Mastering PostgreSQL Administration

Bruce Momjian

Most of this stuff I knew already, so the notes are short.


  • local — Unix sockets
    • Significantly faster than going through host
  • host — TCP/IP, both SSL and non-SSL
  • hostssl — only SSL
    • Can delay connection startup by 25-40%
  • hostnossl — never SSL

Template databases

  • You can use template databases to make a standard DB for when you create new ones. For example, if you want to always have a certain function or table, put it in template1. This works with extensions and contrib like pg_crypto.

Data directory

  • xxx_fsm files are freespace map
  • pg_xlog is the WAL log directory
  • pg_clog is compressed status log

Config file settings

  • shared_buffers should be 25% of total RAM for dedicated DB servers. Don’t go over 40-50% or machine will starve. Also, overhead of that many buffers is huge.
  • If you can get five minutes of your working set into shared_buffers, you’re golden.
  • Going over a couple hundred connections, it’s worth it to look at a pooler.

Analyzing activity

  • Heavily-used tables
  • Unnecessary indexes
  • Additional indexes
  • Index usage
  • TOAST usage

Identifying slow queries and fixing them

Stephen Frost


  • MergeJoin for small data sets?
    • Check work_mem
  • Nested Loop with a large data set?
    Could be bad row estimates.
  • DELETEs are slow?
    • Make sure you have indexes on foreign keys
  • Harder items
    • Check over your long-running queries
    • Use stored procedures/triggers
    • Partioning larger items

Prepared queries

  • Plan once, run many
  • Not as much info to plan with, plans may be more stable
    • No constraint exclusion, though
  • How to explain/explain analyze

Query Review

  • Don’t do select count(*) on big tables
    • Look at pg_class.reltuples for an estimate
    • Write a trigger that keeps track of the count in a side table
  • ORDER BY and LIMIT can help Pg optimize queries
  • select * can be wasteful by invoking TOAST
  • Use JOIN syntax to make sure you don’t forget the join conditions

CTE Common Table Expressions

    my_view AS ( select * from my_expensive_view),
    my_sums AS ( select sum(my_view.x)
SELECT my_view.*, my_sums.sum FROM my_view, my_sums

PostgreSQL 9.1 Grand Tour

Josh Berkus


  • Synchronous replication
  • Replication tools
  • Per-Column collation
  • wCTEs
  • Serialized Snapshot Isolation
  • Unlogged tables
  • SE-Postgres
  • K-Nearest Neighbor
  • Extensions
  • Other Features

Land of Surreal Queries: Writable CTEs

-- This is in 8.4
WITH deleted_posts AS (
    DELETE FROM posts
    WHERE created < now() - '6 months'::INTERVAL
SELECT user_id, count(*)
FROM deleted_posts

In 9.1, you can do UPDATE on that.

Unlogged tables

Sometimes you have data where if something happens, you don’t care. Unlogged tables are much faster, but you risk data loss.




Handling for FDW, which is Foreign Data Wrappers.


  • Valid-on-creation FKs
  • Extensible ENUMs
  • Triggers on Views
  • Reduced NUMERIC size
  • ALTER TYPE without rewrite
  • pg_dump directory format as a precursor for parralel pg_dump

Monitoring the heck out of your database

Josh Williams, End Point

What are we looking for?

  • Performance of the system
  • Application throughput
  • Is it dead or about to die?

“They don’t care if the system’s on fire so long as it’s making money.”

Monitoring Pg

  • Log monitoring for errors
  • Log monitoring for query performance
  • Control files / External commands
  • Statistics from the DB itself

Monitoring error conditions

  • ERROR: Division by zero
  • FATAL: password authentication
  • PANIC: could not write to file pg_xlog

Quick discussion of tail_n_mail

Log monitoring for query performance


Most of the rest of the talk was about check_postgres, which I already know all about. A few cool to-do items came out of it.o

  • Look at tracking –dbstats in cacti
  • Add the –noidle to –action=backends to get a better sense of the counts.

Honey, I Shrunk the Database

Vanessa Hurst

Why shrink?

  • Accuracy
    • You don’t know how your app will behave in production unless you use real data.
  • Freshness
    • New data should be available regularly
    • Full database refreshes should be timely
  • Resource Limitation
    • Staging and developer machines cannot handle production load
  • Data protection
    • Limit spread of sensitive data

Case study: Paperless Post

  • Requiremenets
    • Freshness – Daily on command for non-developers
    • Shrinkage – slices & mutations
  • Resources
    • Source — extra disk space, RAM and CPUS
    • Destination — Limited, often entirely un-optimizied
    • Development — constrained DBA resources

Shrunk strategies

  • Copies
    • Restored backups or live replicas
  • Slices
    • Select portions of live data
  • Mutations
    • Sanitized or anonymized data
  • Assumptions
    • Usually for testing


  • Vertical slice
    • Difficult to obtatin a valid, useful subset of data
    • Example: Include some tables, exclude others
  • Horizontal slice
    • Difficult to write & maintain
    • Example: SQL or application code to determine subset of data
  • Pg tools — vertical slice
    • pg_dump
    • include data only
      • Include table schema only
      • Select tables
      • Select schemas
      • Exclude schemas

Postgres Tuning

Greg Smith

Tuning is a lifecycle.

Deploy / Monitor / Tune / Design

You may have a great design up front, but then after a while you have more data than you did before, so you have to redesign.

Survival basics

  • Monitor before there’s a problem
  • Document healthy activity
  • Watch performance trends
    • “The site is bad. Is it just today, or has it been getting worse over time?”
  • Good change control: Minimize changes, document heavily
    • Keep your config files in version control like any other part of your app.
  • Log bad activity
  • Capture details during a crisis

Monitoring and trending

  • Alerting and trending
  • Alerts: Nagios + check_postgres


  • Watch database and operating system on the same timeline
  • Munin: Easy, complete, heavy
    • Generates more traffic, may not scale up to hundreds of nodes
  • Cacti: Lighter, but missing key views
    • Not Greg’s first choice
    • Harder to get started with the Postgres plugins
    • Missing key views, which he’ll cover later
  • Various open-sourc and proprietary solutions

Munin: Load average

  • Load average = how many processes are active and trying to do something.
  • Load average is sensitive to sample rate. Short-term spikes may disappear when seen at a long-term scale.

Munin: CPU usage

  • Best view of CPU usage of the monitoring tools.
  • If your system is running a lot of system activity, often for connection costs, look at a pooler like pg_bouncer.

Munin: Connection distribution

  • Greg wrote this in Cacti because it’s so useful.
  • Graph shows a Tomcat app that has built-in connection pool.
  • The graph shown isn’t actually a problem.
  • Better to have a bunch of idle connections because of a pooler, rather than getting hammered by a thousand unpooled connections.

Munin: Database shared_buffers usage

  • If shared_buffers goes up without the same spike in disk IO, it must be in the OS’s cache.
  • If shared_buffers is bigger than 8GB, it can be a negative, rather than letting the OS do the buffering. derby’s is at 5GB.
  • There is some overlap between Pg’s buffers and the OS’s, but Pg tries to minimize this. Seq scan and VACUUM won’t clear out shared_buffers, for example.
  • There’s nothing wrong with using the OS cache.
  • SSDs are great for random-read workloads. If the drive doesn’t know to sync the data, and is not honest with the OS about it, you can have corrupted data.
  • SSDs best use is for indexes.

Munin: Workload distribution

  • Shows what kind of ops are done on tuples.
  • Sequential scans may not necessarily be bad. Small fact tables that get cached are sequentially scanned, but that’s OK because they’re all in RAM.

Munin: Long queries/transactions

  • Watch for oldest transaction. Open transactions block cleanup activities like VACUUM.
  • Open transaction longer than X amount of time is Nagios-worthy.

Using pgbench

  • pgbench can do more than just run against the pgbench database. It can simulate any workload. It has its own little scripting language in it.

OS monitoring

  • top -c
  • htop
  • vmstat 1
  • iostat -mx 5
  • watch

Long queries

What are 5 long running queries?

psql -x -c 'select now() - query_start as runtime, current_query from pg_stat_activity order by 1 desc limit 5'

It’s safe to kill query processes, but not to kill -9 them.

Argument tuning

  • Start monitoring your long-running queries.
  • Run an EXPLAIN ANALYZE on slow queries showing up in the logs.
  • Sort to disk is using 2700K, so we update work_mem to 4MB. However, that still doesn’t fix it. Memory use is bigger in RAM than on disk.
  • If you’re reading more than 20% of the rows, Pg will switch to a sequential scan, because random I/O is so slow.
  • Indexing a boolean rarely makes sense.

The dashboard report

  • Sometimes you want to cache your results and not even worry about the query speed.
  • Use window functions for ranking.

The OFFSET 0 hack

  • Adding an OFFSET 0 in a subquery forced a certain JOIN order on the subquery. Something about making the subquery know that it is limited in some way.

Keep seldom-used settings handy in your configuration files

September 21, 2011 Programming No comments , , ,

PostgreSQL 9.0 High Performance, by Greg Smith

I’m upgrading two work databases from PostgreSQL 9.0 to 9.1, and that means some test bulk loads. (Yes, I know about pg_upgrade, but we’re doing reloads for other reasons.) Greg Smith’s fantastic PostgreSQL 9.0 High Performance is a great help in everything related to Postgres performance, including a couple of pages on how to speed up bulk loads.

Greg’s advice is to tweak some parameters that you wouldn’t use in production, but can use for the duration of the bulk load. I wanted to make it easy to flip between standard configuration and bulk load config as I did my testing, so I aggregated the bits that were relevant to my config and stuck them at the end of my postgresql.conf. I left them commented out.

# Settings from PostgreSQL 9.0 High Performance, p. 401
# maintenance_work_mem = 1GB
# checkpoint_segments = 150
# synchronous_commit = off
# fsync = off

Since they’re at the end, when I uncomment them and restart the server, they override the settings earlier in the file. Then I can do my load, and then comment them out again, without having to go all over the file to find them.

Note that I made sure to note where I got the settings from, for future reference. I don’t expect to have to do a bulk load again for at least a year, and I’ll forget by them.

Notes and comments from OSCON 2011

September 20, 2011 Open source, Programming 1 comment , , , , , , , , , , , , , , , , , , , , , , ,

Finally, two months after OSCON 2011, here’s a dump of my notes from the more tech-heavy sessions I attended. Some of it is narrative, and some of it is just barely-formatted notes. The target here is my own use of what was most interesting and useful for me at work, but I make them public here for anyone who’s interested.

This post is long and ugly, so here’s a table of contents:

Back to table of contents

API Design Anti-patterns, by Alex Martelli of Google


Martelli’s talk was about providing public-facing web APIs, not code-level APIs. He said that public-facing websites providing must provide an API. “They’re going to scrape to get the data if you don’t,” so you might as well create an API that is less load on your site.

API design anti-patterns

  • Worst issue: no API
  • 2nd-worst API design issue: no design
  • Too many APIs spoil the broth
  • “fear of commitment”
  • Inconsistency in APIs
  • Extremes: No balance between concerns
    • what languages to support?
      • excessive language dependence or independence
    • what about standard protocols/formats?
  • Inadequate debugging, error messages, documentation

Everyone wants an API. Take a look at the most common questions on StackOverflow. They’re about spidering and scraping websites, or simulating keystroke and mouse gestures. Sometimes these questions are about system testing, but most of them point to missing APIs for a site. The APIs may actually be there, or they may be undocumented.

You should be offering an API, and it should be easy. You are in the shoes of your users. You need this API just like they do. Even a simple, weak API is better than none. Follow the path of least resistance: REST and JSON.

Document your API, or at least consider examples which may be easier than text to programmers. Keep your docs and especially the code examples in them tested. Use doctest or a similar system for testing the documentation.

Back to table of contents

The Conway Channel

The Conway Channel is Damian Conway’s annual discussion of new tools that he’s created in the past year.

Regexp::Grammars is all sorts of parsing stuff for Perl 5.10 regexes, and it went entirely over my head.

IO::Prompter is an updated version of IO::Prompt which is pretty cool already. It only works Perl with 5.10+. IO::Prompt makes it easy to prompt the user for input, and the new IO::Prompter adds more options and data validation.

# Get a number
my $n = prompt -num 'Enter a number';

# Get a password with asterisks
my $passwd = prompt 'Enter your password', -echo=>'*';

# Menu with nested options
my $selection
    = prompt 'Choose wisely...', -menu => {
            wealth => [ 'moderate', 'vast', 'incalculable' ],
            health => [ 'hale', 'hearty', 'rude' ],
            wisdom => [ 'cosmic', 'folk' ],
        }, '>';

Data::Show is like Data::Dumper but also shows helpful debug tips like variable names and origin of the statement. It doesn’t try to serialize your output like Data::Dumper does, which is a good thing. Data::Show is now my default data debug tool.

my $person = {
    name => 'Quinn',
    preferred_games => {
        wii => 'Mario Party 8',
        board => 'Life: Spongebob Squarepants Edition',
    aliases => [ 'Shmoo', 'Monkeybutt' ],
    greeter => sub { my $name = shift; say "Hello $name" },
show $person;

======(  $person  )====================[ '', line 20 ]======

      aliases => ["Shmoo", "Monkeybutt"],
      greeter => sub { ... },
      name => "Quinn",
      preferred_games => {
        board => "Life: Spongebob Squarepants Edition",
        wii => "Mario Party 8",

Acme::Crap is a joke module that adds a crap function that also lets you use exclamation points to show severity of the error.

use Acme::Crap;

crap    'This broke';
crap!   'This other thing broke';
crap!!  'A third thing broke';
crap!!! 'A four thing broke';

This broke at line 10
This other thing broke! at line 11
A Third Thing Broke!! at line 12
A FOUR THING BROKE!!! at line 13

As with most of Damian’s joke modules, you’re not likely to use this in a real program, but to learn from how it works internally. In Acme::Crap’s case, the lesson is in overloading the ! operator.

Back to table of contents

Cornac, the PHP static analysis tool

Cornac is a static analysis tool for PHP by Damien Seguy. A cornac is someone who drives an elephant.

Cornac is both static audit and an application inventory:

  • Static audit
    • Process large quantities of code
    • Process the same code over and over
    • Depends on auditor expert level
    • Automates searches
    • Make search systematic
    • Produces false positives
  • Application inventory
    • Taking a global look at the appliction
    • List of structures names
    • List of used functionalities

Migrating to PHP 5.3

  • Incomplete evolutions
  • Obsolete functions
  • Reference handling
  • References with the “new” operator
  • mktime() doesn’t take 7 parameters any more

Gives a list of extensions. Maybe Perl::Critic should include an inventory of modules used? Elliot points out that you can give perlcritic the --statistics argument for some similar stats.

Found three different classes with the same name, but three different source files.

Summary of classes and properties makes it easy to see inconsistencies.

Has an inclusion network, like my homemade xreq tool, but graphical:

  • include, include_once, require and require_once
  • Ignores variables
  • Circles represent files, arrows represent inclusions

Most interesting of all was finding out about two other PHP static analysis tools: PMD (PHP Mess Detector) and PHP_Depends.

Back to table of contents

Using Jenkins

Andrew Bayer, @abayer

Andrew was clearly aiming at people who had many Jenkins instances, which we certainly won’t be at work, but he had lots of good solid details to discuss.

#1 Use plugins productively

  • Search for plugins to fit your needs. Organized by category on the wiki and in the update center.
  • If you’re not using a plugin any more, disable it.
    • Save memory and reduce clutter on job configuration pages.
  • Favorite plugins:
    • JobConfigHistory: See the difference between job configuations now and then. With authentication enabled, you get to see who changed it and how.
    • Disk Usage
    • Build Timeout: Can set a timeout to abort a build if it takes longer than a set time.
    • Email-ext: Better control of when emails get sent and who they get sent to. Format the emails ythe way you want using infromation from your builds.
    • Parametereized Trigger: Kick off downstream builds with information from upstream.

#2 Standardize your slaves

If you’ve got more than a couple builds, you’ve probably got multiple slaves. Ad hoc slaves my be convenient, but you’re in for trouble if they have different environments.

Use Puppet or Chef to standardize what goes on the machines. Have Jenkins or your job install the tools. Or, you can use a VM to spawn your slaves.

Whatever method you choose, just make sure your slaves are consistent.

Don’t build on master. Always build on slaves.

  • No conflict on memory/CPU/IO between master and builds.
  • Easier to add slaves than to beef up master.
  • Mixing build on master with builds on slaves means inconsistencies between builds.

#3 Use incremental builds if possible

If your build takes 4-8 hours, you can’t do real CI on every change.

If you’re integrating with code review or other pre-tested commit processes, you want to verify changes as fast as possible.

Incremental builds are complementary to full builds, not replacements.

#4 Integrate with other tools

  • Pre-tested commits with Gerrit (git code review)
  • Sonar
    • Code metrics, code coverage, unit test results, etc, all in one place
    • Great graphs, charts, etc — fantastic manager candy!
  • Chat/IM notifications

#5 Break up bloat

  • Too many builds makes it hard to navigate the Jenkins UI and hurt performance.
  • Builds that try to do too much take too long and make it impossible to restart a build process partway through.
  • Don’t be afraid to spread your jobs across multiple Jenkins masters.
  • Split jobs in a logical way — separate instances per group, per product, per physical location, etc.

#6 Stick with stable releases

  • Jenkins releases weekly. Rapid turnaround for features & fixes, but not 100% stability for every release.
  • Plugins release whenever the developers want to.
  • Update center makes it easy to upgrade core & plugins, but that’s not always best.
  • Use the Jenkins core LTS releases, every 3 months or so.

#7 Join the community

Back to table of contents

Learning jQuery

I was only in the jQuery talk for a little bit, and I was just trying to get a high-level feel for it. Still, some of the notes made things much clearer to my reading of jQuery code.

$ is a function, the “bling” function. It is the dispatcher for everything in jQuery.

// Sets the alternate rows to be odd
$('table tr:nth-child(odd)').addClass('odd');

jQuery should get loaded last on your page. Prototype uses the $ function, and will eat jQuery’s $. But jQuery won’t stomp on the Prototype $ function.

Put your Javascript last on the page, because the <script> tag blocks the rendering of the web page.

Back to table of contents

MVCC in Postgres and how to minimize the downsides

Bruce Momjian, Presentations

This turned out to be 100% theory and no actual “minimize the downsides”. It was good to see illustrations of how MVCC works, but there was nothing I could use directly.

Why learn MVCC?

  • Predict concurrent query behavior
  • Manage MVCC performance effects
  • Understand storage space reuse

Core principle: Readers never block writers, and writers never block readers.

(Chart below is an attempt at reproducing his charts, which was a pointless exercise. Better to look at his presentation directly.)

Cre 40
Exp        INSERT

Cre 40
Exp 47     DELETE

Cre 64
Exp 78     old (delete)
Cre 78
Exp        new (insert)

Four different numbers on each table drive MVCC:

  • xmin: creation trx set by INSERT and UPDATE
  • xmax: expire transaction number, set by UPDATE and DELETE, also used for explicit row locks
  • cmin/cmax: used to identify the command number that created or expired the tupleid; also used to store combo command IDs when the tuple is created and expired in the same trnasaction, and for explicit row locks.

Back to table of contents

(Re)Developing Perl 5 Modules in Perl 6

Damian Conway

Perl isn’t a programming language. It’s a life support system for CPAN.

Damian ported some of his Perl 5 modules to Perl 6 as a learning exercise.


Makes a block of code not get executed, so it gets syntax checked but not run.

# Usage example
use Acme::Don't;

don't { blah(); blah(); blah();

Perl 6 implementation

module Acme::Don't;
use v6;
sub don't (&) is export {}


  • No homonyms in Perl 6
  • No cargo-cult vestigials
  • Fewer implicit behaviours
  • A little more typing required
  • Still obviously Perlish


Modifies files in place.

  • Parameter lists really help
  • Smarter open() helps too
  • Roles let you mix in behviours
  • A lot less typing required
  • Mainly because of better builtins


  • Perl 6’s macros kick source filters’ butt
  • Mutate grammar, not source
  • Still room for cleverness
  • No Perl 6 implementation yet has full macro support
  • No Perl 6 implementation yet has STD grammar

Perl 6 is solid enough now. Start thinking about porting modules.

Back to table of contents

PostgreSQL 9.1 overview

Selena Deckelmann


New replication tools

SE-Linux security label support. Extends SE stuff into the database to the column level.

Writable CTE: Common Table Expressions:
A temporary table or VIEW that exists just for a single query. There have been CTEs since 8.4, but not writable ones until now.

This query deletes old posts, and returns a summary of what was deleted by user_id.

WITH deleted_posts AS (
    DELETE FROM posts
    WHERE created < now() - '6 months'::INTERVAL
SELECT user_id, count(*) FROM deleted_posts group BY 1;

Per-column collation orders

Extensions: Postgres-specfiic package management for contrib/, PgFoundry projects, tools. Like Oracle “packages” or CPAN modules. The PGXN is the Postgres Extension Network.

K-nearest Neighbor Indexes: Geographical nearness helper

Unlogged tables: Only living in memory, for tables where it’s OK if they disappear after a crash. Much faster, but potentially ephemeral.

Serializable snapshot isolation: No more “select for update”. No more blocking on table locks.

Foreign data wrappers

  • Remote datasource access
  • Initially implemented text, CSV data sources
  • Underway currently: Oracle & MySQL sources
  • Good for imports and things that would otherwise fail if you just used COPY
  • Nothing other than sequential scans are possible.
  • Expect tons of FDWs to be implemented once we get 9.1 to production

Back to table of contents

Pro PostgreSQL 9

Robert Treat, OmniTI, who are basically scalability consultants is other stuff around postgres. is for 9.1+ extensions

Use package management rather than build from source

  • Consistent
  • Standardized
  • Simple


  • Production level work, use 9.0
  • Any project not due to launch for 3 months from today, use 9.1

pg_controldata gives you all sorts of awesome details

recovery.conf is in the PGDATA dir for standby machines

pg_clog, pg_log and pg_xlog are the main data logging files.
You can delete under pg_log and that’s OK.

Trust contrib modules more than your own code from scratch. Try
to install contrib modules into their own schemas.


  • work_mem
    • How much memory for each individual query
    • Mostly for large analytical queries
    • OLTP is probably fine with the defaults
    • 2M is good for most people
  • checkpoint_segments
    • Number of WAL files emitted before a checkpoint.
    • Smaller = more flushing to disk
    • Minimum of 10, more like 30
  • maintenance_work_mem
    • 1G is probably fine
  • max_prepared_transactions
    • Is NOT prepared statements
    • Set to zero unless you are on two-phase commit
  • wal_buffers
    • Always set to 16M and be done with it.
  • checkpoint_completion_target
    • default is .5
    • Set to .9. Avoid hard checkpoint spikes at the expense of some overall IO being higher.

Hardware for Postgres

  • Multiple CPUs work wonders, up to 32 processors. See
  • Put WAL on its own disk, RAID 1
  • Put DATA on its own disk, RAID 10
  • More spindles is good
  • More controllers even gooder.
  • Go with SSDs over more spindles.
  • No NFS, no RAID 5

Don’t replace multiple spindles with a single SSD. You still want redundancy.


Logical backups

  • slow to create and restore
  • “pure”, no system-level corruption
  • susceptible to database-level corruption
  • pg_dump is your friend, and pg_dumpall for global settings

Physical backups

  • replication/failover machine
  • tarball (pitr)
  • filesystem snapshots (pitr)


  • Basic idea is to copy all database files and relevant xlogs
  • Use multiple machines if able
  • Use rsync if able
  • Copy the slave if able

Back to table of contents

Perl Unicode Essentials, Tom Christiansen

Perl has best Unicode suport of any language.

Unicode::Tussle is a bundle of Unicode tools tchrist wrote.

5.12 is minimal for using unicode_strings feature. 5.14 is optimal.


    use strict;
    use warnings;
    use warnings qw( FATAL utf8 ); # Fatalize utf8

21 bits for a Unicode character.

Enable named cahracters via \N{CHARNAME}

    use charnames qw( :full );

If you have a DATA handle, you must explicitly set its encoding. If you want this to be UTF-8, then say:

    binmode( DATA, ':encoding(UTF-8)' );

Tom’s programs start this way.

    use v5.14;
    use utf8;
    use strict;
    use autodie;
    use warnings;
    use warnings  qw< FATAL utf8 >;
    use open      qw< :std :encoding(UTF-8) >;
    use charnames qw< :full >;
    use feature   qw< unicode_strings >;
Explicitly <code>close</code> your files.  Implicit <code>close</code> never checks for errors.
Up until 5.12, there was &quot;The Unicode Bug&quot;.  The fix that makes it work right is
    use feature "unicode_strings";
Key core pragmas for Unicode are: v5.14, utf8, feature, charnames, open, re&quot;/flags&quot;, encoding::warnings.
Stay away from bytes, encoding and locale.
For the programmer, it's easier to do NFD (&quot;o\x{304}\x{303}&quot;) instead of NFC (&quot;\x{22D}&quot;)
NFD is required to, for example, match <code>/^o/</code> to know that something starts with &quot;o&quot;.
String comparisons on Unicode are pretty much always the wrong way to go.  That includes <code>eq</code>, <code>ne</code>, <code>le</code>, <code>gt</code>, <code>cmp</code>, <code>sort</code>, etc.  Use Unicode::Collate.  Get a taste of it by playing with <em class="file">ucsort</em> utility.

401 passwords Twitter won’t let you use

July 25, 2011 Internet, Programming 2 comments

Twitter has a list of 401 passwords that they disallow, not because of content, but because of how commonly used they are. A common password is easier for a bad guy to guess. None of these are passwords you’d want to use anyway, because they’re so easily guessable by a simple dictionary attack. Bad guys have lists like this anyway, and Twitter is trying to make the most common and unsafe passwords unusable. I wonder how many people would use “111111” as a Twitter password if allowed.

The list is embedded in the JavaScript of the website. Search in the page source for “BANNED_PASSWORDS”. The list is ROT13-encoded, but with Perl that’s trivial to decode:

$str =~ tr[a-mn-z][n-za-m];

The list contains a fair amount of profanity and sexual language below, as you might expect, and geek words like “ncc1701“, “thx1138” and “rush2112“, but also plenty of sports teams like “steelers”, “broncos” and “arsenal”. Many common names like “jennifer” and “michael” show up as well. Note that shorter passwords like “asdf” aren’t included because Twitter requires a minimum of six characters for passwords anyway.

As I write this today, there are 401 passwords in the list, which is 31 more than were reported in 2009. It seems from that article that they weren’t ROT13ed at the time.

The full list (slightly expurgated) follows:


I’ll be presenting “Just Enough C for Open Source Projects” July 19th at Software Craftsmanship McHenry

July 1, 2011 Open source, Programming 2 comments ,

For programmers raised on high-level languages like Perl, Java and PHP, working on a C project can be daunting. Still, many open source projects work at a low-level in C to take advantage of the power and speed of working close to the machine. Whether it’s Perl, Postgres or Linux, C is what makes it run.

This session will provide a high-level overview of C, aimed specifically at the programmer wanting to get involved in a C-based open source project. We’ll cover:

  • Nothing in C is DWIM (“Do what I mean”)
  • Numeric types, strings and structures
  • Memory management: the heap, the stack, and pointers
  • Using the preprocessor
  • Understanding compiler warnings
  • Memory checking with valgrind
  • How to navigate a large C-based open source project (ctags, etc)
  • Security, or, how the Bad Guys smash the stack

Sign up at

What schools should be teaching IT students

April 18, 2010 Career, Programming, Work life No comments

This past Friday, I spoke at POSSCON on what schools should be teaching IT students. Here are the slides from the presentation.

Do I need to learn Microsoft technologies?

May 8, 2009 Ask Andy, Career, Programming 17 comments

In a thread on Stack Overflow, a reader named Andrew finishing his undergrad degree asked:

I notice that the vast majority of companies I’m looking at are strictly Microsoft users, from windows to visual studio. Am I going to be at a disadvantage as most of my experience is unix/linux
development based?

My response included:

Whether or not “most jobs” are using MS technologies, would you WANT to work with MS technologies? If you went and boned up on your .NET and Visual C++ and had to use Windows all day, would that be the kind of job you wanted? If not, then it doesn’t matter if that’s what “most jobs” call for, because those aren’t the jobs for you.

I was taken to task by a reader named Ben Collins (not Ben Collins-Sussman of Google) who said:

I think this is stupendously bad advice. Of course you should bone up on Microsoft technologies. The chances of you making it through a 40-year career in technology without having to work with MS stuff is slim to none.

Ben’s right, you’re likely to have to use Microsoft technologies, if that’s how you want your career to take you. What I think we’re seeing here is the difference in viewpoints between someone like Ben who seems to think primarily in terms of maximum salary and maximum employability, and someone who thinks about the importance of loving what it is that you do for a job.
There’s nothing wrong with wanting to be employable. Nobody who knows Visual Studio or Java is going to have too much of a hard time finding jobs that need those skills. Then again, I flipped burgers at McDonald’s for three years, and McDonald’s is always looking for people, so I’m pretty employable there, too.
To those of us who look at our jobs as more than just a way to make money, it makes little sense to ask about what “most companies” do. We’re more concerned with the joy of working in our chosen part of the tech industry. I’d learn Visual C++ and try to find some joy in working in Windows if it was the only way to support my family, but that’s not the case.
To the fresh college graduates out there, I ask you to not put yourself in the situation where you’re concerned with what is going to give you the maximum salary, or the maximum number of potential job openings. Instead, look at what you want to do, what sparks the excitement in your heart. Optimize for the maximum amount of love for your job, especially as you’re just starting out.
For those grizzled veterans out there who slog through the trenches, working on projects that don’t bring them joy, I ask you to reconsider your career choices. Imagine you’re fresh out of school. What would you love to be doing? Figure out what that is, and work toward it, if only in small steps.
You spend more waking hours on your job than with your spouse. Optimize your career to bring you as much happiness as possible. Life is too short to work in a job you don’t love.