Watch for the surprises

September 23, 2011 Programming, Work life No comments , , ,

Look for and act on the surprises around you ever day.  That’s where we have the most opportunity to make changes for the better.

Yesterday, amidst all the discussion of CERN’s announcement that it seemed to have measured neutrinos moving faster than light, Mark Jason Dominus reminded me of the line often attributed to Asimov that the sound of scientific breakthrough is not “Eureka!” but “That’s funny…”

Back in the late 90s, I had a new boss in the IT department. When he called his first staff meeting, he told us to bring some sort of metrics from our areas of responsibility.  I’d recently created the company’s intranet, so I ran some reports of hits by top-level directory from the Apache logs.  My boss gave my stats a cursory glance, handed them back and asked “So what surprised you?”  The question itself surprised me.  “Now that I look at it,” I answered, “I didn’t expect that the /foo directory would be getting so many hits.”  Then came his crucial follow-up: “So what should we do about it?”

So what’s going on around you that you didn’t expect?  Are you even looking?  How?  Where?

Look at your server log files.  Try to absorb patterns.  Who’s requesting the same non-existent .GIF file every 11 minutes?

Run a profiler on your code. Why is the sort function being called so many times?  Why would a simple string transformation function take so long to execute?

Measure your system performance with a tool like Munin or Cacti.  Look for use spikes.  What happens at 3:30am that thrashes the system?  Why do the cache hits drop to near zero twice a day?

Always keep an eye open for the unexpected behavior, the strange blip, the neutrino that took 60 nanoseconds longer to get there than expected.  Then follow up on what you find.

Notes and comments from Postgres Open 2011

September 22, 2011 Programming 1 comment , , , , , , , ,

Like I posted my Notes and comments from OSCON 2011, here are my notes and comments from Postgres Open 2011. Some of it is narrative, and some of it is just barely-formatted notes. The target here is my own use of what was most interesting and useful for me at work, but I make them public here for anyone who’s interested.

Mastering PostgreSQL Administration

Bruce Momjian

http://postgresopen.org/2011/schedule/presentations/89/

http://momjian.us/presentation

http://momjian.us/main/writings/pgsql/administration.pdf

Most of this stuff I knew already, so the notes are short.

Connections

  • local — Unix sockets
    • Significantly faster than going through host
  • host — TCP/IP, both SSL and non-SSL
  • hostssl — only SSL
    • Can delay connection startup by 25-40%
  • hostnossl — never SSL

Template databases

  • You can use template databases to make a standard DB for when you create new ones. For example, if you want to always have a certain function or table, put it in template1. This works with extensions and contrib like pg_crypto.

Data directory

  • xxx_fsm files are freespace map
  • pg_xlog is the WAL log directory
  • pg_clog is compressed status log

Config file settings

  • shared_buffers should be 25% of total RAM for dedicated DB servers. Don’t go over 40-50% or machine will starve. Also, overhead of that many buffers is huge.
  • If you can get five minutes of your working set into shared_buffers, you’re golden.
  • Going over a couple hundred connections, it’s worth it to look at a pooler.

Analyzing activity

  • Heavily-used tables
  • Unnecessary indexes
  • Additional indexes
  • Index usage
  • TOAST usage

Identifying slow queries and fixing them

Stephen Frost

http://postgresopen.org/2011/schedule/presentations/71/

Fixing

  • MergeJoin for small data sets?
    • Check work_mem
  • Nested Loop with a large data set?
    Could be bad row estimates.
  • DELETEs are slow?
    • Make sure you have indexes on foreign keys
  • Harder items
    • Check over your long-running queries
    • Use stored procedures/triggers
    • Partioning larger items

Prepared queries

  • Plan once, run many
  • Not as much info to plan with, plans may be more stable
    • No constraint exclusion, though
  • How to explain/explain analyze

Query Review

  • Don’t do select count(*) on big tables
    • Look at pg_class.reltuples for an estimate
    • Write a trigger that keeps track of the count in a side table
  • ORDER BY and LIMIT can help Pg optimize queries
  • select * can be wasteful by invoking TOAST
  • Use JOIN syntax to make sure you don’t forget the join conditions

CTE Common Table Expressions

WITH (
    my_view AS ( select * from my_expensive_view),
    my_sums AS ( select sum(my_view.x)
)
SELECT my_view.*, my_sums.sum FROM my_view, my_sums

PostgreSQL 9.1 Grand Tour

Josh Berkus

http://www.pgexperts.com/document.html?id=52

Overview

  • Synchronous replication
  • Replication tools
  • Per-Column collation
  • wCTEs
  • Serialized Snapshot Isolation
  • Unlogged tables
  • SE-Postgres
  • K-Nearest Neighbor
  • SQL/MED
  • Extensions
  • Other Features

Land of Surreal Queries: Writable CTEs

-- This is in 8.4
WITH deleted_posts AS (
    DELETE FROM posts
    WHERE created < now() - '6 months'::INTERVAL
    RETURNING *
)
SELECT user_id, count(*)
FROM deleted_posts
GROUP BY 1

In 9.1, you can do UPDATE on that.

Unlogged tables

Sometimes you have data where if something happens, you don’t care. Unlogged tables are much faster, but you risk data loss.

Extensions

CREATE EXTENSION IF NOT EXISTS citext WITH SCHEMA ext;

SQL-MED

Handling for FDW, which is Foreign Data Wrappers.

Others

  • Valid-on-creation FKs
  • Extensible ENUMs
  • Triggers on Views
  • Reduced NUMERIC size
  • ALTER TYPE without rewrite
  • pg_dump directory format as a precursor for parralel pg_dump

Monitoring the heck out of your database

Josh Williams, End Point

http://joshwilliams.name/talks/monitoring/

What are we looking for?

  • Performance of the system
  • Application throughput
  • Is it dead or about to die?

“They don’t care if the system’s on fire so long as it’s making money.”

Monitoring Pg

  • Log monitoring for errors
  • Log monitoring for query performance
  • Control files / External commands
  • Statistics from the DB itself

Monitoring error conditions

  • ERROR: Division by zero
  • FATAL: password authentication
  • PANIC: could not write to file pg_xlog

Quick discussion of tail_n_mail

Log monitoring for query performance

check_postgres

Most of the rest of the talk was about check_postgres, which I already know all about. A few cool to-do items came out of it.o

  • Look at tracking –dbstats in cacti
  • Add the –noidle to –action=backends to get a better sense of the counts.

Honey, I Shrunk the Database

Vanessa Hurst

http://postgresopen.org/2011/speaker/profile/36/

http://www.slideshare.net/DBNess/honey-i-shrunk-the-database-9273383

Why shrink?

  • Accuracy
    • You don’t know how your app will behave in production unless you use real data.
  • Freshness
    • New data should be available regularly
    • Full database refreshes should be timely
  • Resource Limitation
    • Staging and developer machines cannot handle production load
  • Data protection
    • Limit spread of sensitive data

Case study: Paperless Post

  • Requiremenets
    • Freshness – Daily on command for non-developers
    • Shrinkage – slices & mutations
  • Resources
    • Source — extra disk space, RAM and CPUS
    • Destination — Limited, often entirely un-optimizied
    • Development — constrained DBA resources

Shrunk strategies

  • Copies
    • Restored backups or live replicas
  • Slices
    • Select portions of live data
  • Mutations
    • Sanitized or anonymized data
  • Assumptions
    • Usually for testing

Slices

  • Vertical slice
    • Difficult to obtatin a valid, useful subset of data
    • Example: Include some tables, exclude others
  • Horizontal slice
    • Difficult to write & maintain
    • Example: SQL or application code to determine subset of data
  • Pg tools — vertical slice
    • pg_dump
    • include data only
      • Include table schema only
      • Select tables
      • Select schemas
      • Exclude schemas

Postgres Tuning

Greg Smith

Tuning is a lifecycle.

Deploy / Monitor / Tune / Design

You may have a great design up front, but then after a while you have more data than you did before, so you have to redesign.

Survival basics

  • Monitor before there’s a problem
  • Document healthy activity
  • Watch performance trends
    • “The site is bad. Is it just today, or has it been getting worse over time?”
  • Good change control: Minimize changes, document heavily
    • Keep your config files in version control like any other part of your app.
  • Log bad activity
  • Capture details during a crisis

Monitoring and trending

  • Alerting and trending
  • Alerts: Nagios + check_postgres

Trending

  • Watch database and operating system on the same timeline
  • Munin: Easy, complete, heavy
    • Generates more traffic, may not scale up to hundreds of nodes
  • Cacti: Lighter, but missing key views
    • Not Greg’s first choice
    • Harder to get started with the Postgres plugins
    • Missing key views, which he’ll cover later
  • Various open-sourc and proprietary solutions

Munin: Load average

  • Load average = how many processes are active and trying to do something.
  • Load average is sensitive to sample rate. Short-term spikes may disappear when seen at a long-term scale.

Munin: CPU usage

  • Best view of CPU usage of the monitoring tools.
  • If your system is running a lot of system activity, often for connection costs, look at a pooler like pg_bouncer.

Munin: Connection distribution

  • Greg wrote this in Cacti because it’s so useful.
  • Graph shows a Tomcat app that has built-in connection pool.
  • The graph shown isn’t actually a problem.
  • Better to have a bunch of idle connections because of a pooler, rather than getting hammered by a thousand unpooled connections.

Munin: Database shared_buffers usage

  • If shared_buffers goes up without the same spike in disk IO, it must be in the OS’s cache.
  • If shared_buffers is bigger than 8GB, it can be a negative, rather than letting the OS do the buffering. derby’s is at 5GB.
  • There is some overlap between Pg’s buffers and the OS’s, but Pg tries to minimize this. Seq scan and VACUUM won’t clear out shared_buffers, for example.
  • There’s nothing wrong with using the OS cache.
  • SSDs are great for random-read workloads. If the drive doesn’t know to sync the data, and is not honest with the OS about it, you can have corrupted data.
  • SSDs best use is for indexes.

Munin: Workload distribution

  • Shows what kind of ops are done on tuples.
  • Sequential scans may not necessarily be bad. Small fact tables that get cached are sequentially scanned, but that’s OK because they’re all in RAM.

Munin: Long queries/transactions

  • Watch for oldest transaction. Open transactions block cleanup activities like VACUUM.
  • Open transaction longer than X amount of time is Nagios-worthy.

Using pgbench

  • pgbench can do more than just run against the pgbench database. It can simulate any workload. It has its own little scripting language in it.

OS monitoring

  • top -c
  • htop
  • vmstat 1
  • iostat -mx 5
  • watch

Long queries

What are 5 long running queries?

psql -x -c 'select now() - query_start as runtime, current_query from pg_stat_activity order by 1 desc limit 5'

It’s safe to kill query processes, but not to kill -9 them.

Argument tuning

  • Start monitoring your long-running queries.
  • Run an EXPLAIN ANALYZE on slow queries showing up in the logs.
  • Sort to disk is using 2700K, so we update work_mem to 4MB. However, that still doesn’t fix it. Memory use is bigger in RAM than on disk.
  • If you’re reading more than 20% of the rows, Pg will switch to a sequential scan, because random I/O is so slow.
  • Indexing a boolean rarely makes sense.

The dashboard report

  • Sometimes you want to cache your results and not even worry about the query speed.
  • Use window functions for ranking.

The OFFSET 0 hack

  • Adding an OFFSET 0 in a subquery forced a certain JOIN order on the subquery. Something about making the subquery know that it is limited in some way.

Keep seldom-used settings handy in your configuration files

September 21, 2011 Programming No comments , , ,

PostgreSQL 9.0 High Performance, by Greg Smith

I’m upgrading two work databases from PostgreSQL 9.0 to 9.1, and that means some test bulk loads. (Yes, I know about pg_upgrade, but we’re doing reloads for other reasons.) Greg Smith’s fantastic PostgreSQL 9.0 High Performance is a great help in everything related to Postgres performance, including a couple of pages on how to speed up bulk loads.

Greg’s advice is to tweak some parameters that you wouldn’t use in production, but can use for the duration of the bulk load. I wanted to make it easy to flip between standard configuration and bulk load config as I did my testing, so I aggregated the bits that were relevant to my config and stuck them at the end of my postgresql.conf. I left them commented out.

#------------------------------------------------------------------------------
# BULK LOAD ONLY
# Settings from PostgreSQL 9.0 High Performance, p. 401
#------------------------------------------------------------------------------
# maintenance_work_mem = 1GB
# checkpoint_segments = 150
# synchronous_commit = off
# fsync = off

Since they’re at the end, when I uncomment them and restart the server, they override the settings earlier in the file. Then I can do my load, and then comment them out again, without having to go all over the file to find them.

Note that I made sure to note where I got the settings from, for future reference. I don’t expect to have to do a bulk load again for at least a year, and I’ll forget by them.

Notes and comments from OSCON 2011

September 20, 2011 Open source, Programming 1 comment , , , , , , , , , , , , , , , , , , , , , , ,

Finally, two months after OSCON 2011, here’s a dump of my notes from the more tech-heavy sessions I attended. Some of it is narrative, and some of it is just barely-formatted notes. The target here is my own use of what was most interesting and useful for me at work, but I make them public here for anyone who’s interested.

This post is long and ugly, so here’s a table of contents:

Back to table of contents

API Design Anti-patterns, by Alex Martelli of Google

Abstract

Martelli’s talk was about providing public-facing web APIs, not code-level APIs. He said that public-facing websites providing must provide an API. “They’re going to scrape to get the data if you don’t,” so you might as well create an API that is less load on your site.

API design anti-patterns

  • Worst issue: no API
  • 2nd-worst API design issue: no design
  • Too many APIs spoil the broth
  • “fear of commitment”
  • Inconsistency in APIs
  • Extremes: No balance between concerns
    • what languages to support?
      • excessive language dependence or independence
    • what about standard protocols/formats?
  • Inadequate debugging, error messages, documentation

Everyone wants an API. Take a look at the most common questions on StackOverflow. They’re about spidering and scraping websites, or simulating keystroke and mouse gestures. Sometimes these questions are about system testing, but most of them point to missing APIs for a site. The APIs may actually be there, or they may be undocumented.

You should be offering an API, and it should be easy. You are in the shoes of your users. You need this API just like they do. Even a simple, weak API is better than none. Follow the path of least resistance: REST and JSON.

Document your API, or at least consider examples which may be easier than text to programmers. Keep your docs and especially the code examples in them tested. Use doctest or a similar system for testing the documentation.

Back to table of contents

The Conway Channel

The Conway Channel is Damian Conway’s annual discussion of new tools that he’s created in the past year.

Regexp::Grammars is all sorts of parsing stuff for Perl 5.10 regexes, and it went entirely over my head.

IO::Prompter is an updated version of IO::Prompt which is pretty cool already. It only works Perl with 5.10+. IO::Prompt makes it easy to prompt the user for input, and the new IO::Prompter adds more options and data validation.

# Get a number
my $n = prompt -num 'Enter a number';

# Get a password with asterisks
my $passwd = prompt 'Enter your password', -echo=>'*';

# Menu with nested options
my $selection
    = prompt 'Choose wisely...', -menu => {
            wealth => [ 'moderate', 'vast', 'incalculable' ],
            health => [ 'hale', 'hearty', 'rude' ],
            wisdom => [ 'cosmic', 'folk' ],
        }, '>';

Data::Show is like Data::Dumper but also shows helpful debug tips like variable names and origin of the statement. It doesn’t try to serialize your output like Data::Dumper does, which is a good thing. Data::Show is now my default data debug tool.

my $person = {
    name => 'Quinn',
    preferred_games => {
        wii => 'Mario Party 8',
        board => 'Life: Spongebob Squarepants Edition',
    },
    aliases => [ 'Shmoo', 'Monkeybutt' ],
    greeter => sub { my $name = shift; say "Hello $name" },
};
show $person;

======(  $person  )====================[ 'data-show.pl', line 20 ]======

    {
      aliases => ["Shmoo", "Monkeybutt"],
      greeter => sub { ... },
      name => "Quinn",
      preferred_games => {
        board => "Life: Spongebob Squarepants Edition",
        wii => "Mario Party 8",
      },
    }

Acme::Crap is a joke module that adds a crap function that also lets you use exclamation points to show severity of the error.

use Acme::Crap;

crap    'This broke';
crap!   'This other thing broke';
crap!!  'A third thing broke';
crap!!! 'A four thing broke';

This broke at acme-crap.pl line 10
This other thing broke! at acme-crap.pl line 11
A Third Thing Broke!! at acme-crap.pl line 12
A FOUR THING BROKE!!! at acme-crap.pl line 13

As with most of Damian’s joke modules, you’re not likely to use this in a real program, but to learn from how it works internally. In Acme::Crap’s case, the lesson is in overloading the ! operator.

Back to table of contents

Cornac, the PHP static analysis tool

Cornac is a static analysis tool for PHP by Damien Seguy. A cornac is someone who drives an elephant.

Cornac is both static audit and an application inventory:

  • Static audit
    • Process large quantities of code
    • Process the same code over and over
    • Depends on auditor expert level
    • Automates searches
    • Make search systematic
    • Produces false positives
  • Application inventory
    • Taking a global look at the appliction
    • List of structures names
    • List of used functionalities

Migrating to PHP 5.3

  • Incomplete evolutions
  • Obsolete functions
  • Reference handling
  • References with the “new” operator
  • mktime() doesn’t take 7 parameters any more

Gives a list of extensions. Maybe Perl::Critic should include an inventory of modules used? Elliot points out that you can give perlcritic the --statistics argument for some similar stats.

Found three different classes with the same name, but three different source files.

Summary of classes and properties makes it easy to see inconsistencies.

Has an inclusion network, like my homemade xreq tool, but graphical:

  • include, include_once, require and require_once
  • Ignores variables
  • Circles represent files, arrows represent inclusions

Most interesting of all was finding out about two other PHP static analysis tools: PMD (PHP Mess Detector) and PHP_Depends.

Back to table of contents

Using Jenkins

Andrew Bayer, @abayer
Slides

Andrew was clearly aiming at people who had many Jenkins instances, which we certainly won’t be at work, but he had lots of good solid details to discuss.

#1 Use plugins productively

  • Search for plugins to fit your needs. Organized by category on the wiki and in the update center.
  • If you’re not using a plugin any more, disable it.
    • Save memory and reduce clutter on job configuration pages.
  • Favorite plugins:
    • JobConfigHistory: See the difference between job configuations now and then. With authentication enabled, you get to see who changed it and how.
    • Disk Usage
    • Build Timeout: Can set a timeout to abort a build if it takes longer than a set time.
    • Email-ext: Better control of when emails get sent and who they get sent to. Format the emails ythe way you want using infromation from your builds.
    • Parametereized Trigger: Kick off downstream builds with information from upstream.

#2 Standardize your slaves

If you’ve got more than a couple builds, you’ve probably got multiple slaves. Ad hoc slaves my be convenient, but you’re in for trouble if they have different environments.

Use Puppet or Chef to standardize what goes on the machines. Have Jenkins or your job install the tools. Or, you can use a VM to spawn your slaves.

Whatever method you choose, just make sure your slaves are consistent.

Don’t build on master. Always build on slaves.

  • No conflict on memory/CPU/IO between master and builds.
  • Easier to add slaves than to beef up master.
  • Mixing build on master with builds on slaves means inconsistencies between builds.

#3 Use incremental builds if possible

If your build takes 4-8 hours, you can’t do real CI on every change.

If you’re integrating with code review or other pre-tested commit processes, you want to verify changes as fast as possible.

Incremental builds are complementary to full builds, not replacements.

#4 Integrate with other tools

  • Pre-tested commits with Gerrit (git code review)
  • Sonar
    • Code metrics, code coverage, unit test results, etc, all in one place
    • Great graphs, charts, etc — fantastic manager candy!
  • Chat/IM notifications

#5 Break up bloat

  • Too many builds makes it hard to navigate the Jenkins UI and hurt performance.
  • Builds that try to do too much take too long and make it impossible to restart a build process partway through.
  • Don’t be afraid to spread your jobs across multiple Jenkins masters.
  • Split jobs in a logical way — separate instances per group, per product, per physical location, etc.

#6 Stick with stable releases

  • Jenkins releases weekly. Rapid turnaround for features & fixes, but not 100% stability for every release.
  • Plugins release whenever the developers want to.
  • Update center makes it easy to upgrade core & plugins, but that’s not always best.
  • Use the Jenkins core LTS releases, every 3 months or so.

#7 Join the community

Back to table of contents

Learning jQuery

I was only in the jQuery talk for a little bit, and I was just trying to get a high-level feel for it. Still, some of the notes made things much clearer to my reading of jQuery code.

$ is a function, the “bling” function. It is the dispatcher for everything in jQuery.

// Sets the alternate rows to be odd
$('table tr:nth-child(odd)').addClass('odd');

jQuery should get loaded last on your page. Prototype uses the $ function, and will eat jQuery’s $. But jQuery won’t stomp on the Prototype $ function.

Put your Javascript last on the page, because the <script> tag blocks the rendering of the web page.

Back to table of contents

MVCC in Postgres and how to minimize the downsides

Bruce Momjian, Presentations

This turned out to be 100% theory and no actual “minimize the downsides”. It was good to see illustrations of how MVCC works, but there was nothing I could use directly.

Why learn MVCC?

  • Predict concurrent query behavior
  • Manage MVCC performance effects
  • Understand storage space reuse

Core principle: Readers never block writers, and writers never block readers.

(Chart below is an attempt at reproducing his charts, which was a pointless exercise. Better to look at his presentation directly.)

Cre 40
Exp        INSERT

Cre 40
Exp 47     DELETE

Cre 64
Exp 78     old (delete)
------
Cre 78
Exp        new (insert)

Four different numbers on each table drive MVCC:

  • xmin: creation trx set by INSERT and UPDATE
  • xmax: expire transaction number, set by UPDATE and DELETE, also used for explicit row locks
  • cmin/cmax: used to identify the command number that created or expired the tupleid; also used to store combo command IDs when the tuple is created and expired in the same trnasaction, and for explicit row locks.

Back to table of contents

(Re)Developing Perl 5 Modules in Perl 6

Damian Conway

Perl isn’t a programming language. It’s a life support system for CPAN.

Damian ported some of his Perl 5 modules to Perl 6 as a learning exercise.

Acme::Don’t

Makes a block of code not get executed, so it gets syntax checked but not run.

# Usage example
use Acme::Don't;

don't { blah(); blah(); blah();

Perl 6 implementation

module Acme::Don't;
use v6;
sub don't (&) is export {}

Lessons:

  • No homonyms in Perl 6
  • No cargo-cult vestigials
  • Fewer implicit behaviours
  • A little more typing required
  • Still obviously Perlish

IO::Insitu

Modifies files in place.

  • Parameter lists really help
  • Smarter open() helps too
  • Roles let you mix in behviours
  • A lot less typing required
  • Mainly because of better builtins

Smart::Comments

  • Perl 6’s macros kick source filters’ butt
  • Mutate grammar, not source
  • Still room for cleverness
  • No Perl 6 implementation yet has full macro support
  • No Perl 6 implementation yet has STD grammar

Perl 6 is solid enough now. Start thinking about porting modules.

Back to table of contents

PostgreSQL 9.1 overview

Selena Deckelmann

Slides

New replication tools

SE-Linux security label support. Extends SE stuff into the database to the column level.

Writable CTE: Common Table Expressions:
A temporary table or VIEW that exists just for a single query. There have been CTEs since 8.4, but not writable ones until now.

This query deletes old posts, and returns a summary of what was deleted by user_id.

WITH deleted_posts AS (
    DELETE FROM posts
    WHERE created < now() - '6 months'::INTERVAL
    RETURNING *
)
SELECT user_id, count(*) FROM deleted_posts group BY 1;

Per-column collation orders

Extensions: Postgres-specfiic package management for contrib/, PgFoundry projects, tools. Like Oracle “packages” or CPAN modules. The PGXN is the Postgres Extension Network.

K-nearest Neighbor Indexes: Geographical nearness helper

Unlogged tables: Only living in memory, for tables where it’s OK if they disappear after a crash. Much faster, but potentially ephemeral.

Serializable snapshot isolation: No more “select for update”. No more blocking on table locks.

Foreign data wrappers

  • Remote datasource access
  • Initially implemented text, CSV data sources
  • Underway currently: Oracle & MySQL sources
  • Good for imports and things that would otherwise fail if you just used COPY
  • Nothing other than sequential scans are possible.
  • Expect tons of FDWs to be implemented once we get 9.1 to production

Back to table of contents

Pro PostgreSQL 9

Robert Treat, OmniTI, who are basically scalability consultants

pgfoundry.org is other stuff around postgres.

pgxn.org is for 9.1+ extensions

Use package management rather than build from source

  • Consistent
  • Standardized
  • Simple

Versions

  • Production level work, use 9.0
  • Any project not due to launch for 3 months from today, use 9.1

pg_controldata gives you all sorts of awesome details

recovery.conf is in the PGDATA dir for standby machines

pg_clog, pg_log and pg_xlog are the main data logging files.
You can delete under pg_log and that’s OK.

Trust contrib modules more than your own code from scratch. Try
to install contrib modules into their own schemas.

Configuration

  • work_mem
    • How much memory for each individual query
    • Mostly for large analytical queries
    • OLTP is probably fine with the defaults
    • 2M is good for most people
  • checkpoint_segments
    • Number of WAL files emitted before a checkpoint.
    • Smaller = more flushing to disk
    • Minimum of 10, more like 30
  • maintenance_work_mem
    • 1G is probably fine
  • max_prepared_transactions
    • Is NOT prepared statements
    • Set to zero unless you are on two-phase commit
  • wal_buffers
    • Always set to 16M and be done with it.
  • checkpoint_completion_target
    • default is .5
    • Set to .9. Avoid hard checkpoint spikes at the expense of some overall IO being higher.

Hardware for Postgres

  • Multiple CPUs work wonders, up to 32 processors. See http://tweakers.net
  • Put WAL on its own disk, RAID 1
  • Put DATA on its own disk, RAID 10
  • More spindles is good
  • More controllers even gooder.
  • Go with SSDs over more spindles.
  • No NFS, no RAID 5

Don’t replace multiple spindles with a single SSD. You still want redundancy.

Backups

Logical backups

  • slow to create and restore
  • “pure”, no system-level corruption
  • susceptible to database-level corruption
  • pg_dump is your friend, and pg_dumpall for global settings

Physical backups

  • replication/failover machine
  • tarball (pitr)
  • filesystem snapshots (pitr)

Tarball

  • Basic idea is to copy all database files and relevant xlogs
  • Use multiple machines if able
  • Use rsync if able
  • Copy the slave if able

Back to table of contents

Perl Unicode Essentials, Tom Christiansen

http://98.245.80.27/tcpc/OSCON2011/index.html

Perl has best Unicode suport of any language.

Unicode::Tussle is a bundle of Unicode tools tchrist wrote.

5.12 is minimal for using unicode_strings feature. 5.14 is optimal.

Recommendations:

    use strict;
    use warnings;
    use warnings qw( FATAL utf8 ); # Fatalize utf8

21 bits for a Unicode character.

Enable named cahracters via \N{CHARNAME}

    use charnames qw( :full );

If you have a DATA handle, you must explicitly set its encoding. If you want this to be UTF-8, then say:

    binmode( DATA, ':encoding(UTF-8)' );

Tom’s programs start this way.

    use v5.14;
    use utf8;
    use strict;
    use autodie;
    use warnings;
    use warnings  qw< FATAL utf8 >;
    use open      qw< :std :encoding(UTF-8) >;
    use charnames qw< :full >;
    use feature   qw< unicode_strings >;
Explicitly <code>close</code> your files.  Implicit <code>close</code> never checks for errors.
Up until 5.12, there was &quot;The Unicode Bug&quot;.  The fix that makes it work right is
    use feature "unicode_strings";
Key core pragmas for Unicode are: v5.14, utf8, feature, charnames, open, re&quot;/flags&quot;, encoding::warnings.
Stay away from bytes, encoding and locale.
For the programmer, it's easier to do NFD (&quot;o\x{304}\x{303}&quot;) instead of NFC (&quot;\x{22D}&quot;)
NFD is required to, for example, match <code>/^o/</code> to know that something starts with &quot;o&quot;.
String comparisons on Unicode are pretty much always the wrong way to go.  That includes <code>eq</code>, <code>ne</code>, <code>le</code>, <code>gt</code>, <code>cmp</code>, <code>sort</code>, etc.  Use Unicode::Collate.  Get a taste of it by playing with <em class="file">ucsort</em> utility.

Objective: “Obtain job where I commute by zipline”

September 8, 2011 Job hunting 2 comments , , , ,

I spent an hour last night reading freelance writer Julieanne Smolinski‘s Twitter feed.  She’s funny in a Jack Handey kind of way, and I retweeted this Tweet:

I know you’re not supposed to lie on a resume, so I suppose my “Objective” has to be “obtain job where I commute by zipline.”

Thing is, that’s as good an objective to put on your résumé as any other.  Objectives say nothing and waste the attention of your reader.

Look at these sample objectives I found from Googling “sample resume objectives”:

  • Marketing position that utilizes my writing skills and enables me to make a positive contribution to the organization.
  • Accomplished administrator seeking to leverage extensive background in personnel management, recruitment, employee relations and benefits administration in an entry-level human resources position.
  • To transfer the office management expertise gained during eight years in a corporate setting to a managerial-level position for an established non-profit that needs fundraising and event-planning talent
  • To find a role in Human Resources that will utilize my experience with legal forms, payroll and employee recruitment as well as enable me to grow within the company.

The pattern is clear: Describe the position for which you’re applying, often with obvious fluff.  Rest assured that saying that you want to “make a positive contribution to the organization” does not give you an advantage over those candidates who don’t state it.

Don’t waste the reader’s attention on a rehash of the job description and canned drivel.  Leave out the objective.  Instead, write a three-or-four-bullet summary of your skills that summarizes the rest of the résumé.  For example:

  • Seven years experience in system administration on Linux and Windows datacenters
  • Certified MCP (Microsoft Certified Professional), working on CCNA (Cisco Certified Network Associate)
  • Four years help desk experience for 300-seat company, and fluent in Spanish

A hiring manager with 100 résumés to sift through isn’t going to read the whole thing word-for-word unless you give her a reason to.  Without a summary at the top, the reader has to skim to find the good parts.  Make it easy for her to find the good parts.

Finally, note that Julieanne’s quip gets to the heart of what’s wrong with the objective: It’s all about what the candidate wants. It’s like saying “Hi, glad to meet you, I’m Bob Smith, here’s what I want from your company.” The résumé is a tool to help you get the interview, and that starts with telling the reader what you can do for her, not the other way around.

(For more on objectives, see The worst way to start a resume)

“Building and Managing a Project Community with Github”, St. Louis, MO, 2011-09-03

August 31, 2011 Open source 2 comments , ,


On Saturday, September 3rd I’ll be presenting “Building and Managing a Project Community with Github” at ArchReactor, a hackerspace in St. Louis, MO.

ArchReactor
Jefferson Underground Building
2400 South Jefferson Avenue
St. Louis, MO 63104
http://archreactor.org/location

There will be a social hour from 4:00-5:00pm, and my presentation starts at 5pm sharp. I hope to see you there!

Your github account is not your portfolio, but it’s a start

August 24, 2011 Job hunting, Open source 6 comments , , ,

Gina Trapani started a Google+ thread about using Github as a portfolio of your work to show potential employers. This in turn was prompted by a blog post by PyDanny titled “Github is my resume.” It’s a great idea, but it’s only a start. Your portfolio should be more curated than that to be effective.

I shouldn’t complain too much. Far too few job seekers consider the power of showing existing work products to hiring managers. That’s probably because so few employers ask to see any. In my book Land the Tech Job You Love, I cite Ilya Talman, one of the top tech recruiters in Chicago, estimating that only 15% of hiring managers ask to see samples of work.

Consider the manager looking to hire a computer programmer. She has hundred résumés from respondents, all claiming to know Ruby and Rails. She knows that anyone can put Ruby, Rails, or any other technologies into a résumé without knowing them. Even well-meaning candidates might think “I read a book on Ruby once, and Rails can’t be too tough, so I’ll put them on my résumé.” Looking at sample code is a great way to separate the good programmers from the fakers.

Since creating a repository of someone else’s good code is only slightly more involved than putting “Ruby on Rails” in a résumé document, a good hiring manager will ask in the interview about the code. When I interview candidates, I ask for printed code samples of their best work for us to discuss. Pointing at a given section on the paper, I’ll say “Tell me about your choice to write your own Perl function here instead of using a module from CPAN“, or “I see your variables seem to be named using a certain convention; why did you use that method?” In a few minutes, I can easily find out more about the candidate’s thought process and coding style than a mile-long résumé. This method also exposes potentially faked code.

So as much as I applaud candidates having a body of work to which they can point employers, simply saying “Here’s my Github repo” is not enough. The hiring manager doesn’t want to see everything you’ve written. Although everyone is different, she probably wants to see three things:

  • quality of work
  • breadth of work
  • applicability to her specific needs

Most important, she doesn’t want to go digging through all your code to find the answers to these questions.

Consider my github repository as an example. There are 28 repositories in it. Of these, nine are forks of other repos for me to modify, so clearly do not count as code I’ve written. Three repos are version control for websites I manage. Some are incubators of ideas for future projects that have yet to blossom. My scraps repository is a junk drawer where I put code I’ve written and might have use for later. How will an interested employer know what to look at? It’s arrogant and foolish to tell someone looking to hire you “here’s all my public code, you figure it out.” It’s the RTFM method of portfolio presentation, and it doesn’t put you in the best light possible.

For an effective portfolio, choose three to five projects that show your best work, and then provide a paragraph or two about each, describing the project in English and your involvement with it. There is literally no project or repository, on Github or elsewhere, about which I can say “This work is 100% mine.” Everything I’ve ever worked on has had work contributed from others, and the nature of those contributions needs to be disclosed upfront and honestly.

None of this is special to Github. There are plenty of online code repositories out there, such as Perl’s CPAN, which can act as a showcase for your work. Of course, you can also create your own online portfolio on your website as well. The keys are to highlight your best work and accurately describe your involvement.

A common complaint I hear when I discuss code portfolios goes like this: “Most of my work is private or under NDA, so I can’t have a portfolio.” Hogwash. You can go write your own code specifically to show your skills. If your area of expertise is with web apps, then go write a web app that does something fairly useful and publish that as your portfolio. Assign it an open source license so that others can take advantage of it, too. You’ll be helping your community while you help your job prospects.

Do you have an online code portfolio? Let me know in the comments, and include the URL for others to see.

Should I put ____ on my résumé?

August 15, 2011 Job hunting 3 comments

I read Reddit’s résumé subreddit regularly, and it’s one of the most common questions asked: “Should I put such-and-such item on my résumé, or leave it off?” The variations are endless:

  • Should I put a job on my résumé that I was at for only three months?
  • Should I put my college work on my résumé, even though I only was in for two years of a four-year degree?
  • Should I put my hobbies on my résumé?
  • Should I put my volunteer work on my résumé?
  • Should I put my high school education on my résumé?

The answer is the same for each of these examples: It depends on the job for which you’re applying.  Here’s how to analyze the situation and make the right choice for the job.

First, remember that the purpose of a résumé is to get you a job interview. Therefore, the question you have to ask yourself is “Will this piece of information help convince the reader to call me in for an interview?”  If it won’t, then leave it out.

Second, every position is different, so you must ask the question as it relates to the job for which you’re applying. You don’t have a single résumé that you blast out to the world. Consider every point on your résumé as it applies to the job for which you’re applying. For example, you probably don’t want to put on your résumé that you play guitar when applying for a job as a system administrator, unless you’re applying for that sysadmin job at a music publishing house.

All that said, here are a few items that you should almost definitely leave off a résumé:

  • “References available upon request,” which is assumed and is therefore noise.
  • A list of references, because these will be asked for at a later point in the hiring process
  • A photograph, which is inappropriate in the United States

Band naming made easy

August 9, 2011 Internet 1 comment , ,

My friend Rob Warmowski has a new band named Sirs. They have a show coming up with another band opening for them. The second paragraph is key.

Join Sirs Saturday, August 20 at 4 PM with a live performance to celebrate the release of our 12″ EP “Boo Hoo”. Where better to do this than at Saki, the fine purveyor of records located at 3716 W. Fullerton in Chicago? Nowhere, that’s where.

Opening band: Small Trabajo. (Note: nobody in Small Trabajo yet knows that their name is Small Trabajo. We were told by store staff that the band, being very new, was having a hard time coming up with a band name. Hearing this, I went to the first Captcha I could find (http://captcha.net) and solved the problem immediately.)

The Internet has a solution for every problem!

No, you can’t ask about money in the job interview

August 2, 2011 Interviews 6 comments , ,

So often I see it posted to reddit: “When do I ask about money?” You don’t. You don’t ask about money in the job interview. You wait until the company brings it up, often in the form of a job offer. There’s a time and a place for everything, and the time and place for compensation discussion is in the job offer, or when the company chooses to bring it up.

When you go into a job interview, your focus must be on the company’s needs, or what work the hiring manager wants you to do. You want to talk about what you can do for the company, not ask about what they can do for you. Asking about salary, benefits, vacation, or other forms of compensation tells the interviewer that you’re more concerned with what’s in it for you, rather than how you can help her. Whether that’s true or not doesn’t matter. You still run a risk of coming across that way.

(This is also part of why an objective is the worst way to start a résumé, because it says “Hi, I’m so-and-so, and here’s what I want from you.”)

The goal of a job interview is for you to get a job offer, or to move closer to getting one. If you don’t get the job offer, it doesn’t matter how much the job pays.

An interview isn’t a one-sided affair, of course. It’s also about you finding out about the company, about worklife, about the sorts of projects you’d work on, because these all fit into things of benefit to the company. Compensation, however, is a one-way benefit to you. What if the interviewer doesn’t discuss salary? Then you just wait for the second interview or the job offer, where the specifics of compensation will all be laid out.

People have countered my stance on this with “I just want to know what it’s paying so that I can save time for both of us by not going through an interview for a job that’s not going to pay enough.” That’s what we programmers refer to as a premature optimization. Just as it doesn’t matter how fast your program runs if it gives the wrong answer, it doesn’t matter how quickly you get through the hiring process if you don’t get the offer.

Have some patience. Focus on selling your skills and experience to the interviewer. Talk to the interviewer about her problems and how you’ll solve them. And don’t ask about compensation.