Finally, two months after OSCON 2011, here's a dump of my notes from the more tech-heavy sessions I attended. Some of it is narrative, and some of it is just barely-formatted notes. The target here is my own use of what was most interesting and useful for me at work, but I make them public here for anyone who's interested.

This post is long and ugly, so here's a table of contents:

Back to table of contents

API Design Anti-patterns, by Alex Martelli of Google

Abstract

Martelli's talk was about providing public-facing web APIs, not code-level APIs. He said that public-facing websites providing must provide an API. "They're going to scrape to get the data if you don't," so you might as well create an API that is less load on your site.

API design anti-patterns

  • Worst issue: no API
  • 2nd-worst API design issue: no design
  • Too many APIs spoil the broth
  • "fear of commitment"
  • Inconsistency in APIs
  • Extremes: No balance between concerns
    • what languages to support?
      • excessive language dependence or independence
    • what about standard protocols/formats?
  • Inadequate debugging, error messages, documentation

Everyone wants an API. Take a look at the most common questions on StackOverflow. They're about spidering and scraping websites, or simulating keystroke and mouse gestures. Sometimes these questions are about system testing, but most of them point to missing APIs for a site. The APIs may actually be there, or they may be undocumented.

You should be offering an API, and it should be easy. You are in the shoes of your users. You need this API just like they do. Even a simple, weak API is better than none. Follow the path of least resistance: REST and JSON.

Document your API, or at least consider examples which may be easier than text to programmers. Keep your docs and especially the code examples in them tested. Use doctest or a similar system for testing the documentation.

Back to table of contents

The Conway Channel

The Conway Channel is Damian Conway's annual discussion of new tools that he's created in the past year.

Regexp::Grammars is all sorts of parsing stuff for Perl 5.10 regexes, and it went entirely over my head.

IO::Prompter is an updated version of IO::Prompt which is pretty cool already. It only works Perl with 5.10+. IO::Prompt makes it easy to prompt the user for input, and the new IO::Prompter adds more options and data validation.

# Get a number
my $n = prompt -num 'Enter a number';

# Get a password with asterisks
my $passwd = prompt 'Enter your password', -echo=>'*';

# Menu with nested options
my $selection
    = prompt 'Choose wisely...', -menu => {
            wealth => [ 'moderate', 'vast', 'incalculable' ],
            health => [ 'hale', 'hearty', 'rude' ],
            wisdom => [ 'cosmic', 'folk' ],
        }, '>';

Data::Show is like Data::Dumper but also shows helpful debug tips like variable names and origin of the statement. It doesn't try to serialize your output like Data::Dumper does, which is a good thing. Data::Show is now my default data debug tool.

my $person = {
    name => 'Quinn',
    preferred_games => {
        wii => 'Mario Party 8',
        board => 'Life: Spongebob Squarepants Edition',
    },
    aliases => [ 'Shmoo', 'Monkeybutt' ],
    greeter => sub { my $name = shift; say "Hello $name" },
};
show $person;

======(  $person  )====================[ 'data-show.pl', line 20 ]======

    {
      aliases => ["Shmoo", "Monkeybutt"],
      greeter => sub { ... },
      name => "Quinn",
      preferred_games => {
        board => "Life: Spongebob Squarepants Edition",
        wii => "Mario Party 8",
      },
    }
Acme::Crap is a joke module that adds a crap function that also lets you use exclamation points to show severity of the error.
use Acme::Crap;

crap    'This broke';
crap!   'This other thing broke';
crap!!  'A third thing broke';
crap!!! 'A four thing broke';

This broke at acme-crap.pl line 10
This other thing broke! at acme-crap.pl line 11
A Third Thing Broke!! at acme-crap.pl line 12
A FOUR THING BROKE!!! at acme-crap.pl line 13

As with most of Damian's joke modules, you're not likely to use this in a real program, but to learn from how it works internally. In Acme::Crap's case, the lesson is in overloading the ! operator.

Back to table of contents

Cornac, the PHP static analysis tool

Cornac is a static analysis tool for PHP by Damien Seguy. A cornac is someone who drives an elephant.

Cornac is both static audit and an application inventory:

  • Static audit
    • Process large quantities of code
    • Process the same code over and over
    • Depends on auditor expert level
    • Automates searches
    • Make search systematic
    • Produces false positives
  • Application inventory
    • Taking a global look at the appliction
    • List of structures names
    • List of used functionalities

Migrating to PHP 5.3

  • Incomplete evolutions
  • Obsolete functions
  • Reference handling
  • References with the "new" operator
  • mktime() doesn't take 7 parameters any more

Gives a list of extensions. Maybe Perl::Critic should include an inventory of modules used? Elliot points out that you can give perlcritic the --statistics argument for some similar stats.

Found three different classes with the same name, but three different source files.

Summary of classes and properties makes it easy to see inconsistencies.

Has an inclusion network, like my homemade xreq tool, but graphical:

  • include, include_once, require and require_once
  • Ignores variables
  • Circles represent files, arrows represent inclusions

Most interesting of all was finding out about two other PHP static analysis tools: PMD (PHP Mess Detector) and PHP_Depends.

Back to table of contents

Using Jenkins

Andrew Bayer, @abayer

Slides

Andrew was clearly aiming at people who had many Jenkins instances, which we certainly won't be at work, but he had lots of good solid details to discuss.

#1 Use plugins productively

  • Search for plugins to fit your needs. Organized by category on the wiki and in the update center.
  • If you're not using a plugin any more, disable it.
    • Save memory and reduce clutter on job configuration pages.
  • Favorite plugins:
    • JobConfigHistory: See the difference between job configuations now and then. With authentication enabled, you get to see who changed it and how.
    • Disk Usage
    • Build Timeout: Can set a timeout to abort a build if it takes longer than a set time.
    • Email-ext: Better control of when emails get sent and who they get sent to. Format the emails ythe way you want using infromation from your builds.
    • Parametereized Trigger: Kick off downstream builds with information from upstream.

#2 Standardize your slaves

If you've got more than a couple builds, you've probably got multiple slaves. Ad hoc slaves my be convenient, but you're in for trouble if they have different environments.

Use Puppet or Chef to standardize what goes on the machines. Have Jenkins or your job install the tools. Or, you can use a VM to spawn your slaves.

Whatever method you choose, just make sure your slaves are consistent.

Don't build on master. Always build on slaves.

  • No conflict on memory/CPU/IO between master and builds.
  • Easier to add slaves than to beef up master.
  • Mixing build on master with builds on slaves means inconsistencies between builds.

#3 Use incremental builds if possible

If your build takes 4-8 hours, you can't do real CI on every change.

If you're integrating with code review or other pre-tested commit processes, you want to verify changes as fast as possible.

Incremental builds are complementary to full builds, not replacements.

#4 Integrate with other tools

  • Pre-tested commits with Gerrit (git code review)
  • Sonar
    • Code metrics, code coverage, unit test results, etc, all in one place
    • Great graphs, charts, etc -- fantastic manager candy!
  • Chat/IM notifications

#5 Break up bloat

  • Too many builds makes it hard to navigate the Jenkins UI and hurt performance.
  • Builds that try to do too much take too long and make it impossible to restart a build process partway through.
  • Don't be afraid to spread your jobs across multiple Jenkins masters.
  • Split jobs in a logical way -- separate instances per group, per product, per physical location, etc.

#6 Stick with stable releases

  • Jenkins releases weekly. Rapid turnaround for features & fixes, but not 100% stability for every release.
  • Plugins release whenever the developers want to.
  • Update center makes it easy to upgrade core & plugins, but that's not always best.
  • Use the Jenkins core LTS releases, every 3 months or so.

#7 Join the community

Back to table of contents

Learning jQuery

I was only in the jQuery talk for a little bit, and I was just trying to get a high-level feel for it. Still, some of the notes made things much clearer to my reading of jQuery code.

$ is a function, the "bling" function. It is the dispatcher for everything in jQuery.

// Sets the alternate rows to be odd
$('table tr:nth-child(odd)').addClass('odd');

jQuery should get loaded last on your page. Prototype uses the $ function, and will eat jQuery's $. But jQuery won't stomp on the Prototype $ function.

Put your Javascript last on the page, because the <script> tag blocks the rendering of the web page.

Back to table of contents

MVCC in Postgres and how to minimize the downsides

Bruce Momjian, Presentations

This turned out to be 100% theory and no actual "minimize the downsides". It was good to see illustrations of how MVCC works, but there was nothing I could use directly.

Why learn MVCC?

  • Predict concurrent query behavior
  • Manage MVCC performance effects
  • Understand storage space reuse

Core principle: Readers never block writers, and writers never block readers.

(Chart below is an attempt at reproducing his charts, which was a pointless exercise. Better to look at his presentation directly.)

Cre 40
Exp        INSERT

Cre 40
Exp 47     DELETE

Cre 64
Exp 78     old (delete)
------
Cre 78
Exp        new (insert)

Four different numbers on each table drive MVCC:

  • xmin: creation trx set by INSERT and UPDATE
  • xmax: expire transaction number, set by UPDATE and DELETE, also used for explicit row locks
  • cmin/cmax: used to identify the command number that created or expired the tupleid; also used to store combo command IDs when the tuple is created and expired in the same trnasaction, and for explicit row locks.
Back to table of contents

(Re)Developing Perl 5 Modules in Perl 6

Damian Conway

Perl isn't a programming language. It's a life support system for CPAN.

Damian ported some of his Perl 5 modules to Perl 6 as a learning exercise.

Acme::Don't

Makes a block of code not get executed, so it gets syntax checked but not run.

# Usage example
use Acme::Don't;

don't { blah(); blah(); blah();

Perl 6 implementation

module Acme::Don't;
use v6;
sub don't (&) is export {}

Lessons:

  • No homonyms in Perl 6
  • No cargo-cult vestigials
  • Fewer implicit behaviours
  • A little more typing required
  • Still obviously Perlish

IO::Insitu

Modifies files in place.

  • Parameter lists really help
  • Smarter open() helps too
  • Roles let you mix in behviours
  • A lot less typing required
  • Mainly because of better builtins

Smart::Comments

  • Perl 6's macros kick source filters' butt
  • Mutate grammar, not source
  • Still room for cleverness
  • No Perl 6 implementation yet has full macro support
  • No Perl 6 implementation yet has STD grammar

Perl 6 is solid enough now. Start thinking about porting modules.

Back to table of contents

PostgreSQL 9.1 overview

Selena Deckelmann
Slides

New replication tools

SE-Linux security label support. Extends SE stuff into the database to the column level.

Writable CTE: Common Table Expressions:
A temporary table or VIEW that exists just for a single query. There have been CTEs since 8.4, but not writable ones until now.

This query deletes old posts, and returns a summary of what was deleted by user_id.

WITH deleted_posts AS (
    DELETE FROM posts
    WHERE created < now() - '6 months'::INTERVAL
    RETURNING *
)
SELECT user_id, count(*) FROM deleted_posts group BY 1;

Per-column collation orders

Extensions: Postgres-specfiic package management for contrib/, PgFoundry projects, tools. Like Oracle "packages" or CPAN modules. The PGXN is the Postgres Extension Network.

K-nearest Neighbor Indexes: Geographical nearness helper

Unlogged tables: Only living in memory, for tables where it's OK if they disappear after a crash. Much faster, but potentially ephemeral.

Serializable snapshot isolation: No more "select for update". No more blocking on table locks.

Foreign data wrappers

  • Remote datasource access
  • Initially implemented text, CSV data sources
  • Underway currently: Oracle & MySQL sources
  • Good for imports and things that would otherwise fail if you just used COPY
  • Nothing other than sequential scans are possible.
  • Expect tons of FDWs to be implemented once we get 9.1 to production
Back to table of contents

Pro PostgreSQL 9

Robert Treat, OmniTI, who are basically scalability consultants

pgfoundry.org is other stuff around postgres.

pgxn.org is for 9.1+ extensions

Use package management rather than build from source

  • Consistent
  • Standardized
  • Simple

Versions

  • Production level work, use 9.0
  • Any project not due to launch for 3 months from today, use 9.1

pg_controldata gives you all sorts of awesome details

recovery.conf is in the PGDATA dir for standby machines

pg_clog, pg_log and pg_xlog are the main data logging files. You can delete under pg_log and that's OK.

Trust contrib modules more than your own code from scratch. Try to install contrib modules into their own schemas.

Configuration

  • work_mem

    • How much memory for each individual query
    • Mostly for large analytical queries
    • OLTP is probably fine with the defaults
    • 2M is good for most people
  • checkpoint_segments

    • Number of WAL files emitted before a checkpoint.
    • Smaller = more flushing to disk
    • Minimum of 10, more like 30
  • maintenance_work_mem

    • 1G is probably fine
  • max_prepared_transactions

    • Is NOT prepared statements
    • Set to zero unless you are on two-phase commit
  • wal_buffers

    • Always set to 16M and be done with it.
  • checkpoint_completion_target

    • default is .5
    • Set to .9. Avoid hard checkpoint spikes at the expense of some overall IO being higher.

Hardware for Postgres

  • Multiple CPUs work wonders, up to 32 processors. See http://tweakers.net
  • Put WAL on its own disk, RAID 1
  • Put DATA on its own disk, RAID 10
  • More spindles is good
  • More controllers even gooder.
  • Go with SSDs over more spindles.
  • No NFS, no RAID 5

Don't replace multiple spindles with a single SSD. You still want redundancy.

Backups

Logical backups

  • slow to create and restore
  • "pure", no system-level corruption
  • susceptible to database-level corruption
  • pg_dump is your friend, and pg_dumpall for global settings

Physical backups

  • replication/failover machine
  • tarball (pitr)
  • filesystem snapshots (pitr)

Tarball

  • Basic idea is to copy all database files and relevant xlogs
  • Use multiple machines if able
  • Use rsync if able
  • Copy the slave if able
Back to table of contents

Perl Unicode Essentials, Tom Christiansen

http://98.245.80.27/tcpc/OSCON2011/index.html

Perl has best Unicode suport of any language.

Unicode::Tussle is a bundle of Unicode tools tchrist wrote.

5.12 is minimal for using unicode_strings feature. 5.14 is optimal.

Recommendations:

    use strict;
    use warnings;
    use warnings qw( FATAL utf8 ); # Fatalize utf8

21 bits for a Unicode character.

Enable named cahracters via \N{CHARNAME}

    use charnames qw( :full );

If you have a DATA handle, you must explicitly set its encoding. If you want this to be UTF-8, then say:

    binmode( DATA, ':encoding(UTF-8)' );

Tom's programs start this way.

    use v5.14;
    use utf8;
    use strict;
    use autodie;
    use warnings;
    use warnings  qw< FATAL utf8 >;
    use open      qw< :std :encoding(UTF-8) >;
    use charnames qw< :full >;
    use feature   qw< unicode_strings >;

Explicitly close your files. Implicit close never checks for errors.

Up until 5.12, there was "The Unicode Bug". The fix that makes it work right is

    use feature "unicode_strings";

Key core pragmas for Unicode are: v5.14, utf8, feature, charnames, open, re"/flags", encoding::warnings.

Stay away from bytes, encoding and locale.

For the programmer, it's easier to do NFD ("o\x{304}\x{303}") instead of NFC ("\x{22D}")

NFD is required to, for example, match /^o/ to know that something starts with "o".

String comparisons on Unicode are pretty much always the wrong way to go. That includes eq, ne, le, gt, cmp, sort, etc. Use Unicode::Collate. Get a taste of it by playing with ucsort utility.