Open source

Which of my PostgreSQL indexes are getting used most heavily?

October 10, 2011 Open source, Programming No comments , , , ,

Ever since we got the fast new database server with SSDs, I’ve been monitoring which tables are getting heavy traffic and should go live on the SSDs. We have two tablespaces, “fast” which is faster but smaller, and “slow” which is bigger but slower. I’ve been using this query to determine which indexes should live in which tablespace. There are different forms of this query around the web, but I needed to see the tablespaces, too.

SELECT
    i.idx_scan,
    i.idx_tup_read,
    i.idx_tup_fetch,
    i.indexrelname AS index,
    it.spcname AS index_tablespace,
    i.relname AS table,
    tt.spcname AS table_tablespace,
    pg_size_pretty(pg_relation_size(i.indexrelname::text)) as index_size
FROM pg_stat_all_indexes i
    INNER JOIN pg_class ic ON (i.indexrelid = ic.oid)
    LEFT OUTER JOIN pg_tablespace it ON (ic.reltablespace = it.oid)
    INNER JOIN pg_class tc ON (i.relid = tc.oid)
    LEFT OUTER JOIN pg_tablespace tt ON (tc.reltablespace = tt.oid)
ORDER BY 1 desc, 2 desc, 3 desc

The output looks like this (in \x mode because of the width):

-[ RECORD 1 ]----+----------------------------------------------------
idx_scan         | 395974172
idx_tup_read     | 432974893
idx_tup_fetch    | 426070104
index            | testbook_pkey
index_tablespace | fast
table            | testbook
table_tablespace | fast
index_size       | 289 MB
-[ RECORD 2 ]----+----------------------------------------------------
idx_scan         | 133416135
idx_tup_read     | 133441801
idx_tup_fetch    | 133413399
index            | lists_listid_custid
index_tablespace | fast
table            | lists
table_tablespace | fast
index_size       | 7096 kB
-[ RECORD 3 ]----+----------------------------------------------------
idx_scan         | 50310975
idx_tup_read     | 1286116
idx_tup_fetch    | 742639
index            | listdetail_bkkey_listid_where_ctr2_is_zero
index_tablespace | fast
table            | listdetail
table_tablespace | fast
index_size       | 682 MB

I have one case where a heavily-trafficked table is still staying on the slow tablespace. It’s a log of user login history that is only ever appended to, and is searched only a few times a day. SSDs are great at random reads, but not much faster than physical spindles on sequential writes. Therefore, my login history would not benefit much from moving to the SSD tablespace, and I can allocate that precious space to another table or index instead.

You’re not a genius? Says who?

October 10, 2011 Open source, People, Social 2 comments , , , , ,

Who says you’re not a genius?  Who are any of us to say?  And why would anyone bother telling someone that?

In my last blog post, I talked about how it was unnecessary and counterproductive to justify your projects to your detractors. It only wastes time that could be spent doing something positive, and it’s not going to change anyone’s mind.  One of the commenters took issue with my premise, basing his disagreement on the glib comment “You ain’t Steve Jobs.”

Of course I’m not, but so what? How much of a genius do I have to be before I no longer have to justify myself to others? (Don’t answer that; it only encourages them.)

The unspoken corollary to comments like “You ain’t Steve Jobs” seems to be “Therefore, you must listen to how others want you to be.” Fortunately, even in the absence of a Jobs-level genius, we’re all able to stand on our own, to live and work as we see fit, without having to take mandatory guidance from others.

I wonder at the thought process that it takes to tell someone “You’re not as _____ as you think you are.”  Near as I can figure, comments like these have one or more of these subtexts:

  • “You need to be more like me.”
  • “I’m trying to save you wasting time or risking failure.”
  • “I have taken it upon myself to take you down a peg and put you in your place.”  (This one often appears with the phrase “I’m just saying…”)

Fortunately, none of these are valid, none of them need concern you.  You can, and should, ignore them.  Ignore them for your own sake, and for the sake of the awesome things you have in you to share with the world.

I wonder how many Jobs-level brains are out there but never flower because the person was told too many times that he (or more likely, she) isn’t as good as he thinks he is. Neil deGrasse Tyson makes a brilliant point about how we teach children, that parents spend the first years of a child’s life teaching him to walk and talk, and the rest of his life telling him to shut up and sit down, quashing their sense of wonder and thirst for knowledge. “You ain’t Steve Jobs” is the adult version of “shut up and sit down.”

Whatever your level of genius, one thing we can all share with Steve is his perseverance.  He kept working at what he believed in, despite public derision about his public failures.  Before the success of the Macintosh, Apple released the Lisa, and before the iPad, the Newton. How much poorer the world would be if Jobs had listened to his critics and packed it in!

I’m not saying that there isn’t value to be found in criticism, even unsolicited criticism, about your work.  I’m not suggesting that you shut out the world around you.  If you can take what you find useful and leave the rest, then do it.

I am suggesting that you shut out those who tell you you’re no good, or who want to put you in your place. When people tell you you’re not awesome, ignore them.  Who are they to say?  And why does it matter if they think you’re awesome or not?  Eventually, you’ll prove them wrong.

 

There’s only one useful way to handle your detractors

October 6, 2011 Open source, People, Social 8 comments , , ,

Here’s a Reddit/Slashdot/whatever thread that never happened:

Internet crank on Reddit: “Hey, Steve Jobs, I guess that new iPad looks cool, but I think iPad is a stupid name, it makes me think of sanitary napkins.”

Steve: “Yeah, well, here’s why we called it that. (Long explanation justifying his choices)”

Crank #2: “Well, why didn’t you call it the iTablet? I think that would have been a good name. What does everyone else think?”

Crank #3: “What does it have to be iAnything? I’m tired of the i- prefix.”

Steve: “We thought about that, but … (More explanation about his choices)”

Crank #1: “And really, isn’t it just a bigger iPod Touch? I would never carry that around with me. And come on, you’re just trying to redo the Newton anyway LOL”

Steve: “My logic behind the iPad is (vision, business plan, blah blah blah)”

Can you even  imagine Steve Jobs in this sort of time-wasting and emotionally draining tit-for-tat in a thread on Slashdot? On reddit? In some blog’s comment section? Of course not. Justification of his plans would take away from the amazing things that he needed to achieve.

Naysayers are part of every project. How many people do you think pissed on Jimmy Wales’ little project to aggregate knowledge? Nobody’s going to spend their time writing encyclopedia entries! And yet there it is.  On a personal level, if I listened to everyone who thought I was wasting my time improving on find + grep you’d never have ack.

We all have to persevere in the face of adversity to ideas, but there’s more than that.  We need to ignore our detractors. Despite how silly and time-wasting it is to argue your motivations and reasons for undertaking a project, many of us feel compelled to argue with everyone who disagrees with us.  I suggest you not waste your time.

On the Internet, the attitude is “Why wasn’t I consulted?” Every anti-social child (measured by calendar or maturity) with a keyboard thinks it’s his responsibility to piss on everything he doesn’t like. They’ll be there always. You can no more make them go away than you would by arguing with the rain.

What are you hoping to achieve by arguing with someone who doesn’t like your project? Do you expect that he’ll come around to your way of thinking? It won’t happen through words.

Not only does arguing with your critics waste your precious time, but it tells them, and every other crank reading, that you’re willing to engage in debate about what you’re doing. Don’t encourage them! Let them find a more receptive target.

I’m not saying that factual misstatements need to be ignored.  If something is provably incorrect, go ahead and counter it with facts.  However, most of the time these message thread pissing wars get down to “I would not be doing what you are doing, and therefore you are wrong for doing so.”

The only thing that has a chance of silencing your critics is success at what you do. Arguing with the naysayers doesn’t get you any closer to that.

Notes and comments from OSCON 2011

September 20, 2011 Open source, Programming 1 comment , , , , , , , , , , , , , , , , , , , , , , ,

Finally, two months after OSCON 2011, here’s a dump of my notes from the more tech-heavy sessions I attended. Some of it is narrative, and some of it is just barely-formatted notes. The target here is my own use of what was most interesting and useful for me at work, but I make them public here for anyone who’s interested.

This post is long and ugly, so here’s a table of contents:

Back to table of contents

API Design Anti-patterns, by Alex Martelli of Google

Abstract

Martelli’s talk was about providing public-facing web APIs, not code-level APIs. He said that public-facing websites providing must provide an API. “They’re going to scrape to get the data if you don’t,” so you might as well create an API that is less load on your site.

API design anti-patterns

  • Worst issue: no API
  • 2nd-worst API design issue: no design
  • Too many APIs spoil the broth
  • “fear of commitment”
  • Inconsistency in APIs
  • Extremes: No balance between concerns
    • what languages to support?
      • excessive language dependence or independence
    • what about standard protocols/formats?
  • Inadequate debugging, error messages, documentation

Everyone wants an API. Take a look at the most common questions on StackOverflow. They’re about spidering and scraping websites, or simulating keystroke and mouse gestures. Sometimes these questions are about system testing, but most of them point to missing APIs for a site. The APIs may actually be there, or they may be undocumented.

You should be offering an API, and it should be easy. You are in the shoes of your users. You need this API just like they do. Even a simple, weak API is better than none. Follow the path of least resistance: REST and JSON.

Document your API, or at least consider examples which may be easier than text to programmers. Keep your docs and especially the code examples in them tested. Use doctest or a similar system for testing the documentation.

Back to table of contents

The Conway Channel

The Conway Channel is Damian Conway’s annual discussion of new tools that he’s created in the past year.

Regexp::Grammars is all sorts of parsing stuff for Perl 5.10 regexes, and it went entirely over my head.

IO::Prompter is an updated version of IO::Prompt which is pretty cool already. It only works Perl with 5.10+. IO::Prompt makes it easy to prompt the user for input, and the new IO::Prompter adds more options and data validation.

# Get a number
my $n = prompt -num 'Enter a number';

# Get a password with asterisks
my $passwd = prompt 'Enter your password', -echo=>'*';

# Menu with nested options
my $selection
    = prompt 'Choose wisely...', -menu => {
            wealth => [ 'moderate', 'vast', 'incalculable' ],
            health => [ 'hale', 'hearty', 'rude' ],
            wisdom => [ 'cosmic', 'folk' ],
        }, '>';

Data::Show is like Data::Dumper but also shows helpful debug tips like variable names and origin of the statement. It doesn’t try to serialize your output like Data::Dumper does, which is a good thing. Data::Show is now my default data debug tool.

my $person = {
    name => 'Quinn',
    preferred_games => {
        wii => 'Mario Party 8',
        board => 'Life: Spongebob Squarepants Edition',
    },
    aliases => [ 'Shmoo', 'Monkeybutt' ],
    greeter => sub { my $name = shift; say "Hello $name" },
};
show $person;

======(  $person  )====================[ 'data-show.pl', line 20 ]======

    {
      aliases => ["Shmoo", "Monkeybutt"],
      greeter => sub { ... },
      name => "Quinn",
      preferred_games => {
        board => "Life: Spongebob Squarepants Edition",
        wii => "Mario Party 8",
      },
    }

Acme::Crap is a joke module that adds a crap function that also lets you use exclamation points to show severity of the error.

use Acme::Crap;

crap    'This broke';
crap!   'This other thing broke';
crap!!  'A third thing broke';
crap!!! 'A four thing broke';

This broke at acme-crap.pl line 10
This other thing broke! at acme-crap.pl line 11
A Third Thing Broke!! at acme-crap.pl line 12
A FOUR THING BROKE!!! at acme-crap.pl line 13

As with most of Damian’s joke modules, you’re not likely to use this in a real program, but to learn from how it works internally. In Acme::Crap’s case, the lesson is in overloading the ! operator.

Back to table of contents

Cornac, the PHP static analysis tool

Cornac is a static analysis tool for PHP by Damien Seguy. A cornac is someone who drives an elephant.

Cornac is both static audit and an application inventory:

  • Static audit
    • Process large quantities of code
    • Process the same code over and over
    • Depends on auditor expert level
    • Automates searches
    • Make search systematic
    • Produces false positives
  • Application inventory
    • Taking a global look at the appliction
    • List of structures names
    • List of used functionalities

Migrating to PHP 5.3

  • Incomplete evolutions
  • Obsolete functions
  • Reference handling
  • References with the “new” operator
  • mktime() doesn’t take 7 parameters any more

Gives a list of extensions. Maybe Perl::Critic should include an inventory of modules used? Elliot points out that you can give perlcritic the --statistics argument for some similar stats.

Found three different classes with the same name, but three different source files.

Summary of classes and properties makes it easy to see inconsistencies.

Has an inclusion network, like my homemade xreq tool, but graphical:

  • include, include_once, require and require_once
  • Ignores variables
  • Circles represent files, arrows represent inclusions

Most interesting of all was finding out about two other PHP static analysis tools: PMD (PHP Mess Detector) and PHP_Depends.

Back to table of contents

Using Jenkins

Andrew Bayer, @abayer
Slides

Andrew was clearly aiming at people who had many Jenkins instances, which we certainly won’t be at work, but he had lots of good solid details to discuss.

#1 Use plugins productively

  • Search for plugins to fit your needs. Organized by category on the wiki and in the update center.
  • If you’re not using a plugin any more, disable it.
    • Save memory and reduce clutter on job configuration pages.
  • Favorite plugins:
    • JobConfigHistory: See the difference between job configuations now and then. With authentication enabled, you get to see who changed it and how.
    • Disk Usage
    • Build Timeout: Can set a timeout to abort a build if it takes longer than a set time.
    • Email-ext: Better control of when emails get sent and who they get sent to. Format the emails ythe way you want using infromation from your builds.
    • Parametereized Trigger: Kick off downstream builds with information from upstream.

#2 Standardize your slaves

If you’ve got more than a couple builds, you’ve probably got multiple slaves. Ad hoc slaves my be convenient, but you’re in for trouble if they have different environments.

Use Puppet or Chef to standardize what goes on the machines. Have Jenkins or your job install the tools. Or, you can use a VM to spawn your slaves.

Whatever method you choose, just make sure your slaves are consistent.

Don’t build on master. Always build on slaves.

  • No conflict on memory/CPU/IO between master and builds.
  • Easier to add slaves than to beef up master.
  • Mixing build on master with builds on slaves means inconsistencies between builds.

#3 Use incremental builds if possible

If your build takes 4-8 hours, you can’t do real CI on every change.

If you’re integrating with code review or other pre-tested commit processes, you want to verify changes as fast as possible.

Incremental builds are complementary to full builds, not replacements.

#4 Integrate with other tools

  • Pre-tested commits with Gerrit (git code review)
  • Sonar
    • Code metrics, code coverage, unit test results, etc, all in one place
    • Great graphs, charts, etc — fantastic manager candy!
  • Chat/IM notifications

#5 Break up bloat

  • Too many builds makes it hard to navigate the Jenkins UI and hurt performance.
  • Builds that try to do too much take too long and make it impossible to restart a build process partway through.
  • Don’t be afraid to spread your jobs across multiple Jenkins masters.
  • Split jobs in a logical way — separate instances per group, per product, per physical location, etc.

#6 Stick with stable releases

  • Jenkins releases weekly. Rapid turnaround for features & fixes, but not 100% stability for every release.
  • Plugins release whenever the developers want to.
  • Update center makes it easy to upgrade core & plugins, but that’s not always best.
  • Use the Jenkins core LTS releases, every 3 months or so.

#7 Join the community

Back to table of contents

Learning jQuery

I was only in the jQuery talk for a little bit, and I was just trying to get a high-level feel for it. Still, some of the notes made things much clearer to my reading of jQuery code.

$ is a function, the “bling” function. It is the dispatcher for everything in jQuery.

// Sets the alternate rows to be odd
$('table tr:nth-child(odd)').addClass('odd');

jQuery should get loaded last on your page. Prototype uses the $ function, and will eat jQuery’s $. But jQuery won’t stomp on the Prototype $ function.

Put your Javascript last on the page, because the <script> tag blocks the rendering of the web page.

Back to table of contents

MVCC in Postgres and how to minimize the downsides

Bruce Momjian, Presentations

This turned out to be 100% theory and no actual “minimize the downsides”. It was good to see illustrations of how MVCC works, but there was nothing I could use directly.

Why learn MVCC?

  • Predict concurrent query behavior
  • Manage MVCC performance effects
  • Understand storage space reuse

Core principle: Readers never block writers, and writers never block readers.

(Chart below is an attempt at reproducing his charts, which was a pointless exercise. Better to look at his presentation directly.)

Cre 40
Exp        INSERT

Cre 40
Exp 47     DELETE

Cre 64
Exp 78     old (delete)
------
Cre 78
Exp        new (insert)

Four different numbers on each table drive MVCC:

  • xmin: creation trx set by INSERT and UPDATE
  • xmax: expire transaction number, set by UPDATE and DELETE, also used for explicit row locks
  • cmin/cmax: used to identify the command number that created or expired the tupleid; also used to store combo command IDs when the tuple is created and expired in the same trnasaction, and for explicit row locks.

Back to table of contents

(Re)Developing Perl 5 Modules in Perl 6

Damian Conway

Perl isn’t a programming language. It’s a life support system for CPAN.

Damian ported some of his Perl 5 modules to Perl 6 as a learning exercise.

Acme::Don’t

Makes a block of code not get executed, so it gets syntax checked but not run.

# Usage example
use Acme::Don't;

don't { blah(); blah(); blah();

Perl 6 implementation

module Acme::Don't;
use v6;
sub don't (&) is export {}

Lessons:

  • No homonyms in Perl 6
  • No cargo-cult vestigials
  • Fewer implicit behaviours
  • A little more typing required
  • Still obviously Perlish

IO::Insitu

Modifies files in place.

  • Parameter lists really help
  • Smarter open() helps too
  • Roles let you mix in behviours
  • A lot less typing required
  • Mainly because of better builtins

Smart::Comments

  • Perl 6’s macros kick source filters’ butt
  • Mutate grammar, not source
  • Still room for cleverness
  • No Perl 6 implementation yet has full macro support
  • No Perl 6 implementation yet has STD grammar

Perl 6 is solid enough now. Start thinking about porting modules.

Back to table of contents

PostgreSQL 9.1 overview

Selena Deckelmann

Slides

New replication tools

SE-Linux security label support. Extends SE stuff into the database to the column level.

Writable CTE: Common Table Expressions:
A temporary table or VIEW that exists just for a single query. There have been CTEs since 8.4, but not writable ones until now.

This query deletes old posts, and returns a summary of what was deleted by user_id.

WITH deleted_posts AS (
    DELETE FROM posts
    WHERE created < now() - '6 months'::INTERVAL
    RETURNING *
)
SELECT user_id, count(*) FROM deleted_posts group BY 1;

Per-column collation orders

Extensions: Postgres-specfiic package management for contrib/, PgFoundry projects, tools. Like Oracle “packages” or CPAN modules. The PGXN is the Postgres Extension Network.

K-nearest Neighbor Indexes: Geographical nearness helper

Unlogged tables: Only living in memory, for tables where it’s OK if they disappear after a crash. Much faster, but potentially ephemeral.

Serializable snapshot isolation: No more “select for update”. No more blocking on table locks.

Foreign data wrappers

  • Remote datasource access
  • Initially implemented text, CSV data sources
  • Underway currently: Oracle & MySQL sources
  • Good for imports and things that would otherwise fail if you just used COPY
  • Nothing other than sequential scans are possible.
  • Expect tons of FDWs to be implemented once we get 9.1 to production

Back to table of contents

Pro PostgreSQL 9

Robert Treat, OmniTI, who are basically scalability consultants

pgfoundry.org is other stuff around postgres.

pgxn.org is for 9.1+ extensions

Use package management rather than build from source

  • Consistent
  • Standardized
  • Simple

Versions

  • Production level work, use 9.0
  • Any project not due to launch for 3 months from today, use 9.1

pg_controldata gives you all sorts of awesome details

recovery.conf is in the PGDATA dir for standby machines

pg_clog, pg_log and pg_xlog are the main data logging files.
You can delete under pg_log and that’s OK.

Trust contrib modules more than your own code from scratch. Try
to install contrib modules into their own schemas.

Configuration

  • work_mem
    • How much memory for each individual query
    • Mostly for large analytical queries
    • OLTP is probably fine with the defaults
    • 2M is good for most people
  • checkpoint_segments
    • Number of WAL files emitted before a checkpoint.
    • Smaller = more flushing to disk
    • Minimum of 10, more like 30
  • maintenance_work_mem
    • 1G is probably fine
  • max_prepared_transactions
    • Is NOT prepared statements
    • Set to zero unless you are on two-phase commit
  • wal_buffers
    • Always set to 16M and be done with it.
  • checkpoint_completion_target
    • default is .5
    • Set to .9. Avoid hard checkpoint spikes at the expense of some overall IO being higher.

Hardware for Postgres

  • Multiple CPUs work wonders, up to 32 processors. See http://tweakers.net
  • Put WAL on its own disk, RAID 1
  • Put DATA on its own disk, RAID 10
  • More spindles is good
  • More controllers even gooder.
  • Go with SSDs over more spindles.
  • No NFS, no RAID 5

Don’t replace multiple spindles with a single SSD. You still want redundancy.

Backups

Logical backups

  • slow to create and restore
  • “pure”, no system-level corruption
  • susceptible to database-level corruption
  • pg_dump is your friend, and pg_dumpall for global settings

Physical backups

  • replication/failover machine
  • tarball (pitr)
  • filesystem snapshots (pitr)

Tarball

  • Basic idea is to copy all database files and relevant xlogs
  • Use multiple machines if able
  • Use rsync if able
  • Copy the slave if able

Back to table of contents

Perl Unicode Essentials, Tom Christiansen

http://98.245.80.27/tcpc/OSCON2011/index.html

Perl has best Unicode suport of any language.

Unicode::Tussle is a bundle of Unicode tools tchrist wrote.

5.12 is minimal for using unicode_strings feature. 5.14 is optimal.

Recommendations:

    use strict;
    use warnings;
    use warnings qw( FATAL utf8 ); # Fatalize utf8

21 bits for a Unicode character.

Enable named cahracters via \N{CHARNAME}

    use charnames qw( :full );

If you have a DATA handle, you must explicitly set its encoding. If you want this to be UTF-8, then say:

    binmode( DATA, ':encoding(UTF-8)' );

Tom’s programs start this way.

    use v5.14;
    use utf8;
    use strict;
    use autodie;
    use warnings;
    use warnings  qw< FATAL utf8 >;
    use open      qw< :std :encoding(UTF-8) >;
    use charnames qw< :full >;
    use feature   qw< unicode_strings >;
Explicitly <code>close</code> your files.  Implicit <code>close</code> never checks for errors.
Up until 5.12, there was &quot;The Unicode Bug&quot;.  The fix that makes it work right is
    use feature "unicode_strings";
Key core pragmas for Unicode are: v5.14, utf8, feature, charnames, open, re&quot;/flags&quot;, encoding::warnings.
Stay away from bytes, encoding and locale.
For the programmer, it's easier to do NFD (&quot;o\x{304}\x{303}&quot;) instead of NFC (&quot;\x{22D}&quot;)
NFD is required to, for example, match <code>/^o/</code> to know that something starts with &quot;o&quot;.
String comparisons on Unicode are pretty much always the wrong way to go.  That includes <code>eq</code>, <code>ne</code>, <code>le</code>, <code>gt</code>, <code>cmp</code>, <code>sort</code>, etc.  Use Unicode::Collate.  Get a taste of it by playing with <em class="file">ucsort</em> utility.

“Building and Managing a Project Community with Github”, St. Louis, MO, 2011-09-03

August 31, 2011 Open source 2 comments , ,


On Saturday, September 3rd I’ll be presenting “Building and Managing a Project Community with Github” at ArchReactor, a hackerspace in St. Louis, MO.

ArchReactor
Jefferson Underground Building
2400 South Jefferson Avenue
St. Louis, MO 63104
http://archreactor.org/location

There will be a social hour from 4:00-5:00pm, and my presentation starts at 5pm sharp. I hope to see you there!

Your github account is not your portfolio, but it’s a start

August 24, 2011 Job hunting, Open source 6 comments , , ,

Gina Trapani started a Google+ thread about using Github as a portfolio of your work to show potential employers. This in turn was prompted by a blog post by PyDanny titled “Github is my resume.” It’s a great idea, but it’s only a start. Your portfolio should be more curated than that to be effective.

I shouldn’t complain too much. Far too few job seekers consider the power of showing existing work products to hiring managers. That’s probably because so few employers ask to see any. In my book Land the Tech Job You Love, I cite Ilya Talman, one of the top tech recruiters in Chicago, estimating that only 15% of hiring managers ask to see samples of work.

Consider the manager looking to hire a computer programmer. She has hundred résumés from respondents, all claiming to know Ruby and Rails. She knows that anyone can put Ruby, Rails, or any other technologies into a résumé without knowing them. Even well-meaning candidates might think “I read a book on Ruby once, and Rails can’t be too tough, so I’ll put them on my résumé.” Looking at sample code is a great way to separate the good programmers from the fakers.

Since creating a repository of someone else’s good code is only slightly more involved than putting “Ruby on Rails” in a résumé document, a good hiring manager will ask in the interview about the code. When I interview candidates, I ask for printed code samples of their best work for us to discuss. Pointing at a given section on the paper, I’ll say “Tell me about your choice to write your own Perl function here instead of using a module from CPAN“, or “I see your variables seem to be named using a certain convention; why did you use that method?” In a few minutes, I can easily find out more about the candidate’s thought process and coding style than a mile-long résumé. This method also exposes potentially faked code.

So as much as I applaud candidates having a body of work to which they can point employers, simply saying “Here’s my Github repo” is not enough. The hiring manager doesn’t want to see everything you’ve written. Although everyone is different, she probably wants to see three things:

  • quality of work
  • breadth of work
  • applicability to her specific needs

Most important, she doesn’t want to go digging through all your code to find the answers to these questions.

Consider my github repository as an example. There are 28 repositories in it. Of these, nine are forks of other repos for me to modify, so clearly do not count as code I’ve written. Three repos are version control for websites I manage. Some are incubators of ideas for future projects that have yet to blossom. My scraps repository is a junk drawer where I put code I’ve written and might have use for later. How will an interested employer know what to look at? It’s arrogant and foolish to tell someone looking to hire you “here’s all my public code, you figure it out.” It’s the RTFM method of portfolio presentation, and it doesn’t put you in the best light possible.

For an effective portfolio, choose three to five projects that show your best work, and then provide a paragraph or two about each, describing the project in English and your involvement with it. There is literally no project or repository, on Github or elsewhere, about which I can say “This work is 100% mine.” Everything I’ve ever worked on has had work contributed from others, and the nature of those contributions needs to be disclosed upfront and honestly.

None of this is special to Github. There are plenty of online code repositories out there, such as Perl’s CPAN, which can act as a showcase for your work. Of course, you can also create your own online portfolio on your website as well. The keys are to highlight your best work and accurately describe your involvement.

A common complaint I hear when I discuss code portfolios goes like this: “Most of my work is private or under NDA, so I can’t have a portfolio.” Hogwash. You can go write your own code specifically to show your skills. If your area of expertise is with web apps, then go write a web app that does something fairly useful and publish that as your portfolio. Assign it an open source license so that others can take advantage of it, too. You’ll be helping your community while you help your job prospects.

Do you have an online code portfolio? Let me know in the comments, and include the URL for others to see.

Six tips for preparing to attend a technical conference

July 21, 2011 Open source, Social 4 comments , , ,

I’ve been going to technical conferences since YAPC::NA 2002, and next week I’ll be at OSCON 2011 talking about community and Github. Preparation is important to getting the most out of the conference with the least amount of hassle. Here are some tips I’ve learned along the way.

Bring power tools

Power cord, display dongle, cube tap and business cards

Not electric drills and saws, but tools for getting power. Conference organizers may not have planned adequately for the influx of laptops, and electric outlets can be a rare commodity. If you’re flying to a conference, it can be especially difficult to find a plug at the airports. O’Hare in Chicago is especially bad.

If you can fit a power strip into your laptop bag, good. If you want to go cheap, go buy a cube tap at your hardware store for two dollars.

Make sure you bring your cell phone charger and a USB cable to hook up your phone to your laptop, too.

Label your stuff

If your forget your laptop power cord in a room, whoever finds it isn’t going to know whose it is. At the Apple-heavy conferences I usually attend, everyone’s cords all look the same anyway. Label it with your name and cell phone number. Same goes for anything else that you might use and lose, such as display adapter dongles. It’s frustratingly expensive to realize you lost a $25 piece of plastic.

Plan what you want to see

If you leave conference talk planning until the day of the talk, you’re more likely to miss seeing the really good stuff. Amidst all the talk in the hallways and hanging out in the exhibit halls and hackathons long lunches with new friends, it’s easy to forget about that one talk you really wanted to see until you look back on the schedule and realize it ended half an hour ago.

The OSCON scheduler makes it easy to mark the talks you want to see, but for the most important ones, I suggest adding them to your calendar on your phone and setting an alarm.

Bring business cards

You’re going to meet people, so give them something to remember you by. I’m talking about making your own business card, not your company business card. Your card need not be fancy, but if you can get a graphic designer friend to put together something nice in exchange for lunch and/or a few beers, so much the better. At the very least, you’ll want to include your name, website, email address and cell phone number. I also put my Twitter ID and Github ID on mine.

My box of 500 business cards was only about $20 delivered to my door. It’s fantastic bang for your buck for keeping in contact with the people you meet.

Get a laptop bag with a shoulder strap

While you’re at the conference, you’re going to take your laptop with you at all times. AT ALL TIMES. Every conference, someone gets a laptop stolen. You’re not going to let it be you.

Do not trust the guy next to you to “watch this while I run to the bathroom.” When you go to the bathroom, or grab a drink, or whatever it is that you do that isn’t seated at a conference table with your laptop in front of you, you’re going to have your laptop zipped up in your bag, with the strap over your shoulder. This goes double for airports.

Bathrooms are an ideal place for a thief to take your laptop. I assure you that standing at a urinal trying to take care of business with a laptop tucked under your arm is not fun. If you’re in a stall, be aware of how easy it is for a thief to grab a bag from under the stall, or from reaching over a door and taking the laptop from the hook.

A laptop bag with a shoulder strap is the only way to go.

Clean your house

Wash the dishes. Empty the garbage. Take stuff out of the fridge if it’s going to go bad in your absence. A lot of nastiness can happen in five days.

Other tips

I asked on Twitter for suggestions for conference prep. Some replies:

  • Give a practice session of any talk that I haven’t given before. — @mjdominus
  • Make a checklist of all cables I need. Then research where to buy them in Portland just in case. — @rjbs
  • Get a lot of sleep the week before. — @adamturoff

What suggestions do you have? Please leave them in the comments below.

Toward ending RTFM marketing in open source

July 20, 2011 Open source No comments , , ,

Too many times I’ve seen a conference announced once, and then never heard about it again. It’s what I call the RTFM method of marketing: Either you happen to know about the event, or you lose out. This year for YAPC::NA, the annual North American grassroots Perl conference, lead organizer JT Smith isn’t going to let that happen.

No sooner had the 2011 conference wrapped up when JT started daily postings about 2012’s event to the YAPC::NA blog. He plans to keep that pace going for the next year, until June 13th, 2012 when 2012’s event start. The goal is to keep people thinking about YAPC::NA in the next eleven months, and to keep everyone’s expectations high. “Everyone at YAPC 2011 laughed at me when I said I was going to do a blog post a day,” JT told me on Sunday, “but I’ve got the next 300 postings planned out.”

It’s not just frequency that’s different this time. JT’s writing about the details of the conference, and why you’d want to attend. His posts give tips about the best way to travel to Madison, and attract potential attendees with views of the conference location on the lake. A “spouse program” for the non-hacker members of the family is also high on his publicity list.

As JT and I ate lunch at the bar where he hopes to have a YAPC beer night, we discussed the mechanics of this ongoing communication campaign. JT has the next thirty postings written and posted to Tumblr with future publication dates, letting him create postings in batches, rather than every day. “I chose Tumblr for the blog because it has the best posting scheduling system,” he told me.

You can follow the YAPC::NA Twitter stream at @yapcna, or the blog itself at blog.yapcna.org.


I give “RTFM marketing” that name because it’s an extension of the geek notion of RTFM. “RTFM” comes from the rude geek response of “RTFM”, or “Read the F-ing Manual”. It’s used as a reply to a question that the geek thinks should not have been asked, because the information exists somewhere that the querent could have looked himself. It’s as if the rude geek is saying “The information exists in at least one place that I know of, and therefore you should know that information, too.”

The idea that one should just have known about a given piece of information applies to this sort of undermarketing as well. Project leaders seem to think that when information has been published once, everyone will know about it. The RTFM marketers expect that everyone know what they do, read the blogs they do, travel in the same online circles as they do. This is a recipe for failure.

This mindset can be crippling when it comes to publicizing projects and events. Organizers do their projects a disservice when they market their endeavors with the expectation that everyone will automatically know about something simply because they’re written one blog post about it.

RTFM marketers also don’t spread their messages wide enough. They advertise to the echo chamber of the circles in which they normally run. They’ll post to the standard blogs, post to the mailing lists they read, or discuss it in the IRC channels they frequent. This limits the potential audience for the project to the one with which the project leader is already familiar.

Tips for doing open source project marketing right:

  • Write & post frequently.
  • Write & post in many disparate locations.
  • Explain the benefits. Explicitly tell the reader why they would want to attend your event or use your software.
  • Change your messages. Don’t post the same thing twice.
  • Never assume that someone will have read your previous message. It’s OK to repeat something stated in a previous message.
  • You don’t know your potential audience as well as you think you do. Think big.

I’d love to hear stories and ideas about how you got the word out about your project.

“Building and Maintaining a Project Community with Github”, my talk at OSCON 2011

July 16, 2011 Open source No comments ,

Here’s the OSCON page link to add to your schedules.

github.com has taken open source by storm, but it’s more than just a code repository with the latest hot source control system. It’s a new way of working with open source projects.

The web-based social aspects of github can change the human and technical dynamics of working on open source projects. Some of the issues I’ll discuss include:

  • Easier access to code means lower barrier to entry means more people submitting patches. This is a boon, and brings challenges.
  • People seem to expect patches to be accepted because of the ease with which change sets are created. These expectations may clash with project goals.
  • Watching the github fork network lets you see what other people are doing with their forks. It allows me as a project admin to see what people are doing with the code.
  • New workflows are required. A branch and merge strategy for development is crucial.
  • Projects need a guidemap to tell people what to do, because people may think it’s just a simple matter of creating a fork, making a change, and saying “Here’s my work, now integrate it.”
  • Project branches can easily become large, hard-to-handle change sets. Less care and thought is put into change sent back to the project because merging is so easy. Contributors still must work together to coordinate work.
  • Discussion of patches has moved from the mailing list to the change request itself. This can diminish visibility and discussion.

I’ll discuss these and other aspects of community and project management and give examples from my own experiences migrating existing projects to github.

I’ll be presenting “Just Enough C for Open Source Projects” July 19th at Software Craftsmanship McHenry

July 1, 2011 Open source, Programming 2 comments ,

For programmers raised on high-level languages like Perl, Java and PHP, working on a C project can be daunting. Still, many open source projects work at a low-level in C to take advantage of the power and speed of working close to the machine. Whether it’s Perl, Postgres or Linux, C is what makes it run.

This session will provide a high-level overview of C, aimed specifically at the programmer wanting to get involved in a C-based open source project. We’ll cover:

  • Nothing in C is DWIM (“Do what I mean”)
  • Numeric types, strings and structures
  • Memory management: the heap, the stack, and pointers
  • Using the preprocessor
  • Understanding compiler warnings
  • Memory checking with valgrind
  • How to navigate a large C-based open source project (ctags, etc)
  • Security, or, how the Bad Guys smash the stack

Sign up at mchenry.softwarecraftsmanship.org.