Notes and comments from Postgres Open 2011
Like I posted my Notes and comments from OSCON 2011, here are my notes and comments from Postgres Open 2011. Some of it is narrative, and some of it is just barely-formatted notes. The target here is my own use of what was most interesting and useful for me at work, but I make them public here for anyone who’s interested.
- Mastering PostgreSQL Administration
- Identifying slow queries and fixing them
- PostgreSQL 9.1 Grand Tour
- Monitoring the heck out of your database
- Honey, I Shrunk the Database
- Postgres Tuning
Mastering PostgreSQL Administration
Bruce Momjian
http://postgresopen.org/2011/schedule/presentations/89/
http://momjian.us/presentation
http://momjian.us/main/writings/pgsql/administration.pdf
Most of this stuff I knew already, so the notes are short.
Connections
- local -- Unix sockets
- Significantly faster than going through host
- host -- TCP/IP, both SSL and non-SSL
- hostssl -- only SSL
- Can delay connection startup by 25-40%
- hostnossl -- never SSL
Template databases
- You can use template databases to make a standard DB for when you create new ones. For example, if you want to always have a certain function or table, put it in
template1
. This works with extensions and contrib likepg_crypto
.
Data directory
xxx_fsm
files are freespace mappg_xlog
is the WAL log directorypg_clog
is compressed status log
Config file settings
shared_buffers
should be 25% of total RAM for dedicated DB servers. Don't go over 40-50% or machine will starve. Also, overhead of that many buffers is huge.- If you can get five minutes of your working set into
shared_buffers
, you're golden. - Going over a couple hundred connections, it's worth it to look at a pooler.
Analyzing activity
- Heavily-used tables
- Unnecessary indexes
- Additional indexes
- Index usage
- TOAST usage
Identifying slow queries and fixing them
Stephen Frost
http://postgresopen.org/2011/schedule/presentations/71/
Fixing
- MergeJoin for small data sets?
- Check
work_mem
- Check
- Nested Loop with a large data set? Could be bad row estimates.
- DELETEs are slow?
- Make sure you have indexes on foreign keys
- Harder items
- Check over your long-running queries
- Use stored procedures/triggers
- Partioning larger items
Prepared queries
- Plan once, run many
- Not as much info to plan with, plans may be more stable
- No constraint exclusion, though
- How to explain/explain analyze
Query Review
- Don't do
select count(*)
on big tables - Look at
pg_class.reltuples
for an estimate - Write a trigger that keeps track of the count in a side table
ORDER BY
andLIMIT
can help Pg optimize queriesselect *
can be wasteful by invoking TOAST- Use JOIN syntax to make sure you don't forget the join conditions
CTE Common Table Expressions
WITH (
my_view AS ( select * from my_expensive_view),
my_sums AS ( select sum(my_view.x)
)
SELECT my_view.*, my_sums.sum FROM my_view, my_sums
PostgreSQL 9.1 Grand Tour
Josh Berkus
http://www.pgexperts.com/document.html?id=52
Overview
- Synchronous replication
- Replication tools
- Per-Column collation
- wCTEs
- Serialized Snapshot Isolation
- Unlogged tables
- SE-Postgres
- K-Nearest Neighbor
- SQL/MED
- Extensions
- Other Features
Land of Surreal Queries: Writable CTEs
-- This is in 8.4
WITH deleted_posts AS (
DELETE FROM posts
WHERE created < now() - '6 months'::INTERVAL
RETURNING *
)
SELECT user_id, count(*)
FROM deleted_posts
GROUP BY 1
In 9.1, you can do UPDATE on that.
Unlogged tables
Sometimes you have data where if something happens, you don't care. Unlogged tables are much faster, but you risk data loss.
Extensions
CREATE EXTENSION IF NOT EXISTS citext WITH SCHEMA ext;
SQL-MED
Handling for FDW, which is Foreign Data Wrappers.
Others
- Valid-on-creation FKs
- Extensible ENUMs
- Triggers on Views
- Reduced NUMERIC size
- ALTER TYPE without rewrite
pg_dump
directory format as a precursor for parralelpg_dump
Monitoring the heck out of your database
Josh Williams, End Point
http://joshwilliams.name/talks/monitoring/
What are we looking for?
- Performance of the system
- Application throughput
- Is it dead or about to die?
"They don't care if the system's on fire so long as it's making money."
Monitoring Pg
- Log monitoring for errors
- Log monitoring for query performance
- Control files / External commands
- Statistics from the DB itself
Monitoring error conditions
- ERROR: Division by zero
- FATAL: password authentication
- PANIC: could not write to file
pg_xlog
...
Quick discussion of tail_n_mail
Log monitoring for query performance
- Pgfouine
- pgsi http://bucardo.org/wiki/Pgsi
check_postgres
Most of the rest of the talk was about check_postgres, which I already know all about. A few cool to-do items came out of it.o
- Look at tracking --dbstats in cacti
- Add the --noidle to --action=backends to get a better sense of the counts.
Honey, I Shrunk the Database
Vanessa Hurst
http://postgresopen.org/2011/speaker/profile/36/
http://www.slideshare.net/DBNess/honey-i-shrunk-the-database-9273383
Why shrink?
- Accuracy
- You don't know how your app will behave in production unless you use real data.
- Freshness
- New data should be available regularly
- Full database refreshes should be timely
- Resource Limitation
- Staging and developer machines cannot handle production load
- Data protection
- Limit spread of sensitive data
Case study: Paperless Post
Requiremenets
- Freshness - Daily on command for non-developers
- Shrinkage - slices & mutations
Resources
- Source -- extra disk space, RAM and CPUS
- Destination -- Limited, often entirely un-optimizied
- Development -- constrained DBA resources
Shrunk strategies
- Copies
- Restored backups or live replicas
- Slices
- Select portions of live data
- Mutations
- Sanitized or anonymized data
- Assumptions
- Usually for testing
Slices
Vertical slice
- Difficult to obtatin a valid, useful subset of data
- Example: Include some tables, exclude others
Horizontal slice
- Difficult to write & maintain
- Example: SQL or application code to determine subset of data
Pg tools -- vertical slice
- pg_dump
- include data only
- Include table schema only
- Select tables
- Select schemas
- Exclude schemas
Postgres Tuning
Greg Smith
Tuning is a lifecycle.
Deploy / Monitor / Tune / Design
You may have a great design up front, but then after a while you have more data than you did before, so you have to redesign.
Survival basics
- Monitor before there's a problem
- Document healthy activity
- Watch performance trends
- "The site is bad. Is it just today, or has it been getting worse over time?"
- Good change control: Minimize changes, document heavily
- Keep your config files in version control like any other part of your app.
- Log bad activity
- Capture details during a crisis
Monitoring and trending
- Alerting and trending
- Alerts: Nagios + check_postgres
Trending
- Watch database and operating system on the same timeline
- Munin: Easy, complete, heavy
- Generates more traffic, may not scale up to hundreds of nodes
- Cacti: Lighter, but missing key views
- Not Greg's first choice
- Harder to get started with the Postgres plugins
- Missing key views, which he'll cover later
- Various open-sourc and proprietary solutions
Munin: Load average
- Load average = how many processes are active and trying to do something.
- Load average is sensitive to sample rate. Short-term spikes may disappear when seen at a long-term scale.
Munin: CPU usage
- Best view of CPU usage of the monitoring tools.
- If your system is running a lot of system activity, often for connection costs, look at a pooler like pg_bouncer.
Munin: Connection distribution
- Greg wrote this in Cacti because it's so useful.
- Graph shows a Tomcat app that has built-in connection pool.
- The graph shown isn't actually a problem.
- Better to have a bunch of idle connections because of a pooler, rather than getting hammered by a thousand unpooled connections.
Munin: Database shared_buffers usage
- If
shared_buffers
goes up without the same spike in disk IO, it must be in the OS's cache. - If
shared_buffers
is bigger than 8GB, it can be a negative, rather than letting the OS do the buffering. derby's is at 5GB. - There is some overlap between Pg's buffers and the OS's, but Pg tries to minimize this. Seq scan and VACUUM won't clear out
shared_buffers
, for example. - There's nothing wrong with using the OS cache.
- SSDs are great for random-read workloads. If the drive doesn't know to sync the data, and is not honest with the OS about it, you can have corrupted data.
- SSDs best use is for indexes.
Munin: Workload distribution
- Shows what kind of ops are done on tuples.
- Sequential scans may not necessarily be bad. Small fact tables that get cached are sequentially scanned, but that's OK because they're all in RAM.
Munin: Long queries/transactions
- Watch for oldest transaction. Open transactions block cleanup activities like VACUUM.
- Open transaction longer than X amount of time is Nagios-worthy.
Using pgbench
- pgbench can do more than just run against the pgbench database. It can simulate any workload. It has its own little scripting language in it.
OS monitoring
- top -c
- htop
- vmstat 1
- iostat -mx 5
- watch
Long queries
What are 5 long running queries?
psql -x -c 'select now() - query_start as runtime, current_query from pg_stat_activity order by 1 desc limit 5'
It's safe to kill
query processes, but not to kill -9
them.
Argument tuning
- Start monitoring your long-running queries.
- Run an EXPLAIN ANALYZE on slow queries showing up in the logs.
- Sort to disk is using 2700K, so we update
work_mem
to 4MB. However, that still doesn't fix it. Memory use is bigger in RAM than on disk. - If you're reading more than 20% of the rows, Pg will switch to a sequential scan, because random I/O is so slow.
- Indexing a boolean rarely makes sense.
The dashboard report
- Sometimes you want to cache your results and not even worry about the query speed.
- Use window functions for ranking.
The OFFSET 0 hack
- Adding an
OFFSET 0
in a subquery forced a certainJOIN
order on the subquery. Something about making the subquery know that it is limited in some way.
One response