What are we working on?
Our team is working on a contract to migrate a number of public and internal NASA web sites and application to the public cloud, Amazon Web Services in this case. Some of these we’re migrating almost verbatim, some we’re migrating from one application platform to another, and a few we’re reimplementing to leverage AWS-specific cloud services.
One of the first applications we migrated was used as a blog and wiki platform; it was written in a proprietary system to which we had no access. Our direction from NASA was to focus our deployment to a few dominant open source platforms. For this one, we chose WordPress.
Setting up the WordPress platform wasn’t hard but migrating content and users (and integration with NASA’s internal single-sign-on) proved much more challenging.
To run security scans and test new deployments, we want to be able to quickly clone production to development servers. AWS snapshots make this really easy: snapshot the boot volume, create a new AMI from it, and launch the AMI. Unfortunately, the change in hostname on the new box breaks WordPress — in our case in a subtle way that wasn’t addressed in the docs we found.
This post explains why this happened, provides a couple solutions to fix it, and a lesson learned to avoid this problem in the future.
WordPress Database Mangling
When moving WordPress from one host to another, the change in hostname breaks WordPress.
WordPress embeds the hostname of the original host in its database files, many of them. Some use the hostname, some use URLs, and other embed it in PHP-serialized strings which precludes global search and replace on an export file.
We’re not addressing changing the WordPress folder location here since other docs discuss that in more detail.
Many Locations of the Hostname
Database table wp_blogs maps the multisite id to the name and path; the ‘domain’ is what you need to change.
The wp_options table, and for multisite networks, wp_*_options contains name/value pairs that use the hostname. These include critically: siteurl, home, and fileupload_url — and these are all simple URLs. But there are also options like recently_edited, dashboard_widget_options, and adapt_theme_settings which contain PHP-serialized structures; you can’t edit these conventionally because string length changes will break the serialization. There are also _transient_feed_* options which I expected we could ignore.
There are also wp_posts and wp_*_posts tables which contain the hostname in the guid entry, which WordPress claims should never change, except when moving attachment folders. If we’re moving a site to a new host, it’s probably OK to change but it may cause RSS subscribers to see feed entries from the old site as new entries. But there are other fields that get the hostname that we may want to change, and some of those have serialized representations.
There are many, many more locations in the tables. We have 153 sites in our network, so we need some automation.
Automated Search and Destroy
It recommends using interconnectit’s Search and Replace script. The script goes through all the tables in the site and replaces the old hostname with the new, correctly handling serialized structures. We got the dreaded WordPress White Page of Death on our sites though the /wp-admin/ pages worked fine. We tried both version 2 and 3.0-Beta of the with the same result.
We also tried the wp-cli tool which does the same thing (and more) and had the same White Page of Death syndrome.
And we tried some custom Python + SQLAlchemy code, again with the same WPOD.
The fact that all these mechanisms produce the exact same failure is — in some geeky way — vaguely comforting.
White Page Of Death
Other reports of White Page Of Death on WordPress complain of the entire site being blank, due to mangled updates or upgrades or extraneous blank lines after a close-php “?>” tag. But our /wp-admin/ pages were fine, for all sites. Something appeared wrong with the themes that the sites are rendered with since they fail, while the baked-in style of the wp-admin pages was OK.
In our wp_*_options file, we see that ‘theme’ and ‘stylesheet’ were originally set to the same name as our site — wiki.nasa.gov — a pretty obvious choice. When the search-and-replace operations ran amok on the entire DB, it changed those to be the new hostname, and this caused the theme not to be found since it’s located in a directory with the old name.
Progress: Break Theme, Repair
To test our understanding, we changed our top multisite’s wp_options where the option_name is ‘template‘ or ‘stylesheet‘ and changed the option_value to “wiki_nasa_gov“. I got a White Page Of Death. Good, this is what I was expecting: that’s not the name of my theme directory!
I then copied /var/www/wp-content/themes/wiki.nasa.gov to wiki_nasa_gov and got my site back. Excellent: now we can use the search-and-replace sledgehammers (searchreplacedb or wp-cli), and accommodate by renaming the theme directory (and presumably updating our code repo).
Start-to-Finish: Change DB, Fix Theme
First Rule of the WordPress-Club: Backup the DB
You should be able to restore this if the DB edits hose you.
mysqldump -u root -p wordpress > wordpress.mysqldump
Mine takes 25 seconds and results in a 106MB dump file.
Change Site Config Hostname
Edit: /var/www/wp-config.php to set DOMAIN_CURRENT_SITE to the new DNS hostname of our development box:
define(‘DOMAIN_CURRENT_SITE’, ‘wiki.nasa.gov’);
to
define(‘DOMAIN_CURRENT_SITE’, ‘cshenton-wiki-webapp-1-stage-pub.example.com’);
Global Search and Replace
Download and install the wp-cli to your WordPress dir, rename it to ‘wp’ and make it executable. Use it to global search and replace our hostname in all tables, every column, preserving PHP-serialization; it takes about three minutes for our 153 site network:
$ ./wp search-replace wiki.nasa.gov cshenton-wiki-webapp-1-stage-pub.example.com –network
Success: Made 33478 replacements.
Our …/wp-admin/ pages continue to work, but our site pages now give us the White Page Of Death. This is expected since it changed wp_options to ‘template’ and ‘stylesheet’ to be our new hostname.
Repair the Theme: the hard way
It’s not really that hard and it’s something you can easily run from the command line. Of course you’ll have to do it again if you search-n-replace this database again.
Repair the Theme: the easy way
We can affect the change simply by renaming (or, safer, copying) the theme to the name the over-aggressive search-n-replace used:
Lesson Learned about Theme Names
While we anticipate it’s pretty common to name your theme after the FQDN of the site you’re hosting, you can avoid all this work by giving your theme a different name. In our case, we might have chosen “www_nasa_gov” for our theme’s directory name. This will prevent any aggressive search-n-replace on the DB from breaking your theme.