Migrating Wordpress Blogs to Octopress
This week’s challenge: liberating my site from Wordpress’ clumsy grasp.
About the challenge
This has been said many times before: it’s fantastic how Wordpress allowed so many people the ability to publish their content on the internet for free. For all the criticism, let’s not forget how many people gave their free time to developing Wordpress. But the world has moved on and I can’t stand the platform anymore. So I am moving to Octopress.
I simply followed the instructions on the Octopress setup page, using rvm to upgrade ruby, and it all went well.
Next I edited the _config.yml file - again, no major surprises there. I used “$F, %a”” as the date format ( 2004-12-25, Mon ) and /:categories/:title/ as the permalink structure.
Creating test blogs
I created the first test file:
1 2 3
which opeened the file in Sublime Text 2. I added a single category and a some sample text.
1 2 3 4 5 6 7 8 9 10
I generated the site with
1 2 3 4 5 6 7 8 9 10
It launched the static file into a browser. It looked like all the bits are there, but of course the images etc aren’t because the app needs a web server. I could just deploy everything to Apache, but there is a more convenient way.
Previewing Octopress with POW
Octopress is a rack app, and can be viewed wtih Pow, the simplest of rack servers. I installed it as per the instructions, then started rake to automatically deploy to it when I save.
1 2 3 4
Changing Octopress theme
Wanted to try a different theme, so went for the Slash Octopress theme. The instructions are pretty simple, but had to remember to stop watching the octopress folder before running the commands.
1 2 3 4 5
Liberating data from Wordpress blogs
Now the laborious parts. First of all, got the Exitwp plugin from github
Then I logged onto the Wordpress admin console (for the last time!) and in wp-admin/export.php I clicked on “Download Export File”. I saved the xml files into the wordpress-xml directory I just cloned from github.
I run the XML through xmlint as suggested, although without a DTD I am not sure what I was looking for
Installed dependencies - but first had to install Pip.
1 2 3 4 5 6
Edited the config.yaml file - changed download_images to true, and added a few filters:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Finally, run the converter command.
It is quite good - it mostly does a good job, and the lists the files it couldn’t parse at the end. I only had 6 out of 400 posts, which is quite something.
Even the files it couldn’t convert, it still created them with the right front matter, so all I needed to take care of is the HTML to markdown conversion. I used Pandoc, a remarkable conversion tool for that. I pasted the HTML from the Wordpress window to a file called text.html, run the command below, then pasted the text.md file into the correct jekyll post file.
The only issue is that a the <!–more–> excerpt thing was missing for most posts. I bit the bullet and added it manually to all the files - it only took me half a hour, nothing too dramatic.
Combining multiple Octopress blogs
With the basics out of the way, it’s time to fine tune things. I actually had two instances of Wordpress running two separate subsites - a blog and a portfolio site with a common homepage. I was hoping to be able to combine them when WP 3.0 came out, but managing the subdomains was too painful so I never did. I want to keep that setup for now, as I am planning to do a lot of reorganizing of the portfolio site, but not the blog. Octopress is not set up to manage multiple blogs, but eventually found a way.
My starting point was two separate octopress instances, Octo1/ and Octo2/, sitting side by side.
First of all I tried deploying both to a third folder octopress_deploy. That had the undesirable side effect of duplicating assets - there’ll be two versions of images, css, and so on. But also, rake watch didn’t work anymore. I found watch very useful so that wasn’t good.
Then I tried the technique suggested on this Octopress github page. This makes the rake watch task work again, but doesn’t solve the repeated assets issue. I guess I would have to edit the themes for that, something I can do later.
So the main site is the blog. It is a vanilla Octopress site, except that the links have the structure
The portfolio site is published to the SOURCE of the main site - so that when the main site is generated, it copies along the portfolio site files too.
1 2 3
The rakefile for the portfolio site was amended accordingly, so that the generate publishes to the correct directory (again, notice the ‘source’ in the path)
So that almost creates the structure I want:
1 2 3 4 5 6 7
Now I need a way to generate the missing blog summary page. I thought I could use the archive for that, since I don’t use it for anyehing else. It turns out I can just move the index.html inside source/blog/archives to source/blog - that’s my blog index page, there and then. Of course it uses a different template, but that’s ok for now.
Finally, some tweaks to the theme files. Changed octopress_blog/source/includes/custom/navigation.html removing Archives links and changing url for blog links. Did the same on octopress_work/source/includes/custom/navigation.html. Updated the favicons and head.html. Changed some image paths in the scss for the buttons in the top navigation bar - removed the Rails image-url( helper and replaced it with an hard coded URL. It’s good enough for now.
Added an intro message in the homepage by editing the default.html page
1 2 3
That’s pretty much it for now.
One issue with merging multiple blogs is that each comes with its own sitemap.xml file, and they’ll need to be merged. After a discussion on GitHub I came up with these amends to the rakefile, which basically merge all the sitemap.xml files it finds inside the public/ directory, and runs at the end of the generate task
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
I made a pull request for this Octopress change, should anyone be interested
Fixing bad SEO in Octopress permalinks with categories
Another problem with Jekyll / Octopress is that it doesn’t do a good job of creating permalinks with categories in them. It puts the category human readable name as the permalink, rather than a URL friendly version as per the post title. So, if you have a category “Café dreams” and post “My favourite Café”, and your structure is /:categories/:title, then your permalink will include
/Café Dreams/my-favourite-cafe instead of
/cafe-dreams/my-favourite-cafe. There are two separate aspects to it, with two different solutions - fixing the legacy Wordpress pages, and ensuring all future pages do not suffer from this issue.
Generating Octopress permalinks from legacy Wordpress slugs
This is a semi-manual batch job, but there isn’t really an easy way to do it. The good thing is that all the posts have a Wordpress slug: field, so I can use that to create the title from. I create a temporary rake task for this. It worked ok, bar a couple of files which I fixed manually.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Ensuring new Octopress pages have an SEO friendly link with category
In order to ensure all new Octopress posts do not suffer from the same bad permalink problem, I amended the rake new_post task to take an optonial second parameter, category. So you can call it like this
and it will generate this front matter below. Notice that rake will complain if there are any blank spaces between the two square brackets. While I was at it I added an ‘editor’ variable (in my case, “subl”) to open the newly created file with.
1 2 3 4 5 6 7 8 9
Note that I had to the encoding as the first line, to avoid the regular expression choking on umlauts etc
The code for the rake is below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
There are plenty more small adjustements to do, but this is it for now. Now it’s time for part 2: depoly to an Nginx server