by ezs | Nov 1, 2010 | evilzenscientist, Uncategorized
A trio of projects before the year-end – all interwined.
- Migration of mail from Google Apps to Hosted Exchange.
- Migration of DNS from current service provider to ‘someone new’
- Migration of blog/photos to ‘somewhere in the cloud’
Moving the mail isn’t that hard – it’s just making sure that mail doesn’t get dropped while the new MX and CNAMEs are propagating. The old mail will live on in Google Apps – the new stuff in hosted Exchange. The trickier part is making sure that ‘my customers’ get the right service – and can keep getting mail in Outlook or the web. Users eh.
Moving the DNS is part of the mid-term strategy to change ISP. Covad have been great to me since I moved to the US; sadly they are starting to show signs of decay. I need to support additional DNS records than the A, CNAME and MX records – no plans from Covad.
The final push is to move the blog servers out of the ‘home data centre’ and to a reliable, faster provider.
The ultimate aim is to divorce myself from Covad and the Static IP business DSL that has worked so well – and move to something that is much faster – but maybe without the SLA on the line itself.
by ezs | Oct 18, 2010 | evilzenscientist, Uncategorized
I’ve been testing out xCache for a while – primarily as a PHP accelerator.
Early results were really promising – reducing page load times dramatically; and also reducing CPU load as common pages (i.e. the latest blog post and photos) were fed directly from the cache.
There seems to be some kind of memory leak/cache clean up issue with xCache 1.3 – I allocate some amount of RAM for cache (16MB, 64MB, 256MB – it really doesn’t matter) and at some point Apache/PHP starts eating up RAM, then starting to swap – and finally the server grinds to a halt.
xCache is off for now – I’ll keep investigating.
by ezs | Oct 7, 2010 | evilzenscientist, Uncategorized
Any ideas?
Twice this week all connectivity has been lost – upstream of the CPE (on premise router).
The first was from 2100 to 0800:

The next from 2130 to 0430:

It looks like some kind of maintenance window from the Qwest who actually provision the line.
by ezs | Jun 3, 2010 | evilzenscientist, Uncategorized
The firewall/IDS/proxy box has been up for a year.

I’m happy with that.
by ezs | May 31, 2010 | evilzenscientist, Uncategorized
I got an email on Saturday morning:
“I’m getting a message when I try and “post draft and edit online”. See pictures attached of the messages.”

Uh oh. Nothing had changed in the config of the web server for months – and adding extra disk space to the server wouldn’t cause this.
I looked at the Apache error logs – nothing. I couldn’t see anything that would be causing this. Typically it’s a permissions or xml-rpc problem that’s kicking up a complaint in Windows Live Writer.
Other blogs on the same server were working perfectly; I could upload via xml-rpc as well. Very strange.
Eventually I tracked down an alert in /var/log/warn that was flagging ‘cannot read inode bitmap’ – whenever I tried to upload an image via xml-rpc. Even stranger. This really didn’t make any sense – but it looked like early signs of a corrupt root filesystem and being unable to write to temp.
I dismounted everything and tried to fsck the disk – and then the world of pain unraveled. The entire root filesystem seemed to have junk – it’s ext3 so should be pretty robust. I’ve no idea what caused it – but the end result was that most of /etc was toasted and there were some 10,000 entries in lost+found.
The upside is that the mysql and web data are all on seperate disks – so really easy to reconstruct the server. I had backups of my PHP, mysql and Apache confs – as well as all the data. The only slog was updating the Apache/PHP/MySQL stack to the correct (current) versions for my uses.
What I learned:
- backups are great – but separating the data from the OS is a real winner
- backup the config files for the core apps
- document the correct versions of core apps. Currently Apache 2.2.10, PHP 5.3.2 and MySQL 5.1.3 – these all work together without problems
Total downtime – about eight hours. Real time spent fixing this – about three hours.
I also moved several of the blogs to WordPress 3.0 RC1 – it’s been really stable so far on the main blog. I also had to do a latin1 to utf8 conversion on one of the older blogs. Always painful – but a one time hit. I need to add that to the change control/validation for the next round of big updates.
Recent Comments