Sorry for not posting anything over the last little while, but crazy busy does even begin to describe it. What have we been up to? Over the course of seven nights, eight hours each night, we successfully moved the following:
Just under 2000 servers (Ranging from brand new to ten years old)
50+ Foundry switches
3 Compellent SANs
2 EMC SANs
Misc Core Networking Gear
Partridge in a pear tree
We knew that moving two Datacenters full of equipment was no small feat. To put it into perspective before the move we spoke with Dell. We just wanted to see what they would charge to come in and do it for us. They quoted us 4 million dollars and told us to expect a 12% failure rate of equipment.
With an amazing crew of a dozen or so, we pulled it off with a failure rate of barely 1% (20 servers). The failures ranged from RAM issues to a motherboard literally blowing up. Some issues were as simple as replacing a single component. Only a couple of the servers had to actually be rebuilt.
I couldn't be prouder by the fact that we didn't have a single customer cancellation directly related to the migration. I feel the reason for this is simple:
Communicate with your customers in an open and honest fashion. If you know their server is going to be down for a minimum of an hour, don't tell them it could be back up within 10 minutes.
There is no such thing as "over communicating" with your customers. If a notification is applicable to them, send it. They will let you know if you are being too chatty.
If you screw up, don't try to cover it up. Own up to it and take care of the customer. If you don't continue screwing up this honesty will endear them to you.
Spend twice the amount time you think you should planning, but don't make it so ridged that it is set it into stone.
Have a Plan B and even C but more important have a plan that allows adaptability. Over the course of seven days we dealt with everything from snow, broken down trucks, and trips to the hospital. You can't plan for everything.
Lastly, force people to get rest. This is something I was guilty of not doing. I would move servers all night and drive home. Then I would crawl into bed setting my alarm for 3 hours of sleep. After a shower and the legal limit of coffee I would get up and start working until it was time to move servers again that night. Rinse and repeat for six more nights. This was stupid on my part.
Some of our migration process contains proprietary information but a lot of it is not. If you have a move coming up of 1, 10, or 1000 servers email me and I can give you some tips that will make you say "duh, why didn't I think of that?"
a93945b6-d4c2-41db-86ac-b018a3c39e6e|2|2.0
Tech
server administration, datacenter