When things go wrong……….working at the weekend

It’s been a very long weekend. I’ve been trying to do a migration from nt4 and exchange 5.5 up to Windows 2003 64bit, Active Directory and Exchange2007 and although I succeeded there have been several large hiccups along the way. More details in the extended entry as it will get technical!


First off I’d like to put a disclaimer that this is not one of our clients we do regular maintenance for!
The NT4 server was running exchange 5.5 service pack 2 and you need to run service pack 3 to be able to do the migration to Exchange2003 so we downloaded sp4 as this was the most recent version and proceeded to do the upgrade… All seemed to go well with the sp4 installation until the end, when I received the message that it had failed with “Access is Denied – Microsoft Windows NT ID No. 0xc0020005”. The log files did not provide any useful information but eventually I found kb243080. I stopped the services, reran the install – same thing happened. Unfortunately the next step is to use your exchange media and remove outlook web access. Bear in mind that this is NT4 and Exchange 5.5, the media for this took a long time to be found as it hadn’t been used for a very very long time. Moral of this part of the story – if doing this upgrade ensure you have the exchange media available. After removing owa as per the kb article I was able to install sp4 with no problems and the migration to Exchange 2003 went great. It took 4 hours to copy 16gb of data files (see a warning here?) and I left it to run overnight. I left the customers office for an hours drive home at 9.45pm!
Saturday morning came and I found out that there are several remote users, running outlook2003 (thankfully) that do not have a vpn connection to the office BUT still need to access outlook. It turns out that in Exchange 5.5 days by opening up port 135, 1225 and 1226 it was possible to get outlook to work via outlook web access. This wasn’t something I had heard of before and of course it doesn’t work with Exchange 2003. To solve this problem I set up RPC over http for their outlook clients using the very helpful page on Petri’s site. The tricky bit was getting the self signed ssl certificate correct. You need to ensure that you use the name that outlook web access listens to – NOT the internal name of the server.
After getting the clients working I then realised that they were hitting the 18GB limit of exchange 2003 (remember the 16gb hint earlier?) so a quick registry change using kb912375 to remove the 18gb information store limit and the users were all good to go on Exchange 2003 – my goal for this weekend.
I then removed Exchange 5.5 from the organization and started work on the Exchange 2007 installation.
This was the most frustrating part of the experience – especially as it happened just as I was about to go home. I set the newly installed Windows2003, 64 bit Domain Controller to install all of the windows updates and just as I was leaving for the night hit the ok to reboot button. (It’s a Domain Controller because its the only Windows2003 server that will remain a server – the exchange2003 server will eventually be wiped after the mail has been migrated to Exchange2007 and then the server reinstalled as a File Server only).
After the server rebooted I got the worst dialog box ever….”Object not Found” – followed by “”LSASS.EXE terminated unexpectedly with status code -1073741571”. You may be familiar with this error message – 99% of the time it means you have the Sasser virus on the machine. A sinking feeling came over me. Unfortunately this dialog box was coming up *before* you log in so it means you are unable to login as it destroys the ctrl-alt-del option and the server kept rebooting. Jumping into safe mode I downloaded the Microsoft Malicious Program removal tool, Symantecs Sasser removal and Stinger onto a thumb drive. Write protected it and ran them all on the server – Sasser wasn’t detected by *any* of the tools. A manual inspection of the registry and windows\system32 didn’t come up with anything either. At this point I decided to bring the server back with me to the office, sleep on the problem and look at it in the morning. I got home at 10.45pm.
Sunday morning and I discover that there are two kb articles on this problem – 838501 talks about a Windows 2000 to Windows 2003 upgrade causing this error and the solution is to edit the registry but the other one, 940925 occurs after service pack 2 or 918442 has been installed. This seems the likely culprit as it all happened after installing Windows Updates. Unfortunately this kb article requires a hotfix and PSS are not open at the weekend (having said that I called them anyway, pressed 1 to say I had a business critical problem and would have argued the fee if they had tried to charge me as the problem is caused by their own software, but thankfully I didn’t have to go into that). A search for this file on google only finds the x86 version, I’m running 64bit version, so I’ve requested the hotfix via the web interface and while I’m waiting for it to be sent to me (in the past they have come really quick) I have written this blog post for my future reference.
Incidentally I got the server back up and running in a usable condition (in that I can log into it) by booting into the Last Known Good configuration – so in some respects it was quite handy that the problem occured before I was able to log in. However I’m not going to do any more work on the server until the hotfix is available.

Things learnt from this:-
1. Although it wasn’t the Sasser virus – work on brand new server installations behind a unique NAT firewall to prevent you getting a new server infected by another pc on the lan (especially when you find out that Windows Updates, Antivirus defs are not up to date AND the client has had complaints about zombie spammers from their network after the fact).
2. Ensure the media for all applications that you are going to touch are available – from old version installed through to latest version you are installing and that you have the service packs downloaded.
3. Microsoft probably will help you on the phone if you state you want a hotfix – regardless of the business hours.
4. You HAVE to have internet access – do not do upgrades on servers that do the DNS or is the default gateway if you do not have alternative access to the internet or dns settings. (This did not apply in this case).
5. Have up to date versions of antivirus software already on a read only usb drive that is updated on a regular basis (and then made readonly again).
6. If it’s late – take a break and look at it tomorrow.