sysadmin

You are currently browsing articles tagged sysadmin.

I had a very tense few hours with a customer’s server yesterday. The fact that it’s a Small Business Server and thus, the “Everything Server”, didn’t make things much better. I did two things, and both turned out to be bad. I also didn’t reboot between the two things, which also turned bad to be even worse.

One. I installed the new service pack, which is a Good Thing (generally), except when the computer hangs at “setting up, stage 3 of 3, 0% ready” and spins the little circle thingy for half an hour. At that stage the “please do not turn off your computer” becomes stressful to ignore. So i leaned on the power button, chose to restart in Safe mode and everything seemed okay. For a while.

Two. I changed the network adapter to traffic at 1 Gb/s full duplex. This turned out to be catastrophic. And i fully blame HP for this. After a reboot into normal mode, i had no network. At all. And i was not able to open the HP network interface control panel thingy, since the “management database” was locked. Not even netsh would help me this time.

After much stressful head scratching and beard tearing, i hypothesized that HP NIC management is grumpy because it was in fact plugged into a switch that only goes to 100 Mb/s. Yeah, i can appreciate that it can’t traffic with the wrong line speeds but that i can’t turn that setting off is criminal. If that indeed was the case. So i plugged the server’s NIC into a backline giga-Ether switch (yeah, you shouldn’t do that either) and rebooted. And hey presto, the “management database” was no longer locked.

Back to 100/full, plug the server where it belonged, and normality is restored. Just in time to go and fetch the kids. Sysadmin feat in true Hollywood style.

I just wonder what those HP engineers were thinking about.

Tags: , , , ,

Update: The VBscript code i had was both long and buggy. The new code is short and sweet, and at least works no less than the previous code.

BGinfo is a nifty piece of software which can print out a whole lot of technical information on the desktop background of a Windows box. As an administrator for a bunch of client machines, BGinfo has proven Most Useful.

There are two issues, however. Sometimes the information i use on my backgrounds can be a bit over the top. And then there’s one little bit of info not included in the admittedly colossal BGinfo arsenal: whether the computer needs rebooting after having been updated. So here’s my fix.

Step Zero is to download BGinfo from the link above and save it anywhere that can be addressed over the Windows network during a logon procedure. I chose the domain controller’s Netlogon share, or \\%LOGONSERVER%\NETLOGON in the examples below. In reality, i used the real name of the logon server instead of %LOGONSERVER% but i suppose the variable name will work just as well. You might need to add %-signs for added magic.

I then created a minimal BGinfo template with just the hostname, IP address and a custom field i call Is Reboot Required. The template uses the user’s own default wallpaper and the BGinfo data is aligned to the top right of the window. Your mileage may vary. Save the template with the BGinfo executive. My path is \\%LOGONSERVER%\NETLOGON\bginfo-minimal.bgi

The custom field Is Reboot Required points to the output of a certain is-reboot-required Visual basic script, saved with above two files as is-reboot-required.vbs:


If CreateObject("Microsoft.Update.SystemInfo").RebootRequired Then
Echo "Reboot required"
End if

Old code. Don’t use:

function readFromRegistry (strRegistryKey, strDefault )

Dim WSHShell, value
On Error Resume Next
Set WSHShell = CreateObject("WScript.Shell")
value = WSHShell.RegRead( strRegistryKey )

if err.number <> 0 then
readFromRegistry= strDefault
else
readFromRegistry=value
end if

set WSHShell = nothing

end function

str = readFromRegistry( "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\PendingFileRenameOperations", "no" )
if( isNull( str )) then
msg = ""
else
msg = "Reboot required"
end if

Echo msg

What the script does is check whether (Windows Update, usually) requires some files to be renamed during the next reboot cycle. This information is stored in the PendingFileRenameOperations registry key. If it’s non-empty,If our computer’s Microsoft Update client deems a reboot is required, we emit the administrator-friendly message “Reboot required”, otherwise we just shut up (having a “Reboot not required” message on the wallpaper isn’t what i call good usability).

Disclaimers: This script works when plugged in but not when run on the command line, oddly enough. And, i’m no VBS guru. The script was created by creative copy-pasting from other resources on the ‘Net.

To paste things together, i created the following one-liner batch file bginfo-minimal.cmd:


\\%LOGONSERVER%\NETLOGON\bginfo.exe \\%LOGONSERVER%\NETLOGON\bginfo-minimal.bgi /timer:0 /nolicprompt

Finally, i added \\%DOMAINCONTROLLER%\NETLOGON\bginfo-minimal.cmd in the startup scripts. Since this happened a week ago, i can’t remember if i did it through Group Policy or through the Administrator’s logon script or (ungh) through the Startup group in the Start menu but in any case it works. If i did it the Right Way (through Group Policy), that means i had to create a new Organizational Unit “Wizards”, add a custom group Admins, add Domain Administrators to it, create a new Group Policy to the Wizards, and apply the bginfo-minimal.cmd from the right path to that group, for that is the way of Windows Server 2003. But then again, i might just have been lazy.

Tags: , , ,

I learned something today. It is possible to have a Windows computer join a domain over VPN. My colleague suggested this to be true once but i never actually tried it myself.  And here’s how.

Be at the office, or at home. Take the computer that’s going to the customer and install all the security updates. Make a VPN conection to the customer. Check that the DNS settings for the subnet behind the VPN connection points to the nameserver of the customer. If you’re running a well configured VPN, that should happen automagically (also if you’re running Windows VPN).

Right-click My Computer, choose Properties, do the usual drill from Computer Name to join the customer’s domain. Reboot.

And here comes the trick.

Log in as the old local user. Re-ignite the VPN connection. Start –> Switch user. Log in as administrator (or whoever) from the customer domain. This will, oddly enough, de-activate the VPN connection, so you’ll need to rebuild it.

Do the other tricks you wanted to as a member of the customer’s domain.

Easy as pie, once you know the recepie.

Tags: , , , ,

I bought the Cisco Routers for the Desperate book in PDF format (waiting for clearance from the publisher that i’m actually allowed to create a hard copy from it), as recomended by the Tao of Security weblog i subscribe to. The book starts with the following dedication: “To all those poor bastards who are awake at oh-dark-thirty trying to get their router working.”

The book starts with a pretty accurate scenario description of When the Router Breaks:

People panic. Pretty soon, everyone’s running around as if they have a drunken badger loose in their undies. While that can be amusing to watch, it doesn’t get the Internet fixed.

I think i’m going to enjoy reading this one.


Tags: , , ,

Here’s the story of how i rescued a Windows XP installation from a broken 160 GB SATA hard disk to an intact 60 GB SATA disk, illustrated in a few easy teps that will make my six and a half hours of creative hackery seem like a work (walk) in the park. I also sing high praise to the penguin.

But first a disclaimer, since my boss will probably be reading this.  All this could probably have been done using suitable tools running on Windows. We just don’t have any. Also, you could probably have done this using partimg, saving you a bucketload of work, but since you’re doing this from a broken disk, partimg will puke and fall over.

Here’s the brief background. A few days ago, i heard from a customer that one of their laptop hard disks had broken. Today, while waiting for the replacement HD, i got an update. The guy with the broken laptop is going on a business trip to see some customers and that he needs a laptop with him. So if either that one could be repaired, or if i could get a spare laptop of theirs in running order, that would be, well, critical. Deadline in 24 hours, preferrably less.

This would have been easier if we actually had had a replacement hard disk for his machine, or had not the replacement laptop been “slow to boot” (ie either full of viruses/worms/crapware or just decomposed). Now it was a no-win in either direction.

Step 0: Sanity check

To successfully perform this trick, you need a spare hard disk, cannibalized from your demonstration station, an external HD, and a wonderful little distribution called System Rescue CD. Oh, and a lot of coffee. Optional extras, which would have been nice, would have been a SATA adapter so that you can have two laptop HDs plugged in at the same time, a second copy of System Rescue CD, and the same number of power bricks that you have laptops to work with. I did this with two laptop, one Rescue CD (stupid) and one power brick (equally stupid). If you have only one laptop to work with, be prepared to plug and unplug hard disks plentiful times, and try to compensate my scribbling with your manifestation of reality. I could probably rewrite this article with a more optimal setup, but then it would seem even less heroic.

Oh, also a functioning computer that you can have for reference and to play music from is essential :)

Now before i let you get your hands in the mud, realize that the narrative that follows is just that. A narrative that follows. I can’t take any responsibility if you follow the story below to the comma and a small black hole appears in the middle of your living room that sucks everything into it and reality just ends and the whole thing just ruins your day. If you’re unsure of what i’ve written and the correctness of it, assume i’ve made a mistake and stop right there.

Now let’s get our hands in the mud.

Using, for instance, the laptop’s HD checking tool built into the BIOS, make sure that the hard disk actually is broken. Remember: “Patients lie.”

If your source disk fails, now would be a good time to label your disks (dymo, magic marker, whatever) and your computers, since on the outside they look very much alike when you can’t boot onto them to see which box really is which.

If your source disk actually hasn’t failed yet but only show signs (or sounds) of age, i’ve added how to do this in way fewer steps at the end.

Step 1: Make “just-in-case” backups

This step is completely optional, but since you’re soon going to do irreversibly damaging things to your source hard disk, it’s probably a Really Good Idea to follow. Also, you’re going to repeat this step soon, so why not practice now when it’s not irrevocably dangerous?

Boot the “broken” laptop with System Rescue CD. Plug in the external HD, which needs to have more free space than the HD you are going to rescue, and needs to be formatted in a way that supports gigantic files (ntfs, ext3). Mount the external hard disk as /mnt/brick (or whatever you like). Figure out, using fdisk -l /dev/sdX, which hard disk it is that you’re trying to rescue. Mine was /dev/sda and the brick was /dev/sdb.

Make a backup copy of the master boot record (MBR) using the following two commands (substituting paths where necessary):

dd if=/dev/sda of=/mnt/brick/backup-sda.mbr count=1 bs=512
sfdisk -d /dev/hda > /mnt/brick/ backup-sda.sf

(tip taken from here). Without the MBR, the computer Just Won’t Boot even if everything else is restored. This i realized only after everything else was restored but hey, i’m nice and i’m writing it here where things are still simple.

The reason why you’re using dd and sfdisk to back up the MBR is that while the Windows XP restore disk has the very convenient tool fixmbr and was provided with your nice HP laptop, it does not include SATA drivers so it won’t see that you have a hard disk on your computer to fix the damn MBR on. Or in essence, it is a useless piece of compressed polycarbonate and it should be a criminal offence to ship it as such as a restore disk. Also, the Vista installation disk you have backstage will not bother running a restore console on an XP installation. Well, mine didn’t. (End rant)

Back up the hard disk using ddrescue, make a backup of the b0rken hard disk. If your paths are like mine, the syntax is ddrescue /dev/sda1 /mnt/brick/sda1-backup /mnt/brick/sda1-backup.log and what it does is copy the first partition of the disk sda onto a file named sda1-backup on the external hard drive and using a log file in case things go haywire. This will probably take a an hour or two. Send St. Anthony some warm thoughts, just in case.

Nota Bena: If you have the two laptops up and running at the same time (because you have two System Rescue CDs), remember to sync and umount the it before pulling the plug and connecting it to the other lappie. If you’re on a gigabit network, screw USB hard disks and copy over the net instead. If you have just one of the lappies up at a time (because you have just one power brick :) ) you’ll need to go through the mkdir /mnt/brick && mount /dev/sdb1 /mnt/brick hoop after each startup. Oh, and make sure /dev/sdb1 actually is your external HD brick :)

Step 2: Prepare the target disk

As mentioned, we had a spare disk that was smaller than the disk that had broken. Fortunately, the amount of stuff on the broken source disk was lesser in size than the capacity of the target disk. This is where the dangerous fun parts begin.

Boot a laptop with the target disk using System Rescue CD, or plug it into the system you got running in the previous steps using a SATA adapter/enclosure/doohickey/thingamajig. Give a sigh to the installation you have on it, back up the valuable stuff from it onto the external hard disk. If you haven’t yet done so, start XWindows using the command wizard. Plow through the options until you have a graphical user interface. Start GParted by clicking the icon with the disk symbol. Make really really really sure you are selecting the right disk unit (this is why it might be good to boot up the computer with only that disk connected, and to unmount and unplug the external HD before you commence with the following) and delete all partitions there are on the target disk. Create a new NTFS partition on the disk, filling all of it. Then, using the resize/move partition button, make a note (pen and paper, baby!) how many MBs the partition is. Then, just for good measure, using fdisk -l /dev/sda (assuming the disk you just repartitioned is sda) write down the size info you get there too.

And you think that was scary?

Step 3: Resize the source partition

Go back to the laptop with the broken hard disk. Get GParted running on it like in the step above. Grab that /dev/sda1 partition and Resize it into the exact number of MB as your target disk’s image is, the one you made notes of in the previous step. Breath normally (if you can). Oh, and remember to run the computer on a power brick, not batteries, while you do this. It feels much better. I promise.

At this stage, half of your system probably thinks that the /dev/sda1 partition is still of the previous larger size. If you feel unsure, run fdisk -l /dev/sda to check. Or reboot. Or something.

Step 4: Back up the resized partition

Again, using ddrescue, back up the the partition you just resized to the external HD. You’ll probably need to run through the mkdir /mnt/brick and mount /dev/sda1 /mnt/brick hoop again if you’re running with just one System Rescue CD (and one power brick). In case you have both lappies running, i suppose now is a little to late to remind you that you need to sync and umount the /mnt/brick before swapping it between laptops. If you didn’t, your data is probably fried at this stage, so start from the top. Don’t say i didn’t tell you before, because i just added that bit (see, i can write in a nonlinear fashion even if you’re probably reading this from up to down). Then back up the MBR as outlined in step 1.

Thinking of it, you might as well first back up the MBR and then back up the data, since backing up the data is going to take a lot longer than backing up the boot record. Still, since you just made the data partition smaller, it’s not going to take as long as in the previous data backup phase. If you’re running short on disk space on the external brick, it’s probably faster to run down to the chip shop and get a new disk than trying to gzip the original image, even if the chip shop is closed. OK, down to business.

Suggested syntax:
dd if=/dev/sda of=/mnt/brick/resized-sda.mbr count=1 bs=512
sfdisk -d /dev/hda > /mnt/brick/resized-sda.sf
ddrescue /dev/sda1 /mnt/brick/sda1-resized /mnt/brick/sda1-resized.log

Again, be sure of yer paths yadda yadda (hey, we’re all grown ups so we can take care of ourselves so i’ll stop warning you at this stage).

Step 5: All pieces fall together nicely

Right then, time to put all your pieces together. The partimg manual (linked to in step 1) suggests now would be a good time to restore your resized partition table to the empty disk. I didn’t, because i only realized later copying the MBR is a mandatory step if you want the target box to boot. So it will probably work if you do it in the wrong order too. But i’ll document the procedure here in the supposedly correct(er) order.

Boot the computer with the blank NTFS-formatted hard disk (which we suppose is /dev/sda — oh that’s right, i said i wouldn’t be warning about paths anymore) and the external USB brick plugged in.

dd if=/mnt/brick/resized-sda.mbr of=/dev/sda
sfdisk /dev/sda < /mnt/brick/resized-sda.sf

…and a fdisk -l /dev/sda, a sync and/or a reboot if you weel wobbly. Could be the coffee at this stage though.

Finally, restore the resized partition image onto the new disk:

ddrescue /mnt/brick/resized-sda1 /dev/sda1 /mnt/brick/resized-sda.restore-log

Step 6: The resurrection

Place the restored hard disk in the laptop which used to house the broken disk. Boot that laptop. Be very, very satisfied. Buy yourself a chocolate, because you’re worth it.

Post mortem

I could probably re-write this article using a more optimized setup. But then again, i started with a way more complicated question which was “how can i resize the backup image i’d taken and fit it on the target disk?”. Turned out it was easier to just resize the broken partition and dump that on the new disk. Also, backing up my 160 gig backup image (i’d rather be careful than sorry) from and to the same external USB hard disk took sooooooo long that i was going to see sunrise before a complete copy.

An easier solution that wouldn’t have worked

Here’s how to do this whole trick if your hard disks aren’t broken just yet. Or if you’re migrating to a larger/smaller HD and don’t want to install everything anew. I’m going to assume this time that you’re doing it on a computer where you can have both disks plugged in at the same time. I’m also going to assume you’re only going to move/rescue a disk with one partition. If there are more partitions there, you’ll have to improvise a bit. They’ll all be copied though, but i’ll leave the particulars to you, the enlightened reader.

Finally, i’m assuming that you’ve read the whole article down until here because i’m not going to repeat how you’re going to do it here. If you haven’t, start from the top and i’ll be waiting right here until you’re through, okay?

Case 1: Identical source and target disks

Plug in both hard disks. Boot with System Rescue CD. Verify that your source disk is /dev/sda and your target disk is /dev/sdb (and not the other way around or your data will be forever fried — you might consider making a backup at this stage :) eg by mounting one of them and checking what’s inside.

ddrescue /dev/sda /dev/sdb transfer.log

Wait. Reboot. Rejoice. Piece of cake.

Case 2: Target disk is larger than source disk

Plug in both disks. Boot with System Rescue CD. Verify /dev/sda is your source disk and /dev/sdb is your target disk as above.

ddrescue /dev/sda /dev/sdb transfer.log

Wait.

Start XWindows. Start GParted. Select target disk from the less-than-obvious drop down at the near top right corner of the GParted window. Resize target disk to maximum. Apply.

Reboot. Rejoice. Cake with crusting.

Case 3: Target disk is smaller than source disk

This is what i should have done (see, now i spoiled my own thunder) and is more or less a more efficient re-write of this whole article up until now.

Plug in both disks. Boot with System Rescue CD. Plug in external HD brick. Mount as above to /mnt/brick. Make a backup of the source disk’s MBR if you’re nervous/careful/pedantic. Back up the source disk, just in case (optional for the brave/foolish).

ddrescue /dev/sda /mnt/brick/sda-backup backup.log

Start XWindows. Start GParted. Select source disk. Resize the partition so that it’ll fit on the target disk. Move your pr0n/mp3s/dvdrips to external brick first if required. Exit GParted. Take a deep breath.

ddrescue /dev/sda /dev/sdb transfer.log

Wait. Restart GParted. Resize your newly transferred /dev/sdb1 to fill all of the disk. Apply. Sync. Reboot. Rejoice.

And that’s about the size of it! Oh, and these tricks would probably have worked equally well for backing up other Windowsen, Linuces and OSXen. I just didn’t try.

Tags: , , ,

I’m not sure if i should post this or not. Not because it’s got any information that is secret, but just because it isn’t very elegant. But i’m posting.

Scenario: The Customer has a server in their DMZ. It’s a Windows server and it’s running Terminal services (RDP). A custom application needs to be installed onto this server. For that, the firewall must be configured so that a list of addresses, including the party installing the application, can access RDP and the port the custom application will answer on. I’m on the Inside net doing the firewall configuration.

So how can i test that RDP actually works from the outside, when i am on the inside? That would probably be easy if i had a Windows box i could RDP into and then RDP out of it to the customer’s server. But i don’t.

Enter (cough) Linux. And (cough cough) Cygwin.

  1. Install Cygwin on your Windows laptop. To install X-Windows, choose to install “xinit” from the X section. The rest of the files will follow.
  2. Run Cygwin. Exit Cygwin (it’s voudou, don’t question it).
  3. As administrator, run Cygwin and start X (or XWin or startx). Click away errors (more voudou).
  4. Start PuTTY and enable X forwarding.
  5. ssh into Linux box on the Outside you have access to.
  6. Start tsclient on the Linux box, which will the graphical stuff tunnel over ssh and end up on your X-Windows which is running on Cygwin/X which is, in fact, running on your Windows box. I think we have two or three layers of tunnelling here, but i’m not sure.
  7. Connect to the server in the basement, going through an improbable chain of loosely coupled and technically incompatible loops.
  8. Marvel.

So there. Didn’t say it was elegant. I’m not particularly proud of the solution, but at least i showed it worked. The elegant way would probably have been to use my cell phone to hook my laptop up to the Internet and get to the DMZ server from there… but where’s the fun in that? ;)

Tags: , , ,

If you’re a Windows sysadmin of the gun-for-hire kind of type (like me) you may have run into this situation: You need to find a user, in context. Not just find the user and change her maiden name or reset her password, but find in which Organizational Unit (AD lingo for “folder”) she belongs to. Or like in this specific case, find a similar user – that actually is a resource mailbox for a help desk queue – and create another one like that one. Since i’ve never created help desk queue “users” before, i wouldn’t have a clue where they should be put.

After some creative googling courtesy of our friendly in-house sysadmin Bob (thanks, Bob!) here’s the solution:

  • Fire up the Active Directory Users and Computers admin thingy (i’m using the one on an Exchange server, but i guess the plain vanilla version will work just as well).
  • From the View menu, engage the Advanced Features mode. That’s the trick.
  • Now go and Find the user using your normal set of tricks. When found, double click the user to view its Properties sheet.
  • Lo and behold! There is a new tab there, and it is called Object! Click it!
  • Read out the Canonical name of object and weep of relief and happiness.
  • Now close the Find dialog, because otherwise you can’t…
  • navigate to the OU indicated the user resides in

Hallelujah!

Since this feature must be the Most Useful and Needed one, i cannot understand why it’s hidden behind an Advanced View, but such are the paths of an AD ninja… full of turns we must just accept if we cannot understand.

Tags: , , ,

I have teared enough hairs from my skull to make a rug trying to install Ubuntu Server 8.10 on a HP ProLiant DL360 server. The short answer is it will not work and the quick solution is install Ubuntu Server 8.04.1-LTS instead.

The longer answer is that has to do with the disks. The DL360 (and supposedly its sibling servers) use a RAID that Ubuntu 8.10 does not understand. It doesn’t matter if i tell it to enable or disable SATA RAID, or to use or not use LVM. The system installs nicely but after that, it just won’t boot. Same goes with both the x64 and x86 versions of Ubuntu Server 8.10. Since the RAID is enabled in hardware, i am supposing that my disks are mirrorred and that i’m protected on that plane. The 8.10 setting probably just allowed me to actually see that we have a RAID going on. Transparency is always nice.

I’ve read incoherent (at least to me) explanations that you should go and poke with Grub to get things right, but i couldn’t get a comprehensive enough explanation that i would know exactly what i was doing. So i decided not to be bothered. And then i read in another article that thou shalst screw the latest version and just go with the previous one, and things are nice and fine. You should even be able to update to the latest version over the command line, so you’ll get virtual machine support and all the other goodies the 8.10 provides.

There are two implications. One: install 8.04 and you’re up and running before your coffee gets cold (even in a well ventilated server room), or two: if you know exactly how to actually get 8.10 up and running with the RAID discovered, please tell me in the comments. Thank you.

Tags: , , , ,

I got a service call from our biggest customer on Sunday. The girl at the check in desk told me that she couldn’t get to the reservation system, so she couldn’t check customers in or out. She also could not open her email. And there was something about cooling equipment in the engine room that had failed.

That last bit worried me that a reboot of the workstation might not do the trick this time.

Last Sunday was also Fathers’ day. Not a good day for an emergency. I am happy that the call didn’t come before my kids had the chance to “wake me up” and deliver their congratulations and prezzies. They had really been waiting for it. In fact, i even had the time for a proper breakfast. But the rest of the day would seem to go to the dumps. My wife was also on duty call that weekend, and the kids and i were supposed to show up at my parents in law mid-day. Doom was impending.

I called the customer’s site security manager and got the news. There had been a power failure in a transformer a few blocks away. The on-site UPS was sucked dry, and the generators had failed to start. It was a cascade failure, and it was not good. But hey, they are a big customer. Maybe the servers would come back once power had been restored.

Power was back at about eleven-thirty. I did a bunch of phone calls to the customer’s different sites to ask whether their reservation systems were down or up, while the kids were growing louder. They were all dressed up and ready to leave and did an excellent job getting on each others’ nerves.

The reports from the sites were contradictory to say the least. The reservation system was up, no down, no it was up but now it’s not. Email was still down. And the lunch at the in-laws was about to start. So i gave them too a call and said that we’re going to be a few minutes late but that i’d probably have to set up a remote office at their place and do some phone calls and use my computer to take a remote connection to the customer. If all was really bad, i might have to skip lunch and visit the customer’s site, but the kids would be there anyway. And it surely wouldn’t take very long.

I felt the first grain of bad karma fall on me.

From my remote office, i was able to talk with the firewall, but the mail server didn’t respond to pings. And with the site manager on the phone suggesting that i should maybe stop mucking about with remote help and get my servicing arse over there instead, i concurred. Since i don’t have access to the servers’ ILO management system (which works even if the server is off and through which i could be able to remotely switch on the server), i thought i might as well look good in the customers’ eyes and drive down town to push the damn power button and be back in time for desert. Or coffee, if it was more than one server.

On the way down town, i had another chat with the customer’s IT manager and he decided he too would come to the disaster area. At the time, i thought it might be overkill. It’s probably just a flick of the switch on a server and we’re back up and running.

Boy i was wrong.

Things were a bit more silent in the engine room than usual. The air conditioning was okay, which was the first good bit of work related news for the day. We proceeded to fire up the servers. The domain controller was off. The file server was off. The mail server had hung, or it was off, or just b0rked. The intranet was down. The virtual server server (in lack of a better term) was off, and with it, the virtual servers. The disk array was on but one of the virtual servers could not connect to it. The reservation system was off for this site but up for another. The billing system, it turned out, was off. The orders printer in the kitchen was blown. The applications to operate and monitor telephone calls, wake-ups, keys and (oh!) the mini bars were off. Also, our management PC was off. And to top things off, the console thingy that one would operate half the servers with had suddenly decided that it wanted a password which nobody had. And all this was by no means apparent with a glance. Problems oozed in as others were solved. On site, three fathers: the site security chief, the IT manager, and me. How could things be better.

We started with the most critical systems. At this time, i had mobilized half of the Infra crew, most notably Niko who got the virtual servers and the disk array into order and Tero who was on a beach in Spain and remote-instructed us from there. Had it not been for their expertise, the customer’s systems would probably still be down. Soon, we had the check in system up and the three systems that need to run in tandem (trindem?) to take care of billings was slowly back in operation. Email required an extra booting, but it also came back.

Seldom had i more wished for proper documentation of the system than now. An inventory of equipment and servers and how to get everything running even for a guy like me who doesn’t spend most of his billable hours at this customer… would have save the day.

At this time, lunch, dessert and coffee were but a pressing but sad memory. By each hour, i had to tell my wife that this won’t take much longer and we just need this one system back up, after which it turned out that that one system really is a whole bunch of subsystems that first need to be physically located to get into operation. I felt the bad karma pile in massive quantities.

At this time i should probably tell you about the third server room on site. The first two ones are like proper server rooms. There’s loud air conditioning. There are a bit more monitors, cables, power supplies, cardboard boxes and junk lying around than there should be. There are racks with loud expensive technical equipment having lots of lights that blink. There’s a crapload of cables going in front of the boxes that blink most, so you can’t really access the equipment without a jungle machete or a lot of patience (the second option is preferred). Many of the servers are tightly crammed because at the time, nobody thought you really would need to get to the other side of the servers. Say, to plug in one of those bulky CRT monitors lying around because the console demands a password which, as i probably mentioned, nobody knew. And you couldn’t use remote desktop, because the stickers on the computers failed to mention the hostname or IP address of the box. And you would need to get to the computer to see if the apps on it are running. And just to really top it off, a few of the machines refused to start without a keyboard plugged in, and since the console was off-line because nobody knew of the password, it wasn’t considered a working keyboard, at least not by the computer.

Compared to the two main server rooms, the third server room is a mess. The non-techie people working around there use the room for ad-hoc storage of audiovisual equipment (speakers, cables, microphones, amps, cables, more cables…) and junk. I had to step around a cardboard box of miscellanea just to get into the room. A ghetto blaster was obstructing half of the entrance. A snake pit of cables was lying on the so called operator table, partly on top of and partly under the keyboard, mouse and KVM switch.

Above the operator table are a few shelves with servers. Well, actually they aren’t servers of the kind you would call servers. They are more like old workstations on server duty, in part because it’s cheaper that way and in part because nobody seems to know whether an application on one “server” will play nice with the application on another. Thus, there is one box per application. Per critical application, i might add, and that the workstations are five years old or more, and that they live in a crammed space on the second to top shelf in a room filled with snakes, audiovisual trash and a ghetto blaster. I really should have taken a picture.

Since nobody thought of it at installation time, the “servers” were not set to start automatically once they got power. In fact, this held true for nearly all computers, be they proper servers or workstations working as servers. And even if they had started, many of the critical applications still needed somebody to actually log in to the computer and start the application in question. Here, the computers were not part of any site-wide Windows domain, so we had to guess the passwords, just to keep things interesting.

It was a quarter past four when i headed back towards the remains of the fathers’ day reception. The other guests had looked after our kids who had been a bit confused on the non-presence of their father on that fathers’ day reception. I gave my kids a big hug, apologized to the company present, and hoped that i’d never have to see a computer again.

Boy was i wrong.

Tags: , , ,

I usually use nmap to check which ports are open on a machine, be it the local machine or a remote one. Today i was reminded of two tools that do the same.

Netcat is a wonderful do-anything tool that can send data, listen (act as a server) or scan ports on the network. To test, i wrote this on my Linux lappie: nc -vv -l -p 4242 -e /home/llauren/tmp/morjens where the file /home/llauren/tmp/morjens is a small shell script that outputs a friendly Hello World string in Swedish. On the Windows box, i type nc ip.add.re.ss 4242 and if everything works out fine, i am greeted with a cheerful “Morjens bara!”.

Of course, it wasn’t that easy. nc is on the list of dangerous haxxoring tools according to Symantec, so i had to fire up a virtual machine and run netcat there, but then it worked wonders. I also took the opportunity to bemoan the unhemulic security tools to our friendly systems admin Bob who told me that i can set up Symantec AV to ignore some directories. And now netcat works “natively” on my Vista box too.

Netcat, btw, is available for Windows here.

Then it was over to looking for open ports at a client of ours. But lo and behold, nc is denied by their security policy as well. And while i do have the keys to the back door (or rather to their antivirus management console), i was reminded by a flashback of the built-into-Windows tool netstat.

To list all open ports, both listening and client ports, enter netstat -ab where a stands for “all” (connections and listening ports) and b displays the executable involved in creating the connection or listening port.

Now i just have to decide whether to go and edit those security settings (temporarily of course) to see whether i can run nc -vz ip.add.re.ss portnum

Tags: , , , , , , , ,

« Older entries

Bad Behavior has blocked 726 access attempts in the last 7 days.