btm.geek

Thursday, April 17, 2008

blogger fail

Yup, bloggers sftp publishing is terrible. I thought it was UnixShell for a long time as their internet connection plain sucks. When I started to publish to my new colo, I had the same timeout problems.

So this has all been exported to wordpress, and life continues there. Back to 100% self-hosted blogging.

Monday, April 14, 2008

vmware timekeeping part 3

earlier posts here and here.

A review:
1) We removed ntp from the linux guests and left it running on the vmware hosts.
2) We installed open-vm-tools on the guest and live enabled timesync using vmware-guestd

Notes revealed we were gaining about 40s a day.

3) set clock=pit (use clocksource=pit now) in the grub config as a kernel option and restarted a guest

That looks like about 40s over three weeks.

4) today I noticed a lot of "/dev/vmmon[3685]: host clock rate change request 500 -> 998" messages on the vmware hosts (linux) and I set up the recommendations here which is 'host.cpukHz = cpuspeedinkhz', 'host.noTSC = TRUE', and 'ptsc.noTSC = TRUE' to work around possible speed step issues.

I accidentally used khz = mhz * 100 instead of khz = mhz * 1000 which made the time get way off when I stopped and then started the vm I testing was on. This was interesting though because I was afraid I'd have to stop vmware-server, not just an individual vmware-vmx process to get it to re-read /etc/vmware/config.

Looping ntpdate shows about 8/10th of a second gain over 20 minutes. Still more gain than I'd like to see. Will watch the graph and then try again in a week or two.

Friday, April 11, 2008

Avocent KVMoIP LDAP Configuration

The manual is way too confusing about this:

It works like this:
LDAP Overview:
LDAPS works fine with Server 2003 R2 AD, and is preferred (leave it on port 636). If you're using fqdn's, make sure you have DNS servers set in the network section.

On the Search page:
'Search DN/Password' is the Bind DN/Password.
'Search Base' is similarly the 'Base DN'.
'UID Mask' should be 'attribute=%1', replace attribute with the name of the attribute storing the username, so generally with AD this is 'sAMAccountName=%1'

Query page:
If 'Group Container Mask' = 'ou=%1' and Group Container = 'KVM' then we're looking for ou=KVM in the above configured BaseDN. This is where we'll set everything up. I recommend staying at the top of the tree for simplicity.

Target mask should be 'cn=%1' because we're looking for objects and * Access Control Attribute will be 'info' because that corressponds to 'notes' in the ADUC UI.

In this OU container:

1) Create a computer object with the same name as the KVM name under 'Appliance -> Overview'. I renamed this to KVM01. I had to do this on a DC as MMC was crashing on my terminal server when creating a computer object, probably unrelated.

2) Now create a group, call it whatever. In the notes section put 'KVM Appliance Admin'. This is how we define what you can do. Add the KVM computer object to this group, and any users (or groups, ie domain admins) you want.

3) These people will have full access to the kvm and all objects. It sounds like adding access into individual objects requires being in a group with info of 'KVM User' and the computer objects for the actual server names in the group as well. Bah.

Thursday, April 10, 2008

Vista trust relationship login failures

A local Vista computer started having intermittent login failures when a domain user tried to log in about a trust problem with the account database.

Since Vista disables the local administrator account even though it had a password. I used Nordahl's ntpasswd linux boot cd to enable the local administrator account (if I hadn't known the password I could have changed it as well). Of course the CD requires access to the syskey as the SAM is encrypted, but it always finds it automatically since nobody puts the syskey on floppy.

Then I logged in and removed the computer to the domain, changed it's name, and rejoined it and things were fine.

Domain profiles were kept intact by the way.

Thursday, April 03, 2008

FHS Compliance for NFS mounts

Where should one mount shared NFS data?

FHS 2.3 has no advice. All the NFS talk is about how you might NFS mount /usr and the likes.

Options: /opt, /mnt, /srv. mnt is the old school way, but FHS uses the key word 'temporary' which makes sense these days, even though we've started using /media for most things temporary. opt? I stay away from opt since I touched oracle.

/srv : Data for services provided by this system
Sounds good to me. nfs mounts will go in /srv, since it's all data for services provided by this system.

Thursday, March 27, 2008

further vmware timekeeping

I've talked about this in other posts. I've been automating vmware guest creation and configuration. Time has been one of the bigger hassles. The best reading about it is here.

I automated vmware tools install using the open-vm-tools deb (backport to etch). Then used puppet to run vmware-cmd to enable timesync on all of the guests. See here.

This keeps time from falling behind, but we set up some munin graphs and saw time was gaining about 40s a day. so I just wrote another puppet exec to add 'clock=pit' to the end of the kernel lines. Newer kernels use time algorithms that try to correct time for lost cycles. Lost cycles are common in virtualized environments. I'll note how this works out after a week or so.

Wednesday, March 26, 2008

Anti-spam gateway design notes

Once again I'm rebuilding an anti-spam gateway. This time I'm puppetizing it as I go, so I wanted to take some time today to think about the design.

MTA (flame war #1)
About four years ago I built a personal mail server and used qmail. Before that I don't remember what I used, probably sendmail. Qmail's nice because it's small and well designed, but the author had some RFC fixation and support for things like TLS had to be patched in. This qmail install was on gentoo though, and the emerge auto-patched about over 20 features in as it built it. I believe the idea was that these features wouldn't make it into the official source, so they wouldn't be in a binary build either. Pain in the ass really.

I do have memories of using sendmail. Actually, horrible dreams of youthful innocence being torn to shreds by m4. We'll stay away from the beast.

A couple years ago I built an anti-spam gateway using postfix and it was easy enough.

Queueing
In the past I've used amavisd with postfix to run the clamav and spamassassin checks. This has worked by taking incoming smtp messages to postfix and routing them to amavisd on another locally bound port, which scans them and then redelivers them to another locally bound port. One neat thing about this design is you could have amavis running on seperate boxes, with one doing spam, one doing antivirus, and just route between them all, with the final one doing the delivery to the internal mail servers.

qmail had qmail-scanner-queue which tied all of this together in a way that looks similar to MailScanner, that picks up the messages in one folder and when its done leaves them somewhere else.

postfix uses content_filter to tie into antispam otherwise. The trouble with this is that it's already accepted a message by the time it's gotten all of this far.

When you decide something is spam, you can do a couple things. If you're still in the SMTP phase, you can reject it before you accept it. I prefer this. Otherwise you've accepted it and you can delete it, return it, tag it (modify the subject), or grey list it somewhere. Option #1 is bad because it may not have been spam. #2 is bad because you have to generate a email message back to the sender address saying "We think this is spam" and if it was spam, whoever gets it is certainly not the person that sent it. This is better than #1 though because you get less support calls for disappearing email. #3 and #4 are annoying because you still have to look at the mail.

In the past I've used RBLs in postfix to reject mail, which gets a lot of spam, then tagging in spamassassin so it'll filter into users JunkMail folders so at least they only look at it if they're looking for something. This is probably acceptable still. Sometimes I'll delete mail based on spamassassin score if it's really high, because if someone sends you a legitimate email that gets a score that high, you probably don't want to talk to them anyway.

Tuesday, March 25, 2008

Ubuntu 7.10 GRUB Error 21

A recent install of Ubuntu Gutsy 7.10 on a slave IDE disk (cable select) with an existing master IDE disk with XP Pro on it rebooted and got a GRUB Error 21. I was about to boot off the network again to go into rescue mode and look at the grub configs, but when I saw the boot menu I wondered what the boot order since I just added the disk. When I got into BIOS I saw that Primary Slave was OFF. Ubuntu had seen the disk even though the BIOS had it disabled, and since GRUB talks to the BIOS it couldn't find the disk. Enabling the disk by setting it to auto in the Dell BIOS fixed GRUB.

Monday, March 24, 2008

dimdim on centos (fail)

I managed to track down a copy of centos 4.5 i386 and made a VM to try to get dimdim running. I had all sorts of fun earlier trying to get it running on the much preferred debian. I was talking to a friend of mine about this attempt and he noted that when someone requests him to install some OSS software, one of his major filters is "does it install on debian?". If it doesn't have a deb, it fails the bar. This is a pretty good bar. There are exceptions for things like java before they relicensed it. Perhaps, "does it install on ubuntu?" is a better question.

That the "installer" for dimdim installs a pile of rpms from dimdim's website that have nothing to do with the product (glibc? wtf?) is a great example of why we don't use rpm based linux distributions.

1) People who don't understand the differences between rpm/deb distros tend to not respect why packaging is essential, and do stupid shit like put system library rpms in their installer.

2) RPMs suck, and therefore RPM based distros suck. I'm not going to get into a flame war over this, but simply try to take your major RPM distro and upgrade it from one major version to the next. Then try to convince me how the steps you took are not cruel and unusual punishment. ("apt-get update && apt-get upgrade && apt-get dist-upgrade" Wow.)

Anyways, I ran the installer per the PDF documentation that reads like it was made by the marketing department. It managed to make it through after doing a bunch of kooky stuff to remind me that it is just a shell script, not a packaging system. (Note that if you run it twice, it'll fail because lighttpd is already installed. Maybe this bug that was supposedly fixed last year?).

Once you run the startup script, if you connect to the host you'll get something like this:


404 Not Found
The path '/' was not found.

Traceback (most recent call last):
  File "/usr/lib/python2.3/site-packages/cherrypy/_cprequest.py", line 551, in respond
    cherrypy.response.body = self.handler()
  File "/usr/lib/python2.3/site-packages/cherrypy/_cperror.py", line 198, in __call__
    raise self
NotFound: (404, "The path '/' was not found.")

You need to go to http://host/dimdim/, the trailing slash is essential.

This time around the site was less responsive. Sometimes when you start a meeting and you install the plugins the first time, the connect to the meeting fails. Attempts to start a new meeting fail with "Exceeded server limit of meetings". I thought this was a bug, which I worked around by restarting the server. But this time I restarted the server, joined a meeting, then tried to create another one and got this message. Let's make this clear since dimdim doesn't.

The Open Source Edition of Dimdim is intentionally crippled.

You can only have one active meeting at a time. While their editions page mentions that 'dimdim pro', a SaaS product, only allows one meeting at a time, the OSS column merely says 'Free' in that box. This is really perturbing. It wouldn't be so bad if they were up front about it. There's a thread here and here on the official sourceforge forums with no official responses. Someone there talks of having reverse engineered the limitation, but it's a "email me" type talk, not an open discussion.

Grepping for 'maxConcurrentConferences' in the dimdim install shows it set to 50 in the dimdim.properties file. The forum post refers to a comment of:

## NOTE : In this Open Source Edition only 1 Meeting at a time is allowed. If you need a Dimdim Meeting Server with higher capabilities then please
## contact sales@dimdim.com.

However my dimdim.properties lacks any such note. Perhaps in the source code rather than the slightly older centos installer it says this. This value is set to 50 by default in my config files, I recall seeing some mention somewhere that this limit was in a jar file.

I later found a thread by a user complaining that only five or six users could get into a meeting. This response appears to be by a dimdim employee and states:

Open Source SF edition of dimdim is a personal edition of the meeting server and is meant to cater to single meeting. We have currently placed the restrction to upto 5 participants. For larger meetings, the resources required increase significantly and require dedicated servers.
Please use the hosted dimdim edition - for hosting larger meetings. We also provide an enterprise server build for on-premise installations.

Someone replies with the same sort of arguments that seem obvious to any OSS fan, and links to a webarchive copy of dimdim's website where they say:

Dimdim makes extensive usage of open source components and products and hopes that someday Dimdim itself will be useful to others in the way others have been useful to it. Big thanks to the communities and individuals of all the open source projects used in Dimdim.

I assume at some point the company had OSS fans, and management has pushed it away from OSS.

Sigh. Dimdim is a very pretty waste of time.

Programming an old EM01 Websensor

I have an old EM01b websensor made by eesensors. They're an awesome product, basically a small webserver that senses humidity, temperature and illumination. We've been using an old one as a nagios monitor for the server room temperature. The newer model (both are called an em01b). The one pictured is the older model. I recently picked up the newer model as it comes with one of three options: contact closure, thermistor (additional temperature monitor), or voltage monitoring (great for UPS batteries). I got one with contact closure and tied it into the Common Alarm circuit on our HVAC unit because one of them recently shut down due to a high water level (drain was clogged) and we didn't know until nagios threw a temperature warning. Now nagios can poll for the contact closure and will know of an HVAC alarm immediately.

The EM01b isn't cheap, but I'm sure it's cheaper than a separate monitoring unit for a UPS or HVAC unit, and since many people use nagios, it ties in pretty well. I also wrote a ruby munin module for it recently, which I'll post later when I get permission from work to keep the copyright on it and GPL it. This is awesome for temperature trending so you can see how all those servers you've added over the last six months have affected environmentals in the data center.

Once I had the new EM01b set up, which you program via the network interface using HTTP requests, I went about reconfiguring the old one. The old ones are a little tougher as there is no information about them on the web. I had to email eesensors and I was sent this link to the old cdrom. Maybe nobody else will have this problem, but since I hadn't bought the old em01b, I had no idea how to configure it. It comes with another module, the es00r, which an esbus to serial interface. You need to connect this to the 6pin esbus interface on the em01b using a 6pin phone cable. Power up the em01b with the es00r connected and run the Com2ex*.exe file in the EM01_Configuration folder the zip file. You need to connect the es00r to the computer with a regular M-F RS323 cable. Select your COM port and hit connect. If it doesn't say "communications established" on the bottom of the program, it's likely you don't have a true RS232 cable. I had to try a couple to find one that would work.

Once you get an established link, restart the em01b. Re-establish the link, then click read/verify to ensure the communications are good. Enter the configuration you want in, and click transfer to send it to the em01b via the es00r. When it's complete, restart the em01b, reconnect, and hit read/verify to make sure it got there ok.

update:I was getting the same values from the early em01b every query and emailed eesensors about leaving the es00r connected and they said:

The Es00r cannot be plugged in - it may interfere with the Websensor data which could explain the reason you are seeing the same values. In addition, the 6 random digits should be appended to the back of the "em" command (ie. "em123456") on earlier models.

I disconnected the es00r and power cycled the em01 and I'm getting different readings now over time. I'm still querying index.htm?em though, as the v4.2 manual says this is okay and it seems to work for me:

Compatibility with the earlier models of Websensor has been maintained. Any version of the Websensor will always return temperature, relative humidity and illumination data by sending: http://192.168.254.102/index.html?em

Tuesday, March 18, 2008

Widemile takes over world by way of multivariate testing

At this point, it's official, Widemile is taking over the world. What?! You want proof?

I'd like to pretend sometimes I don't know a whole lot about business, but in actuality there's a bunch of experience kicking around in my past and I tend to pick up more than the average bear. The difference is that I've never considered myself a business person, or that it was my primary responsibility by happenstance (other than while consulting). But I've done lots of supportish things, lots of consulting, have had to manage people and the likes. More than I'm willing to admit to even myself. Anyways, the point is I tend to only do business related things when I don't feel like someone else competent is doing them. So I notice things, but keep them to myself.

I work at Widemile as a Systems Administrator. I don't even know what that title means anymore. I think I'm the first full time, non development sysadmin there. I do a number of things, like helping users find the any key, remind them cdrom drives aren't cup holders... actually, I spend most my time building the operations platform. So I do development, like puppet, ruby, shell scripts and the likes, but I'm not a developer. Or so I say. Endlessly. Fortunately those people with developer in their title know what they're doing.

When I first started working at Widemile, I wasn't particularly interested in the business plan. Linux systems engineering? Sounds good. What do you do? Web 2.0 Product? Check... I've heard it. People sometimes don't realize how socialized a sysadmin gets, everybody wants to be your friend when something doesn't work. (There is no friend checkbox in RT. People don't make note of this.) So I hear a lot of chatter about our product and the results it brings in. I figured, "automated testing of a web page? Sounds good, makes sense, but it's novel right? I mean, how much can it really make a difference?".

The answer is tons. The term they use is Conversion Marketing. I'm sure this means something to SEO/SEM types, but what we really do is "Make more people buy your stuff." Which, after all, is kind of the point of business. At this point, I have no reluctance to put forth that using Widemile's product will make more people buy your stuff. It works kind of like this:

You sell stuff to farmers. You pay some carebear 1000 gold pieces to hang around the farmers yakking about how great your stuff is. On average, you make 10,000 gold pieces. Now what if you had some 'multivariate testing' pixie dust to sprinkle on that carebear such that there was less yakking, and more of what people wanted? Wait, you ask, how we know what people want? Magic! (Math...) You give us 250g, and we find you a better carebear with Math Dust for 750g and now you're making 20,000 gold.

Jokes aside (it's hard, really). All the talk I hear is of our customers actually having huge success. I'm not in sales, I can't be quoting things, but from my techie point of view with secret business experience, it's magic "something for nothing" sort of success. When I've managed to convey to people what Widemile does, a couple educated few have said, "Oh, like Google Optimizer." No, actually. Congrats on knowing someone in the optimization business, even if it is Google. It's basically like this, google has a thinger. They get cool thingers, like take Dodgeball. I love dodgeball. Second to Google Search, it's my most used google product, even more than google maps. How much Dodgeball changed... in years? Little, it's no secret. Some things Google makes are cool, don't get me wrong, but there are lots of reasons Google has products, and they're not always to be innovative.

Widemile is a Landing Page Optimization (LPO) pioneer. They have the secret sauce (ooh, see what I did there? I linked to an article promoting operations, slam!). Seriously though, people are being sold on LPO that's called LPO but it doesn't compare to what we do. There is secret sauce out here, real stuff. If you care to know the ingredients, I encourage you to go read every character on Billy's blog. I don't have a lot of free time, and LPO isn't a package management system that generates me more free time, so I'll leave it to you business types to figure this stuff out. But it's neat all the same.

The reality is, from a personal point of view: somehow less and less companies seem to get what I want on the web. I recall hearing talk over the years about what kind of time opportunity you had to capture someones interest in traditional marketing. It was pretty short, I forget what it was exactly. I'll tell you this though. If I don't have an established relationship with a company (which, if I did, it's not really marketing when I go to their site, since I'm going there anyways), how long will I fudge around trying to find where to click next to get what I want? A very short amount of time. What do I want from you, web? Simplicity with endless bounds. I want the tubes to be lego. By itself, it's just a little piece of plastic, but with a handful, you've got a Space Elevator. Alright, maybe not the best example, but that's the point.

Today I was trying to find support information for a Netgear ReadyNAS. The web has been defeated in the world of driver searching (search for a dell driver if you haven't experienced this), training me to start at a vendors website and drill down, rather than just search. Netgear's web site is terrible. What do you get if you just search for ReadyNAS support? Netgear, and look, a community oriented site! Communities have it figured out because they're usually filled with information created by people who were once trying to figure things out. Black boxes are alright if a) we buy them to do something and b) they do it.

Try going to newegg and finding RJ45 crimp connectors without searching. Then try with searching. It's tough. Most websites are tough to get what you want. This is why tags are getting popular on web 2.0 sites like flickr and delicious. People choose tags that are meaningful to them because we want to be able to find what we want. How do you know what other people are looking for? If you don't know, it only makes sense to test to me. This is where split testing sounds so silly to me, of all the possibilities you're trying two. That you probably thought up yourself. Isn't this supposed to be a test to see what other people want? There really is Magic in Widemile's platform, and I'm serious when I say there's spiffy math behind it a secret sauce design, but software that finds out what variation is most successful? It's easy to understand how awesome that is. If you're spending any significant amount of money on online advertising and not doing LPO, you're throwing away money.

Monday, March 17, 2008

debugging netgear readynas (was infarant)

I've talked in the past about how cool it is to have a root shell on your NAS. I'd like to take a moment to second that.

Some software that copies web logs off one of our readynas 1100s wasn't working today. I got looking and it used a domain account. I realized pretty quickly it had stopped working when we had upgraded the NAS devices to the new domain, but we don't use this one setup often enough to have noticed it had stopped running.

I logged into the readynas and used wbinfo to verify that winbind was working right. While poking around the log files I saw and error about proftpd and PAM. I'm lucky to have to of these readynas boxes, so I verified that the pam configs hadn't changed compared to the production system. I then checked the proftpd binary and it had changed size. Raidiator appears to be debian based, you can see woody packages in a 'dpkg -l'. Interestingly 'dpkg -s proftpd' shows version '1.3.0-9.netgear6' on both machines, although it had definitely changed. I copied the proftpd binary from the production nas to the backup nas and restarted proftpd and authentication started working again.

5% chance it was a fluke, but I think it's a real bug that slipped past QA and if not for being open source based I'd be sitting in a support queue rather than having the problem fixed and blogging about it already. Forum post here too.

They've been adding lots of cool features to the ReadyNAS line, like a built in bittorrent client and some neat photo support. It already supports CIFS and things like rsync, making it pretty accessible and functional out of the box. Besides what looks like decent support for third party development. That there's a real usable website separate from the netgear main site points to there being some decent smart people behind the project, and possibly at Netgear for letting their acquisition do some things the right way.

Despite the RND4000 (4 Disk desktop model without disks) being about $800, I want one just to hack on raidiator. Too bad it's not a fully open source distro.

Sunday, March 16, 2008

security questions, offline banking?

It's an odd thing to say, but I've considering -not- paying bills, banking, etc online anymore. Why? Security Questions. My bank account just made me add some, and I've been struggling with Sallie Mae for some time, having had to reset my account twice since they've added security questions and not before. I suppose it's not as bad as how Key Bank liked to ask my debit card and pin for security verification. If there was anything to not entire into a web site, I think a debit card pin would be near the top of the list.

Worst of the security questions is they require exact answers. Gone are the days of "what is your mothers maiden name", instead we have "What is the street your favorite residence is on?". How the hell do I remember if it's "26th" or "26th Ave" or just "26" or some other combination? Name of my first teacher? Which one?

The solution? I use a password as the answer to all security questions now. Where's the version of Dell IdeaStorm that applies to the web on the whole? How long is it going to take until the increase in support calls to reset accounts makes web sites realize this is the worst idea I've seen to date? Meh.

Saturday, March 15, 2008

An Exchange 2007 server on which an address list service is active cannot be found

While modifying the mailbox quotes on a user mailbox on exchange 2007 I got the error "An Exchange 2007 server on which an address list service is active cannot be found". Lots of chatter here but I did look at see that the 'Microsoft Exchange System Attendant' service wasn't running, although it was set to automatic. Start -> Run -> services.msc and started it and replayed the actions and the changes worked afterwards. Not sure why it wasn't running.

Thursday, March 13, 2008

dimdim on debian etch

update2: I couldn't get it working right on centos either, although I spent less time on it. I did verify that the OSS edition of dimdim is crippled. Do not use dimdim.

update:this install managed to get the conference server going, possibly the streaming server, but not the media server. there's good information in it though.

'Opensource'. Heh. I think a decent community makes things much more open source than a license does, but semantics...

-worst build system ever- What's the point of packaging tar, sed, python with your distribution? a) you're using rpms and don't know better b) you only want to ride the OSS wave, but you don't actually want to be part of the OSS community?

We'll install a ton of shit via apt rather than touch those dirty dirty rpms that come with the offline installer.

download the fancy "centos" offline installer.

unzip *zip
chmod 755 *run
mkdir dimdim
./*run --tar -xvf -Cdimdim
# install lots of crap. who knows?
apt-get install sun-java5-jre openoffice.org libaio1
cd /usr/local ; tar -xvzf ~/dimdim/dimdimrepository/dimdim.tar.gz

Make sure nothing is running on port 80 (netstat -lnp), stop it if it is.

Go to /user/local/dimdim
Read Linux_Readme_1.5.0.txt

vi server.xml, replace DIMDIM_PORT_NUMBER with 80, edit the servernames at the top
vi wrapper.conf, replace wrapper.java.command= with /etc/alternativa/java
# the above is a link into the above installed jvm by way of the alternatives system
./dimdim start ; tail -f wrapper.log

Seems to.. do something?

edit?: ConferenceServer/apache-tomcat-5.5.17/webapps/dimdim/WEB-INF/classes/resources/streaming.properties

The Conference Server appears to be in ConferenceServer/, and is the main web interface that you want running on port 80. the dimdim.properties and server.xml in /usr/local/dimdim are the most important files. ./dimdim start will start it, then you can watch wrapper.log

The Streaming Server is in StreamingServerCluster/server1. There's Information about duplicating it in Linux_Readme_1.5.0.txt. StreamingServerCluster/server1/conf/red5.properties contains it's port configurations, this is what runs on 1935/30001. I don't really know what the http.port is supposed to point to.

The Media Server... Who knows? I think this is what dimdim.dmsServerAddress in dimdim.properties is supposed to point to. Before I set this, I could connect to dimdim but portions didn't work. After I set this, the site would lock up just after the browser checks and future attempts to log in reported that the server was full of meetings or something like that.

automating vmware guest deployment with capistrano

This will get some more work, but I didn't find much out there so this is a good starting point for someone

It appears straightforward enough, but feel free to ask any questions. You'll need the rest of your operations platform pre-built, such as existing vmware hosts, pxe booting a debian install, etc.

I don't think blogger is killing anything important. Some day I'll setup an actual repository instead of using blogger for this crap. On the new server, next vacation. :)


# Capistrano recipe to build a vmware guest
# Bryan McLellan -- bryanm@widemile.com

require 'erb'

logger.info("Vmware guest creation script logs in as root")
set(:user, "root")

vmxtemplate = %q{
#!/usr/bin/vmware
config.version = "8"
virtualHW.version = "4"
scsi0.present = "TRUE"
scsi0.virtualDev = "<%=disktype %>"
scsi0:0.present = "TRUE"
scsi0:0.redo = ""
priority.grabbed = "normal"
priority.ungrabbed = "normal"
guestOS = "other26xlinux-64"
ide1:0.startConnected = "FALSE"
floppy0.startConnected = "FALSE"

displayName = "<%=fqdn %>"
scsi0:0.fileName = "<%=fqdn %>.vmdk"
memsize = "<%=memory %>"

Ethernet0.present = "TRUE"
Ethernet0.virtualDev = "e1000"
ethernet0.addressType = "generated"
ethernet0.generatedAddressOffset = "0"
Ethernet0.connectionType = "custom"
Ethernet0.vnet = "<%=eth0 %>"

Ethernet1.present = "TRUE"
Ethernet1.virtualDev = "e1000"
ethernet1.addressType = "generated"
ethernet1.generatedAddressOffset = "10"
Ethernet1.connectionType = "custom"
Ethernet1.vnet = "<%=eth1 %>"

tools.syncTime = "TRUE"
}

pxetemplate = %q{
DEFAULT etch_i386_install_auto
TIMEOUT 100

LABEL etch_i386_install_auto
        kernel debian/etch/i386/linux
        append vga=normal initrd=debian/etch/i386/initrd.gz preseed/url=http://debian.example.org/preseed/autoserver-etch.cfg debian-installer/locale=en_US console-keymaps-at/keymap=us hostname=<%=hostname %> domain=<%=domain %> interface=eth0 --
}

def lastdhcpip(ourmac)
  curLeaseIp = nil
  curLeaseMac = nil
  lastip = nil

  f = File.open("/var/lib/dhcp/dhcpd.leases")
  f.each do |line|
    case line
    when /lease (.*) \{/
      curLeaseIp = $1
    when /hardware ethernet (.*);/
      curLeaseMac = $1
      if ourmac == curLeaseMac
        lastip = curLeaseIp
      end
    end
  end

  f.close
  return lastip
end

set(:disktype, "lsilogic")
set(:disksize, "3Gb")
set(:memory, "768")

#set(:hostname, fqdn.match(/^[0-9A-Za-z-]*/))
#puts("hostname: #{hostname}")

task :build, :roles => :host  do
  set(:host) do
    Capistrano::CLI.ui.ask "vmware hostname: "
  end unless exists?(:host)

  role :host, host

  set(:hostname) do
    Capistrano::CLI.ui.ask "guest hostname (vm16-dev-ots04): "
  end unless exists?(:hostname)

  set(:network) do
    Capistrano::CLI.ui.ask "guest network (prod/corp/test): "
  end unless exists?(:network)

  case network
    when /prod/
      set(:fqdn, "#{hostname}.prod.example.org")
      set(:domain, "prod.example.org")
      set(:eth0, "/dev/vmnet4")
      set(:eth1, "/dev/vmnet11")
    when /corp/
      set(:fqdn, "#{hostname}.corp.example.org")
      set(:domain, "corp.example.org")
      set(:eth0, "/dev/vmnet0")
      set(:eth1, "/dev/vmnet0")
    when /test/
      set(:fqdn, "#{hostname}.test.example.org")
      set(:domain, "test.example.org")
     set(:eth0, "/dev/vmnet2")
      set(:eth1, "/dev/vmnet14")
  end
  puts("fqdn: #{fqdn}")

  result = ERB.new(vmxtemplate).result(binding)

  run("mkdir /srv/vmware/#{fqdn}")
  logger.info("Building vmx configuration file")
  put(result, "/srv/vmware/#{fqdn}/#{fqdn}.vmx", :mode => 0755)

  logger.info("Creating virtual disk")
  run("/usr/bin/vmware-vdiskmanager -c -a #{disktype} -s #{disksize} -t 2 /srv/vmware/#{fqdn}/#{fqdn}.vmdk")

  # start and stop vm to generate uuid and MACs
  logger.info("starting VM")
  #run("/usr/bin/vmware-cmd -s unregister /srv/vmware/#{fqdn}/#{fqdn}.vmx")
  run("/usr/bin/vmware-cmd -s register /srv/vmware/#{fqdn}/#{fqdn}.vmx")
  run("/usr/bin/vmware-cmd /srv/vmware/#{fqdn}/#{fqdn}.vmx start")
  sleep 1
  run("/usr/bin/vmware-cmd /srv/vmware/#{fqdn}/#{fqdn}.vmx stop hard")
  macaddr0 = nil
  run("cat /srv/vmware/#{fqdn}/#{fqdn}.vmx") do |ch, stream, data|
    case data
    when /ethernet0.generatedAddress = "(.+)"/
      macaddr0 = $1
    end
  end
  macaddr0dash = macaddr0.gsub(/:/, "-");

  pxeConfig = File.new("/srv/tftp/pxelinux.cfg/01-#{macaddr0dash}", "w", 0644)
  result = ERB.new(pxetemplate).result(binding)
  pxeConfig.puts(result)
  pxeConfig.close

  # Box gets a different ip sometimes on install than on first boot. annoying
  run("/usr/bin/vmware-cmd /srv/vmware/#{fqdn}/#{fqdn}.vmx start")
  logger.info("Sleeping 30 seconds for network startup")
  sleep 30
  ipaddr0 = lastdhcpip(macaddr0)
  logger.info("host #{fqdn} is now building and we be available at #{ipaddr0}")
  File.delete("/srv/tftp/pxelinux.cfg/01-#{macaddr0dash}")
end

parsing dhcpd.leases with ruby

Needed to get the IP address of a certain mac from the dhcpd leases file, wrote this, seems to work, albeit short. IANAP, YMMV. All of my programming comes from looking at examples, so any faults of mine are actually someone else's. Blame fR and niblr!


#!/usr/bin/ruby -w
# getdhcpip.rb Bryan McLellan -- bryanm@widemile.com
# parse through dhcpd.leases in search of a mac to get it's current ip
# assume not malformed. remember that this is a log file and the most recent (bottom) is the most accurate

def lastdhcpip(ourmac)
  curLeaseIp = nil
  curLeaseMac = nil
  lastip = nil

  f = File.open("/var/lib/dhcp/dhcpd.leases")
  f.each do |line|
    case line
    when /lease (.*) \{/
      curLeaseIp = $1
    when /hardware ethernet (.*);/
      curLeaseMac = $1
      if ourmac == curLeaseMac
        lastip = curLeaseIp
      end
    end
  end

  f.close
  return lastip
end

if ARGV[0]
  puts lastdhcpip(ARGV[0])
else
  puts "Requires MAC address as argument: getdhcpip.rb 00:00:00:00:00:00"
end

Wednesday, March 12, 2008

Stopping vmware guests with vmware-cmd

Lots of talk out there about "VMControl error -8: Invalid operation for virtual machine's current state: Make sure the VMware Server Tools are running" when trying to use "vmware-cmd stop" to stop a VM. Stop by default tries to do a soft stop, where it asks the guest to shut down.

I'm scripting a start followed by a stop so vmware will generate new mac addresses for a vmx, and thie works "vmware-cmd stop hard". 'hard', 'soft' and 'trysoft' are listed here as options.

Theres information here about how MACs are generated by the way. Removing mac address lines from the vmx file will cause them (and the uuid if it's removed to) to be generated on startup and added to the vmx file.

Tuesday, March 11, 2008

Support Contracts

I hate support contracts. Google is always faster than working your way up to technical people. Generally I've liked Cisco support, because I can open a TAC case online, and they're super responsive.

I've dis-liked Dell support in the past because when you end up with desktops and laptops on different levels of support, you have to call different places depending on the support level. I want to have a single number, punch in the service tag and have it auto-direct me.

I like Dell's web-support, but often you put in a Service Tag on enterprise equipment, get someone, and then they tell you it's too enterprise and they can't help you so you have to call.

10:37am - Place web support chat on MD3000i Array, non-critical failure.
10:39am - Told they can't help me
10:40am - Call phone support, operator transfer me based on service tag.
10:45am - Support technician transfer me again, says autodialer or something is inefficient.
10:55am - Work with technician on the phone.
11:10pm - Email support log to technician
11:30pm - Rounding off, I get off the phone, technician is going to send me a new controller.
12:30pm - New controller arrives via "UPS SonicAir" by taxi. Holy Crap.

Alright... That'll do pig.

running winbindd without smbd and nmbd

Using Winbind rather than pam_ldap can me more reliable at times. These days, you don't need smbd/nmbd for winbindd to work though. Unfortunately, it sounds like you did once and most the documentation out there says as much.

On debian etch:

Stopping /etc/init.d/samba and winbind, then starting winbind worked fine, winbind appears to default to dual daemon mode, so you don't need to enable as much in /etc/defaults/winbind these days, '-Y' sets it back to single daemon mode.

Also you'll notice the init script doesn't require samba.


#!/bin/sh

### BEGIN INIT INFO
# Provides:          winbind
# Required-Start:    $network $local_fs $remote_fs
# Required-Stop:     $network $local_fs $remote_fs
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: start Winbind daemon
### END INIT INFO

Thursday, March 06, 2008

startup web 2.0 operations

While at Haydrian, Adam Logghe sent me this article about startup operations which is sparked by O'Reilly's rant on startup secret sauce. Not having built a completely automated windows operations deployment system before I can only postulate to a degree, but I disagree with the comment about Microsoft having a leg up on open source because their server team works with there operations team.

In many open source environments, the operations team also happens to be the server operating system team, that is, many operations people in open source are contributers. When starting at Widemile we had a plan to kick start operations. Some of the people here had worked with Adam from HJK in the past. These people are a great example, not only is HJK heavily involved with puppet, including successful deployments, they also develop open source tools like iclassify to tie into puppet and capistrano.

Last night I finished setting up the largest hump for me in our new ops platform. The design is this, servers on vmware guests, with the hosts running on blades with vlan trunking. Working with HJK's help (I highly recommend these guys, just don't everyone hire them at once, I like having access to them myself) we've got a full puppet deployment and last night I finished transitioning all of the the servers to vlan trunking. Need another web server? Check munin for a vmware host with available load, create a new guest (haven't automated this yet) and to an automated network install. Then push puppet and iclassify (one command) out, tag the new node in iclassify ( a couple clicks) with it's role, and puppet pushes out all the required software and configs for that server.

What else do you get out of this? One of the servers wasn't working today, i couldn't get to it on the network. I jumped on the console via the vmware server gui and saw one of the interfaces was bridged to the wrong vlan. Fortunately I can change which /dev/vmnet interface on the host the guest is tied to from the vmware management utility in real time without even rebooting the machine, and everything was fixed.

All the benefits of blades aside, the software solutions used here are wonderful. I've implemented a few hacks like using the vmware-server 'backdoor' to identify what host a guest is on, and have that become an iclassify attribute automatically, usable in iclassify, puppet and capistrano tasks. Now granted, all of this requires a very broad level of experience, but once you get it setup, it's not much work to maintain. When you're talking about having piles of servers dropping from the sky, this is what you want already setup, rather than a handful of admins manually doing configurations.

jboss ha weirdness

I don't know enough about jboss to make an intelligent keyword filled post about this, but I wanted to note that while troubleshooting jboss ha-jndi jms crap, make sure that telneting to port 1099 produces an fqdn. on a couple servers /etc/hosts had a different hostname portion of the fqdn than the hostname alias, and this silently broke JMS. Telneting to 1099 revealed this, or at least, indicated it was dns related as working boxes were giving an fqdn while non working boxes weren't. I think jgroups isn't friendly with dns over all.

Thursday, February 28, 2008

configuring vmware guest time synchronization

I really want a google certification. That is, a certification that says I am an expert googler. To the uninitiated, google may seem like a simple thing, but finding what you really need usually isn't.

I saw this forum post while trying to figure out how to configure vmware guest time synchronization with scripts running off of vmware server. Something wasn't write though:

# vmware-guestd --cmd 'vmx.set_option time.synchronize.tools.startup 0 1'
Unknown option name

So I went and grabbed the new open-vm-tools source. In 'lib/include/vm_app.h'

#define TOOLSOPTION_SYNCTIME "synctime"
#define TOOLSOPTION_COPYPASTE "copypaste"
#define TOOLSOPTION_AUTOHIDE "autohide"
#define TOOLSOPTION_BROADCASTIP "broadcastIP"
#define TOOLSOPTION_ENABLEDND "enableDnD"
#define TOOLSOPTION_SYNCTIME_PERIOD "synctime.period"
#define TOOLSOPTION_SYNCTIME_ENABLE "time.synchronize.tools.enable"
#define TOOLSOPTION_SYNCTIME_STARTUP "time.synchronize.tools.startup"

Trying combinations of the last two did nothing, but I did have vmx.set_option as a search term though and eventually found this post that just uses:

vmware-guestd --cmd 'vmx.set_option synctime 0 1'

Nothing appeared on the screen when I ran this on a guest, but I did notice that the vmx file for the guest on the host automatically changed from:

tools.syncTime = "FALSE"

tools.syncTime = "TRUE"

I thought I was going to have to write a sed script and have puppet change all the vmx files and do a reboot of all the guests. Much happier now.

Wednesday, February 27, 2008

stupid vim notes

I really need to just put my vimrc in a git repo on the tubes somewhere. I'm always forgetting these things. I'm not hardcore about custom configs, but this stuff really helps. If you're not familiar with these things, this helps.

my ~/.vimrc file usually looks like this:

syntax on # enable coloring for source and scripts
set tabstop=2 # make tabs two spaces instead of five or whatever
set expandtab # uses spaces instead of tabs
set background=dark # make that dark blue text light blue because I use black backgrounds

When you open a dos text file on a unix box, sometimes it's full of ^M characters. This is because of the CR/LF or CR difference. Sometimes it's just visually annoying and distracting, sometimes a daemon crashs and burns because of them. I used to use :set filetype with unix/dos or something to convert files. These days I just open the file in vi/vim and do

:% s/^M$//

You need to enter the ^M by typing CTRL+V then CTRL+M.
:% means all lines
s is a substituion regex
^M$ is what you want to match, the $ meaning 'at the end of the line'
the emptiness inside the // means you want to replace ^M with nothing

then save the file. ( :wq )

Tuesday, February 26, 2008

getting hostnames between vmware hosts and guests

The vmware-tools are open source now. There's a open-vm-tools package for lenny/sid, but not etch. There are people out there who have back ported it.

This package appears to make a 'guestinfo.ip' variable, which is a method for passing data between the host and guest without networking. There does not appear to be any variables for the hostname of the guest or the host by default, which is REALLY, REALLY dumb. You can make one though.

guest$ /usr/sbin/vmware-guestd --cmd 'info-get guestinfo.ip'

This is really awesomely funny:

guest$ /usr/sbin/vmware-guestd --cmd 'info-set guestinfo.hostname'
Two and exactly two arguments expected
guest$ /usr/sbin/vmware-guestd --cmd info-set guestinfo.hostname
Too many mandatory argument(s) on the command line. The maximum is 1.

[04:00pm|btm> HA HA HA HA HA
[04:02pm|jet_li> btm: welcome to my world
[04:02pm|jet_li> btm: here's a hint. typing harder won't help
[04:03pm|jet_li> btm: neither will profanity, or throwing things

This does work:

guest$ /usr/sbin/vmware-guestd --cmd 'info-set guestinfo.hostname foo'

Then on the server you can run 'vmware-cmd -l' to list your config files. Then run:

host$ vmware-cmd '/path/to/config.vmx' getguestinfo hostname

And you get:

getguestinfo(hostname) = foo

Now go do something useful with it (I'm going to use it with iClassify and puppet.)

Friday, February 22, 2008

getting a debian-installer ssh shell the hard way

There has to be an easier way...

Boot up into the installer, grab a vty2. (ALT-F2)
anna-install network-console (installs ssh)
network-console-menu (set password)
nano /etc/passwd (set shell to /bin/ash for installer user)

now you can ssh in as installer. if you don't use network console and just install openssh-server-udeb, you don't get hostkeys and config files. network-console-menu generates the hostkeys for you. if you don't change the shell you'll get dumped into the network-console menu when you ssh in, which is okay if that's what you want.

all this just to: 'tar -cvf - . | ssh installer@w.x.y.z tar -xf -C /target' meh.

creating debian release files for a local repository

In the past I've tried to hack the release file with sed, this works better. Namely my local repo's packages files were not in the Release file, and apt was getting upset about that now that I'm using signatures (SecureApt).

Somewhere make an apt-release.conf (copied and modified from here):

APT::FTPArchive::Release::Codename "etch";
APT::FTPArchive::Release::Origin "localhost.example.com";
APT::FTPArchive::Release::Components "main";
APT::FTPArchive::Release::Label "Local Debian Repository";
APT::FTPArchive::Release::Architectures "i386 amd64";
APT::FTPArchive::Release::Suite "stable";

The use apt-ftparchive to create the release file:

apt-ftparchive release -c /path/to/apt-release.conf \
/path/to/etch \
> /path/to/etch/Release

Then sign it: (you do have a local key and all that jazz, right?)

gpg -b /path/to/etch/Release
mv /path/to/etch/Release.sig /path/to/etch/Release.gpg

Should work fine for ubuntu too.

installing from a signed debian repository day 2

problem #1: the "d-i mirror/*" options don't support pushing a different key. /usr/share/keyrings/archive.gpg is hardcoded into net-retriver. This can be worked around by modifying the initrd like I did here. This is as of etch / net-retriever 1.15. However, rebuilding the initrd with your keyring only works up until base-installer. I opened bug #467049.

problem #2: base-installer does an mkinitrd near the end chrooted inside /target. This is before apt-setup runs and pulls down "d-i apt-setup/local0/key" do the apt-install that runs get dependencies for mkinitrd fails.

[09:40am|otavio> btm: you can do that putting a file on /target even before base-installer. (but after partitioning)
[09:40am|otavio> btm: /target/etc/apt/apt.conf.d
[09:40am|otavio> btm: it's ugly but works
[09:49am|otavio> btm: yes, there's ... this requires you to provide a signed repository and a key
[09:50am|otavio> btm: but in a way that it integrates
[09:50am|otavio> btm: i've done, long time ago, a patch to base-installer to allow it to, using preseed, install a package with base
[09:51am|otavio> btm: so it could be used for thta case where you _do have_ a package with the key

This is too much work right now. My repo is local, so I'm going to go back to running allow_unauthenticated and trust my network. This explains why all the preseed examples on the internet while warning that allow_unauthenticated is insecure, don't have an example of the correct solution.

Note that after the reboot you need to do an 'apt-get update' to get the Release files and signatures for the local repository before apt-get will stop complaining about the unauthenticated-ness of the packages. Bug #467063.

Thursday, February 21, 2008

signing your local debian repository

(project incomplete at this time. I can't see straight)

Usually when I configure a local PXE install of an apt-mirror i use 'd-i debian-installer/allow_unauthenticated string true' so I can add my own packages to a mirror. I think in the future setting up two separate mirrors on different virtual hosts is the solution, because I always leave myself with a messy series of symlinks between the web tree and the apt-mirror tree and my own repositories. Only Adam has ever had to look at my mess, so I've survived without too much mockery.

On the most recent adventure I tried hacking the Release file. However recently I've had some consultant provided scripts that aren't fond of the "allow unauthenticated packages?" prompts. This could be worked around with some flags (like --force-yes) but I like to try to clean things up when confronted with them, at least a little bit. There is the preseed option "#d-i apt-setup/local0/key string http://local.server/key" but that just applies to the apt-setup package that configures etc/apt/sources.list on /target. All of the installation comes off of "d-i mirror/*" and I don't see such an option for passing a key. I assume they're afraid of a MitM attack, as it looks like this is part of of a debian-archive-keyring package that gets pushed into the initrd when it's made.

If you're not familiar with udebs, they're worth taking a look at. udebs are small debs used in the installer. Both are ar archives that contain three files. You can extract them without using any dpkg utils with 'ar p data.tar.gz some.udeb | tar xvz'. More info is in an earlier research project with debs here.

I happen to know that some udebs are unpacked when the initrd is made and others are downloaded by the installer and then installed. Looking in the current initrd for etch i386 I found 'archive.gpg' in usr/share/keyrings. This is a little interesting as it looks like the latest udeb installs 'debian-archive-keyring.gpg' and symlinks it to 'archive.gpg' in the postinst (debian script, found in control.tar.gz in the ar (udeb)). There's no such file, so I guess this particular udeb wasn't used to create this initrd. That's fine though, I figured it out.

You'll need a gpg key:

gpg --gen-key
cd [wherever your Release file is]
gpg -b Release
mv Release.sig Release.gpg

By the way! There's lots of information on the internet about mounting initrd's using cramfs. That's old, and it's frustrating when I forget that. debian and ubuntu initrd images aren't cramfs filesystems anymore, use:

mkdir initrd ; cd initrd ; gzip -cd ../initrd.gz | cpio -idmv

'gzip -cd' does decompress to stdout, and 'cpio -idmv' does "copyin" from the cpio archive, making directories, preserving timestamps and being verbose, respectively.

create a new signature file:

cd usr/share/keyrings
gpg --import < archive.gpg
gpg --export > archive.gpg

In the root of your decompressed initrd cpio tree:

find . | cpio -ovH newc | gzip -9c > ../initrd.new.gz

The -9 on gzip is super-duper compression and you'll get a kernel panic if you try to boot off an initrd image made without '-H newc'.

Putting this in your netboot gets you as far as the stage where debian-installer creates the new initrd for the new box, where it fails because you're now chrooted into /target but apt-setup hasn't appeared to have run yet so your key listed in "d-i mirror" hasn't been installed yet (verify with 'chroot /target' then 'apt-key list' in the shell of your installer when it fails). We could rebuild the debian-archive-keyring udeb with our key added to the keyring, but then we have to regenerate package files an release files to create all the right md5sums.

Apt-setup runs after base-installer in debian-installer, see here. It looks like base-installer runs debootstrap and passes arguments:

int
main(int argc, char *argv[])
{
    char **args;
    int i;

    di_system_init("run-debootstrap");
    debconf = debconfclient_new();
    args = (char **)malloc(sizeof(char *) * (argc + 1));
    args[0] = "/usr/sbin/debootstrap";
    for (i = 1; i < argc; i++)
        args[i] = argv[i];
    args[argc] = NULL;
    return exec_debootstrap(args);
}

And debootstrap has a ----keyring option. I can't see a way to configure this though. There's a postinst file that has this hardcoded into a variable, I think this is where the option should be. For now I'm re-enabling allow_unauthenticated, as at the very least apt-setup should install my key, and thus allow the packages I want to install to be "authenticated" after in the reboot.

Adding RT Command by mail extensions on debian

Have: Debian box running request-tracker3.6, installed via apt. Notes for LDAP and that squirrely Display.html bug.

1) Download RT-Extension-CommandByMail

2) unpack, compile, install:

tar -xvzf RT-Extension-CommandByMail-0.05.tar.gz
cd RT-Extension-CommandByMail-0.05
perl Makefile.PL
make
sudo make install

when asked for the location of RT.pm it is '/usr/share/request-tracker3.6/lib/'

3) add: '@MailPlugins = qw(Auth::MailFrom Filter::TakeAction);' to the end of '/etc/request-tracker3.6/RT_SiteConfig.pm' (before '1;')

4) restart the webserver: '/etc/init.d/apache2 restart'

5) review the list of commands.

6) send an email and try it out (subject: '[$rtname #ticketnumber]', rtname is set in RT_SiteConfig.pm) and put a command on the first line of the email

You'll have whatever permissions your email account has. So that's a spoofable security concern, but whatever.

Tuesday, February 19, 2008

The referenced account is currently locked out and may not be logged on to.

I got this error while trying to use an admin share (c$) via CIFS on office XP desktop that's in the company domain from my XP laptop that isn't. I built my office desktop and correctly suspected that the original admin account had the same name as my user account on the laptop. The password on this account didn't meet domain password requirements and was locked out. Even after setting a password that did meet the requirements and unlocking the account, it kept getting re-locked out every time I tried to connect to the desktop.

At older, crazy security driven companies, I would have blamed someone setting the failed passwords required to lock out an account too low. This practice is horrible because you always have someone say "10 times is obviously a hacker!" that don't take into account all the microsoft software that secretly caches your passwords and tries to auto log you in to stuff with your password rather than kerberos credentials.

I ended up just renaming the account on the desktop, and then the laptop got a password prompt that I could enter my domain credentials into.

weird comcast HTTP 301 redirected issues

Someone asked me help on a strange problem recently. HTTP requests to a comcast hosted website sometimes were throwing HTTP 301 redirected messages pointing back at themselves. I did a normal HTTP/1.1 GET and saw the 301, but when I went to the URL with firefox it worked fine.

Trying 216.87.188.20...
Connected to home.comcast.net.
Escape character is '^]'.
GET /~user/image.jpg HTTP/1.1
Host: home.comcast.net

HTTP/1.1 301 Moved Permanently
Date: Tue, 19 Feb 2008 19:25:07 GMT
Server: Apache
Set-Cookie: pwp_mig_status=0; Version=1; Max-Age=900; Path=/
Location: http://home.comcast.net/~user/image.jpg
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1

100

Moved Permanently
The document has moved here.

Since I'm an admin and not a web developer I started up wireshark, grabbed it's http request, then made the same request by hand and got the expected image file instead of the 301 error. I narrowed it down to having to use:

GET /~user/image.jpg HTTP/1.1
Host: home.comcast.net
Cookie: pwp_mig_status=0

I don't know what the workaround would be. Probably not using comcast because their rat bastards anyways. I'll note I had trouble testing by hand, probably some annoying security gear was dropping my requests, but I got the right combination eventually.

Monday, February 18, 2008

shmoocon (4) labs (2?)

Besides my photos, there'll probably be photos from pals like Alex, Andy, Ken and Luiz. Good times were had, I've been out drinking to celebrate being home, but a few quick notes.

If you use broadcast/multicast storm protection on cisco switches, make sure the other network gear supports it or make sure you don't use it on trunk links. ick.

Don't use any cisco protocols that run over LLDP (CDP, VTP, DTP, etc). Enno and Daniel in their learning of fuzzing found that these protocols generally have crappy implementations. Love live the germans btw.

Bring a router/firewall with wireless preconfigured, otherwise you delay the non-network groups from making progress while you configure the network.

Check last years configuration for monitor ports.

Aruba gear used as a switch is not ideal, like it only supports one span/monitor port.

Monowall sucks for not having a shell.

Pfsense doesn't support vlan tagging on some soekris gear (5501) and thus sucks.

When you have security vendors doing vulnerability scanning, make sure your firewall has a huge ( > 50,000) state table.

Due to hardware/cabling issues, having multiple dns/dhcp servers is ideal.

A /24 is far too small of an address space for a security conference. Especially if someone configures it to only serve 100 address via DHCP.

Don't use vlans over 1000, especially on cisco gear. it's confusing and not necessary. if you do, don't use 1000-1004 or so, and pay a keen attention to spanning tree ('show int trunk' and the likes).

Sometimes gear has to have a vlan configured before it will trunk it (see above command)

Labels: shmoocon

Thursday, February 07, 2008

using active directory ldap authentication with testlink

Someone requested a testlink install here at work and of course I wanted LDAP authentication (single sign in is good). On debian you'll need 'php5 php5-mysql php5-ldap mysql-server' installed and you will need to restart apache (not reload!) after these are installed. Mostly I'm assuming you got testlink setup and into the database already and you're just looking for documentation on adding ldap support.

Find the config.inc.php file in the root of your testlink tree and make sure the following settings are set:

$g_login_method = 'LDAP';
$g_ldap_server = 'ad.example.org';
$g_ldap_port = '3268';
$g_ldap_root_dn = 'DC=ad,DC=example,DC=org';
$g_ldap_organization = ''; # e.g. '(organizationname=*Traffic)'
$g_ldap_uid_field = 'sAMAccountName'; # Use 'sAMAccountName' for Active Directory
$g_ldap_bind_dn = 'CN=BindUser,CN=Users,DC=ad,DC=example,DC=org'; // Left empty if you LDAP server allows anonymous binding
$g_ldap_bind_passwd = 'bindpassword'; // Left empty if you LDAP server allows anonymous binding

Note a few things. set ldap server not to a single servername by to the dns name for the domain, or UPN or whatever you call it. You may notice this points to your domain controllers, allowing ghetto-redundancy. If all of your DC's are not GC, use "gc._msdcs.example.org" as you'll see that I'm using port 3268 (the global catalog) rather than 389 (ldap). This is because php5-ldap or libldap2 or even testlink is getting confused when it sees those stupid LDAP referrals you get when you query your basedn is your domain instead of an OU or CN=Users and will fail. Using the GC instead just works. Since this is Active Directory, unless you've hacked it to allow anonymous binding you will need a binddn and bindpw, which can be a regular user or you can go find the documentation on creating this more securely if it matters to you.

LDAPMessage searchResDone(2) Unknown result(9) (Referral:
ldap://ForestDnsZones.corp.widemile.com/DC=ForestDnsZones,DC=corp,DC=widemile,DC=com
ldap://DomainDnsZones.corp.widemile.com/DC=DomainDnsZones,DC=corp,DC=widemile,DC=com

You'll then need to create a user via the new user link on the web interface. Make sure username matches up with your sAMAccountName value, that is, your regular username.

Then go into mysql (mysql -u root -p testlink) and make yourself an admin:

update users set role_id=8 where id=2;

Assuming that you're the first user you created (admin is id=1) (see the users table and the roles table for more information). Now go back and log into the web interface.

Tuesday, February 05, 2008

fixing public folder permissions in exchange 2007 sp1

Even with Exchange 2007 SP1, which adds the Public Folder Management Console to the Exchange Management Console (EMC) under toolbox, you're still being forced to learn the Exchange Management Shell (EMS) for many things.

get-PublicFolderClientPermission -identity "\folder" | fl

Remember | fl is for "format-list" which makes the output readable. What's neat is you'd expect the pipe to take information that you'd see if you weren't piping the output, and put it in a different format. The damn option is even called FORMAT-list. Alas, sometimes fl gives you more information than you would have gotten otherwise, so I always use it.

add-publicfolderclientpermission -identity "\folder" -User userorgroup -accessrights owner

There's a good list of accessrights here.

Also, apparently MS is giving their tech writers drugs now. Read this to de-stress after dealing with these shenanigans. Just remember:

Public folders do not talk. Any conversations between public folders and a real person occurred solely in the mind of the writer. And according to her, that's the only voice she's been hearing lately.

Monday, February 04, 2008

promiscuous mode for intel 3945ABG wireless

A Dell D620 laptop with an Intel 3945ABG card on Windows XP doesn't work in promiscuous mode for applications that use winpcap like wireshark or ethereal out of the box using the Dell drivers. Using the Intel drivers from here despite kind words saying to use the OEM drivers works fine with wireshark. Just unarchive and run the executable and it updates the existing drivers without a reboot, although you will lose your wireless connection for a moment.

Friday, February 01, 2008

enabling root ssh on your nas

I'm liking NAS boxes more and more. I've been annoyed at some NAS gear at work, Infrant ReadyNAS gear, that I've been unable to set a permission of "force R/W for everyone", let alone something more complicated. The web interface has under 'advanced options' the ability to reset the permissions but it hasn't always worked the way I expect it to.

Netgear bought Infrant though, and installing the most recent RAIDiator firmware netgear-itizes everything. Coolest feature though? After you install the latest firmware if you install these two files as firmware: ToggleSSH and EnableRootSSH, you can ssh into the thing as root and poke around. Looks like lots of people have schemes for running databases and crap on it, which seems a little gnarly. I'm happy to be able to go in and get a look at the permissions, samba and winbind configs though.

Thursday, January 31, 2008

dell 2748 and cisco 6509 link aggregation - 802.3ad or etherchannel, not LACP

Once again cleaning up a pile of switches hanging off each other. I'm starting with taking a Dell Powerconnect 2748 switch and trunking it back to a Cisco Catalyst 6509. Run two network links with the intention of aggregating them. Interestingly this overview page says the 3448 and 2424 support "Link Aggregation with support for up to eight aggregated links per switch and up to eight ports per aggregated link (IEEE 802.3ad); LACP support" but the corresponding box for the 2748 is empty.

Under the tech specs for the 3448:

Link Aggregation with support for up to 8 aggregated links per switch and up to 8 member ports per aggregated link (IEEE 802.3ad)

LACP support (IEEE 802.3ad)

And the tech specs for the 2748 (which I have):

Industry-standard link aggregation adhering to IEEE 802.3ad standards

Supports 6 link aggregation groups and up to 4 ports per group

When configuring the two ports for a channel group:

configure terminal
interface range g7/1 - 2
channel-protocol lacp
channel-group 1 mode active

The ports would come up but I'd see intermittent packet loss on pings.

sw01#show etherchannel 1 detail
Group state = L2
Ports: 2 Maxports = 16
Port-channels: 1 Max Port-channels = 16
Protocol: LACP
Ports in the group:
-------------------
Port: Gi7/1
------------

Port state = Up Sngl-port-Bndl Mstr Not-in-Bndl
Channel group = 1 Mode = Passive Gcchange = -
Port-channel = null GC = - Pseudo port-channel = Po1
Port index = 0 Load = 0x00 Protocol = LACP

Flags: S - Device is sending Slow LACPDUs F - Device is sending fast LACPDUs.
A - Device is in active mode. P - Device is in passive mode.

Local information:
LACP port Admin Oper Port Port
Port Flags State Priority Key Key Number State
Gi7/1 SP indep 32768 0x1 0x1 0x701 0x7C

Age of the port in the current state: 00d:00h:05m:09s

Port: Gi7/2
------------

Port state = Up Sngl-port-Bndl Mstr Not-in-Bndl
Channel group = 1 Mode = Passive Gcchange = -
Port-channel = null GC = - Pseudo port-channel = Po1
Port index = 0 Load = 0x00 Protocol = LACP

Flags: S - Device is sending Slow LACPDUs F - Device is sending fast LACPDUs.
A - Device is in active mode. P - Device is in passive mode.

Local information:
LACP port Admin Oper Port Port
Port Flags State Priority Key Key Number State
Gi7/2 SP indep 32768 0x1 0x1 0x702 0x7C

Age of the port in the current state: 00d:00h:05m:09s

Port-channels in the group:
----------------------

Port-channel: Po1 (Primary Aggregator)

------------

Age of the Port-channel = 00d:00h:52m:26s
Logical slot/port = 14/1 Number of ports = 0
Port state = Port-channel Ag-Not-Inuse
Protocol = LACP

I've highlighted the interesting parts. The ports were coming up, but LACP wasn't. I configured "LAG" on the 2748 by selecting the two corresponding ports on the "LAG Membership" page.

Doubting LACP support, I cleared the channel group configuration (no channel-group 1) and then configured only etherchannel support (channel-group 1 mode on). Now things look good!

sw01#show etherchannel 1 detail
Group state = L2
Ports: 2 Maxports = 8
Port-channels: 1 Max Port-channels = 1
Protocol: -
Ports in the group:
-------------------
Port: Gi7/1
------------

Port state = Up Mstr In-Bndl
Channel group = 1 Mode = On/FEC Gcchange = -
Port-channel = Po1 GC = - Pseudo port-channel = Po1
Port index = 0 Load = 0x55 Protocol = -

Age of the port in the current state: 00d:00h:10m:40s

Port: Gi7/2
------------

Port state = Up Mstr In-Bndl
Channel group = 1 Mode = On/FEC Gcchange = -
Port-channel = Po1 GC = - Pseudo port-channel = Po1
Port index = 1 Load = 0xAA Protocol = -

Age of the port in the current state: 00d:00h:10m:40s

Port-channels in the group:
----------------------

Port-channel: Po1
------------

Age of the Port-channel = 00d:01h:04m:04s
Logical slot/port = 14/1 Number of ports = 2
GC = 0x00000000 HotStandBy port = null
Port state = Port-channel Ag-Inuse
Protocol = -

Ports in the Port-channel:

Index Load Port EC state No of bits
------+------+------+------------------+-----------
0 55 Gi7/1 On/FEC 4
1 AA Gi7/2 On/FEC 4

Time since last port bundled: 00d:00h:10m:40s Gi7/2

So 802.3ad == Etherchannel (Cisco) == LAG (Dell). No packet loss now, if you remembered that problem.

I have to figure LACP is the open way to auto-configure ports for 802.3ad or Etherchannel, and is the equivalent to using PAGP on Cisco gear. This is useful if you want your switches to negotiate etherchannel if possible, thus allowing you to add multiple cables and increase bandwidth without heavy reconfiguration. This Cisco page on LACP says:

LACP allows a switch to negotiate an automatic bundle by sending LACP packets to the peer.

As opposed to doing it by hand, which is plain old port aggregation. I wonder if in an older Cisco switch there's an option for pre-802.3ad etherchannel and 802.3ad compatible eitherchannel. It's interesting to note that in this switch the 'switchport trunk encapsulation isl' command doesn't work on some cards as they only support 802.1q vlan trunking.

Really have to laugh at this errata though:

System Firmware Version 1.0.0.33

Known Restrictions and Limitations:
The login screen accepts any password with the default
username, admin.

I guess that's a problem, yeah. It's cool that this release was a year ago and the problem hasn't been fixed. This is why we buy Cisco switches and not Dell switches people.

Wednesday, January 30, 2008

working at widemile and blogging in web 2.0 worlds (post bubble)

A few of you know I work at Widemile now as a Systems Administrator. For non computer people, that means I play with computers. For those who care, I spend some time doing helpdesk trying to keep employees happy, and then secretly make cookies in the server room.... or, well, try to build awesome scalability using lots of different tools. I like startups because there's no corporate mandate that we use IBM such and such, or that we have to use Oracle or any business oriented requirement. Although at my last startup the thing with Oracle did happen, which was kind of silly, but that's another story.

So I get to leverage useful and flexible stuff, which usually amounts to open source software, to make everything work like magic. On that note, props to Adam and team at HJK Solutions for iclassify and being generally classy folks. If you're scaling anything up at a startup, you need these people in your life, I'll vouch for it.

The tech stuff is only interesting to tech people who are used to facing situations where people want miracles. My father was a commercial pilot and used to always say "We've been doing so much, with so little, for so long, that now we can do almost anything with nothing at all." It's pretty true, as most people just don't get their desktop, let alone what goes on in the server room.

One of the things I find cool about widemile is that we have a professional blogger, Billy Shih, working for us. Billy blogs on all things multivariate testing related. I see more and more companies joining and building communities, sometimes in cool ways like Dell Ideastorm. While at times company blogs are well written and come off more corporate than organic, it's great to see real information and opinions come out of a place that you work for, rather than highly positioned marking pieces that always make me, and I assume most of my peers, immediately glaze over. As an employee I also get a weekly email from him about what the tubes are up to. When I listen to colleagues talk about weekly HR emails about new policies against using off-white paper for clients whose names begin with the letter P, I feel fortunate to be in a place with real culture and humanity.