PermaLink Never thought you'd have to diddle with MTUs ever again, did you?04/04/2008 04:22 PM
Domino
At work
You know, back in the early days of TCP/IP with Windows... say, 1994, we get used to having to fart around with fairly low-level parameters like MTUs, packet fragmentation, TTL, all that.  It's been years since we've had to fool with it in this era when everything from cars to refrigerators speak TCP/IP, but amusingly, it popped up again at work today.  We all dusted off our brains to remember how to fix it.

Last year, my department was mostly moved out of the main campus of my day job's site and went about three miles to a leased space in some nondescript office park. Our neighbors are a landscape-supply place, a convenience store, a replacement-window place, and a company that stocks liquor on cruise ships.

How to get that office onto the campus backbone? Well, in the old days they'd have just leased some lines and been done with it. In this case, though, we're on the end of a fairly wimpy line-of-sight microwave link. Some days, it works well. Other days, when the weather is shitty and somebody's FTPing five gig worth of QuickPlaces around the network, it can be pretty awful.

This morning, some of it freaked out in a weird way. We found ourselves unable to get to our own production Domino servers over in the main campus if we did it through the load-balancer that fronts those boxes to even out web traffic. Going to the machines directly via IP address worked fine. But no, it wasn't a DNS issue... things were timing out, or worse, not timing out. One of my workstations is still trying to talk to the server, and FasterFox reports it's been trying for... 83 minutes now. Other internal sites work fine. Access to the internet is fine. It's just... that one site.

Where to find the culprit? Somebody obviously had to change something, since this has been working fine for years now. Our initial suspicion was correct, but in an odd way.

Turns out, the router for the microwave link tends to fragment packets. My guess is, the link is so crappy they like to have small packets to reduce problems when a flock of birds or a snowstorm interferes with transmission. However, sometime last night, someone set a flag on the load-balancer out front of the big servers, set to do not permit packet fragmentation.

Make a request to the server. It hears you and send you your stuff. The load balancer makes sure it's in nice, big fat packets. Those packets get to the router for the microwave link. It says, "I want to fragment this packet, please acknowledge." The load-balancer flips it the big bird. Essentially, the router for the microwave link keeps on trying until the next power failure or something, and keeps the original requesting workstation informed that it's still trying to get the load balancer to listen to reason, which the load balancer ignores.

No timeout. No loading. No... nothing.

Of course, the solution would be to turn the newly-enabled flag on the load balancer off. The systems people don't wanna do that because "it might affect other things." Never mind that they just made this change yesterday. They'd rather that the router for the microwave link use a larger MTU.

And the guy who manages the router for the microwave link? He's off today. Apparently he's not important enough to have a Blackberry or a pager, and nobody else is important enough to be allowed to diddle with the settings on the router.

We eventually worked around it by forcing the workstations themselves to a smaller MTU. The systems people then freaked out that we were monkeying with workstation settings they're supposed to control.

No wonder I can't sleep.

Technorati:

This page has been accessed 88 times. .
Blabber :v

1. chuck dean04/05/2008 12:01:43 AM
Homepage: http://www.lotussmb.com


I'm just glad to know that I wasn't the only one to have a crappy Friday.

I don't have the problem of having to get ahold of the person in charge of the routers or the microwave link.... I'm in charge of the whole network..... I've just got to deal with being the whipping boy for bad behaved code that someone in Germany wrote.

(Note - This is not a Lotus Notes issue!!!!) Turns out we are probably the first utility company in the world to roll out SAP's new web based customer interaction center - a piece of their CRM application suite. This wonderful piece of crap ( code) ... is a web application that seems to require a quad core processor and 2GB of RAM to run adequately. I've got a shitpile of IBM pSeries hardware twiddling its thumbs and a blissfully clean network while the CPU utilization on our call center systems is pegging out at 100% for a frickin' browser ap....

Not one damn developer for the piece of crap around to feel the heat... instead the VPs are pissing down our necks about the problem .... like we can poop new processors and RAM out to fix the issue.

SORRY FOR THE RANT!!!! But I feel much better

- chuck -





2. Turtle04/05/2008 02:02:40 AM
Homepage: http://www.weightlessdog.com/shell.nsf


I always thought it was a strange irony that that company's name can be pronounced "sap."




3. ursus04/05/2008 05:29:06 PM


I just changed Internet providers - everything had been working fine for years with the old provider but with the new one I had intermittent internet just as you described - sometimes the network request would just "hang" and I would need to reissue the request and then everything would work just perfectly :o( Anyway, after about 3 weeks I narrowed it down to the MTU size and the technician agreed that the problem could be solved by setting the MTU to 1460 - super only.... You cannot set the MTU in a lot of Apple's products (Apple TV, iPhone and, unfortunately, Airport cards in iMacs) so now I have various cables running through the flat :o( I was also hoping that I would never need to MTU again




Links
Today's Poll
PlanetLotus
Recent Stuff
BlogRoll
Older stuff
Lotus Domino ND8 RSS News Feed RSS Comments Feed Blog Admin Lotus Geek OpenNTF BlogSphere
Send me stuff