Re: kernel problems (again)


Subject: Re: kernel problems (again)
From: Matthew Schumacher (schu@schu.net)
Date: Tue Aug 13 2002 - 10:17:08 AKDT


Hello all,

Well it did it again.... machine suddenly refuses to kill processes, I
tried killall, kill -9, killall5, shutdown, and init 6 without success.
  Nothing I did would cause it to shutdown forcing me to hit the power
switch. The uptime was about 30 days which is odd because last time
this happened it was also up for about 30 days.

This time around I restarted the machine with a single processor kernel
hoping that will solve the problem. It runs mail so it really doesn't
need the second cpu anyway.

Anyone have any suggestions on other things to look at. I thought linux
was up to the challenge but it seems that there is still a bunch or
problems with smp and quotas.

thanks,

schu

Matthew Schumacher wrote:
> Hello all,
>
> Last week a mail server I manage suddenly failed so I tried to get a
> shell to the server and found that I can login but bash would hang. I
> thought this to be interesting to I passed a 'ps -ef' though ssh to the
> machine to see if that would work. To my surprise it did. I got a list
> back of 248 processes. After further investigation it seems that my box
> hit a process limit and simply would not start a bash shell (but it did
> start a ps which is odd). Anyway I started passing kill commands though
> ssh to try and shut it down but kill wouldn't work, kill -9 wouldn't
> work, killall wouldn't work, shutdown wouldn't work, nothing I did would
> kill a process. After fighting with it for about an hour I drove to the
> co-lo room and hit the switch.
>
> Does anyone know what might cause this? Why would the kernel simply
> refuse to kill anything. Btw, the box normally has 110-130 processes
> running so something had to happen to cause it to hit 248. The extra
> processes looked to be stale sendmail/qpopper/imap processes.
>
> The machine is running redhat 7.3 with the redhat 2.4.18-4 kernel. I
> tried using a generic kernel I compiled but my scsi performance dropped
> in half. After some conversation with Alan Cox he says that redhat
> patches some hi-mem/scsi code to their kernels which fixes some
> performance problems with the generic kernel. I also had a lot of
> trouble using quotas under heavy load with the generic kernel so alas I
> am running a redhat kernel.
>
> Anyway, I really can't deal with software failure on this machine.
> Hopefully someone will have a suggestion on how to trouble shoot this.
> If not maybe I'll start testing FreeBSD....
>
> Later,
>
> schu
>
>
> ---------
> To unsubscribe, send email to <aklug-request@aklug.org>
> with 'unsubscribe' in the message body.
>

---------
To unsubscribe, send email to <aklug-request@aklug.org>
with 'unsubscribe' in the message body.



This archive was generated by hypermail 2a23 : Tue Aug 13 2002 - 10:22:41 AKDT