Archive for February, 2008

Quota Overage -> Disable MySQL Insert/Update Policy Change

A new policy change will be enacted for all accounts on Wednesday, February 20th.  Those that remain over disk quota for more than one week will have insert privileges revoked.  This will have no effect on those already over quota or those who are in danger of going over quota.  Accounts that have gone over quota for a week, managed to miraculously not notice the sudden stoppage of e-mail, and database errors, but eventually freed up space will need to visit MySQL Manager in the control panel to enable INSERT and UPDATE privileges on the accounts.

This policy has been introduced to counteract the terrible threading issues in MySQL, which have been present as far back as 4.0.  Flash forward a few years later and it’s still an obscure bug that hits randomly.  When an INSERT or UPDATE statement occurs MySQL attempts to acquire an exclusive lock on the table to safeguard against corruption; however, this lock to used solely with writes trickles out past the table level, past the database level, and locks all tables in all databases.  Regardless whether it is a read or write operation, it’s locked.  Although I have written a monitoring service to check against these locks every 3 minutes and restart as necessary, I feel it is inadequate at addressing the problem head-on.  The new policy change should make lock-ups extremely infrequent (1 per 180+ days across all servers).  Right now we’re seeing them still happen at a rate of 1 per 30 days amongst all servers.

Again just to rehash: if you run over quota for more than a week, visit MySQL Manager after fixing disk space usage to re-enable the INSERT and UPDATE privileges.

Comments

Emergency kernel upgrade

An emergency kernel upgrade is scheduled for February 11th at 12 AM EST (-0500 GMT).  This kernel upgrade will bring the servers to 2.6.24.1 and is mandatory to avoid a serious exploit within the kernel.  While I do apologize for the quick kernel upgrades over the past few days, this cannot be avoided and further, the temporary fix tends to result in system instability as has been noted on 2 of the servers earlier today.  More information will be available later tonight.

12:22 AM EST:  one more exploit is out in the wild, which hasn’t been pushed upstream yet.  Rebuilding kernels with custom patch in git; another reboot is coming up.

12:28 AM EST:  the last patch has been applied and it looks like we’re good to go.  This is a nasty set of easy exploits in the vmsplice section of the kernel, which is used all over the place such that it’s not something that can be removed.

There are three exploits, all of which affect the 2.6.17 - 2.6.24.1 branches of the Linux kernel.  Unfortunately this means very high and easy penetration for attackers.  If you work IT, then you have my sympathies tomorrow.  It’s going to be a fun day for those who don’t keep up with security on the weekends.  Although it doesn’t quite rival do_brk() in terms of notoriety, I have a hunch it’ll be an infamous bug for some time to come.

One final note, the RET “fix” can lead to system instability and is not recommended.

Comments

Postfix 2.5 goes live tonight, IMAP/POP3 quota enhancements, sky2 still buggy

IMAP Quota Support
Amidst the massive confusion users experience when they suddenly find themselves over quota with little notification aside from within the control panel, I’ve patched Dovecot’s quota reporting extension to work correctly with the servers.  Even better, it’s reported in SquirrelMail and Horde now too.  In summary, the quota reporting works like this: check if there’s a user quota limit and it’s less than the account quota limit; if so, report that, otherwise report the total account quota used and account quota limit.
sqmail-quota.png  horde-quota.png

Postfix 2.5
Postfix 2.5 will be rolled out on the servers tonight with another very important enhancement designed to protect against accounts monopolizing the mail server.  Delays may be configured between delivery attempts in order to prohibit one particular e-mail account from consuming all delivery slots as we saw earlier this week.  Currently Postfix will add a 10 second delay between delivery attempts to the same address.  For example, let’s say you’ve been inundated with a spam flood of 100 messages destined to your e-mail account all of which were accepted on the mail server’s end in a 2 second window.  Postfix 2.4 and earlier would attempt to deliver all of the messages at once, quickly tying up delivery slots in the process as each message is scanned by SpamAssassin and hopelessly delivered.   Another new feature in Postfix 2.5 is the revised scheduling algorithm that takes into account feedback to determine if a hop is up or down.

vsFTPd+ 
I’ve heard 2 reports from users experiencing authentication difficulties with the new vsFTPd+ build.  I have been unable to confirm the bug from my end.  I am waiting on a packet trace from another customer with Ethereal later on tonight.  If I can get that squared away, then vsFTPd+ will be rolled out on the servers tonight otherwise we’ll hold off until that bug is understood.

sky2 NIC Drivers
Despite having extremely high hopes that the 2.6.24 Linux kernel would address the deadlocking problems present in Marvell Yukon XL chips, it’s still an off and on problem.  Issues have been further exacerbated by an unexpected kernel panic on Assmule Wednesday night.  No other servers have exhibited a kernel panic, but we’re still seeing the NIC lose connectivity for 10-15 seconds randomly.  Fortunately it is extremely rare; Augend has had it happen once this week and 10 seconds/604800 seconds is such a marginal number you could chock it up to random network glitches elsewhere.  Anyway, I am still following the threads as they pop up on the LKML and I’ll let you know if anything turns up.

rc.local Support
Because we are destined for another kernel upgrade in the next few months it’s a good time to think about adding rc.local support to each user’s Basic and higher account.  rc.local allows you to include commands that are executed upon boot by the server.  There’s a global rc.local we use to set readahead rates on the hard drives and toss up any auxiliary services, but as I know from process output, many individuals run their own services like mongrel, pen, svnserve, and the like, which are critical to your site.  You can’t always be around when there’s a kernel panic like with Assmule and often times you won’t witness it.  Servers can recover in as little as 3 minutes.  That is less time than it takes to go to the bathroom and come back if you’re gifted in the large bladder department.  rc.local will be rolled out next week.

That’s all for this Friday’s installment.  Log rotation support in the control panel will be split between a basic and advanced editor… in case you were curious as to the hold-up.

Comments

Regression bug fix in File Manager

Minor release tonight to fix a regression error in compressed file handling within the File Manager.  Other changes since then have been packaged into tonight’s update:

  • Fixed: regression error in compressed file handling in File Manager
  • Fixed: warning message on Last Login for first login
  • Changed: add waiting image for autocomplete boxes
  • Changed: directories should include trailing forward slash for autocomplete boxes
  • Changed: moved Web_Module::log* to Log_Module class

Comments

Service Pilots

Two new versions of the SMTP and FTP servers will be piloted on Borel and Assmule respectively through Saturday morning.  Borel will be running Postfix 2.5, which introduces a nifty new tunable parameter to add wait times between mail delivery.  This should be helpful in alleviating potential bottlenecks when a user’s account goes over quota and ties up all delivery slots as each message is fruitlessly delivered.  Assmule is running vsFTPd+, which introduces a handful of fixes including proper authorization prompts in Internet Explorer/Firefox if authorization fails.
Both servers are running the new versions at this time with no apparent interruption or side-effects.

Comments

Rename support to Manage Mailboxes, sortable tables

Although the next esprit update was featured to include a revised Log Rotation provision, it has been pushed back to the next update sometime this week to make way for rename support, because that feature has been requested far too often.

  • Added: info state variable in frontpage configuration named path for multiple FrontPage paths (not yet implemented)
  • Added: tablesorter helper to facilitate sorting with large tables; presently available in “Manage Mailboxes” and “DNS Manager”
  • Added: rename support to “Manage Mailboxes”
  • Fixed: change <IfDefine !PHP5> to <IfDefine !PHP4> in custom logs
  • Fixed: translate HTML special characters when displaying error messages in browser
  • Fixed: FCKeditor handles PHP code now
  • Fixed: background transition on postback operations faded to white instead of rgb(230,230,220) in IE.  Change the CSS declaration from hex to rgb
  • Fixed: automatically add DNS record for subdomain if it does not exist at creation
  • Changed: set FCKeditor in Source mode initially
  • Changed: rename Mail_Module::alias_exists to Mail_Module::address_exists

Comments

Emergency MySQL Upgrade to 5.0.51a

MySQL was upgraded earlier tonight at approximately 1:20 AM EST (-0500 GMT) to address a severe security hole in the yaSSL layer.  It’s doubtful anyone noticed the upgrade, especially since the servers are going down in 24 minutes for the new kernel.  This is just a heads-up that we went from 5.0.51 to 5.0.51a.

Comments