Archive for September, 2008

New Billing Portal

After more internal changes than I would like to admit, the new billing portal is up and running.  Check it out at http://apisnetworks.com/billing

Subscription hashes have been reduced from 40 to 32 characters, which should play more nicely with many e-mail clients.  Suspended accounts  are now automatically unsuspended after payment too.  Give it a try, if you have a canceled subscription and let me know if you run into a bug.  Next on the agenda is a collection of server changes and bug fixes for esprit 0.99 before I finally return to writing the framework for esprit 1.0.

- Matt

Comments

 

MySQL Per-User Connection Limit Changes

All users who have set a concurrent connection value to the MySQL server for a user account in excess of 15 concurrent connections have been reset to 15.  This change is to curb a problem seen too often as of late with poor database design and connection pile-ups.  Instead of attacking the problem head-on– the database structure, users instead option to increase the number of connections to the server.  This does not solve the problem, but rather is targeted at deferring a symptom of the underlying cause — poor database design.

Poor database design, such as braindead structuring and lack of indexes on columns used in where clauses (important!) can cause a 0.10 ms query to take in excess of 1000+ seconds to complete.  By increasing the connection limit, instead of adding an index to a particular column, the underlying cause is not addressed, but rather an indication of an underlying a problem is erroneously addressed.  A higher connection limit thus in turn causes more connections to pile-up eventually tripping off our monitoring systems and alerting us with annoying, yet effective, pages.

On high volume servers with multiple users who have maxed out their concurrent connection limit with a full disk quota it’s not uncommon to see the server become temporarily inaccessible until the watchdog restarts the MySQL process.  Here’s the nitty-gritty explanation why: everyone has 10 connections.  If two sites both fill up their disk quotas, causing MySQL queries to wait endlessly, never to complete, then 20/100 connection slots for the server are filled.  Now let’s introduce another user, whose tables lack indexes, which over time, have grown immensely.  Here’s a sample query, pulled from a server yesterday (names changed to protect the innocent):

+--------+---------+-----------+----------------+---------+-------+-------------+---------
| Id     | User    | Host      | db             | Command | Time  | State       | Info
+--------+---------+-----------+----------------+---------+-------+--------------+-
| 292663 | x       | localhost | db          | Query   | 552  | Sending data | query #1
| 292668 | x       | localhost | db          | Query   | 552  | Sending data | query #1
| 292686 | x       | localhost | db          | Query   | 552  | Sending data | query #1
| 292765 | x       | localhost | db          | Query   | 431  | Locked       | query #1
| 292812 | x       | localhost | db          | Query   | 431  | Locked       | query #1
| 292813 | x       | localhost | db          | Query   | 319  | Locked       | query #1
| 292814 | x       | localhost | db          | Query   | 319  | Locked       | query #1
| 292815 | x       | localhost | db          | Query   | 430  | Locked       | query #1
| 292816 | x       | localhost | db          | Query   | 431  | Locked       | query #1
| 292817 | x       | localhost | db          | Query   | 430  | Locked       | query #1
| 292818 | x       | localhost | db          | Query   | 430  | Locked       | query #1
| 292819 | x       | localhost | db          | Query   | 430  | Locked       | query #1
| 292820 | x       | localhost | db          | Query   | 549  | Sending data | query #2

As you can tell, the user has 12 clients working on one query and one working on another query, on the same table, which is locked, waiting for query #1 to complete.   Over time these connections build-up.  What if the client adjusts the connection limit threshold to 70 on top of the 20 connection slots filled by users who have gone over disk quota?  It would place the server in a bind with only 10 connection slots available for clients.  Your Rails applications, your mysql commandline, your control panel access — all of those which use persistent connections have to fight with one another to try to secure one of the 10 remaining slots.  Oh and to make matters worse?  You also have to let the 2,000+ PHP scripts requested every minute an opportunity to connect to the MySQL server to issue a query or two… there’s no room left for the normal use of MySQL on the server.  This is why I designed the servers with strict concurrent connection thresholds– to avoid resource contention and provide a gentle notice that you may have a problem.  If you ever run into an error on the server, then there’s an issue that needs to be addressed.  File a trouble ticket within the control panel.  It never hurts to ask, remember I wrote everything from the ground up; I can tell you or at least point you towards a solution.

Few clients are genuinely in need of 15 - 25 concurrent connections and those are the sites that push 95% of their 100 - 200 GB bandwidth quota during only 5% of their bandwidth cycle.  These are very high volume, seasonal sites.  Users attempting to set a concurrent connection value above 10 within the control panel will require a trouble ticket explaining the need for a higher limit.  Again, 10 shouldn’t pose a problem for most users who have a properly indexed database — even the many who do land on Digg continue to hum without issue.

PS: The story has a happy ending.  I consolidated the need for 99+ clients, doing absolutely nothing, waiting for query #1 to complete by adding two indexes on a table, one of which went on the id column.  The customer’s concurrent connection limit was decreased back down to 10 without any additional issues.

Problematic query: SELECT count(*) FROM `user` WHERE `id` = ‘1′

- Matt

Supplements:

  • hackmysql.com: A good overview on index placement and database design

Comments (2)

 

Zend Optimizer Available Again

Zend Optimizer is now available on servers running PHP5.  It is setup similarly to the PHP4 server, except the port has been changed from 9000 to 9001.  Applications accessed via Zend Optimizer are neither cached nor optimized, so please use the feature sparingly.  You will need to file a ticket within the control panel to have your account provisioned for Zend Optimizer.

Check it out on Aleph!

Comments

 

esprit Update — Majordomo bugfixes

A new esprit update has been pushed to the server at this time focusing on Majordomo bugfixes.

  • Fixed: Invalid list name when Majordomo list contains hyphen
  • Fixed: Majordomo approval alias missing resend address causing closed+confirm status to fail
  • Fixed: Add setgid bit to lists/ directory for Majordomo to ensure modified lists through Majordomo are editable within the control panel by the owner

Comments

 

Service monitor resiliency changes (MySQL connection counting, restart timing)

Two new changes have been pushed to the servers today aimed at increasing service resiliency in the event of a problem.

First, MySQL connection numbers are now counted and compared against normal connection rates to provide an extra layer of protection against the dreaded table-level lock that trickles out and locks all databases across all clients. Prior to today a simple, cacheless select query was performed, which worked well at determining if there were a problem within MySQL prior to the 5.0.67 upgrade. In addition to counting active connections, the query has been rewritten to require a temporary table to be created (included JOIN/SORT clauses) producing the average scenario for most MySQL queries.

Secondly, timestamps are now factored into service restarts.  Two restarts within a 60 minute interval will dispatch a page to all of the technicians, myself included.  In all cases, more than one restart within a hour is indicative of an underlying problem on the server, which requires administrative action to correct.

- Matt

Comments

 

esprit Update, bugfixes, Manage Mailboxes Changes

A new esprit update has been pushed to the servers at this time.  This should be the last update before esprit 1.0 is released sometime late next month.  A couple of changes were backported into Manage Mailboxes, albeit not as polished as expected.  These changes are to ease bulk operations on multiple e-mail accounts.  If there is a demand (or any outstanding bugs), a cancel button will be added to the upload progress meter.  Additions beyond the bare essentials will be postponed in favor of esprit 1.0.

  • Added: modal on login
  • Added: bulk mailbox operations to Manage Mailboxes
  • Fixed: disallow primitive module access if session is unauthenticated
  • Fixed: properly escape PostgreSQL database names if prefix begins with digit
  • Fixed: typo on server port array index in Ensim I/O handler
  • Fixed: regression bug in Package Manager (bug #99)
  • Fixed: phpPgAdmin username typo (bug #97)
  • Fixed: action succeeded/action failed conflicting status returns on success + warning
  • Changed: highlight selected table rows in File Manager, Manage Mailboxes
  • Changed: updated favicon to match current apis favicon
  • Removed: miscellaneous code clean-up from earlier esprit versions

Comments