Friday, August 1, 2008

mod rewrites for search engine friendly URLs

On a Linux web hosting account this works great:

Open a text editor (Notepad)

Paste this into the file:

RewriteEngine On
RewriteRule ^app/([A-Za-z0-9_-]+)$ index.php?app=$1 [L]


Save the file as .htaccess

Upload it.

That's all there is to it.

Reference:
http://www.sourcerally.net/Scripts/42-Web---Root-Directory


Saturday, May 31, 2008

Load balancer configuration

...

For any traffic that you intend to load-balance please be sure you have DNS pointed at the load balancer's virtual IP (VIP).

In order to properly configure your load-balancer we will need to know which servers (also specify which IPs on which servers) you wish to have load-balanced and which types of traffic you will be load-balancing. The most common scenario is to load-balance WWW (TCP port 80) and HTTPS (TCP port 443) traffic between web servers. By default we also enable session persistence.

The two most popular load-balancing algorithms that can be used when configuring the VIP on the load balancer are the "round robin" and "least connections" algorithms. The round robin algorithm sends each new request to the next server in line--a very basic setup. The least connections algorithm keeps track of how many sessions are currently live on each server, and sends each new session to the server that currently has the least amount of activity. Between the two, least connections is the more popular selection.

When selecting an operational mode, you also need to consider whether your websites require persistence. This feature, if enabled, allows you to make sure that a session doesn't shift to another server halfway through the transaction. This is often necessary on stateful sites that have SSL, shopping carts, etc. There are two basic methods that can be applied when configuring persistence on the VIP -- one method functionally links the WWW and HTTPS traffic to the same server, per source. This method is good for sites that will use shopping carts or other dynamic content where the customer will shift between the WWW and HTTPS sites, maintaining server-side session information during the transition. The downside is that if both ports don't show as active and listening for connections, the server is flagged as down and taken out of rotation. This does carry with it an additional risk, just because if your servers had an issue down the road that caused them to not be listening on both ports, users will see no website at all.

The alternate configuration would separate the WWW and HTTPS traffic into two different virtual groups. This would allow each server to show as "up" and active even though they're not listening on port 443 currently, but takes away the load-balancer's ability to link both the port 80 and port 443 traffic from a client to the same server persistently. If the standard and SSL sites for your configuration exist as wholly separate entities, this may be a valid alternative.

If you have any questions regarding the load-balancer's configuration please let us know. We will complete its configuration once you have supplied us with the following information:

01 - Which services/ports do you want to be load-balanced (e.g., WWW/80, HTTPS/443)?

No -- only using https for phpMyAdmin

02 - For each service/port being load-balanced, which servers (specifically, which servers' IPs) do you want the load-balancer to direct the traffic to?

example:

VIP 123.123.123.123 will redirect WWW traffic to:
192.168.1.111 (primary IP of server 12345)
192.168.1.112 (primary IP of server 12346)


Use only the web servers IPs that you want to load balance...
You can't do this with database servers.

03 - (answer this question only if you use both WWW and HTTPS sites) Do you want session persistence to be maintained between WWW and HTTPS sites?

No.

04 - Please note: we will configure the load-balancer to use the "least connections" load-balancing algorithm and also enable session persistence unless you specify otherwise.

OK.

Labels: ,


Managed Hosting: 24/7 support for absolute emergencies...

Are these canned email replies to trouble tickets really necessary?

Most people would just think it's an emergency because their site is down -- it's just simply UNAVAILABLE!

But I guess if the help just isn't there, you're SOL...

Seems to me though if you're in a managed hosting situation, someone on their side should know your site is down before you call.

This automated response is to make you aware that the DBA team is one
of the few teams that does not operate 24/7. While we do
offer 24/7 support for absolute emergencies and "production down"
situations, any issue that is not deemed an emergency will be addressed
during normal business hours (M-F 8am-5pm CST). If you feel this issue
is indeed an emergency, please contact your support team as soon
as possible!

For guidance, emergencies are typically defined as:

MySQL
* Refusing Connections
* Not starting
* RHCS Cluster Failure beyond successful failover Oracle
* Node down
* Database down
* Critical errors in Alert log
* Critical errors in Listener log
* Database refusing connections
* System errors tied to an Oracle process
* Restore needed SQL Server
* Service Unavailable
* Database in suspect mode
* After restart for an unexpected Server Down (Blue screen of Death) or Instance Down
* Server refusing connections

If you are unsure then please call the support line and discuss this
with your support team.

Labels:


Using the binary vs. char type in MySQL with PHP

In regards to using BINARY over CHAR, while this might be splitting hairs in terms of performance, the BINARY option is going to be more efficient in multiple fronts. The reason is that if you store an MD5 as a string, it requires 8 bits per character. But you are actually only representing 4 bits per character. If you store the string in binary, you can half the size of the field. It is also going to make the index smaller and more efficient since there is no need to compare strings - you are merely comparing numbers.

Unfortunately, to do this requires, some making some modifications to both MySQL and your application. First, you would need to update your existing data in MySQL. To do that, you could do something like:

mysql> CREATE TABLE newtable like oldtable;
mysql> INSERT INTO newtable SELECT UNHEX(md5field), field2, field3, ..., fieldN;

Then, you need to update your application. If you are using PHP, you can use the MD5 function:

$binaryMD5 = md5('my string', true);

The 'true' in the md5 function call tells PHP to return the binary version of the MD5 instead of string-based hex representation. You can then insert the data into MySQL as you did before, although since it is binary data, you may want to look at using prepared statements and the MySQLi extension.

So, as you can see, while it is not trivial, it's not terribly difficult, but is something you would absolutely need to test first before moving into production. It should give you better performance and lower your disk usage though so it something worth looking at. Some food for thought if nothing else.

MySQL buffer_size recommendations

To figure out the right settings for your database server, you really need a good MySQL DBA to take a look at your application's resource utilization and make decisions based on hard data.

Preliminary assessment for our current set up - after yet another crash:

My guess would be that the mysqld service crashed while writing to the table. Being as mysql is using 6G in global buffers alone and then trying to use 300 threads I bet it just ran out of RAM and crashed:

sort_buffer_size = 128M
read_buffer_size = 128M
read_rnd_buffer_size = 128M


Configured Max Per-thread Buffers : 131 G
Configured Max Global Buffers : 6 G
Configured Max Memory Limit : 138 G
Physical Memory : 7.78 G

I would recommend setting the per thread values to something reasonable to avoid outages:

sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
join_buffer_size=2M

Also:

Current key_buffer_size = 6 G
Using key_buffer_size > 4GB will cause instability in versions prior to 5.0.52
See Bug#5731, Bug#29419, Bug#29446

This particular table has frequent inserts and as such is locking frequently. If possible, consider converting this table to InnoDB. This is not possible if you are using fulltext or advisable if you use the COUNT() function against it a lot.

This was on a dedicated database server with 8GB RAM and RAID5.

This server had already been upgraded to RHE 64-bit...

Labels: , , ,


Friday, May 30, 2008

Converting a MyISAM table to InnoDB - pros and cons

Notes from a MySQL DBA:

With MyISAM tables, updates/insert cause table level locking. Converting to InnoDB will be useful as this uses MVCC + row level locking.

Things that can be done:
Convert to InnoDB
Create a numeric PK (auto_Increment) and UNIQUE KEY for hashid. InnoDB require primary keys (PK) for all your tables. If you don't define one, one will be invisibly defined for you. InnoDB best likes small fast PK. A big alphanumeric key in InnoDB might present performance issues.

InnoDB requires more RAM and hard drive space than MyISAM.

There are real benefits that you will gain by going all InnoDB:

Row level locking
Online snapshots for backups and/or slave provisioning
Tables cannot get corrupted (no REPAIR TABLE)

There are some caveats about InnoDB:

No full text indexing

No GEO Indexes

No merge tables

Table space never shrinks. If your tablespace grows to a point where it fills your current space, Even deleting the data will not shrink this file. An export/Import would be your only option. Of course this can be done on a slave and this server can later be promoted to a master.

More RAM and more HD space. InnoDB is RAM hungry. Its space footprint is roughly around 3x that of MyISAM. That is why with the database your size (62M) every byte in a column counts. A varchar takes the length of the text + 1 byte (in a 32 character long string that would be 33bytes if using single byte charset (latin1)) vs. a 4byte INT size. Multiply this by 64M rows, 2G in a varchar vs. 256M for an INT, on a single field column. So an InnoDB footprint would be 2-3x that size.

Optimizing your field type is especially important when the data grows like yours. Here is a good document about this:

http://dev.mysql.com/doc/refman/5.0/en/data-size.html

MySQL recommends having 2 RAIDS to deal with high IO database servers. 1 RAID1 for OS+binarylogs and 1 RAID10 for data. If you convert your database to InnoDB on the migration server. Once you’re ready to move to the new server, you would just flip the SQL traffic to the new server.


I recommend that we convert to InnoDB in the slave and not the live server as this operation does lock your table. I would not recommend converting to InnoDB without cleaning some unwanted indexes as these take a long time to build with a dataset this large. I would recommend that we make these conversions on a slave, which would be better provisioned for these operations.

--

Labels: , ,


Optimizing a large MySQL table that gets a lot of inserts

When you develop a web application that is database driven that is going to log a lot of data, you really have to consider the database structure carefully.

Here's an example of how a log table was optimized. This table had over 60 million rows and getting another million rows a day!

--

The problem with this table (in terms of speed) is that

It’s quite large (62M rows)
It’s heavily indexed.
Large PK (Primary Key) and fields (VARCHARS vs CHAR/INT)

The first one will make searching the data expensive and the second one will make inserting data also expensive. So basically its all bad in terms of read writes.

Having a large primary key might be an issue with Innodb. Also since all the data is an MD5, using CHAR(32) would provide faster lookups.

Things that can be done:

Partition the data to smaller tables like ‘log_01_08’, ‘log_02_08’, etc. And have your application hit the appropriate table when pulling reports.

Create summary tables with Triggers where a trigger will update a table upon an insert. This way if you need to pull stats on the table, you don’t have to query the huge table but the summary table. So on any insert in log table, update summary table with new counter if hits (for example). Of course this depends on the type reports that you’re pulling off this large table.

Trim the indexes to the ones you only need. Indexing all the fields in a table will make INSERTS (and restoring) a really expensive operation, as it has to update the index as well. You can also use multi-field indexes. If you have a query like:

SELECT hashid FROM logs WHERE hashid = ‘xxxx’ AND querydate > ‘date’

Then consider creating an index like (hashid,queryid) and not 2 separate indexes. Trim your indexes as much as you can. If these cannot be converted to numeric hashes, then consider only indexing the first 12-16 characters and not all 32 chars. This assumes that the first 12 chars are unique enough. Of course this does not apply to the primary key.

I see that your hash your fields, maybe consider using a numeric function to hash your text like CRC32. The advantage of this is that in your columns you would have numeric values and not alphanumeric like you have right now (MD5). This not only translates to a smaller filesystem footprint but a much faster index search.