Saturday, May 31, 2008

Load balancer configuration

...

For any traffic that you intend to load-balance please be sure you have DNS pointed at the load balancer's virtual IP (VIP).

In order to properly configure your load-balancer we will need to know which servers (also specify which IPs on which servers) you wish to have load-balanced and which types of traffic you will be load-balancing. The most common scenario is to load-balance WWW (TCP port 80) and HTTPS (TCP port 443) traffic between web servers. By default we also enable session persistence.

The two most popular load-balancing algorithms that can be used when configuring the VIP on the load balancer are the "round robin" and "least connections" algorithms. The round robin algorithm sends each new request to the next server in line--a very basic setup. The least connections algorithm keeps track of how many sessions are currently live on each server, and sends each new session to the server that currently has the least amount of activity. Between the two, least connections is the more popular selection.

When selecting an operational mode, you also need to consider whether your websites require persistence. This feature, if enabled, allows you to make sure that a session doesn't shift to another server halfway through the transaction. This is often necessary on stateful sites that have SSL, shopping carts, etc. There are two basic methods that can be applied when configuring persistence on the VIP -- one method functionally links the WWW and HTTPS traffic to the same server, per source. This method is good for sites that will use shopping carts or other dynamic content where the customer will shift between the WWW and HTTPS sites, maintaining server-side session information during the transition. The downside is that if both ports don't show as active and listening for connections, the server is flagged as down and taken out of rotation. This does carry with it an additional risk, just because if your servers had an issue down the road that caused them to not be listening on both ports, users will see no website at all.

The alternate configuration would separate the WWW and HTTPS traffic into two different virtual groups. This would allow each server to show as "up" and active even though they're not listening on port 443 currently, but takes away the load-balancer's ability to link both the port 80 and port 443 traffic from a client to the same server persistently. If the standard and SSL sites for your configuration exist as wholly separate entities, this may be a valid alternative.

If you have any questions regarding the load-balancer's configuration please let us know. We will complete its configuration once you have supplied us with the following information:

01 - Which services/ports do you want to be load-balanced (e.g., WWW/80, HTTPS/443)?

No -- only using https for phpMyAdmin

02 - For each service/port being load-balanced, which servers (specifically, which servers' IPs) do you want the load-balancer to direct the traffic to?

example:

VIP 123.123.123.123 will redirect WWW traffic to:
192.168.1.111 (primary IP of server 12345)
192.168.1.112 (primary IP of server 12346)


Use only the web servers IPs that you want to load balance...
You can't do this with database servers.

03 - (answer this question only if you use both WWW and HTTPS sites) Do you want session persistence to be maintained between WWW and HTTPS sites?

No.

04 - Please note: we will configure the load-balancer to use the "least connections" load-balancing algorithm and also enable session persistence unless you specify otherwise.

OK.

Labels: ,


Managed Hosting: 24/7 support for absolute emergencies...

Are these canned email replies to trouble tickets really necessary?

Most people would just think it's an emergency because their site is down -- it's just simply UNAVAILABLE!

But I guess if the help just isn't there, you're SOL...

Seems to me though if you're in a managed hosting situation, someone on their side should know your site is down before you call.

This automated response is to make you aware that the DBA team is one
of the few teams that does not operate 24/7. While we do
offer 24/7 support for absolute emergencies and "production down"
situations, any issue that is not deemed an emergency will be addressed
during normal business hours (M-F 8am-5pm CST). If you feel this issue
is indeed an emergency, please contact your support team as soon
as possible!

For guidance, emergencies are typically defined as:

MySQL
* Refusing Connections
* Not starting
* RHCS Cluster Failure beyond successful failover Oracle
* Node down
* Database down
* Critical errors in Alert log
* Critical errors in Listener log
* Database refusing connections
* System errors tied to an Oracle process
* Restore needed SQL Server
* Service Unavailable
* Database in suspect mode
* After restart for an unexpected Server Down (Blue screen of Death) or Instance Down
* Server refusing connections

If you are unsure then please call the support line and discuss this
with your support team.

Labels:


Using the binary vs. char type in MySQL with PHP

In regards to using BINARY over CHAR, while this might be splitting hairs in terms of performance, the BINARY option is going to be more efficient in multiple fronts. The reason is that if you store an MD5 as a string, it requires 8 bits per character. But you are actually only representing 4 bits per character. If you store the string in binary, you can half the size of the field. It is also going to make the index smaller and more efficient since there is no need to compare strings - you are merely comparing numbers.

Unfortunately, to do this requires, some making some modifications to both MySQL and your application. First, you would need to update your existing data in MySQL. To do that, you could do something like:

mysql> CREATE TABLE newtable like oldtable;
mysql> INSERT INTO newtable SELECT UNHEX(md5field), field2, field3, ..., fieldN;

Then, you need to update your application. If you are using PHP, you can use the MD5 function:

$binaryMD5 = md5('my string', true);

The 'true' in the md5 function call tells PHP to return the binary version of the MD5 instead of string-based hex representation. You can then insert the data into MySQL as you did before, although since it is binary data, you may want to look at using prepared statements and the MySQLi extension.

So, as you can see, while it is not trivial, it's not terribly difficult, but is something you would absolutely need to test first before moving into production. It should give you better performance and lower your disk usage though so it something worth looking at. Some food for thought if nothing else.

MySQL buffer_size recommendations

To figure out the right settings for your database server, you really need a good MySQL DBA to take a look at your application's resource utilization and make decisions based on hard data.

Preliminary assessment for our current set up - after yet another crash:

My guess would be that the mysqld service crashed while writing to the table. Being as mysql is using 6G in global buffers alone and then trying to use 300 threads I bet it just ran out of RAM and crashed:

sort_buffer_size = 128M
read_buffer_size = 128M
read_rnd_buffer_size = 128M


Configured Max Per-thread Buffers : 131 G
Configured Max Global Buffers : 6 G
Configured Max Memory Limit : 138 G
Physical Memory : 7.78 G

I would recommend setting the per thread values to something reasonable to avoid outages:

sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
join_buffer_size=2M

Also:

Current key_buffer_size = 6 G
Using key_buffer_size > 4GB will cause instability in versions prior to 5.0.52
See Bug#5731, Bug#29419, Bug#29446

This particular table has frequent inserts and as such is locking frequently. If possible, consider converting this table to InnoDB. This is not possible if you are using fulltext or advisable if you use the COUNT() function against it a lot.

This was on a dedicated database server with 8GB RAM and RAID5.

This server had already been upgraded to RHE 64-bit...

Labels: , , ,


Friday, May 30, 2008

Converting a MyISAM table to InnoDB - pros and cons

Notes from a MySQL DBA:

With MyISAM tables, updates/insert cause table level locking. Converting to InnoDB will be useful as this uses MVCC + row level locking.

Things that can be done:
Convert to InnoDB
Create a numeric PK (auto_Increment) and UNIQUE KEY for hashid. InnoDB require primary keys (PK) for all your tables. If you don't define one, one will be invisibly defined for you. InnoDB best likes small fast PK. A big alphanumeric key in InnoDB might present performance issues.

InnoDB requires more RAM and hard drive space than MyISAM.

There are real benefits that you will gain by going all InnoDB:

Row level locking
Online snapshots for backups and/or slave provisioning
Tables cannot get corrupted (no REPAIR TABLE)

There are some caveats about InnoDB:

No full text indexing

No GEO Indexes

No merge tables

Table space never shrinks. If your tablespace grows to a point where it fills your current space, Even deleting the data will not shrink this file. An export/Import would be your only option. Of course this can be done on a slave and this server can later be promoted to a master.

More RAM and more HD space. InnoDB is RAM hungry. Its space footprint is roughly around 3x that of MyISAM. That is why with the database your size (62M) every byte in a column counts. A varchar takes the length of the text + 1 byte (in a 32 character long string that would be 33bytes if using single byte charset (latin1)) vs. a 4byte INT size. Multiply this by 64M rows, 2G in a varchar vs. 256M for an INT, on a single field column. So an InnoDB footprint would be 2-3x that size.

Optimizing your field type is especially important when the data grows like yours. Here is a good document about this:

http://dev.mysql.com/doc/refman/5.0/en/data-size.html

MySQL recommends having 2 RAIDS to deal with high IO database servers. 1 RAID1 for OS+binarylogs and 1 RAID10 for data. If you convert your database to InnoDB on the migration server. Once you’re ready to move to the new server, you would just flip the SQL traffic to the new server.


I recommend that we convert to InnoDB in the slave and not the live server as this operation does lock your table. I would not recommend converting to InnoDB without cleaning some unwanted indexes as these take a long time to build with a dataset this large. I would recommend that we make these conversions on a slave, which would be better provisioned for these operations.

--

Labels: , ,


Optimizing a large MySQL table that gets a lot of inserts

When you develop a web application that is database driven that is going to log a lot of data, you really have to consider the database structure carefully.

Here's an example of how a log table was optimized. This table had over 60 million rows and getting another million rows a day!

--

The problem with this table (in terms of speed) is that

It’s quite large (62M rows)
It’s heavily indexed.
Large PK (Primary Key) and fields (VARCHARS vs CHAR/INT)

The first one will make searching the data expensive and the second one will make inserting data also expensive. So basically its all bad in terms of read writes.

Having a large primary key might be an issue with Innodb. Also since all the data is an MD5, using CHAR(32) would provide faster lookups.

Things that can be done:

Partition the data to smaller tables like ‘log_01_08’, ‘log_02_08’, etc. And have your application hit the appropriate table when pulling reports.

Create summary tables with Triggers where a trigger will update a table upon an insert. This way if you need to pull stats on the table, you don’t have to query the huge table but the summary table. So on any insert in log table, update summary table with new counter if hits (for example). Of course this depends on the type reports that you’re pulling off this large table.

Trim the indexes to the ones you only need. Indexing all the fields in a table will make INSERTS (and restoring) a really expensive operation, as it has to update the index as well. You can also use multi-field indexes. If you have a query like:

SELECT hashid FROM logs WHERE hashid = ‘xxxx’ AND querydate > ‘date’

Then consider creating an index like (hashid,queryid) and not 2 separate indexes. Trim your indexes as much as you can. If these cannot be converted to numeric hashes, then consider only indexing the first 12-16 characters and not all 32 chars. This assumes that the first 12 chars are unique enough. Of course this does not apply to the primary key.

I see that your hash your fields, maybe consider using a numeric function to hash your text like CRC32. The advantage of this is that in your columns you would have numeric values and not alphanumeric like you have right now (MD5). This not only translates to a smaller filesystem footprint but a much faster index search.

Limitations of MySQL on 32-bit OS -- RedHat Linux Enterprise

Most developers will never run into this situation. But if you're running a Linux server RHE 32-bit OS and have a large MySQL database, watch out!

It took a few days of crashing this database with a table that has over 14GB of data.

The index on that table had grown over 2GB and was causing MySQL server to page the data (writing temp data to the hard drive instead of RAM). Disk I/O is thousands of times slower than RAM, and being over the MySQL limit on the 32-bit OS was just a recipe for disaster.

Oh, and also, at first the techs determined that our MySQL temp directory was full... so that was another major crash.

The solution included:
- upgrade the OS to 64-bit RHE
- change the storage engine to InnoDB
- get bigger drives
- set up the drives in a RAID 10 configuration
- optimize the various indexes that are used in JOIN statements...
- make sure the temp directory partition will have plenty of room

Of course that's just the beginning -- in the right direction.

-j

Labels: , , ,


Friday, May 9, 2008

ssh client for Windows with command line - copSSH

I just looked at pretty much everything available out there.

I like this one the best, and it's free. This is just a simple add-on that will give your cmd.exe additional functionality like ssh. I've used it on Windows 2000 and Windows XP.

Download copSSH - OpenSSH for Windows from


http://sourceforge.net/project/showfiles.php?group_id=69227&package_id=127780&release_id=615115




copSSH is an ssh server and client implementation for windows systems. It is a yet another packaging of portable openssh, cygwin, some popular utilites, plus implementation of some best practices regarding security. You can use COPSSH for remote administration of your systems or gathering remote information in a secure way.



Here's a list of the commands available:

Directory of C:\Program Files\copSSH\bin

[.] [..] bash.exe cat.exe
chgrp.exe chmod.exe chown.exe copsshadm.exe
cp.exe cygcheck.exe cygcrypt-0.dll cygcrypto-0.9.8.dll
cygiconv-2.dll cygintl-1.dll cygintl-2.dll cygintl-3.dll
cygintl-8.dll cygminires.dll cygncurses-8.dll cygpath.exe
cygpcre-0.dll cygpcre.dll cygpopt-0.dll cygreadline6.dll
cygrunsrv.exe cygwin1.dll cygwrap-0.dll cygz.dll
d2u.exe date.exe diff.exe false.exe
grep.exe groups gzip.exe less.exe
ln.exe ln.exe.manifest ls.exe mkdir.exe
mkgroup.exe mkpasswd.exe mount.exe mv.exe
ps.exe pwd.exe rm.exe scp.exe
sed.exe sftp-server.exe sftp.exe sftponly
sort.exe ssh-add.exe ssh-agent.exe ssh-keygen.exe
ssh-keygen.exe.manifest ssh-keyscan.exe ssh-keysign.exe ssh.exe
sshd.exe tail.exe tar.exe touch.exe
u2d.exe umount.exe UserActivationWizard.exe UserDeactivationWizard.exe
wc.exe


You don't need to download Cygwin separately since copSSH has its own copy in the installer.

After you install copSSH, just add the path to the bin directory to your system's Path.

eg. C:\Program Files\copSSH\bin

You can open the CMD shell from the Start menu or from the normal cmd.exe.


C:\WINDOWS\system32>ssh
usage: ssh [-1246AaCfgKkMNnqsTtVvXxY] [-b bind_address] [-c cipher_spec]
[-D [bind_address:]port] [-e escape_char] [-F configfile]
[-i identity_file] [-L [bind_address:]port:host:hostport]
[-l login_name] [-m mac_spec] [-O ctl_cmd] [-o option] [-p port]
[-R [bind_address:]port:host:hostport] [-S ctl_path]
[-w local_tun[:remote_tun]] [user@]hostname [command]


You may also want to check the QuickEdit Mode in your cmd.exe window's Options tab. This will let you copy text by selecting it and hitting the Enter key, or right-click into the window to paste any text in your Clipboard.

That's it. Nothing else to download. It just works.



credits: http://www.itefix.no/

-J

Using mysqldump and gzip to move a database to another server

When your database is large enough that using phpMyAdmin is not feasible, you can use mysqldump to move a database to a new server.


This looks like a reasonable tutorial, which came up with a google search:


http://www.mydigitallife.info/2007/07/21/how-to-backup-and-restore-export-and-import-mysql-databases-tutorial/


Essentially all you do is:

# mysqldump dbname > whatever.sql
-- or --
# mysqldump --all-databases > mysqldump.sql

If you import the "all databases" file, it should be into a clean mysql installation. Do it like this:
# mysql < mysqldump.sql

To import a single DB dump it's just about as simple as:
# mysql
# create database dbname;
# mysql dbname < dbname.sql

after you send it to the new server.

If the database is over .5 GB, I would use gzip to compress the output file before moving it to the new server:

Make the SQL dump file:
mysqldump -u username -ppassword database_name > dump.sql | gzip > outputfile.sql.gz


Then on your new server:
gunzip < outputfile.sql.gz | mysql -u username -ppassword database_name < dump.sql

Wednesday, May 7, 2008

Adwords, Googlebot/2.0, and AdsBot-Google

I noticed this in my logs. Looks like when you use your Adwords account to analyze your ads and upload a campaign, Google sends a bunch of queries to analyze your landing pages. And it's not just the AdsBot, you also get the actual Googlebot visiting as well... Interesting.



05/07/2008 12:32 AM
Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)
66.249.84.12


05/07/2008 12:25 AM
Mozilla/5.0 (compatible; Googlebot/2.0; +http://www.google.com/bot.html)
66.249.84.12


05/07/2008 12:06 AM
Mozilla/5.0 (compatible; Googlebot/2.0; +http://www.google.com/bot.html)
66.249.85.131


05/07/2008 12:00 AM
Mozilla/5.0 (compatible; Googlebot/2.0; +http://www.google.com/bot.html)
74.125.16.4


05/06/2008 11:50 PM
Mozilla/5.0 (compatible; Googlebot/2.0; +http://www.google.com/bot.html)
66.249.85.131


05/06/2008 11:35 PM
Mozilla/5.0 (compatible; Googlebot/2.0; +http://www.google.com/bot.html)
66.249.84.12


05/06/2008 11:33 PM
AdsBot-Google (+http://www.google.com/adsbot.html)
209.85.238.28


05/06/2008 11:25 PM
Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)
72.14.195.38


05/06/2008 10:34 PM
Mozilla/5.0 (compatible; Googlebot/2.0; +http://www.google.com/bot.html)
72.14.195.38

Tuesday, May 6, 2008

Google Adwords API: Using getNewKeywordEstimate() vs. getCampaignEstimate()

It seems the Google Adwords API doesn't give you consistent results for cost and traffic data. The numbers can change radically during the same 24hr period.
How are we supposed to get reliable stats from this?


Here's the response from Adwords staff:


Using getNewKeywordEstimate() function is not the best choice if you want the most accurate estimates. When you use estimateKeywordList() method, which is what being called by getNewKeywordEstimate(), the keywords are estimated as if they were partof a single new ad group in a single new campaign with global targeting.

See,http://www.google.com/apis/adwords/developer/TrafficEstimatorService.html#es.

This obviously has affect on the resulting estimates. For more accurateestimates, it is recommended to use estimateCampaignList() function, whereyou can actually set the targeting.
See,http://www.google.com/apis/adwords/developer/TrafficEstimatorService.html#es.


Does this mean that the only way to get consistent numbers is to first add a campaign, add an adgroup with keywords and then run that function?

Is this what the web interface does?

Isn't there an easier way to just get reliable numbers before you actually create the campaign? Seems like a lot of overhead just to research keywords and get cost/traffic estimates.


Look at these crazy numbers:


































































































































































keyworddatetimelowerAveragePositionupperAveragePositionlowerCostPerClickupperCostPerClicklowerClicksPerDayupperClicksPerDay
dog trainer5/1/2008 0:42135.1858266.4822835.3953826.83415
dog trainer5/1/2008 1:42135.1858266.4822835.3953826.83415
dog trainer5/1/2008 2:42135.1858266.4822835.3953826.83415
dog trainer5/1/2008 3:42135.1858266.4822835.3953826.83415
dog trainer5/1/2008 4:42135.1858266.4822835.3953826.83415
dog trainer5/1/2008 5:42135.1858266.4822835.3953826.83415
dog trainer5/1/2008 6:42135.1858266.4822835.3953826.83415
dog trainer5/1/2008 7:43133.1105323.88816584.62508105.8072
dog trainer5/1/2008 8:43133.1105323.88816584.62508105.8072
dog trainer5/1/2008 9:43133.1105323.88816584.62508105.8072
dog trainer5/1/2008 10:43133.1105323.88816584.62508105.8072
dog trainer5/1/2008 11:43133.1105323.88816584.62508105.8072
dog trainer5/1/2008 12:43133.1105323.88816584.62508105.8072
dog trainer5/1/2008 13:43133.1105323.88816584.62508105.8072
dog trainer5/1/2008 14:43133.1105323.88816584.62508105.8072