Amazon, Linux, Nginx, blitz.io, New Relic, Oh My...

| Comments

It’s a post about using fancy, contemporary tools, top of its breed, just to show you how many - once cumbersome, now really easy to setup - things can be done the smart way. We will have virtual machine running on Amazon’s cloud, nginx web server on Ubuntu Linux installation, WordPress on steroids W3 Total Cache, with New Relic installed for monitoring, all load tested by Blitz. I can’t stop myself from mentioning I’m writing this blog post using Sublime Text 2, using Markdown so I could compile it later for Octopress, not worrying about the files, because I’m using git to back it up in remote location.

I’ve been inspired recently by the Ewan’s blog entry and so I wanted to try it myself. I’m still quite new to *ux world of operating systems, so a few of the steps will mostly require me to google out a lot of things, things I’d like to share with you dear reader. This might be especially interesting for rookies, as I’ll try to be pretty detailed describing what I’ve done to get this setup up and running. Like Ewan in his original work I also wanted to host WordPress, for selfish reasons - this is something we’ve been working on for a while at work. The only difference is we’ll be hosted on Windows, IIS. Wanted to see in comparison how easily I’ll be able to get it all up and running in the cloud.

I’ve recently had a few chances, on different occasions, to use Amazon’s EC2 services and I must admit I’m impressed and it’s not that easy to impress me, I’ll tell you that. It’s reasonably priced, it’s fast, it’s secure by default, it’s super easy to setup (as soon as you learn finding your way with lots of new terminology, as AWS - Amazon Web Services - offering has really a lot of different services to offer, each one of them), really - pure awesomeness. I’ll be making side-notes for you once in a while, due to a little bit of background in Amazon’s WS offerings - it’s likely you’ll find out something interesting.

First step is to request an account for Amazon Web Services. After you’re done head over to the EC2 tab (EC2 is ‘Elastic Cloud’ and allows creation of the Virtual Machines, with various configurations of OS/CPU/memory) and click ‘Launch Instance’ button located just under “My Instances” section. This brings up ‘Request Instances Wizard’ which allows you to pick the server. As in Ewan’s post - pick Ubuntu 11.10: Picking a server in AWS console

In the next step, pick Micro instance (this is sufficient for our setup), like so: Choosing Instance Type

Leave rest of the settings with default values. On ‘Instance Details’ step you may add some metadata for your instance, for example I’ve used following keys/values: Custom tags for EC2 server instance

If you’re requesting new machine from Amazon, you’ll also need a Key Pair - in order to be able to receive administrator password for the machine later on (if you’ve already went through this procedure before - you might re-use your key obtained then). This is unless you would be restoring the machine you’ve prepared beforehand, with Administrator account already setup and one you know password for - then no Key Pair is needed. Wizard is pretty informative about choices you need to make on your way.

You need to pay attention on the ‘Configure Firewall’ screen. All of these settings may be modified later on, just think about how you’d like your server to be visible/accessible over the Internet. Don’t expose too much. For example - for ssh connection I only enabled access from the ip of my machine at home. Of course ip could be spoofed, I just don’t think it would be worthwhile to go through that much effort to get to my personal machine (e.g. hosting my blog), so as a security measure, it’s good enough. You might also want to enable HTTP access for at least your ip (again - depending on what you’d like to use machine for). In this particular case - we should enable access from everywhere - it will allow us to use external tooling for our load testing planned for later on.

Finally you’ll get to the summary screen, scan through the configuration of the machine you’re requesting to make sure this is what you wanted, and happily press ‘Launch’ button to start your instance. Shortly enough (minutes) you will have your machine up and running.

With your shiny new machine running in the cloud you should now connect to it via ssh. This is actually the first place in Ewan’s entry that didn’t work for me as he describes. I had both of the private keys - id_dsa and id_rsa in .ssh folder, but when I was trying to connect to the server I’d get a message (debugging enabled, just use ssh -i):

connecting to ec2-46-137-2-17.eu-west-1.compute.amazonaws.com
1
2
3
4
5
6
debug1: Offering RSA public key: /Users/geekbeing/.ssh/id_rsa
debug1: Authentications that can continue: publickey
debug1: Offering DSA public key: /Users/geekbeing/.ssh/id_dsa
debug1: Authentications that can continue: publickey
debug1: No more authentication methods to try.
Permission denied (publickey).

Took me a while but then I suddenly realized - how on earth could Amazon know about my .ssh keys? They couldn’t - I thought I’d need to use a key I agreed upon with Amazon. This is where Key Pair key is needed. So I’ve copied my .pem file downloaded during EC2 Ubuntu server setup to my .ssh folder and retried ssh connection, this time providing the key, like so (debugging still enabled until the issue will be resolved):

second attempt connecting to ec2-46-137-2-17.eu-west-1.compute.amazonaws.com
1
ssh -v -i macbook-home.pem [email protected]

This time debug message made sense instantly:

connecting to ec2-46-137-2-17.eu-west-1.compute.amazonaws.com
1
Permissions 0644 for 'macbook-home.pem' are too open.

During investigation of the ssh connection error earlier on, I’ve read somewhere correct permissions should be 06000, so I’ve run:

setup permissions for .pem file
1
chmod 0600 macbook-home.pem

This time upon another login attempt I’ve finished with error message telling me not to use root user for access (rightfully so):

don’t use ‘root’ access, stupid
1
Please login as the user "ubuntu" rather than the user "root".

But this is very good, we’re actually getting somewhere - what is more - not only error messages point me exactly to what’s wrong - as a bonus they make total sense too! Finally, with

all right, let’s try one more time
1
ssh -v -i macbook-home.pem [email protected]

Next steps are taken 1:1 from Ewan’s blog We’re in! For the purpose of my installation, as we’ve already setup Security Group in AWS management console, I’m skipping firewall setup from Ewan’s blog, heading straight to installing MySQL, which can be easily done with:

installing MySQL
1
2
apt-get update
apt-get install mysql-server

During installation you’ll be prompted to setup password for root user in MySQL, enter the password twice, don’t leave it blank. After installation is done - connect to MySQL:

connect to newly installed MySQL instance
1
mysql -u root -p

You’ll be asked for credentials you’ve setup for the ‘root’ user upon installation - enter these and then run following commands at mysql prompt - this assumes you’ll be hosting WordPress and MySQL on the same server - hence ‘localhost’ appearing in SQL commands:

preparing database for WordPress
1
2
3
4
5
CREATE DATABASE wordpress;
GRANT ALL PRIVILEGES ON wordpress.* TO
"wp_user"@"localhost" IDENTIFIED BY "PASSWORD_YOU_WANT_FOR_WP_USER";
FLUSH PRIVILEGES;
EXIT

Next step is to install and configure PHP. Not only PHP but also PHP FPM, PHP APC and MySQL module for PHP. Run the following command:

installing PHP
1
apt-get install php5-fpm php-pear php5-common php5-mysql php-apc

Configure PHP for APC, by adding following lines at the bottom of php.ini file (you can use e.g. vi editor. Just use ‘vi /etc/php5/fpm/php.ini’ command, then press ‘G’ to jump to the bottom of the file, ‘i’ to enter Insert mode in vi, then enter those lines. When you’re done - press ESC to exit Insert mode and finally ‘:wq” to write file to disk and quit vi:

configure php.ini
1
2
3
[apc]
apc.write_lock = 1
apc.slam_defense = 0

Now you need to configure PHP for nginx. These are the changes you need to do, starting with ‘vi /etc/php5/fpm/pool.d/www.conf’ and going into Insert mode as previously - so first replace

configure www.conf, step one
1
listen = 127.0.0.1:9000

with

configure www.conf, step one
1
listen = /dev/shm/php-fpm-www.sock

then insert the following lines just below the line you’ve just modified:

configure www.conf, step two
1
2
3
listen.owner = nginx
listen.group = nginx
listen.mode = 0660

then change

configure www.conf, step three
1
2
user = www-data
group = www-data

to

configure www.conf, step three
1
2
user = nginx
group = nginx

We’re done with that part. Now we need to install nginx itself. From now on, since I haven’t actually modified anything in Ewan’s description, just follow the instructions on his blog or from nginx website, Ewan has just reused it on his blog.

If it all went down quickly and smoothly (as it was in my case) - you’ve ended up with your WordPress installation at this point. If you want to see mine, running in the cloud - feel free accessing it http://ec2-46-137-2-17.eu-west-1.compute.amazonaws.com/.

It was really fun, now we can play a little bit with Blitz to see ourselves how much load we can put our server under and still have reasonable response times. In order to collect more information we should set up our WordPress installation with e.g. Google Analytics, or to have different kind of statistics and more detailed reports (e.g. about SQL queries being run, CPU or memory usage, etc…) - I will also go for New Relic. With all of these setup, we can now get back to Blitz, register for an account and test response time of our instance, by entering the URL to the text box and clicking ‘Run’ we will receive results like so: Blitz.io test run results As you can see this also includes suggestion how you can run load tests on your site. Handy enough - this text in greenish color is actually a link you can click to start the first load test! Clicking it issues a request to Blitz to load test the application under the given url, however, this first attempt ends up with the message:

Piotr, we have a little hiccup
1
2
3
4
5
6
7
Prove to us you own this app and we are happy to let you rush it.
Just make sure that one of the following URLs is reachable
and returns 42 as the content.
This helps us understand the meaning of it all.

Status    Authorization URL   
404   /mu-eb1fb975-6fa67f08-xxxxxxxx-xxxxxxxx

What can I see. Funny, geeky, simply 1337. Ok then, let’s set our WordPress installation up for authorization with Blitz. I wasn’t able to google out how to setup routing in order for authorization URL to return 42, so I went with the option of using .txt file. You just create one in the folder where you have all of your WordPress files, in my case it was mu-eb1fb975-6fa67f08-xxxxxxxx-xxxxxxxx.txt, fill it with 42 and next time you try running rush Blitz will do its magic.

When I’ve run it for the first time, I effectively killed the site. It seemed like it was doing pretty well, but that’s only before I got to 200 concurrent requests. Then it died, got up for a 10 more seconds before finally giving up: Dying WordPress

I’d be awesome to know more about what happened server side. Can you imagine the hell there? So as mentioned before I decided to install New Relic’s software for monitoring. Signing Up is really simple and then you just need to install their component on the server in order to start collecting detailed information about health of your server and application. Once again - they make it really simple - you just pick PHP from the list of the available languages/platforms, pick Debian-based Linux (we are using Ubuntu in this case) and you get nice and easy to follow step-by-step instructions for the installation.

On the next few attempts I wanted to be a little less brutal, seems like command

performing rush using blitz.io with 100 concurrent requests
1
-p 1-100:60 http://ec2-46-137-2-17.eu-west-1.compute.amazonaws.com/

Results for rush with 100 concurrent requests thanks to Blitz.io almost allowed me to be kind - my poor WordPress died by the very end of the test. Plateau at the beginning is just a WordPress not fully recovered after previous rush ;). In the rushes that followed - site was dying after circa 50 seconds, I just didn’t get to take the screenshots. The detailed report says:

‘This rush generated 277 successful hits in 1.0 min and we transferred 272.31 KB of data in and out of your app. The average hit rate of 4.39/second translates to about 379,705 hits/day. You got bigger problems though: 87.53% of the users during this rush experienced timeouts or errors! (…) The first timeout happened at 5.01 seconds into the test when the number of concurrent users was at 9. Looks like you’ve been rushing with a timeout of 1 second.’

New Relic’s monitoring has already been running in background - but more about the results later in this post.

Seems like it’s time for some tweaks. First of all, we will enable W3 Total Cache. Head over to the admin dashboard under your WordPress installation, go to Plugins->Add New->Search. Find this plugin and install it. When installation is complete, click ‘Activate Plugin’. In the plugin’s settings enable Database Cache and Object Cache. Additionally, everywhere where it’s possible pick from the drop-down ‘Opcode: Alternative PHP Cache (APC)’. Save all settings. Click ‘Deploy’ on the very top of the settings page. You should see a message appearing ‘Preview settings successfully deployed’.

Let’s get back to Blitz.io and re-run the tests with the same settings as previously, so you’d have some data to compare against. For me, results are simply astonishing: Results for rush with 100 concurrent requests, W3 Total Cache enabled First of all - my WordPress instance didn’t die! This is way better then before. But not only that - check out how much more we’ve actually accomplished:

‘This rush generated 2,272 successful hits in 1.0 min and we transferred 21.19 MB of data in and out of your app. The average hit rate of 36/second translates to about 3,131,096 hits/day. You got bigger problems though: 0.18% of the users during this rush experienced timeouts or errors!’.

Almost ten times more successful hits, over ten times more of data transferred. And only (ok, it’s still too much, but you get the point) one user expected timeout! This is very encouraging, however - re-running the rush with 250 concurrent users leads to a situation where server stops responding. By the end of the test ‘You got bigger problems though: 5.21% of the users during this rush experienced timeouts or errors!’. This is way too many. Desperate time need desperate measures (or something) - so besides having W3 Total Cache installed we will also install Varnish 3 - ‘Varnish Cache is a web application accelerator also known as a caching HTTP reverse proxy’. For reference on how to do that - please take a look at Ewan’s blog.

With Varnish installed - restart nginx (service nginx restart) and Varnish (service varnish restart) let’s re-run the rush, with following parameters:

performing rush using blitz.io with 250 concurrent requests
1
-p 1-250:60 http://ec2-46-137-2-17.eu-west-1.compute.amazonaws.com/

My mind is literally blown away. Really cheap machine, powerful software (even though everything is available for free, at least in basic version) and something that not long ago seemed to require lots of money and effort - and take a look at the results: Blitz.io rush results, with W3 Super Cache and Varnish 3 installed

‘This rush generated 5,681 successful hits in 1.0 min and we transferred 50.13 MB of data in and out of your app. The average hit rate of 90/second translates to about 7,851,141 hits/day.’ This means we had 5,681 hits during which users have not experienced a single time-out nor error! It is a truly an outstanding result. What is interesting - with index page being bigger in size and by re-running the tests I was able to maintain equally good results, even though amount of transferred data rised to ‘106.65 MB of data in and out of your app’.

And here, just to give you idea about the difference - a screenshot from the statistics gathered by New Relic: Response time and throughput - before and after enabling W3 Total Cache and Varnish 3

This is where Ewan stopped (minus New Relic, minus detailed instructions for connecting to EC2 instance, minus solutions to the problems when following Ewan’s instructions) - but to me it’s more or less half of the way. Because of the requirements for the setup I’m going to use for our project at work, during next few weeks I’ll be doing some research and blog about my findings right after. Here’s the list of topics:

  1. Elastic Load Balancing
  2. Amazon Relational Database Service (Amazon RDS) (beta)
  3. Memcached
  4. NCache

Might take a little while to implement, as most of the hacking is done during the night, which isn’t as easy-peasy-lemon-squeezy to do as it might sound, with two little kids at home and daily job and moving to the house we’ve just built. Ok, enough wining, getting back to work.

Comments