Posts

Quickly Launch A Cassandra Cluster On Amazon EC2

If you have read my previous post, “Map-Reduce With Ruby Using Hadoop“, then you will know that firing up a Hadoop cluster is really simple when you use Whirr. Without even ssh’ing on the machines in the cloud you can start-up your cluster and interact with it. In this post I’ll show you that it is just as easy to fire up a Cassandra cluster on Amazon EC2.

Install Whirr

I will fly through the setup of Whirr quite quickly. All the commands you need are here, but if you want a more thorough explanation then see my other post, “Map-Reduce With Ruby Using Hadoop“.

I am assuming that you have Homebrew installed.

sudo brew update
sudo brew install maven
mkdir ~/src/cloudera
cd ~/src/cloudera
wget https://archive.cloudera.com/cdh/3/whirr-0.1.0+23.tar.gz
tar -xvzf whirr-0.1.0+23.tar.gz
cd whirr-0.1.0+23
mvn clean install
mvn package -Ppackage

Be patient with the above. There is a lot to install, so it will take some time. Maven installs a lot of dependencies if it is your first time using it.

The good news is that from here on you are setup to easily fire-up your Amazon EC2 cluster for Cassandra, or Hadoop if you choose.

Whirr Configuration File

We will need to make a configuration file for Whirr to tell it that we want to launch a Cassandra cluster with 3 nodes. If you are brave, patient and have the cash, then you could just as easily fire-up a 100 node cluster (leave a comment if you do – there may be prizes!).

You will need to create a cassandra.properties file with the following contents…

whirr.service-name=cassandra
whirr.cluster-name=mycassandracluster
whirr.instance-templates=3 cassandra
whirr.provider=ec2
whirr.identity=<YOUR_AMAZON_EC2_ACCESS_KEY_ID_GOES_HERE>
whirr.credential=<YOUR_AMAZON_EC2_SECRET_ACCESS_KEY_GOES_HERE>
whirr.private-key-file=${sys:user.home}/.ssh/id_rsa

Replace the obvious fields with your Amazon EC2 Access Key ID and Amazon EC2 Secret Access Key.

Launch Your Cluster

Now you are ready to fire-up your Cassandra cluster. Simply use the following command and then be prepared to wait 5-10 minutes while Amazon builds your machines. This time is variable. Sometimes Amazon is quick, sometimes not so quick.

bin/whirr launch-cluster --config cassandra.properties


Launching mycassandracluster cluster
Configuring template
Starting 3 node(s)
Nodes started: [[id=us-east-1/i-13f25e7f, providerId=i-13f25e7f, tag=mycassandracluster, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-74f0061d, os=[name=null, family=amzn-linux, version=2010.11.1-beta, arch=paravirtual, is64Bit=true, description=amazon/amzn-ami-2010.11.1-beta.x86_64-ebs], userMetadata={}, state=RUNNING, privateAddresses=[10.204.99.163], publicAddresses=[50.16.155.106], hardware=[id=t1.micro, providerId=t1.micro, name=t1.micro, processors=[[cores=1.0, speed=1.0]], ram=630, volumes=[[id=vol-1657d47e, type=SAN, size=null, device=/dev/sda1, durable=true, isBootDevice=true]], supportsImage=hasRootDeviceType(ebs)]], [id=us-east-1/i-17f25e7b, providerId=i-17f25e7b, tag=mycassandracluster, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-74f0061d, os=[name=null, family=amzn-linux, version=2010.11.1-beta, arch=paravirtual, is64Bit=true, description=amazon/amzn-ami-2010.11.1-beta.x86_64-ebs], userMetadata={}, state=RUNNING, privateAddresses=[10.117.43.129], publicAddresses=[50.16.85.79], hardware=[id=t1.micro, providerId=t1.micro, name=t1.micro, processors=[[cores=1.0, speed=1.0]], ram=630, volumes=[[id=vol-1457d47c, type=SAN, size=null, device=/dev/sda1, durable=true, isBootDevice=true]], supportsImage=hasRootDeviceType(ebs)]], [id=us-east-1/i-11f25e7d, providerId=i-11f25e7d, tag=mycassandracluster, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-74f0061d, os=[name=null, family=amzn-linux, version=2010.11.1-beta, arch=paravirtual, is64Bit=true, description=amazon/amzn-ami-2010.11.1-beta.x86_64-ebs], userMetadata={}, state=RUNNING, privateAddresses=[10.117.46.170], publicAddresses=[184.73.100.203], hardware=[id=t1.micro, providerId=t1.micro, name=t1.micro, processors=[[cores=1.0, speed=1.0]], ram=630, volumes=[[id=vol-e857d480, type=SAN, size=null, device=/dev/sda1, durable=true, isBootDevice=true]], supportsImage=hasRootDeviceType(ebs)]]]
Authorizing firewall
Running configuration script
Completed launch of mycassandracluster
Started cluster of 3 instances
Cluster{instances=[Instance{roles=[cassandra], publicAddress=/50.16.85.79, privateAddress=/10.117.43.129}, Instance{roles=[cassandra], publicAddress=/50.16.155.106, privateAddress=/10.204.99.163}, Instance{roles=[cassandra], publicAddress=/184.73.100.203, privateAddress=/10.117.46.170}], configuration={}}

You now have your very own Cassandra cluster running in the cloud. Not so hard, hey!

Connect From Ruby

I will be following this post with step-by-step guide on how you can interact with your new cluster from your Ruby On Rails application. I recommend subscribing to the RSS feed to get updates to the blog.

Shutdown The Cluster

Here is how you can shutdown your cluster.

bin/whirr destroy-cluster --config cassandra.properties

Destroying mycassandracluster cluster
Cluster mycassandracluster destroyed

Conclusion

Whirr makes it very easy to start and stop a Cassandra cluster in the cloud without leaving the comfort of your laptop. What you do with that cluster is up to you, but I will be give you some ideas of what you could do in future posts.

Comments

  1. Kevin Rood

    Thanks for the excellent post. I tried setting up the cluster and while it did create three ec2 instances, Cassandra was not started on any of them. The version installed was 6.x as well. Do you have any further details on how to make sure Cassandra starts on each node? Also, I would like to completely control which version of Cassandra gets installed on the nodes. Any thoughts about how to do that? Ideally I would like to control what base version of Linux is used as well.

    Thanks!

  2. rashm

    HI,

    Where do we specify hostname or ipaddress of cluster machine when installing cluster using whirr?

    Can we install hadoop stable version or hadoop-2.0.0-alpha version using whirr

    Should we use whirr for cluster installation at production?

    regards,
    rashmi

  3. Chandra Shekher Gupta

    Few Question.

    * If i setup Cassandra in EC2 cloud, I will be charged for using the cloud?
    * I tried to register in Amazon cloud , they says it’s free but ask for credit card details, and i am sure they would be charging for bandwidth usages.

    Thanks Chandra