If you have read my previous post, “Map-Reduce With Ruby Using Hadoop“, then you will know that firing up a Hadoop cluster is really simple when you use Whirr. Without even ssh’ing on the machines in the cloud you can start-up your cluster and interact with it. In this post I’ll show you that it is just as easy to fire up a Cassandra cluster on Amazon EC2.
Install Whirr
I will fly through the setup of Whirr quite quickly. All the commands you need are here, but if you want a more thorough explanation then see my other post, “Map-Reduce With Ruby Using Hadoop“.
I am assuming that you have Homebrew installed.
sudo brew update
sudo brew install maven
mkdir ~/src/cloudera
cd ~/src/cloudera
wget https://archive.cloudera.com/cdh/3/whirr-0.1.0+23.tar.gz
tar -xvzf whirr-0.1.0+23.tar.gz
cd whirr-0.1.0+23
mvn clean install
mvn package -Ppackage
Be patient with the above. There is a lot to install, so it will take some time. Maven installs a lot of dependencies if it is your first time using it.
The good news is that from here on you are setup to easily fire-up your Amazon EC2 cluster for Cassandra, or Hadoop if you choose.
Whirr Configuration File
We will need to make a configuration file for Whirr to tell it that we want to launch a Cassandra cluster with 3 nodes. If you are brave, patient and have the cash, then you could just as easily fire-up a 100 node cluster (leave a comment if you do – there may be prizes!).
You will need to create a cassandra.properties file with the following contents…
whirr.service-name=cassandra
whirr.cluster-name=mycassandracluster
whirr.instance-templates=3 cassandra
whirr.provider=ec2
whirr.identity=<YOUR_AMAZON_EC2_ACCESS_KEY_ID_GOES_HERE>
whirr.credential=<YOUR_AMAZON_EC2_SECRET_ACCESS_KEY_GOES_HERE>
whirr.private-key-file=${sys:user.home}/.ssh/id_rsa
Replace the obvious fields with your Amazon EC2 Access Key ID and Amazon EC2 Secret Access Key.
Launch Your Cluster
Now you are ready to fire-up your Cassandra cluster. Simply use the following command and then be prepared to wait 5-10 minutes while Amazon builds your machines. This time is variable. Sometimes Amazon is quick, sometimes not so quick.
bin/whirr launch-cluster --config cassandra.properties
Launching mycassandracluster cluster
Configuring template
Starting 3 node(s)
Nodes started: [[id=us-east-1/i-13f25e7f, providerId=i-13f25e7f, tag=mycassandracluster, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-74f0061d, os=[name=null, family=amzn-linux, version=2010.11.1-beta, arch=paravirtual, is64Bit=true, description=amazon/amzn-ami-2010.11.1-beta.x86_64-ebs], userMetadata={}, state=RUNNING, privateAddresses=[10.204.99.163], publicAddresses=[50.16.155.106], hardware=[id=t1.micro, providerId=t1.micro, name=t1.micro, processors=[[cores=1.0, speed=1.0]], ram=630, volumes=[[id=vol-1657d47e, type=SAN, size=null, device=/dev/sda1, durable=true, isBootDevice=true]], supportsImage=hasRootDeviceType(ebs)]], [id=us-east-1/i-17f25e7b, providerId=i-17f25e7b, tag=mycassandracluster, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-74f0061d, os=[name=null, family=amzn-linux, version=2010.11.1-beta, arch=paravirtual, is64Bit=true, description=amazon/amzn-ami-2010.11.1-beta.x86_64-ebs], userMetadata={}, state=RUNNING, privateAddresses=[10.117.43.129], publicAddresses=[50.16.85.79], hardware=[id=t1.micro, providerId=t1.micro, name=t1.micro, processors=[[cores=1.0, speed=1.0]], ram=630, volumes=[[id=vol-1457d47c, type=SAN, size=null, device=/dev/sda1, durable=true, isBootDevice=true]], supportsImage=hasRootDeviceType(ebs)]], [id=us-east-1/i-11f25e7d, providerId=i-11f25e7d, tag=mycassandracluster, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-74f0061d, os=[name=null, family=amzn-linux, version=2010.11.1-beta, arch=paravirtual, is64Bit=true, description=amazon/amzn-ami-2010.11.1-beta.x86_64-ebs], userMetadata={}, state=RUNNING, privateAddresses=[10.117.46.170], publicAddresses=[184.73.100.203], hardware=[id=t1.micro, providerId=t1.micro, name=t1.micro, processors=[[cores=1.0, speed=1.0]], ram=630, volumes=[[id=vol-e857d480, type=SAN, size=null, device=/dev/sda1, durable=true, isBootDevice=true]], supportsImage=hasRootDeviceType(ebs)]]]
Authorizing firewall
Running configuration script
Completed launch of mycassandracluster
Started cluster of 3 instances
Cluster{instances=[Instance{roles=[cassandra], publicAddress=/50.16.85.79, privateAddress=/10.117.43.129}, Instance{roles=[cassandra], publicAddress=/50.16.155.106, privateAddress=/10.204.99.163}, Instance{roles=[cassandra], publicAddress=/184.73.100.203, privateAddress=/10.117.46.170}], configuration={}}
You now have your very own Cassandra cluster running in the cloud. Not so hard, hey!
Connect From Ruby
I will be following this post with step-by-step guide on how you can interact with your new cluster from your Ruby On Rails application. I recommend subscribing to the RSS feed to get updates to the blog.
Shutdown The Cluster
Here is how you can shutdown your cluster.
bin/whirr destroy-cluster --config cassandra.properties
Destroying mycassandracluster cluster
Cluster mycassandracluster destroyed
Conclusion
Whirr makes it very easy to start and stop a Cassandra cluster in the cloud without leaving the comfort of your laptop. What you do with that cluster is up to you, but I will be give you some ideas of what you could do in future posts.
Thanks for the excellent post. I tried setting up the cluster and while it did create three ec2 instances, Cassandra was not started on any of them. The version installed was 6.x as well. Do you have any further details on how to make sure Cassandra starts on each node? Also, I would like to completely control which version of Cassandra gets installed on the nodes. Any thoughts about how to do that? Ideally I would like to control what base version of Linux is used as well.
Thanks!
HI,
Where do we specify hostname or ipaddress of cluster machine when installing cluster using whirr?
Can we install hadoop stable version or hadoop-2.0.0-alpha version using whirr
Should we use whirr for cluster installation at production?
regards,
rashmi
Few Question.
* If i setup Cassandra in EC2 cloud, I will be charged for using the cloud?
* I tried to register in Amazon cloud , they says it’s free but ask for credit card details, and i am sure they would be charging for bandwidth usages.
Thanks Chandra
Hi Chandra,
You will be charged for any resources you use on Amazon EC2. They do have a “free tier”, but you will unlikely be able to set up a cluster on this. See https://aws.amazon.com/free/faqs/
Thanks,
Phil