Grabbing The Source Code
On the source repository for Whirr, we can find the latest and greatest source code for Whirr.
We will use the code from trunk, which will give you the very latest version of the software.
Even though version 0.2.0 is currently the most recent stable version we can download on the official Whirr site, 0.3.0 is available here under the tagged releases and hence the trunk. So let’s download the source for that. Version 0.3.0 will soon be made the official release, but by following this step-by-step guide, you will be ahead of the pack.
cd ~/src
svn co https://svn.apache.org/repos/asf/incubator/whirr/trunk whirr-trunk
cd whirr-trunk
Make Sure You Have Those Dependencies
cat BUILD.txt
Looking at the dependencies in BUILD.txt, you can see there is a few things we need.
Apache Whirr Build Instructions
REQUIREMENTS
- Java 1.6
- Apache Maven 2.2.1 or greater
- Ruby 1.8.7 or greater (to run build-tools/update-versions)
BUILDING
To run unit tests and install artifacts locally:
mvn clean install
To build a source package:
mvn package -Ppackage
If you have followed through one of my previous posts, then you will likely have these dependencies all installed, but I will review them just encase.
Being on Mac OS X, I generally use Homebrew so install my dependencies. This will install Maven on Mac OS X if you have Homebrew installed.
sudo brew install maven
If you are on Debian Linux (eg. Ubuntu) use apt-get to install Maven.
sudo apt-get install maven2
There are also instructions available for installing Maven on Windows or other platform.
If you are using the latest Mac OS X then you should have Ruby 1.8.7 and Java 1.6 installed. If you have to upgrade your Ruby, then I recommend checking out Ruby Version Manager (RVM).
Here’s how to check your versions and what I’m currently running…
# Ruby
ruby -v
ruby 1.9.2p94 (2010-12-08 revision 30140) [x86_64-darwin10.5.0]
# Java
java -version
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)
# Maven
mvn -v
Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
Java version: 1.6.0_22
Java home: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
Default locale: en_US, platform encoding: MacRoman
OS name: "mac os x" version: "10.6.6" arch: "x86_64" Family: "mac"
Building Whirr
Let’s build Whirr 0.3.0!
Run the following command from inside the source directory.
cd ~/src/whirr-trunk
mvn clean install
The first time I installed this, it took 17 minutes to build, most of which was downloading dependencies. It failed downloading one of the dependencies and so failed to build.
[WARNING] Unable to get resource 'org.apache.hadoop:hadoop-core:jar:0.20.2'
from repository central (https://repo1.maven.org/maven2):
GET request of:
org/apache/hadoop/hadoop-core/0.20.2/hadoop-core-0.20.2.jar
from central failed
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Failed to resolve artifact.
Missing:
----------
1) org.apache.hadoop:hadoop-core:jar:0.20.2
The path to this dependency was ok, so it must have just been one of those unfortunate glitches in the magic workings of the Internet. Please let me know if you have a similar experience. The chances of this happening to you are very slim.
I ran “mvn clean install” once more.
mvn clean install
This time is only took 2 minutes and built successfully.
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] ------------------------------------------------------------------------
[INFO] Whirr ................................................. SUCCESS [6.779s]
[INFO] Apache Whirr Build Tools .............................. SUCCESS [2.017s]
[INFO] Apache Whirr Core ..................................... SUCCESS [11.104s]
[INFO] Apache Whirr Cassandra ................................ SUCCESS [3.332s]
[INFO] Apache Whirr Hadoop ................................... SUCCESS [9.983s]
[INFO] Apache Whirr ZooKeeper ................................ SUCCESS [7.584s]
[INFO] Apache Whirr HBase .................................... SUCCESS [57.760s]
[INFO] Apache Whirr CLI ...................................... SUCCESS [38.708s]
[INFO] Apache Whirr Hadoop ................................... SUCCESS [1.884s]
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2 minutes 20 seconds
[INFO] Finished at: Mon Jan 24 12:49:58 PST 2011
[INFO] Final Memory: 90M/123M
[INFO] ------------------------------------------------------------------------
Now that our code is compiled, we can run the final build command.
mvn package -Ppackage
(reduced output)
[INFO] Scanning for projects...
[INFO] Reactor build order:
[INFO] Whirr
[INFO] Apache Whirr Build Tools
[INFO] Apache Whirr Core
[INFO] Apache Whirr Cassandra
[INFO] Apache Whirr Hadoop
[INFO] Apache Whirr ZooKeeper
[INFO] Apache Whirr HBase
[INFO] Apache Whirr CLI
[INFO] Apache Whirr Hadoop
[INFO] ------------------------------------------------------------------------
[INFO] Building Whirr
[INFO] task-segment: [package]
[INFO] ------------------------------------------------------------------------
Launch A HBase Cluster
Currently in trunk there is a recipe under the “recipes” directory called “hbase-ec2″. We can copy that, although in this example we are not going to modify it.
cp recipes/hbase-ec2.properties .
There is many comments in there, so here is a summary.
whirr.cluster-name=hbase
whirr.instance-templates=1 zk+nn+jt+hbase-master,5 dn+tt+hbase-regionserver
whirr.provider=ec2
whirr.identity=${env:AWS_ACCESS_KEY_ID}
whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
whirr.hardware-id=c1.xlarge
whirr.image-id=us-east-1/ami-da0cf8b3
whirr.location-id=us-east-1
In the above, the line you will want to play with is “whirr.instance-templates”, as this defines the shape and size of your cluster. Increasing the value “5″ will give you a bigger cluster.
We have a total of 6 machines here. 1 machine runs all the master services and 5 more machines run all the worker services. Here is breakdown of the services running on those machines, as defined by whirr.instance-templates.
1 zk+nn+jt+hbase-master | 1 = 1 instance of the following
zk = Zookeeper |
5 dn+tt+hbase-regionserver | 5 = 5 instances of the following
dn = Hadoop Data-Node |
If you are interested in how all these Hadoop and HBase components work together, see Lars George’s excellent posts “HBase Architecture 101 – Storage” and HBase Architecture 101 – Write-ahead-Log” or check-out the HBase wiki.
In previous Whirr examples I have defined the Amazon EC2 credentials in this properties file, but the above will pick them up from the environment, which is a better way to go. Export you credentials into your environment (here I use dumby credentials as an example).
export AWS_ACCESS_KEY_ID=123456789ABCDEFGHIJKLM
export AWS_SECRET_ACCESS_KEY=ABCDabcd1234/xyzXZY54321acbd
We can now launch our cluster.
bin/whirr launch-cluster --config hbase-ec2.properties
Bootstrapping cluster
Configuring template
Starting 1 node(s) with roles [jt, nn, zk, hbase-master]
Configuring template
Starting 5 node(s) with roles [tt, hbase-regionserver, dn]
Nodes started: [[id=us-east-1/i-e134808d, providerId=i-e134808d, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.112.205.48], publicAddresses=[184.72.159.249], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]]]
Nodes started: [[id=us-east-1/i-f734809b, providerId=i-f734809b, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.194.127.79], publicAddresses=[50.16.154.13], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]], [id=us-east-1/i-fb348097, providerId=i-fb348097, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.98.33.250], publicAddresses=[50.16.71.166], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]], [id=us-east-1/i-f5348099, providerId=i-f5348099, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.195.6.143], publicAddresses=[204.236.242.78], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]], [id=us-east-1/i-f9348095, providerId=i-f9348095, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.119.22.224], publicAddresses=[174.129.72.44], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]], [id=us-east-1/i-f134809d, providerId=i-f134809d, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.98.146.48], publicAddresses=[174.129.142.130], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]]]
Authorizing firewall
Authorizing firewall
Authorizing firewall
Running configuration script
Configuration script run completed
Running configuration script
Configuration script run completed
Completed configuration of hbase
Web UI available at https://ec2-184-72-159-249.compute-1.amazonaws.com
Wrote Hadoop site file /Users/phil/.whirr/hbase/hadoop-site.xml
Wrote Hadoop proxy script /Users/phil/.whirr/hbase/hadoop-proxy.sh
Completed configuration of hbase
Hosts: ec2-184-72-159-249.compute-1.amazonaws.com:2181
Completed configuration of hbase
Web UI available at https://ec2-184-72-159-249.compute-1.amazonaws.com
Wrote HBase site file /Users/phil/.whirr/hbase/hbase-site.xml
Wrote HBase proxy script /Users/phil/.whirr/hbase/hbase-proxy.sh
Wrote instances file /Users/phil/.whirr/hbase/instances
Started cluster of 6 instances
Cluster{instances=[Instance{roles=[tt, hbase-regionserver, dn], publicAddress=/204.236.242.78, privateAddress=/10.195.6.143, id=us-east-1/i-f5348099}, Instance{roles=[tt, hbase-regionserver, dn], publicAddress=/174.129.72.44, privateAddress=/10.119.22.224, id=us-east-1/i-f9348095}, Instance{roles=[tt, hbase-regionserver, dn], publicAddress=/50.16.71.166, privateAddress=/10.98.33.250, id=us-east-1/i-fb348097}, Instance{roles=[jt, nn, zk, hbase-master], publicAddress=ec2-184-72-159-249.compute-1.amazonaws.com/184.72.159.249, privateAddress=/10.112.205.48, id=us-east-1/i-e134808d}, Instance{roles=[tt, hbase-regionserver, dn], publicAddress=/174.129.142.130, privateAddress=/10.98.146.48, id=us-east-1/i-f134809d}, Instance{roles=[tt, hbase-regionserver, dn], publicAddress=/50.16.154.13, privateAddress=/10.194.127.79, id=us-east-1/i-f734809b}], configuration={hbase.zookeeper.quorum=ec2-184-72-159-249.compute-1.amazonaws.com:2181}}
Destroy!
At some point you will want to tear-down that cluster. Here is how you can do that.
bin/whirr destroy-cluster --config hbase-ec2.properties
Destroying hbase cluster
Cluster hbase destroyed
Conclusion
Congratulations! If you have followed through this example, then you now have your own HBase cluster running in the cloud. Now… what to do with that HBase cluster? /.
Hi Phil,
Great post! We are still working on a few kinks but once Whirr 0.3.0 is out and will be released with CDH it will be even easier to get going as no build is required (obviously).
A few notes, you are missing the “tt” (i.e TaskTracker) in the explanation table for the cluster template. Also, just to reiterate, all of the above services share the same instance. So in your example you will start 1+5 EC2 server with the various services running on them.
Finally, what is also cool is that Whirr creates a local “$HOME/.whirr//” directory, so in your example $HOME/.whirr/hbase/ which has various files to help you working with the cluster. One of those is the “hadoop-proxy.sh” (start it like “source $HOME/.whirr/hbase/hadoop-proxy.sh &” to launch it in the background) which sets up the SOCKS proxy for you so that you can talk to the servers and access the web UI for the master and region servers.
There is also a “instances” which lists nicely the servers in the cluster, their local and remote IPs and roles. It could be used by a script or program to parse the details and be able to communicate with the servers.
Thanks again for writing this post!
Regards,
Lars
Thanks for the comments Lars. Well spotted on that missing “tt”! I’ve also clarified the number of EC2 machines in use, as this was not clear.
I’m looking forward to 0.3.0 of Whirr being released and the general progress of it. It is a fantastic tool for getting up and running in the cloud. Thanks to yourself and all the committers!
Very helpful. The only thing I tripped on was understanding how the local keypair file relates to Whirr. In my case, I followed your tutorial from a clean machine, itself running on EC2. Since that machine had no ~/.ssh/id_rsa file, Whirr failed.
It’s still not completely clear to me how Whirr bootstraps the machines, but I did learn that eventually Whirr uploads your local public key to the bootstrapped instances. So a simple invocation of ssh-keygen -t rsa got everything running.
Thanks again for the time and effort spent on this tutorial.
Thanks a lot for this post; I found that your instructions work great, but if you prefer to use git, you can do :
git clone git://git.apache.org/whirr.git
instead of:
svn co https://svn.apache.org/repos/asf/incubator/whirr/trunk whirr-trunk
Great instructions!
I believe the /incubator is not needed in the svn URL
When mine starts, it says:
Completed configuration of hbase
Web UI available at https://107.20.125.231
Wrote HBase site file /Users/tim/.whirr/hbase/hbase-site.xml
Wrote HBase proxy script /Users/tim/.whirr/hbase/hbase-proxy.sh
Completed configuration of hbase role hadoop-datanode
Completed configuration of hbase role hadoop-tasktracker
Starting to run scripts on cluster for phase start on instances: us-east-1/i-116dab74
Running start phase script on: us-east-1/i-116dab74
start phase script run completed on: us-east-1/i-116dab74
Successfully executed start script: [output=, error=, exitCode=0]
Finished running start phase scripts on all cluster instances
Started cluster of 4 instances
but it seems only the zookeeper actually started up and there is no JT or HBase master:
tim@ip-10-79-37-92:~$ ps -ef | grep java
root 3214 1 0 13:29 ? 00:00:00 java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /usr/local/zookeeper-3.3.3/bin/../build/classes:/usr/local/zookeeper-3.3.3/bin/../build/lib/*.jar:/usr/local/zookeeper-3.3.3/bin/../zookeeper-3.3.3.jar:/usr/local/zookeeper-3.3.3/bin/../lib/log4j-1.2.15.jar:/usr/local/zookeeper-3.3.3/bin/../lib/jline-0.9.94.jar:/usr/local/zookeeper-3.3.3/bin/../src/java/lib/*.jar:/etc/zookeeper/conf: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /etc/zookeeper/conf/zoo.cfg
All the slave node services started it seems.
This is an extremely useful post, thank you so much for writing it!