Posts - Page 1 - Big Fast Blog

January 2011DevOps

Run The Latest Whirr And Deploy HBase In Minutes

In a few of my recent posts I have covered the ease of deploying clusters of Hadoop and Cassandra using Whirr. With Whirr you can simply write a

Read

January 2011DevOps

Quickly Launch A Cassandra Cluster On Amazon EC2

If you have read my previous post, Map-Reduce With Ruby Using Hadoop, then you will know that firing up a Hadoop cluster is really simple when you use

Read

January 2011DevOps

De-volatile Your Memcached. Upgrade to Membase

Membase's TCP interface is identical Memcached, so migrating your existing code-base will not be an issue at all.

Read

January 2011DevOps

SQLShell. A Cross-Database SQL Tool With NoSQL Potential

In this blog post I will introduce SQLShell and demonstrate, step-by-step, how to install it and start using it with MySQL. I will also reflect on the possibilites of using this with NoSQL technologies, such as HBase, MongoDB, Hive, CouchDB, Redis and Google BigQuery. SQLShell is a cross-platform, cross-database command-line tool for SQL, much like psql for PostgreSQL or the mysql command-line tool for MySQL.

Read

January 2011Software Engineering

Hosting Images With Google Storage Manager

Today I received my invite from Google Google Storage for Developers. Yes, like most of us, my life if wrapped in layers of googliness. In this post I'm going to review briefly what Google Storage is and upload the image files, for the images you see in this post, using the Google Storage Manager.

Read

January 2011Ruby

Zero-Copy. Transfer Data Faster In Ruby

In this post I will explain the concept behind zero-copy, which is feature of the Linux allowing for faster transfer of data between pipes, file-descriptors and sockets. I will demonstrate how you can use this functionality in your Ruby projects using a code example. This functionality has been implemented in C, Java, Ruby, Perl and nameless other languages, but in this blog I will focus on the Ruby usage.

Read

January 2011Data processing

The Apache Projects – The Justice League Of Scalability

In this post I will define what I believe to be the most important projects within the Apache Projects for building scalable web sites and generally managing large volumes of data.

Read

January 2011Geospatial

Landsliding Into PostGIS With KML Files

In this post I will show, in repeatable steps, how to install PostGIS, load in geospatial data found in a KML file and run queries against that data. The focus of this geospatial data will be landslides and our resulting database will allow us to query, using longitude and latitude co-ordinates, the landslide status of a specific geographical point.

Read

January 2011Software Engineering

Embed Base64-Encoded Images Inline In HTML

Here is how you can embed an image in HTML inline. This is similar to how you embed an image in a HTML email message.

Read

December 2010Data processing

Map-Reduce With Ruby Using Hadoop

Here I demonstrate, with repeatable steps, how to fire-up a Hadoop cluster on Amazon EC2, load data onto the HDFS (Hadoop Distributed File-System), write map-reduce scripts in Ruby and use them to run a map-reduce job on your Hadoop cluster. You will not need to ssh into the cluster, as all tasks are run from your local machine. Below I am using my MacBook Pro as my local machine, but the steps I have provided should be reproducible on other platforms running bash and Java.

Read

December 2010Software Engineering

Homebrew For Mac | How To Install And Use Homebrew

This is demonstration of installing Homebrew, the new Mac OS X package installer. Step-by-step instructions on installing Homebrew and using the brew command.

Read

December 2010Data processing

How To Get Experience Working With Large Datasets

There are data sources out there, but which data source you choose depends on which technology you wish to get experience working with. The experience should be of the technologies you are using, rather than what the data is. Certain datasets pair better with certain technologies. Simulating the data can be another approach. You just need a clever way of generating and randomizing your fake data. Thirdly, you can use a hybrid approach. Take real data and replay it on a loop, randomizing it as it goes through. Simulating the Twitter fire-hose should not be too hard, should it?

Read

December 2010Startups

Find The Road To Your Happiness By Helping Others

This is a common theme I've heard in many of the books I've read. Although, in the books I've read, this pearl of wisdom is phrased a little differently. The way to build a successful business is to help as many people as you can. Apparently, the cash will follow, if you concentrate on the helping part. The number of people you help is also important. The more people you help, the better. For instance, Facebook helps 250 million people per day, whereas Google only helps around 90 million people per day. Helping all those people has become very profitable for these two companies and many more. It's all about changing the focus from how do I make money to how do I help more people.

Read

December 2010Software Engineering

Highchart Vs Flot.js – Comparing JavaScript Graphing Engines

In previous projects at MailChannels I have used Flot.js for graphing. There were many reasons I chose this originally. The graphs are interactive and can be

Read

December 2010Software Engineering

Install Gitolite To Manage Your Git Repositories

This post has been depreciated. A newer post is available. Gitolite Installation Step-By-Step Recently, netSIGN asked me to setup gitolite to give

Read

December 2010Blogging

To Show Ads Or Not To Show Ads

I was just reading An appeal from Wikipedia founder Jimmy Wales, where he is asking for donations for supporting the continuation of Wikipedia, ad-free. I do

Read