Posts

Other People’s Data – An Interview With “Drawn To Scale”

Drawn To Scale

Early this year I came across a new start-up called “Drawn To Scale“. Their website had just enough information to peak my interest, but not enough to answer my questions of who, what and how. Who are the founders? What is their background, technology and business model? How were they going to manage other people’s big data? Can one tool fit the demands from a broad range of data challenges that different businesses are seeing?

In this blog-post Bradford Stephens, Drawn To Scale’s founder, answers a series of technical, business and personal questions to give an overview of what Drawn To Scale is and where it is going.

The Interview

What is “Drawn To Scale”?

Drawn to Scale is the company who created Spire, a distributed database with real-time queries and fulltext search, that’s also simple to manage. Our unique distributed indexing technology allows far more fast and efficient scalability than any other database for this purpose.

Since we’re built for the web application era, we have native JSON serialization on top of our columnar store, so integration into things like jQuery is effortless. Users don’t need a large complex ORM to use Spire.

Databases such as HBase or Cassandra only offer rudimentary primary-key access to data, whereas projects like CouchDB and MongoDB have rich functionality but are limited in scalability. We combine the best of both worlds.

Currently, Spire is available for direct use on-premises or on top of cloud infrastructure, but we’re investigating other ways of making our platform available as the technology evolves.

I see that your tagline is “Big data for all”. How would you describe what Drawn To Scale does for its users?

Spire allows users to build applications (mostly web applications) on top of a database platform without worrying about scaling to the amount of users or data. To handle larger amounts of data or more requests, one can simply add nodes without worrying about all the nightmares of distributed computing management. Companies won’t need to hunt for a team of distributed computing experts (of whom there aren’t very many) in order to work with Big Data.

What stage is the business at today?

We have been around about a year, and have a paying beta customer, with more on the way. We have several employees from well-respected backgrounds in distributed systems and infrastructure. Drawn to Scale is currently self-funded.

When will Drawn To Scale be production ready?

End of Q2, this year.

Who do you see as your target customers?

There’s a few interesting verticals:

Social Networks / Media / Gaming / BI (Business Intelligence) / Analytics
Anything in the social space usually has a combination of Big Data and Big Users. This is perfect for Spire.

Mobile Phones / Sensor Networks
Again — lots of users, lots of data. From mobile applications to monitoring the network infrastructure itself. Even power meter monitoring on the smartgrid is a “big data” problem.

Advertising Networks
Better targeting and faster analytics in the advertising world has always been needed, and we accelerate this.

Are you doing or planning to do a beta sign-up or any trials with select customers?

Yes, we are in a private beta with select customers right now. We have space for a few more, so we’d love to talk to anyone interested!

Who do you see as your biggest competitors in this space?

By far, the biggest competitor is companies trying to build their own purpose-driven database in-house. This usually has many unexpected problems. Distributed systems are not straightforward, nor easy.

What’s interesting is that we usually see 3 things happen to companies when they hit “big data” problems:

1. They attempt to build a sharded SQL or Document database, with all the usual problems (querying every node, failover, imperfect data balancing, etc.)

2. Try to build their own distributed DB from scratch or force functionality on top of KV-stores like Cassandra, or BigTable clones like HBase.

3. The scariest thing we’ve seen is that companies have to change their business model because their database infrastructure can’t handle the amount of data or users they’re getting. This is not as rare as you think.

What do you see as your key differentiator or unique-selling-point from other solutions?

Besides some internal Google tools (and Megastore), we’re the only DB with a distributed index and fulltext search. This means you can keep query latency to “real-time” while adding more data and users. We also are an end-to-end database platform. You can do everything from batch processing to real-time storage without having to glue dozens of open source projects together.

How are your products priced?

We’re pretty flexible. We do standard, straightforward per-cluster licences, or OEM shared-revenue licenses.

How are you ensuring that customers trust in your solution enough to commit to building their own products on top of it?

One of the best things about Spire is that we use Hadoop and HBase as a storage layer, which are very well enterprise-proven. So customers have little risk of losing data from an unproven platform.

Tell me about the founders and any other people behind the scenes at Drawn-To-Scale?

I, Bradford Stephens, am the primary founder of Drawn to Scale. I have a background in Computer Science and Politics. I worked on SQL Server at Microsoft and did consulting, before landing at a Social Media BI startup, Visible Technologies. As the lead platform engineer at Visible Technologies, I was responsible for creating a “Google-Scale”
infrastructure for real-time querying and analytics on top of Social Media such as blogs and tweets. Besides being the author of popular scalability blog roadtofailure.com, I am the co-chair of the OSCON Data conference, and GigaOm Structure Big Data [conference].

We have a fascinating advisory team as well: Ryan Rawson (Amazon, Google, Stumbleupon), Bradford Cross (Flightcaster, Woven), Rich Miller (well-known cloud / tech CEO), and a few others on the way.

How fast are your “low-latency queries”?

Depends on size of data, number of requests, hardware, etc. We aim for milliseconds to seconds.

How did you achieve this performance and what technical challenges did you encounter?

Peformance
The Spire Distributed Indexing Engine is one key to scalability and performance. A lot of research has gone into this. Spire only needs to query the few machines that actually have relevant data. If you try to shard MySQL or document stores, what you end up with is a partitioned set of data on many machines. These systems all have non-distributed indexes, so doing queries is almost always a worst-case scenario. If you wanted to find “All documents with AuthorName = Bob“, and you have 20 nodes, you’ll need to query every one for each query, which completely kills throughput and latency. Our index allows a “universal” view, so finding “All documents with AuthorName = Bob” is a simple request to a single node.

Distributed index
Major challenges (that will always need tuning) are the JVM and Garbage Collection, as well as building very strong fault handling mechanisms. Hadoop especially has a history of vague failure modes. We feel it is far more important to be bulletproof in reality than elegant in theory, which is why we’re fans of the very well-proven BigTable model from Google.

You mention “full-text search” in your white-paper. Can you go into more detail about what kinds of searches you can do?

We can do faceted, multi-lingual searches, because we implement Lucene analyzers (but we don’t use Lucene as the index itself). This makes replacing existing Solr / Lucene clusters straightforward.

I see you are using HBase, Lucene and Chef. Are there any other critical components to you system?

These are the major components, along with open-source serialization libraries, performance testers (Grinder), and minor OSS (open-source software) stuff.

Configuration of the cluster is managed by configuration server. Is that right?

Yes, configuration is managed by a replication Chef configuration server (backed by CouchDB). This makes the job of Operations much easier, because we feel managing dozens of nodes manually is a massive barrier to adoption for distributed databases.

Does each customer have it own cluster of machines (or virtual machines) and their own configuration server?

Yes, we’re not a Platform-As-A-Service, so right now we focus on running in datacenters or on IaaS (Infrastructure-As-A-Service) vendors (EC2, Rackspace Cloud, etc.)

Is this a hosted-only solution? Can customers install your product within their own network?

We run in both the datacenter and the cloud, we are not self-hosted.

Is there an amusing stories of how you can up with the name Drawn-To-Scale?

Bradford’s wife, Myra, is the Lead Chemist at Nintendo (and does product safety testing.) She’s the creative one of the family, and basically came up with it after a brainstorming session. She’s named all of my projects, musical compositions, etc (except for the database itself, Spire). The previous name for the project was “BigSearch” or “BigQuery”, which were rather insipid.

Other humorous stories are classified for a few years ;)

Conclusion

Since I first contacted Drawn To Scale, they have been featured on the O’Reilly Radar and their website now provides more information on what Drawn To Scale is.

Whilst they are still at the early stages and are finding their feet with their first customers, Bradford’s background and understanding of the technology show good promise for a successful startup and a good solution in this big data space.

Resources