<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Carsonified &#187; Databases</title>
	<atom:link href="http://carsonified.com/blog/category/dev/databases/feed/" rel="self" type="application/rss+xml" />
	<link>http://carsonified.com</link>
	<description></description>
	<lastBuildDate>Thu, 18 Mar 2010 10:00:20 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Speed up your Web App by 1000% with 1 Line of SQL</title>
		<link>http://carsonified.com/blog/dev/databases/speed-up-your-web-app-by-1000-with-1-line-of-sql/</link>
		<comments>http://carsonified.com/blog/dev/databases/speed-up-your-web-app-by-1000-with-1-line-of-sql/#comments</comments>
		<pubDate>Wed, 02 Sep 2009 06:54:08 +0000</pubDate>
		<dc:creator>Jake Stride</dc:creator>
				<category><![CDATA[Databases]]></category>

		<guid isPermaLink="false">http://carsonified.com/?p=3172</guid>
		<description><![CDATA[By <strong>Jake Stride</strong><br />
Frameworks have changed the way we develop web applications and many amazing products and services have been delivered as a result.
As developers we take these [frameworks] for granted, they help abstract a lot of the day to day leg work of our applications, and help us develop features more quickly.
This is great in the early [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style=""><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fcarsonified.com%2Fblog%2Fdev%2Fdatabases%2Fspeed-up-your-web-app-by-1000-with-1-line-of-sql%2F"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fcarsonified.com%2Fblog%2Fdev%2Fdatabases%2Fspeed-up-your-web-app-by-1000-with-1-line-of-sql%2F" height="61" width="51" /></a></div><p><a href="http://events.carsonified.com/fowa/2009/london/schedule?utm_source=TV&amp;utm_medium=banner&amp;utm_campaign=Kevin%2Band%20Gary%20Show"><img src="http://ryancarson.com/uploads/kevin_gary.png" alt="Kevin and Gary show at FOWA London" /></a></p>
<p>Frameworks have changed the way we develop web applications and many amazing products and services have been delivered as a result.</p>
<p>As developers we take these [frameworks] for granted, they help abstract a lot of the day to day leg work of our applications, and help us develop features more quickly.</p>
<p>This is great in the early days of an application&#8217;s lifetime when it&#8217;s helpful to be able to rapidly innovate and release new features, but as your user base grows and your application needs to scale, the speed and convenience of these frameworks can hold some hidden pitfalls that we should be aware of.</p>
<p>Recently we came across one such pitfall in the way <a href="http://www.tactilecrm.com">Tactile CRM</a> is built &#8211; we use a custom framework but are slowly migrating our code to the Zend Framework.<br />
<span id="more-3172"></span><br />
We found that one of the search queries used extensively by our application was taking more and more time to execute as new users came on board and our user base grew. The result was a rogue statement/comparison in the way our queries were built with in the application:</p>
<p><code>$sh->addConstraint(new Constraint($field), 'ILIKE', urldecode($this->_data['q']));</code></p>
<p>would produce something similar to:</p>
<p><code>SELECT * FROM table WHERE fieldname ILIKE 'abc%';</code></p>
<p>(Where the ILIKE comparison does a case insensitive search and the % means do a wildcard search i.e. abcd would match as would ABCD)</p>
<p>Now in general this query isn&#8217;t an issue &#8211; depending on the database server you use they will handle it in different way. In our case we had a perfect storm. Our development and test environment of our platform are pinned to database versions for testing and deployment reasons and the ILIKE operation on the current live database is very expensive (this has been fixed in a later version of the database server but when testing on our development version the issue wasn&#8217;t visible).</p>
<p>After hunting down the issue a change of the line to:</p>
<p><code>$sh->addConstraint(new Constraint('lower(' . $field . ')', 'LIKE', strtolower(urldecode($this->_data['q']))));</code></p>
<p>produced SQL such as:</p>
<p><code>SELECT * FROM table WHERE lower(fieldname) LIKE 'abc%';</code></p>
<p>and sped up the queries by over a 1000% in worst case scenarios (we lower cased the search string in our code too). Whilst we were at it we updated the framework to only use the LIKE comparison when doing wildcard searches. The new line of code has given us a speed improvement of over 1000% in certain cases for a few hours work.</p>
<p>Certain schools of thought will say that this should have been fixed before the application was released, however I&#8217;m a firm believer of the release often methodology of development and learning from one&#8217;s mistakes (and those around you). The offending query in itself wasn&#8217;t causing major issues, just in certain edge cases and we caught it before it became an issue. It&#8217;s given us a reasonable speed boost in the application for not a lot of work.</p>
<p>So the moral of the story is, check the queries your framework is building and how your database plans the query execution once in a while, you might be able to implement some low cost speed boosts for you application.</p>
<img src="http://carsonified.com/?ak_action=api_record_view&id=3172&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://carsonified.com/blog/dev/databases/speed-up-your-web-app-by-1000-with-1-line-of-sql/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
		<item>
		<title>Should you go Beyond Relational Databases?</title>
		<link>http://carsonified.com/blog/dev/should-you-go-beyond-relational-databases/</link>
		<comments>http://carsonified.com/blog/dev/should-you-go-beyond-relational-databases/#comments</comments>
		<pubDate>Wed, 24 Jun 2009 21:17:04 +0000</pubDate>
		<dc:creator>Martin Kleppmann</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[Dev]]></category>
		<category><![CDATA[Features]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[AllegroGraph]]></category>
		<category><![CDATA[BigTable]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[DirectedEdge]]></category>
		<category><![CDATA[FreeBase]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Hypertable]]></category>
		<category><![CDATA[Jackrabbit]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[Neo4j]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Project Voldemort]]></category>
		<category><![CDATA[Sesame]]></category>
		<category><![CDATA[Skynet]]></category>
		<category><![CDATA[ThruDB]]></category>

		<guid isPermaLink="false">http://thinkvitamin.com/?p=1595</guid>
		<description><![CDATA[By <strong>Martin Kleppmann</strong><br />Relational databases, such as MySQL, PostgreSQL and various commercial products, have served us well for many years. Lately, however, there has been a lot of discussion on whether the relational model is reaching the end of its life-span, and what may come after it.
Should you care? Which database technology should you be using?
Of course the [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style=""><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fcarsonified.com%2Fblog%2Fdev%2Fshould-you-go-beyond-relational-databases%2F"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fcarsonified.com%2Fblog%2Fdev%2Fshould-you-go-beyond-relational-databases%2F" height="61" width="51" /></a></div><p>Relational databases, such as MySQL, PostgreSQL and various commercial products, have served us well for many years. Lately, however, there has been a lot of discussion on whether the relational model is reaching the end of its life-span, and what may come after it.</p>
<p>Should you care? Which database technology should you be using?</p>
<p>Of course the answer is <em>&#8220;it depends&#8221;</em>, but that&#8217;s not very helpful. Let me ask you a few questions to help you figure out which technology is appropriate to <em>your</em> particular application. Then I can give a few pointers so that you can find out more.</p>
<p>First of all, calm down. Chances are that your current database is perfectly fine for now. But you might want to keep an eye open in case you notice some symptoms which show that you are pushing the relational model to its limits. Some symptoms relate to the <em>structure</em> of your data:</p>
<ul>
<li>Do you have tables with lots of columns, only a few of which are actually used by any particular row?</li>
<li>Do you have &#8220;attribute&#8221; tables where each row is a triple of <code>(foreign key to row in another table, attribute name, attribute value)</code> and you need ugly joins in your queries to deal with those tables?</li>
<li>Have you given up on using columns for structured data, instead just serialising it (to JSON, YAML, XML or whatever) and dumping the string into your database?</li>
<li>Does your schema have a large number of many-to-many join tables or tree-like structures (a foreign key that refers to a different row in the same table)?</li>
<li>Do you find yourself frequently needing to make schema changes so that you can properly represent incoming data?</li>
</ul>
<p>Other symptoms relate to the <em>scalability</em> of your system:</p>
<ul>
<li>Are you reaching the limit of the write capacity of a single database server? (If read capacity is your problem, you should set up master-slave replication. Also make sure that you have first given your database the fattest hardware you can afford, you have optimised your queries, and your schema cannot easily be split into shards.)</li>
<li>Is your amount of data greater than a single server can sensibly hold?</li>
<li>Are your page loads being slowed down unacceptably by background batch processes overwhelming the database?</li>
</ul>
<p>In my opinion, too much emphasis is often placed on scalability, despite being a very remote problem on most projects. It&#8217;s understandable &#8212; large-scale computing systems are sexy, and everybody likes to think they are building a service which is going to be massively popular &#8212; but more often than not, developers would be better off focussing on their customers&#8217; needs, and solving the scaling problem only if it actually arises.</p>
<p>That said, there is one more reason to consider non-relational databases: they are <em>fashionable</em>. It sounds like a silly idea to base a technical decision on fashion, but remember the human aspects of managing software projects. Great developers generally want to work with cool people in a cool environment using cool technology. That means if you want to hire great developers, providing all this coolness gives you a better chance of getting the best people to work with you. If you want to get on <a href="http://news.ycombinator.com/">Hacker News</a>, cool technology is also the way to go. Fashion shouldn&#8217;t be your primary reason, but all else being equal, you can probably err on the side of coolness. Don&#8217;t forget the cool people and the cool environment though. And now I&#8217;ll stop saying cool &#8212; it&#8217;s not very cool.</p>
<h3>Document databases and BigTable</h3>
<p>The <a href="http://labs.google.com/papers/bigtable.html">BigTable paper</a> describes how Google developed their own massively scalable database for internal use, as basis for several of their services. The data model is quite different from relational databases: columns don&#8217;t need to be pre-defined, and rows can be added with any set of columns. Empty columns are not stored at all.</p>
<p>BigTable inspired many developers to write their own implementations of this data model; amongst the most popular are <a href="http://hadoop.apache.org/hbase/">HBase</a>, <a href="http://hypertable.org/">Hypertable</a> and <a href="http://incubator.apache.org/cassandra/">Cassandra</a>. The lack of a pre-defined schema can make these databases attractive in applications where the attributes of objects are not known in advance, or change frequently.</p>
<p><em>Document databases</em> have a related data model (although the way they handle concurrency and distributed servers can be quite different): a BigTable row with its arbitrary number of columns/attributes corresponds to a <em>document</em> in a document database, which is typically a tree of objects containing attribute values and lists, often with a mapping to JSON or XML. Open source document databases include <a href="http://project-voldemort.com/">Project Voldemort</a>, <a href="http://couchdb.apache.org/">CouchDB</a>, <a href="http://www.mongodb.org/">MongoDB</a>, <a href="http://code.google.com/p/thrudb/">ThruDB</a> and <a href="http://jackrabbit.apache.org/">Jackrabbit</a>.</p>
<p>How is this different from just dumping JSON strings into MySQL? Document databases can actually work with the <em>structure</em> of the documents, for example extracting, indexing, aggregating and filtering based on attribute values within the documents. Alternatively you could of course <a href="http://bret.appspot.com/entry/how-friendfeed-uses-mysql">build the attribute indexing yourself</a>, but I wouldn&#8217;t recommend that unless it makes working with your legacy code easier.</p>
<p>The big limitation of BigTables and document databases is that most implementations cannot perform joins or transactions spanning several rows or documents. This restriction is deliberate, because it allows the database to do automatic partitioning, which can be important for scaling &#8212; see the section on distributed key-value stores below. If the structure of your data is lots of independent documents, this is not a problem &#8212; but if your data fits nicely into a relational model and you need joins, please don&#8217;t try to force it into a document model.</p>
<h3>Graph databases</h3>
<p>Graph databases live at the opposite end of the spectrum. While document databases are good for storing data which is structured in the form of lots of independent documents, graph databases focus on the <em>relationships</em> between items &#8212; a better fit for highly interconnected data models.</p>
<p>Standard SQL cannot query <em>transitive</em> relationships, i.e. variable-length chains of joins which continue until some condition is reached. Graph databases, on the other hand, are optimised precisely for this kind of data. Look out for these symptoms indicating that your data would better fit into a graph model:</p>
<ul>
<li>you find yourself writing long chains of joins (join table A to B, B to C, C to D) in your queries;</li>
<li>you are writing loops of queries in your application in order to follow a chain of relationships (particularly when you don&#8217;t know in advance how long that chain is going to be);</li>
<li>you have lots of many-to-many joins or tree-like data structures;</li>
<li>your data is already in a graph form (e.g. information about who is friends with whom in a social network).</li>
</ul>
<p>There is less choice in graph databases than there is in document databases: <a href="http://neo4j.org/">Neo4j</a>, <a href="http://www.franz.com/agraph/allegrograph/">AllegroGraph</a> and <a href="http://www.openrdf.org/">Sesame</a> (which typically uses MySQL or PostgreSQL as storage back-end) are ones to look at. <a href="http://blog.freebase.com/2008/04/09/a-brief-tour-of-graphd/">FreeBase</a> and <a href="http://blog.directededge.com/2009/02/27/on-building-a-stupidly-fast-graph-database/">DirectedEdge</a> have developed graph databases for their internal use.</p>
<p>Graph databases are often associated with the semantic web and RDF datastores, which is one of the applications they are used for. I actually believe that many other applications&#8217; data would also be well represented in graphs. However, as before, don&#8217;t try to force data into a graph if it fits better into tables or documents.</p>
<h3>MapReduce</h3>
<p>Going on a slight tangent: if background batch processing is your problem and you are not aware of the <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce model</a>, you should be. Popularised by <a href="http://labs.google.com/papers/mapreduce.html">another Google paper</a>, MapReduce is a way of writing batch processing jobs without having to worry about infrastructure. Different databases lend themselves more or less well to MapReduce &#8212; something to keep in mind when choosing a database to fit your needs.</p>
<p><a href="http://hadoop.apache.org/">Hadoop</a> is the big one amongst the open MapReduce implementations, and <a href="http://skynet.rubyforge.org/">Skynet</a> and <a href="http://discoproject.org/">Disco</a> are also worth looking at. <a href="http://couchdb.apache.org/">CouchDB</a> also includes some MapReduce ideas on a smaller scale.</p>
<h3>Distributed key-value stores</h3>
<p>A key-value store is a very simple concept, much like a hash table: you can retrieve an item based on its key, you can insert a key/value pair, and you can delete a key/value pair. The value can just be an opaque list of bytes, or might be a structured document (most of the document databases and BigTable implementations above can also be considered to be key-value stores).</p>
<p>Document databases, graph databases and MapReduce introduce new data models and new ways of thinking which can be useful even in a small-scale application; you don&#8217;t need to be Google or Facebook to benefit from them. Distributed key-value stores, on the other hand, are really just about scalability. They can scale to truly vast amounts of data &#8212; much more than a single server could hold.</p>
<p>Distributed databases can <em>transparently partition and replicate</em> your data across many machines in a cluster. You don&#8217;t need to figure out a sharding scheme to decide on which server you can find a particular piece of data; the database can locate it for you. If one server dies, no problem &#8212; others can immediately take over. If you need more resources, just add servers to the cluster, and the database will automatically give them a share of the load and the data.</p>
<p>When choosing a key-value store you need to decide whether it should be opimised for low latency (for lightning-fast data access during your request-response cycle) or for high throughput (which is what you need for batch processing jobs).</p>
<p>Other than the BigTables and document databases above, <a href="http://code.google.com/p/scalaris/">Scalaris</a>, <a href="http://github.com/cliffmoon/dynomite/tree/master">Dynomite</a> and <a href="http://github.com/tuulos/ringo/tree/master">Ringo</a> provide certain data consistency guarantees while taking care of partitioning and distributing the dataset. <a href="http://memcachedb.org/">MemcacheDB</a> and <a href="http://tokyocabinet.sourceforge.net/">Tokyo Cabinet</a> (with <a href="http://tokyocabinet.sourceforge.net/tyrantdoc/">Tokyo Tyrant</a> for network service and <a href="http://opensource.plurk.com/LightCloud/">LightCloud</a> to make it distributed) focus on latency.</p>
<p>The caveat about limited transactions and joins applies even more strongly for distributed databases. Different implementations take different approaches, but in general, if you need to read several items, manipulate them in some way and then write them back, there is no guarantee that you will end up in a consistent state immediately (although many implementations try to become <em>eventually</em> consistent by resolving write conflicts or using distributed transaction protocols; see the algorithm of <a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html">Amazon&#8217;s Dynamo</a> for an example). You should therefore only use these databases if your data items are independent, and if availability and performance are more important than <a href="http://en.wikipedia.org/wiki/ACID">ACID properties</a>. For more information, read about <a href="http://www.julianbrowne.com/article/viewer/brewers-cap-theorem">Brewer&#8217;s CAP Theorem</a>, which states that amongst <strong>C</strong>onsistency, <strong>A</strong>vailability and <strong>P</strong>artition tolerance, you can only choose two, and no database will ever be able to get around that fact.</p>
<p>Richard Jones, co-founder of Last.fm, has written up an excellent <a href="http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/">overview of distributed key-value stores</a>. Also <a href="http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php">Tony Bain gives an introduction</a> to the conceptual differences between relational databases and key-value stores, and recently there was <a href="http://blog.oskarsson.nu/2009/06/nosql-debrief.html">a NOSQL event in San Francisco</a> at which a number of different non-relational databases were presented.</p>
<p>Distributed systems are hard&#8230; really hard. I suggest that you use them only if you really need the scaling aspects they offer (or just for fun outside of a production environment).</p>
<h3>Closing remarks</h3>
<p>In this article I have concentrated on open source projects. If you are willing to bind yourself to a particular vendor/hosting provider, <a href="http://code.google.com/appengine/docs/python/datastore/">Google&#8217;s Datastore</a>, <a href="http://aws.amazon.com/simpledb/">Amazon SimpleDB</a>, <a href="http://msdn.microsoft.com/en-us/library/dd179355.aspx">Windows Azure Storage Services</a> or <a href="http://wiki.developerforce.com/index.php/Database_Services">Force.com</a> might be worth considering. They are good technologies, but keep in mind the business risk of potential lock-in.</p>
<p>I can&#8217;t make judgement about particular projects&#8217; suitability for particular purposes. There is some very clever software out there, but also some very new and unstable software. If you want to consider using them, you should do your own research:</p>
<ul>
<li>look around their websites for a list of sites using the database in production (and for which aspect of their service they use it);</li>
<li>check if they have a lively open source community, in case the original developer loses interest and stops maintaining the software;</li>
<li>try to find some benchmarks (though beware that many benchmarks published on the web are methodologically flawed and/or outdated, so if you are serious about it you should run your own tests, using data which matches your application&#8217;s characteristics).</li>
</ul>
<p>As with any fashionable topic, there are many people with strong opinions, both positive and negative; don&#8217;t let yourself be put off by them. I hope I&#8217;ve given you an overview of the kind of things you can do with different types of databases so that you can choose the right one for your application.</p>
<h3>Like this article?</h3>
<p>If you enjoyed, this article, feel free to re-tweet it to let others know. Thanks, we appreciate it! :) <script type="text/javascript"><!--
tweetmeme_source = 'carsonified';
// --></script><br />
<script src="http://tweetmeme.com/i/scripts/button.js" type="text/javascript"></script></p>
<p>Photo Credit: <a href="http://www.flickr.com/photos/vermininc">flickr.com/photos/vermininc</a></p>
<img src="http://carsonified.com/?ak_action=api_record_view&id=1595&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://carsonified.com/blog/dev/should-you-go-beyond-relational-databases/feed/</wfw:commentRss>
		<slash:comments>32</slash:comments>
		</item>
	</channel>
</rss>
