Future of Web Apps Miami - Conference 22-24 February 2010

News Flash

Great little extension for cool email signatures: http://www.wisestamp.com (via @keirwhitaker)

Archive: MySQL

24 June 2009

Relational databases, such as MySQL, PostgreSQL and various commercial products, have served us well for many years. Lately, however, there has been a lot of discussion on whether the relational model is reaching the end of its life-span, and what may come after it.

Should you care? Which database technology should you be using?

Of course the answer is “it depends”, but that’s not very helpful. Let me ask you a few questions to help you figure out which technology is appropriate to your particular application. Then I can give a few pointers so that you can find out more.

First of all, calm down. Chances are that your current database is perfectly fine for now. But you might want to keep an eye open in case you notice some symptoms which show that you are pushing the relational model to its limits. Some symptoms relate to the structure of your data:

  • Do you have tables with lots of columns, only a few of which are actually used by any particular row?
  • Do you have “attribute” tables where each row is a triple of (foreign key to row in another table, attribute name, attribute value) and you need ugly joins in your queries to deal with those tables?
  • Have you given up on using columns for structured data, instead just serialising it (to JSON, YAML, XML or whatever) and dumping the string into your database?
  • Does your schema have a large number of many-to-many join tables or tree-like structures (a foreign key that refers to a different row in the same table)?
  • Do you find yourself frequently needing to make schema changes so that you can properly represent incoming data?

Other symptoms relate to the scalability of your system:

  • Are you reaching the limit of the write capacity of a single database server? (If read capacity is your problem, you should set up master-slave replication. Also make sure that you have first given your database the fattest hardware you can afford, you have optimised your queries, and your schema cannot easily be split into shards.)
  • Is your amount of data greater than a single server can sensibly hold?
  • Are your page loads being slowed down unacceptably by background batch processes overwhelming the database?

In my opinion, too much emphasis is often placed on scalability, despite being a very remote problem on most projects. It’s understandable — large-scale computing systems are sexy, and everybody likes to think they are building a service which is going to be massively popular — but more often than not, developers would be better off focussing on their customers’ needs, and solving the scaling problem only if it actually arises.

That said, there is one more reason to consider non-relational databases: they are fashionable. It sounds like a silly idea to base a technical decision on fashion, but remember the human aspects of managing software projects. Great developers generally want to work with cool people in a cool environment using cool technology. That means if you want to hire great developers, providing all this coolness gives you a better chance of getting the best people to work with you. If you want to get on Hacker News, cool technology is also the way to go. Fashion shouldn’t be your primary reason, but all else being equal, you can probably err on the side of coolness. Don’t forget the cool people and the cool environment though. And now I’ll stop saying cool — it’s not very cool.

Document databases and BigTable

The BigTable paper describes how Google developed their own massively scalable database for internal use, as basis for several of their services. The data model is quite different from relational databases: columns don’t need to be pre-defined, and rows can be added with any set of columns. Empty columns are not stored at all.

BigTable inspired many developers to write their own implementations of this data model; amongst the most popular are HBase, Hypertable and Cassandra. The lack of a pre-defined schema can make these databases attractive in applications where the attributes of objects are not known in advance, or change frequently.

Document databases have a related data model (although the way they handle concurrency and distributed servers can be quite different): a BigTable row with its arbitrary number of columns/attributes corresponds to a document in a document database, which is typically a tree of objects containing attribute values and lists, often with a mapping to JSON or XML. Open source document databases include Project Voldemort, CouchDB, MongoDB, ThruDB and Jackrabbit.

How is this different from just dumping JSON strings into MySQL? Document databases can actually work with the structure of the documents, for example extracting, indexing, aggregating and filtering based on attribute values within the documents. Alternatively you could of course build the attribute indexing yourself, but I wouldn’t recommend that unless it makes working with your legacy code easier.

The big limitation of BigTables and document databases is that most implementations cannot perform joins or transactions spanning several rows or documents. This restriction is deliberate, because it allows the database to do automatic partitioning, which can be important for scaling — see the section on distributed key-value stores below. If the structure of your data is lots of independent documents, this is not a problem — but if your data fits nicely into a relational model and you need joins, please don’t try to force it into a document model.

Graph databases

Graph databases live at the opposite end of the spectrum. While document databases are good for storing data which is structured in the form of lots of independent documents, graph databases focus on the relationships between items — a better fit for highly interconnected data models.

Standard SQL cannot query transitive relationships, i.e. variable-length chains of joins which continue until some condition is reached. Graph databases, on the other hand, are optimised precisely for this kind of data. Look out for these symptoms indicating that your data would better fit into a graph model:

  • you find yourself writing long chains of joins (join table A to B, B to C, C to D) in your queries;
  • you are writing loops of queries in your application in order to follow a chain of relationships (particularly when you don’t know in advance how long that chain is going to be);
  • you have lots of many-to-many joins or tree-like data structures;
  • your data is already in a graph form (e.g. information about who is friends with whom in a social network).

There is less choice in graph databases than there is in document databases: Neo4j, AllegroGraph and Sesame (which typically uses MySQL or PostgreSQL as storage back-end) are ones to look at. FreeBase and DirectedEdge have developed graph databases for their internal use.

Graph databases are often associated with the semantic web and RDF datastores, which is one of the applications they are used for. I actually believe that many other applications’ data would also be well represented in graphs. However, as before, don’t try to force data into a graph if it fits better into tables or documents.

MapReduce

Going on a slight tangent: if background batch processing is your problem and you are not aware of the MapReduce model, you should be. Popularised by another Google paper, MapReduce is a way of writing batch processing jobs without having to worry about infrastructure. Different databases lend themselves more or less well to MapReduce — something to keep in mind when choosing a database to fit your needs.

Hadoop is the big one amongst the open MapReduce implementations, and Skynet and Disco are also worth looking at. CouchDB also includes some MapReduce ideas on a smaller scale.

Distributed key-value stores

A key-value store is a very simple concept, much like a hash table: you can retrieve an item based on its key, you can insert a key/value pair, and you can delete a key/value pair. The value can just be an opaque list of bytes, or might be a structured document (most of the document databases and BigTable implementations above can also be considered to be key-value stores).

Document databases, graph databases and MapReduce introduce new data models and new ways of thinking which can be useful even in a small-scale application; you don’t need to be Google or Facebook to benefit from them. Distributed key-value stores, on the other hand, are really just about scalability. They can scale to truly vast amounts of data — much more than a single server could hold.

Distributed databases can transparently partition and replicate your data across many machines in a cluster. You don’t need to figure out a sharding scheme to decide on which server you can find a particular piece of data; the database can locate it for you. If one server dies, no problem — others can immediately take over. If you need more resources, just add servers to the cluster, and the database will automatically give them a share of the load and the data.

When choosing a key-value store you need to decide whether it should be opimised for low latency (for lightning-fast data access during your request-response cycle) or for high throughput (which is what you need for batch processing jobs).

Other than the BigTables and document databases above, Scalaris, Dynomite and Ringo provide certain data consistency guarantees while taking care of partitioning and distributing the dataset. MemcacheDB and Tokyo Cabinet (with Tokyo Tyrant for network service and LightCloud to make it distributed) focus on latency.

The caveat about limited transactions and joins applies even more strongly for distributed databases. Different implementations take different approaches, but in general, if you need to read several items, manipulate them in some way and then write them back, there is no guarantee that you will end up in a consistent state immediately (although many implementations try to become eventually consistent by resolving write conflicts or using distributed transaction protocols; see the algorithm of Amazon’s Dynamo for an example). You should therefore only use these databases if your data items are independent, and if availability and performance are more important than ACID properties. For more information, read about Brewer’s CAP Theorem, which states that amongst Consistency, Availability and Partition tolerance, you can only choose two, and no database will ever be able to get around that fact.

Richard Jones, co-founder of Last.fm, has written up an excellent overview of distributed key-value stores. Also Tony Bain gives an introduction to the conceptual differences between relational databases and key-value stores, and recently there was a NOSQL event in San Francisco at which a number of different non-relational databases were presented.

Distributed systems are hard… really hard. I suggest that you use them only if you really need the scaling aspects they offer (or just for fun outside of a production environment).

Closing remarks

In this article I have concentrated on open source projects. If you are willing to bind yourself to a particular vendor/hosting provider, Google’s Datastore, Amazon SimpleDB, Windows Azure Storage Services or Force.com might be worth considering. They are good technologies, but keep in mind the business risk of potential lock-in.

I can’t make judgement about particular projects’ suitability for particular purposes. There is some very clever software out there, but also some very new and unstable software. If you want to consider using them, you should do your own research:

  • look around their websites for a list of sites using the database in production (and for which aspect of their service they use it);
  • check if they have a lively open source community, in case the original developer loses interest and stops maintaining the software;
  • try to find some benchmarks (though beware that many benchmarks published on the web are methodologically flawed and/or outdated, so if you are serious about it you should run your own tests, using data which matches your application’s characteristics).

As with any fashionable topic, there are many people with strong opinions, both positive and negative; don’t let yourself be put off by them. I hope I’ve given you an overview of the kind of things you can do with different types of databases so that you can choose the right one for your application.

Like this article?

If you enjoyed, this article, feel free to re-tweet it to let others know. Thanks, we appreciate it! :)

Photo Credit: flickr.com/photos/vermininc

Continue reading 32

10 June 2009

We’ve compiled six useful tips for all you web designers and developers out there. They cover various topics including: accessibility, SQL, web developer plugins for Firefox, HTML emails, design and jQuery.

Feel free to disagree or add your own in the comments below. If you’d like to submit a tip to be considered for future articles, just head over to Tipster and add your own.

Alt Tags and Screen Readers

By: James Fenton

I can’t claim these tips as my own, though my web accessibility hero Bim Egan (of the RNIB) recently gave me a few simple tips regarding alt tags and screen readers.

  1. Keep your alt tags as concise as possible.
  2. Don’t just describe exactly what is going on in an image, describe what message it is trying to convey.
  3. It is OK to leave the alt attribute blank (alt=”") as it can be more of a hindrance to blind users than a help.
  4. When an image is essentially just style, use it as a background-image.

Spending a few hours watching (and listening) to a blind user on the web is a mind blowing experience and will totally change how you approach accessibility

Shorter SQL statements by abbreviating table prefixes

By: Joris Heyndrickx

Instead of writing:

SELECT books.title, books.short, books.releasedate, authors.firstname, authors.lastname
FROM books, authors
WHERE books.author_id = authors.id AND authors.id = 21

You can write:

SELECT b.title, b.short, b.releasedate, a.firstname, a.lastname
FROM books b, authors a
WHERE b.author_id = a.id AND a.id = 21

My Top 6 Firefox Plugins for Web Development

By: Simon Hamp

FireBug with FirePHP FTW!

  1. FireBug (getfirebug.com)
  2. FirePHP (firephp.org) req. FireBug
  3. ColorZilla (colorzilla.com/firefox/)
  4. HTML Validator (users.skynet.be/mgueury/mozilla)
  5. YSlow (developer.yahoo.com/yslow/) req. FireBug
  6. Web Developer (chrispederick.com/work/web-developer)

Image headers in table based HTML Emails

By: Pete Roome

If you have an image as a header on a table based HTML Email it is likely you will find you have a very annoying gap beneath it between your two <tr>’s. Simply add the following styling to your image to close this gap:

{vertical-align:bottom;}

Lighting effects on boxes

By: David Smith

To make a box stand out in your design simply:

  1. Choose a colour for your box
  2. Create a subtle gradient starting from a slightly darker version of your colour (bottom) to a lighter version of your colour (top)
  3. Make sure your gradient is SUBTLE!
  4. Draw a horizontal line across the top of your box so that it spans the whole width.
  5. Pick a colour from the top of your gradient and lighten it still more. Apply this colour to the line you just created.

You should have an effect which looks like light is hitting the top of your box making it stand out. The blue Tipster banner uses this to good effect.

Manual filmstrips in jQuery

By: Michael Peacock

Here’s how to create a really simple manual photo film-strip in jQuery. It can be used to swap a large image on a page with that of a thumbnail elsewhere on the page, such as different photos of a product in an e-commerce store.

$(document).ready(function(){
imageSwapper(".thumbnails a");
});

function imageSwapper(link) {
$(link).click(function(){
$('#largeimage').attr('src', this.href);
return false;
});
};

Just link the thumbnails to their larger versions and pop them in a div or paragraph of class thumbnails, and give the large image an ID of large image, and you are good to go!

Any tips you can add?

If you have any tips you’d like to add, please do so in the comments below. Thanks!

[Photo Credit: flickr.com/photos/dunechaser]

Like this article?

If you enjoyed, this article, feel free to re-tweet it to let others know. Thanks, we appreciate it! :)

Continue reading 40

1 May 2009

There have been a ton of helpful and interesting tips posted to Tipster recently, so I thought we’d pull a few out for you all (and give the submitters a bit of well-deserved publicity).

Topics that we’re covering are: MySQL, CSS, JavaScript, PHP, JQuery, life hacks and server admin.

MySQL Geo Search

Here’s a MySQL statement to search for nearest objects to you in a database, ‘lat’ & ‘lng’ are fields in the ‘items’ table with the location of object and $latitude & $longitude are the users location. The distance field will be the calculated distance in Km between the user and the object:

SELECT *,(((acos(sin((".$latitude."*pi()/180)) * sin((`lat`*pi()/180))+cos((".$latitude."*pi()/180)) * cos((`lat`*pi()/180)) * cos(((".$longitude."- `lng`)*pi()/180))))*180/pi())*60*1.1515*1.609344) as distance FROM `items` ORDER BY distance;

By: Sam Machin
Vote on this tip

Transparent background images – PNG fix for IE6 – a few reminders

When using the MS filter (via The AlphaImageLoader) to fix PNG transparency for IE6 — in your CSS, remember to:

  1. Set width and height for the element/s containing the transparent background image.
  2. First set background to none — then apply filter for background image.
  3. Apply ‘position: relative;’ to any contained links to ensure functionality

(Also, bear in mind that the background images can no longer be tiled or positioned via ‘background-position’).

By: Prisca
Vote on this tip

Address Longitude and Latitude in Google Maps

Google maps does not show the Longitude and Latitude of an address you search. To get this information, enter this piece of JavaScript into you browsers address bar and hit enter:

javascript:void(prompt('',gApplication.getMap().getCenter()));

A little window will then pop-up displaying the Longitude and Latitude for you. Bingo!

[Editor's note: If you add this to your bookmarks toolbar, it's even easier to use. Just click it whenever you're on Google Maps. Hat tip to Phil Balchin.]

By: Pete Roome
Vote on this tip

Custom Details in Code Hints – Zend Studio for Eclipse

Zend Studio is a brilliant platform for PHP development. In my view, well worth the cost. One of the best things about it now being built on Eclipse is the project-wide code hinting.

When you see the built-in code hints and highlight an option, you get loads of lovely details regarding the functions and classes that you are about to use… as you would expect from most modern IDEs. However, Zend Studio for Eclipse takes this further for your own classes and functions!

When you have written your classes and functions, start placing a multi-line comment just above the opening line, i.e. /**, hit Return and Studio will prompt you for data types and descriptions for your method variables.

This is invaluable in understanding those classes you wrote at 3:47am, 3.5 years ago and haven’t used since!

By: Simon Hamp
Vote on this tip

Use a Print Stylesheet

Print stylesheets allow you to change the style of your page when printing, without having to provide a separate “print version” of the markup.

You specify a print CSS by adding “media=’print’” as an attribute to the link tag.

<link rel="stylesheet" type="text/css" media="print" href="print.css"/>

In the print CSS you should remove unnecessary elements such as sidebars, menus, background colours, presentational images, etc so as not to waste ink and paper on parts which don’t need to be printed. You can also change the font and size to make sure the text is clear on the printed page.

You can even add some CSS to show the URLs of links on the page, which wouldn’t normally be seen when printing.

a[href]:after { content: " (" attr(href) ") "; }

By: Rich Adams
Vote on this tip

Make your links easier to read

Instead of using { text-decoration: underline; } on your links (as browsers do by default) use:

{ border-bottom: 1px solid; }.

This makes your links easier to read, as there will be more space between the text and the line underneath, so the line will no longer cross through your ‘y’s and ‘g’s.

You can also then style your underline with all the usual border stylings, much more exciting than a plain underline :)

By: Laura Kalbag
Vote on this tip

Manual filmstrips in jQuery

Really simple manual photo film-strip in jQuery. Used to swap a large image on a page with that of a thumbnail elsewhere on the page, such as different photos of a product in an e-commerce store.


$(document).ready(function(){
imageSwapper(".thumbnails a");
});


function imageSwapper(link) {
$(link).click(function(){
$(&#039;#largeimage&#039;).attr(&#039;src&#039;, this.href);
return false;
});
};

Just link the thumbnails to their larger versions and pop them in a div or paragraph of class thumbnails, and give the large image an ID of large image, and you are good to go!

By: Michael Peacock
Vote on this tip

Store user passwords as salted hashes

Just using a hash of the user password is not strong enough. If two users have the same password, they’ll have the same hash. Dictionary attacks are also still possible as the attacker can have a list of dictionary word hashes. Using a salted hash involves generating a random string of characters, which you then concatenate with the password before hashing. Two users with the same password will then have different hashes, and the stored hash will never just be a hash of a common word if the user chose a weak password. You then store the salt, and the hashed value in your database.

hash_to_store = sha1(salt + real_pass)

There are many different methods you could use too, such as concatenating the salt with a hash of the password and then hashing that, etc.

hash_to_store = sha1(sha1(real_pass) + ...

By: Rich Adams
Vote on this tip

CD/DVD stuck in Macbook

Having gotten a DVD stuck inside my Macbook the other day, i found the only way to force eject it was to hold down the trackpad button on reboot. DVD popped right out.

By: Pete Roome
Vote on this tip

Photo credit: flickr.com/photos/mance

Continue reading 28

Subscribe to our Newsletter

Sign up to the Think Vitamin Newsletter to get updates on web design, web development and web entrepreneurship as well as special offers and discounts from Carsonified. Rest assured we never share your email address.

Subscribe to the Think Vitamin articles RSS feed

Future of Web Apps Miami - Conference 22-24 February 2010

News

Twitter

Follow us on Twitter

Subscribe

Article Subscribers

Feedburner blog subscriber indicator

News Subscribers

Feedburner blog subscriber indicator

Subscribe by Email

You can receive Think Vitamin updates via email. Just pop your email address in the box below and click the arrows.

Subscribe by RSS

You can also receive new Think Vitamin posts via your RSS feed reader

Subscribe RSS Think Vitamin is a proud member of the Smashing Network

Ads Via The Deck