Social networking and voting site Digg is rewriting its underlying software infrastructure in an effort to improve performance and scalability. Part of that effort involves moving away from the MySQL database that has helped to power Digg since its creation.
In MySQL’s place, Digg is going with an open source NoSQL non-relational database called Cassandra that was originally created by Facebook. As part of the migration effort from MySQL to Cassandra, Digg developers built a tool to help move data from one database to the other. The tool could soon be open source, helping other developers make the same move.
“We built a tool that we call ‘transcribe’ that takes advantage of Hadoop to bulk import from MySQL to Cassandra,” John Quinn, vice president of engineering at Digg, told InternetNews.com. “We’ll be releasing that to the open source community very soon.”
A NoSQL database is a new type of database that differs from the traditional SQL based relational database management systems (RDBMS) like MySQL. Instead of using tables and rows and linking relationships between the two to deliver data, NoSQL databases use different types of data stores and objects.Quinn said that Cassandra wasn’t the only NoSQL database that Digg considered. Other popular NoSQL databases include CouchDBand MongoDB.
“We evaluated all the big players in the open source NoSQL space,” Quinn said. “We particularly liked Cassandra’s proven storage model (BigTable) and the multiple datacenter support.”
Digg has been testing Cassandra in various aspects of its operations since September 2009. Most of Digg’s functionality has been re-implemented from MySQL to now use Cassandra as its primary data store, though Quinn was unable to comment on the performance improvement Digg expects to enjoy with the migration.
But he did provide some additional insight into the rationale behind the move in a blog post.
“It has a fully decentralized model. Every node is identical and there is no single point of failure,” Quinn wrote. “It’s also extremely fault tolerant — data is replicated to multiple nodes and across data centers. Cassandra is also very elastic — read and write throughput increase linearly as new machines are added.”
As part of Digg’s work with Cassandra, the company’s developers have made performance and functional improvements to the code, Quinn said. Cassandra is an open source project run under the direction of the Apache Software Foundation.
“Digg has a full-time committer on the Apache Cassandra project,” he said. “We have contributed major performance enhancements and features. All our work has been contributed upstream to the Apache project.”
As Digg moves its core infrastructure to Cassandra, it isn’t entirely abandoning MySQL, as there are still some places where it fits.
“Our primary data store is Cassandra,” Quinn said. “We’ll continue to use MySQL for specific use cases and rapid prototyping. MySQL provides a level of flexibility that Cassandra does not. It’s very useful for small-scale projects.”
Sean Michael Kerner is a senior editor at InternetNews.com, the news service of Internet.com, the network for technology professionals.
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
FEATURE | By Samuel Greengard,
November 05, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
FEATURE | By Cynthia Harvey,
October 07, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2020
FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
Anticipating The Coming Wave Of AI Enhanced PCs
FEATURE | By Rob Enderle,
September 05, 2020
The Critical Nature Of IBM’s NLP (Natural Language Processing) Effort
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
August 14, 2020
Datamation is the leading industry resource for B2B data professionals and technology buyers. Datamation's focus is on providing insight into the latest trends and innovation in AI, data security, big data, and more, along with in-depth product recommendations and comparisons. More than 1.7M users gain insight and guidance from Datamation every year.
Advertise with TechnologyAdvice on Datamation and our other data and technology-focused platforms.
Advertise with Us
Property of TechnologyAdvice.
© 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this
site are from companies from which TechnologyAdvice receives
compensation. This compensation may impact how and where products
appear on this site including, for example, the order in which
they appear. TechnologyAdvice does not include all companies
or all types of products available in the marketplace.