Whenever analysts or journalists assemble lists of the top trends for this year, “big data” is almost certain to be on the list. While the catchphrase is fairly new, in one sense, big data isn’t really a new concept. Computers have always worked with large and growing sets of data, and we’ve had databases and data warehouses for years.
What is new is how much bigger that data is, how quickly it is growing and how complicated it is. Enterprises understand that the data in their systems represents a gold mine of insights that could help them improve their processes and their performance. But they need tools that will allow them to collect and analyze that data.
Not surprisingly, the big data market is growing very quickly in response to the growing demand from enterprises. According to IDC, the market for big data products and services was worth $3.2 billion in 2010, and they predict the market will grow to hit $16.9 billion by 2015. That’s a 39.4 percent annual growth rate, which is seven times higher than the growth rate IDC expects for the IT market as a whole.
Interestingly, many of the best and best known big data tools available are open source projects. The very best known of these is Hadoop, which is spawning an entire industry of related services and products. This month, we’re profiling Hadoop, as well as 49 other big data projects. Here you’ll find a lot of Apache projects related to Hadoop, as well as open source NoSQL databases, business intelligence tools, development tools and much more.
If we’ve overlooked any important open source big data tools, please feel free to note them in the comments section below.
Also see: Hadoop and Big Data
Perhaps the most interesting aspect of this list of open source Big Data analytics tools is how it suggests the future. It starts with Hadoop, of course, and yet Hadoop is only the beginning. Open source, with its distributed model of development, has proven to be an excellent ecosystem for developing today’s Hadoop-inspired distributed computing software. So take a look at the entries, all of which are some degree influenced by Hadoop, and realize: these products represent the infancy of what promises to be a very long – and very advanced – development cycle of open source Big Data products.
The database and data warehouse is one of the cornerstones of open source software in the enterprise. So it’s no surprise that the sixteen open source databases on these pages run the gamut in terms of approach and sheer number of tools, not to mention the list of prestigious companies that deploy these products. Indeed, as this list clearly shows, there’s no lack of expertise among open source developers when it comes to designing and building advanced database products.
A good business intelligence tool makes all the difference to a manager or executive looking to run an efficient business. A top BI tool offers extensive reporting, big data analytics and integration with Hadoop and other platforms, all typically viewable on an intuitive, users customizable dashboard. Consequently, the open source business intelligence tools seen on these pages are used by many key personnel across all business sectors to make critical decisions.
This array of open source data mining tools is as diverse as the open source community itself. Some are sponsored by companies with the resources for marketing and constant upgrades – and the benefit of constant feedback from customers – while others are classic open source projects, perhaps with an eye toward becoming the next Hadoop or Spark over time. Whatever the case, these pages contain an impressive level of development expertise in the service of Big Data.
A roundup of some of the brightest lights in the Big Data world – a list you’ll certainly be well familiar with if you work in Big Data. These open source file systems and open source programming languages are the very foundation of Big Data, the software workhorses that enable IT professionals to turn a vast data set into a source of actionable information and insight. Perhaps most interesting: as advanced as these tools are, the open source community will certainly have quite a lot more to offer Big Data in the years ahead. These advanced tools are just the beginning.
When IT professionals need to transfer and aggregate huge data sets for Big Data purposes, they require some heavy duty tools. They need software that can quickly sift and index through structured and unstructured data, tools that speak the diverse data languages of today’s highly complex Big Data platforms. The fact that some of the leaders in this area are open source file transfer and open source aggregation tools certainly showcases the ever-growing influence of open source in enterprise environments.
Terracotta’s “Big Memory” technology allows enterprise applications to store and manage big data in server memory, dramatically speeding performance. The company offers both open source and commercial versions of its Terracotta platform, BigMemory, Ehcache and Quartz software. Operating System: OS Independent.
Apache Avro is a data serialization system based on JSON-defined schemas. APIs are available for Java, C, C++ and C#. Operating System: OS Independent.
This Apache project is designed to coordinate the scheduling of Hadoop jobs. It can trigger jobs at a scheduled time or based on data availability. Operating System: Linux, OS X.
Formerly a Hadoop sub-project, Zookeeper is “a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.” APIs are available for Java and C, with Python, Perl, and REST interfaces planned. Operating System: Linux, Windows (development only), OS X (development only).
See Also: Top Big Data Companies
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
FEATURE | By Samuel Greengard,
November 05, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
FEATURE | By Cynthia Harvey,
October 07, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2020
FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
Anticipating The Coming Wave Of AI Enhanced PCs
FEATURE | By Rob Enderle,
September 05, 2020
The Critical Nature Of IBM’s NLP (Natural Language Processing) Effort
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
August 14, 2020
Datamation is the leading industry resource for B2B data professionals and technology buyers. Datamation's focus is on providing insight into the latest trends and innovation in AI, data security, big data, and more, along with in-depth product recommendations and comparisons. More than 1.7M users gain insight and guidance from Datamation every year.
Advertise with TechnologyAdvice on Datamation and our other data and technology-focused platforms.
Advertise with Us
Property of TechnologyAdvice.
© 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this
site are from companies from which TechnologyAdvice receives
compensation. This compensation may impact how and where products
appear on this site including, for example, the order in which
they appear. TechnologyAdvice does not include all companies
or all types of products available in the marketplace.