On March 15, the Sloan Digital Sky Survey (SDSS) released its latest data set to researchers and the broad public – Data Release 2. This data set contains over six terabytes of images and the properties of more than 88 million celestial objects. This data is available on the web at www.sdss.org/DR2 or in a more public friendly format at the SkyServer site. Visitors can pan and zoom around the universe using a sort of celestial version of Mapquest and click on an object to find out the properties of a star, galaxy or quasar.
While this can be fun, from a scientific viewpoint the most important feature is the ability to query that data set for objects that meet the requirements of a research project. To fulfill this demand meant a fundamental change in the way astronomy information is normally stored and managed.
“The volume of data we were projecting was so large that the traditional methods that scientists were using wouldn’t cut it any more,” says Johns Hopkins University associate research scientist Ani Thakar.
Astronomers United
The SDSS is a project of the Astrophysical Research Consortium, a group of more than 200 astronomers at 13 institutions around the world. Its multi-year project is to map one-quarter of the sky and determine the brightness and position of several hundred million objects in it. It gathers information using a 2.5 meter telescope at the Apache Point Observatory in New Mexico. The telescope contains one of the largest imaging cameras in the world. While a typical large telescope contains a single CCD chip, the SDSS camera contains an array of thirty 4-megapixel chips.
Every two weeks, SDSS FedExes the raw imaging data to the U.S. Department of Energy’s Fermi National Accelerator Laboratory in Batavia, Illinois for processing. There it is analyzed, calibrated, put into ASCII CSV (comma separated values) format and shipped to SDSS to add to its database. Using a database is a change from the usual way astronomical data is managed.
“The whole idea of putting them into databases is first of all to ensure the integrity of the data, be able to back out changes and things like that,” Thakar explains. “The other big thing is to provide fast access to the data.”
Page 2: Moving past FITS
Normally the data from a telescope is recorded in FITS (flexible image transport system) files, a binary transport mechanism that is used extensively for astronomy data. While this is adequate for small batches of information, when talking about the hundreds of millions of records that will eventually reside in SDSS data store, FITS is too cumbersome for rapid data access.
“In order for you to search for objects that were of interest for your research would take hours, maybe days,” Thakar continues.
SDSS started out using an object oriented database (OODB), but that didn’t meet the performance requirements. It decided to switch to a relational database.
Jim Gray, a “distinguished engineer” in Microsoft’s Scalable Servers Research Group and manager of the company’s Bay Area Research Center in San Francisco, California, helped SDSS set up on Microsoft’s SQL Server 2000. The database resides on a series of off the shelf RAID 0/5 arrays with a total cost of under $10,000. The SQL database came on line with the Early Data Release in June 2001. Initially the SQL Server was just for the public access, while scientists would continue to use the OODB.
But that didn’t last for long.
“In the first six months, the SQL database stole the show,” says Thakar. “It was so much faster and easier to use that many of the scientists started using it too.”
As a result, everything was moved over to SQL Server.
The SkyServer site offers visitors several options for getting data depending on their level of expertise. There are form-based queries that anyone can use. Hard core users can run SQL queries, or submit a batch file and come back later to view the results. Users can download their results in text, CSV or XML formats. Visitors can also use a graphic interface to locate an area of the sky, zoom in and click on a particular object to find out its properties.
So far, over 200 papers have been published based on data from the SDSS. And there are many more to come as its use speeds up the research process.
“Being able to pose questions in a few hours and get answers in a few minutes changes the way one views the data: you can experiment interactively,” Microsoft’s Jim Gray and Johns Hopkins University astronomy professor Alex Szalay wrote in their paper The World-Wide Telescope, an Archetype for Online Science. “When queries take three days and hundreds of lines of code, one asks many fewer questions and so gets fewer answers.”
Ethics and Artificial Intelligence: Driving Greater Equality
FEATURE | By James Maguire,
December 16, 2020
AI vs. Machine Learning vs. Deep Learning
FEATURE | By Cynthia Harvey,
December 11, 2020
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
FEATURE | By Samuel Greengard,
November 05, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
FEATURE | By Cynthia Harvey,
October 07, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2021
FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
Datamation is the leading industry resource for B2B data professionals and technology buyers. Datamation's focus is on providing insight into the latest trends and innovation in AI, data security, big data, and more, along with in-depth product recommendations and comparisons. More than 1.7M users gain insight and guidance from Datamation every year.
Advertise with TechnologyAdvice on Datamation and our other data and technology-focused platforms.
Advertise with Us
Property of TechnologyAdvice.
© 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this
site are from companies from which TechnologyAdvice receives
compensation. This compensation may impact how and where products
appear on this site including, for example, the order in which
they appear. TechnologyAdvice does not include all companies
or all types of products available in the marketplace.