Wednesday, September 8, 2010

Data and MetaData

bits, by sciascia on flickr
In some ways, the digital revolution is all about data -- the photos, videos, and web pages we view and share. Data is simply a series of zeros and ones, stored together in a file.  Each zero or one is a bit and eight of these bits together is a byte.  The computer assigns meaning to each bit or byte, depending on the type of the file.  For example, in an image, a byte might represent one of 256 different colors for a pixel or one dot in the image.  In other images, a pixel might be represented by 24 bits (three bytes), allowing for over 3 million different colors for each pixel.  In a text file, each byte can represent a character; the ASCII system maps each of the byte values to a character in the English language.

Campagna autunnale vicino Linguaglossa, by alfiogreen
Metadata is data about the data.  It describes what the data means, and makes it easier for us and for computers to categorize the vast amounts of data we share on the Internet.  It's what makes data truly useful.  The photo sharing site Flickr, for example, allows users to tag photos with key words describing the image.  If I was looking for photos of Linguaglossa, the town in Sicily where my ancestors are from, a simple search will find many beautiful pictures.  In fact, your digital camera usually stores metadata about each photo right in the image, describing the camera settings used to take the photo, the date, and other useful information.  Many web pages include metadata describing the content so that search engines can index it more accurately.

Digital libraries use metadata standards to markup resources for cataloging purposes.  For example, metadata for an author might look like this:
<name type="personal">
    <namePart>Bradbury, Ray</namePart>
      <roleTerm type="text">creator</roleTerm>
This uses a format called XML, which encodes metadata in a human-readable form.  In this case, we can see the author's name, Ray Bradbury, and his role as the creator of the work being cataloged.  XML plays a critical role in data exchange on the Internet; it allows data to be extracted in a format that describes it structure, so that computer programs can automatically translate it in meaningful ways.  For example, the RSS and Atom formats are used by blogs to publish a list of posts, so that they are easily read by software such as Google Reader.  The RSS format for the books added to Project Gutenberg can be found at  You can see metadata in action on a digital library by searching the metadata for Project Gutenberg using the Anacleto search engine.


LeeAnne said...

A friend of mine is a computer science major and he was showing me how in his class they are using that XML format to "talk" with the computer. It's really amazing how we can communicate with technology; by typing in commands, we can interact and associate with it.

It will be very exciting to see how we are communicating with computers ten years down the line.

Danny said...

I made a quick post on the differences between data and information since it links nicely to this topic:

Jake C said...

I liked what you said in class about the change in format of media and its effect on society. you used the MP3 as an example. that really hit me hard. no longer do people have to wait around for their "big break" they can do it themselves. There is an artist called "Owl City" who started making music late at night to combat insomnia and put his stuff on myspace. he got so big online that a record company picked him up. this would never have been possible without the MP3 format. see my blog for more. Here is Owl City's website:

Post a Comment