Digital Civilization: digital libraries

Saturday, September 11, 2010

The Legacy of Manutius

I've blogged a lot about computer technology lately, discussing algorithmic thinking, programming languages, and metadata. I want to take some time to tie these concepts together with the history we've been studying lately.

Aldus Manutius, the great Renaissance publisher, is well known for his preservation of Greek, Latin, and Italian texts, as well as his innovation in bringing these books to the general public in a small, portable format known as an octavo. The modern analogue to his efforts is Project Gutenberg, which is digitizing as many books as it can and providing them for free to the public. The latest count includes more than 33,000 free electronic books. In many ways, this project is fulfilling Manutius' dream beyond his wildest expectations, due to the sheer volume of books being made available and the vast number of readers. Of course, Manutius could not have forseen the digital age, when copies have become nearly free. Nor may he have forseen an era when volunteers would donate their time and resources to provide such a large digital library.

Wednesday, September 8, 2010

Data and MetaData


bits, by sciascia on flickr

In some ways, the digital revolution is all about data -- the photos, videos, and web pages we view and share. Data is simply a series of zeros and ones, stored together in a file. Each zero or one is a bit and eight of these bits together is a byte. The computer assigns meaning to each bit or byte, depending on the type of the file. For example, in an image, a byte might represent one of 256 different colors for a pixel or one dot in the image. In other images, a pixel might be represented by 24 bits (three bytes), allowing for over 3 million different colors for each pixel. In a text file, each byte can represent a character; the ASCII system maps each of the byte values to a character in the English language.

Campagna autunnale vicino Linguaglossa, by alfiogreen

Metadata is data about the data. It describes what the data means, and makes it easier for us and for computers to categorize the vast amounts of data we share on the Internet. It's what makes data truly useful. The photo sharing site Flickr, for example, allows users to tag photos with key words describing the image. If I was looking for photos of Linguaglossa, the town in Sicily where my ancestors are from, a simple search will find many beautiful pictures. In fact, your digital camera usually stores metadata about each photo right in the image, describing the camera settings used to take the photo, the date, and other useful information. Many web pages include metadata describing the content so that search engines can index it more accurately.

Digital libraries use metadata standards to markup resources for cataloging purposes. For example, metadata for an author might look like this:

<name type="personal">
    <namePart>Bradbury, Ray</namePart>
    <role>
      <roleTerm type="text">creator</roleTerm>
    </role>
</name>

This uses a format called XML, which encodes metadata in a human-readable form. In this case, we can see the author's name, Ray Bradbury, and his role as the creator of the work being cataloged. XML plays a critical role in data exchange on the Internet; it allows data to be extracted in a format that describes it structure, so that computer programs can automatically translate it in meaningful ways. For example, the RSS and Atom formats are used by blogs to publish a list of posts, so that they are easily read by software such as Google Reader. The RSS format for the books added to Project Gutenberg can be found at http://www.gutenberg.org/feeds/today.rss. You can see metadata in action on a digital library by searching the metadata for Project Gutenberg using the Anacleto search engine.

Pages

Saturday, September 11, 2010

The Legacy of Manutius

Wednesday, September 8, 2010

Data and MetaData