|
Definition
Scientist all over routinely
generate large volumes of data from both computational and laboratory experiments.
Such data, which are irreproducible and expensive to regenerate, must be safely
archived for future reference and research. The archived data form and the point
at which users archive it are matters of individual preference. Usually scientists
store data using multiple platforms. Further, not only do scientists expect their
data to stay in the archive despite personnel changes, they expect those responsible
for the archive to deal with the storage technology changes without those changes
affecting either the scientist or their work. Essentially,
we require a data-intensive computing environment that works seamlessly across
scientific disciplines. Ideally that environment should provide all of the file
system features. Research indicates that supporting this type of massive data
management requires some form of Meta -data to catalog and organize the data. Problems
Identified National Sciences Digital Library
has implemented metadata previously and has find it necessary to restrict metadata
to a specific format. The Scientific Archive Management System, a metadata based
archive for scientific data has provided flexible archival storage for very large
databases. SAM uses metadata to organize and manage the data without imposing
predefined metadata formats on scientist. SAM's ability to handle different data
and metadata types provides a key difference between it and many other archives.
Restrictions imposed by SAM: It can readily accommodate
any type of data file regardless of format, content or domain. The system makes
no assumptions about data format, the platform on which the user generated the
file, the file's content, or even the metadata's content. SAM requires only that
the user have data files to store and will allow the storage of some metadata
about each data file. Working at the metadata level also avoids unnecessary
data retrieval from the archive, which can be time- consuming depending on the
files size, network connectivity or archive storage medium. SAM software hides
system complexity while making it easy to add functionality and augment storage
capacity as demand increases. About SAM SAM
came into existence in 1995 by EMSL - Environmental Molecular Science laboratory.
In 2002, EMSL migrated the original two-server hierarchical storage management
system to an incrementally extensible collection of linux - based disk firms.
The metadata- centric architecture and the original decision to present the archive
to users as a single large file system made the hardware migration a relation
file system made the hardware migration a relatively painless process.
<<back |