Dear Guest, Please Register to Download Free Seminar Reports and PPT of each topic. The Download link will be visible only after the registration.Please search the topic before posting.

Please click the Facebook Like Button if you are satisfied

 Subscribe To Get Latest Seminar Reports and PPT

Receive all seminar updates via Facebook. Just Click the Like Button Below


Custom Search

Google File System

Huge Collection of Computer Science Seminar Topics, Reports and PPT.

Google File System

Postby Prasanth » Wed Jul 24, 2013 7:17 am

The great success of Google Inc. is attributed not only to its efficient search algorithm, but also to the underlying commodity hardware and, thus the file system. As the number of applications run by Google increased massively, Google’s goal became to build a vast storage network out of inexpensive commodity hardware. Google created its own file system, named as Google File System. Google File System was innovatively created by Google engineers and ready for production in record time in a span of one year in 2003, which speeded Google’s market thereafter. Google File system is the largest file system in operation. Formally, Google File System (GFS) is a scalable distributed file system for large distributed data intensive applications. In the design phase of GFS, points which were given stress includes component failures are the norm rather than the exception, files are huge in the order of MB & TB and files are mutated by appending data. The entire file system is organized hierarchically in directories and identified by pathnames. The architecture comprises of a single master, multiple chunk servers and multiple clients. Files are divided into chunks, which is the key design parameter. Google File System also uses leases and mutation order in their design to achieve consistency and atomicity. As of fault tolerance, GFS is highly available, replicas of chunk servers and master exists.


In designing a file system for Google’s needs, they have been guided by assumptions that offer both challenges and opportunities.

• The system is built from many inexpensive commodity components that often fail. It must constantly monitor itself and detect, tolerate, and recover promptly from component failures on a routine basis.

• The system stores a modest number of large files. Usually a few million files, each typically 100 MB or larger in size. Multi-GB files are the common case and should be managed efficiently. Small files must be supported, but need not optimize for them.

• The workloads primarily consist of two kinds of reads: large streaming reads and small random reads. In large streaming reads, individual operations typically read hundreds of KBs, more commonly 1 MB or more. Successive operations from the same client often read through a contiguous region of a file. A small random read typically reads a few KBs at some arbitrary offset. Performance-conscious applications often batch and sort their small reads to advance steadily through the file rather than go back and forth.

• The workloads also have many large, sequential writes that append data to files. Typical operation sizes are similar to those for reads. Once written, files are seldom modified again. Small writes at arbitrary positions in a file are supported but do not have to be efficient.

• The system must efficiently implement well-defined semantics for multiple clients that concurrently append to the same file. The files are often used as producer consumer queues or for many-way merging. Hundreds of producers, running one per machine, will concurrently append to a file. Atomicity with minimal synchronization overhead is essential. The file may be read later, or a consumer may be reading through the file simultaneously.

• High sustained bandwidth is more important than low latency. Most of the target applications place a premium on processing data in bulk at a high rate, while few have stringent response time requirements for an individual read or write.
You do not have the required permissions to download the files attached to this post. You must LOGIN or REGISTER to download these files.
User avatar
Site Admin
Posts: 475
Joined: Sat May 28, 2011 6:29 pm

Return to Computer Science Seminar Topics