SUMMARY

❖ Cloud computing is synonymous to high-performance computing. Hence, file system and file

processing characteristics of high-performance computing environment are also applicable in

cloud computing.

❖ Efficient processing of large data-sets is critical for success of high-performance computing

systems. Large and complex data-sets are generated and produced in cloud every now and

then.

❖ High-performance processing of large data-sets requires parallel execution of partitioned

data across distributed computing nodes. This facility should be enabled with suitable data

processing programming models and other supporting file systems.

❖ Several programming models have been developed for high-performance processing of large

data-sets. Among them, Google’s MapReduce is a well-accepted model for processing massive

amounts of unstructured data in parallel across a distributed processing environment.

❖ Several other models have emerged influenced by the MapReduce model. Among them

Hadoop, Pig and Hive are a few to mention.

❖ Among the various file systems to support high-performance processing of data, Google File

System (GFS) is considered as the pioneer. The open-source Hadoop Distributed File System

(HDFS) is inspired by GFS.236

Cloud Computing

❖ Storage in cloud is delivered in two categories: for general users and for developers. Storage

for general users are delivered as SaaS and for the developers it is delivered as IaaS.

❖ For general users, cloud provides ready-to-use storage which is usually managed by the

providers. Hence, users can directly use the storage without worrying about any kind of

processing of the storage. Such storages are known as ‘unmanaged’ storage type.

❖ Managed storages are raw storages which are built to be managed by the users themselves.

Computing developers use such kind of storages.