CLOUD PROGRAMMING: A CASE STUDY WITH ANEKA

Aneka is an outcome of efforts in the field of distributed and grid computing from the University

of Melbourne, Australia. It offers PaaS facility in cloud computing. ‘Aneka’ is a Sanskrit term

meaning ‘many in one’. The uniqueness of the solution is its support for multiple programming

models like task programming, thread programming and MapReduce programming. This

section discusses these programming models with references to their implementations by

Aneka cloud.

Each of the three programming models supported by Aneka has three main elements as

‘Executors’, ‘Schedulers’ and ‘WorkUnits’. Apart from this there is a ‘Manager’. The ‘WorkUnit’

is a logical entity that defines the size or unit of an executable module that can be handled by

Aneka. Figure 19.9 is going to explain the component structure of Aneka execution model.

The ‘Scheduler’ arranges the execution of work units comprising an application, distributes

them to multiple executing nodes (‘Executors’), receives the result and sends it to users.

The ‘Manager’ is a client component that communicates with the Aneka system on behalf of

the client system.

Aneka has been commercialized by Manjrasoft, an Australian company which was formed

to commercialize the grid and cloud computing software technologies developed at research

lab of University of Melbourne. Manjrasoft first released the beta version of Aneka in 2009.

Aneka, which means ‘many-in-one’, is named so because it supports multiple programming

models like task programming, thread programming and MapReduce programming.

Aneka is built over Microsoft .NET framework and provides run-time and set of API for

developing .NET applications over it. Aneka works as a workload management and distribution

platform that empowers (speeds up) applications over Microsoft .NET framework environment.

Aneka has special ability of being deployed on third-party IaaS as it supports provisioning

of resources on public cloud like Amazon EC2, Microsoft Azure, GoGrid and several other

private clouds. This helps in building hybrid application with very minimal programming

efforts.

Aneka basically comprises two key components:

■ Tools for rapid application development and software development kit (SDK) containing

application program interfaces (APIs) and

■ A run-time engine to manage deployment and execution of applications.

The following section discusses about three programming models supported by Aneka in brief

19.14.1 Thread Programming

High performance computation system focuses on delivering better output during computation.

High throughput in computation is achieved by allowing concurrency through multi-processing

and multi-threading. A process represents a program in execution. In multi-processing, multiple

processes are executed in parallel at a single machine. Such system is meant to support multi

tasking. On the other hand, a thread represents a single flow of control within a process. A system

supports multi-threading when it can execute different threads in parallel within a process.19.14.1.1 Multi-threading in Aneka

For high-end requirements, performance of executing multi-threaded applications on a

single multi-core system (systems having two or more processing units, known as cores and

which is generally attached as a single component) becomes insufficient. In such cases, the

distributed execution of application is the only solution. For this purpose, an application can

be decomposed into several units.

In Aneka, multi-thread programming is implemented over cloud, using Thread Programming

Model. In this model, threads are treated as distributed threads being known as Aneka thread.

Aneka threads follow the principle of local threads which can be executed over distributed

system architecture. Aneka schedules the executions of threads efficiently while creation and

control of the threads is the responsibility of the application developer.

APIs for Aneka thread programming imitate the .NET-based thread class library. Hence it

becomes effortless to port .NET-based multi-threaded application on Aneka as the transition

between a .NET thread and an Aneka thread is almost transparent. .NET applications need not

be fully rewritten to be ported in Aneka platform, rather only a replacement of the class System.

Threading.Thread by AnekaApplication does the trick.

In Aneka Thread Programming model, the work units are represented as Aneka

threads. Programmatically, the concept is implemented by using the template class

AnekaApplication<AnekaThread and ThreadManager>. The ‘AnekaApplication’ class type

Business Applications

Aneka PaaS

Support for multiple programming models: Task, Thread, MapReduce

Public Cloud IaaS

Private Cloud IaaS

FIG 19.10: Aneka PaaS model345

Cloud Management and a Programming Model Case Study

is used for all distributed applications using Thread Programming model in Aneka. A

Configuration class also defines the application’s interaction with the cloud middleware.

In Aneka Thread Programming model, an application is treated as a collection of threads called

Aneka threads which can be executed remotely over distributed environments.

19.14.2 Task Programming

Thread programming provides parallelism in execution and can run in a single system having

a distributed system architecture. Many such programming models are available which tie the

power of multiple computing systems together. Task programming model was designed to be

executed over clusters and architectural distribution is inherent in it. This programming model

provides an attractive solution for executing high-performance distributed applications.

In task programming model, any application is considered as collection of tasks which are

independent of each other and can be executed in any order. Task is defined by every operating

system in its design. All the present-day OSs support multi-tasking activity where multiple

tasks can be executed concurrently.

A task is a combination of one or more programs constituting a computing unit of

application. The computing unit must represent a component of the application that can be

executed independently in isolation. Additionally, an application is a collection of multiple

tasks. Task generally takes input files and produces output file(s) as outcomes.

Depending on various characteristics and requirements of applications, task computing can

be segregated in two primary categories: High-Performance Computing and High-Throughput

Computing. Each category has some specific infrastructural requirements.

High-Performance Computing (HPC) is the use of task programming model for executing

applications needing high computing power over a relatively shorter period of time. HPC

combines tasks in tightly-coupled manner and hence requires very low latency in network

communication to minimize data exchange time. Thus low-latency network is a requirement

for HPC model. Traditionally, the clusters are designed to support HPC applications.

High-Throughput Computing (HTC) is the use of task programming model for executing

applications which needs the high computing power over a longer period of time. HTC

applications generally constitutes of large number of independent tasks which run for

long time (for several weeks or months). Such tasks need not to communicate during

execution and can be easily scheduled over a distributed system architecture. Traditionally,

the computing grids which are composed of heterogeneous resources supports HTC

applications very well.

There is another category in task computing called as Many Task Computing (MTC) that

combines HPC and HTC both. Tasks under MTC model are loosely coupled but communication

intensive. Cloud infrastructural model is most suitable to support MTC.

19.14.2.1 Task Programming in Aneka

Aneka task programming model offers the support for developing distributed applications over

Aneka platform without any difficulty. Aneka tasks are implemented through APIs which are

packed into ‘ITask’ (Aneka.Tasks.ITask) interface. Tasks created at local nodes can be passed

over to Aneka cloud where supports for execution of Aneka tasks are implemented. Figure 19.11

describes the scenario.

The ‘AnekaApplication’ class which is specialized for handling tasks bundles together the

tasks created through ‘ITask’ interface and all of their dependencies (like library and data files).

The other client side component ‘TaskManager’ warps and represents the tasks to the cloud,

‘AnekaTask’ submits tasks to Aneka cloud, monitors execution and accepts the returned result.

In Aneka cloud, there are four services which coordinate the entire task execution activities,

namely as ‘MembershipCatalogue’, ‘TaskScheduler’, ‘ExecutionService’, and ‘StorageService’.

Among these, ‘TaskScheduler’ schedules the execution of tasks among the resources.

Tasks are created by developers using Aneka-supplied interface and classes and then handed

over to Aneka cloud for execution.

19.14.3 MapReduce Programming

Several applications today produce huge volume of data that need to be stored and processed

efficiently. This is a challenging task and is known as data-intensive computing. In cloud

computing environment, data-intensive computing happens in many domains for business

analysis or scientific simulation purposes.

MapReduce is a programming model introduced by Google to process large volume of

data. It works by representing the computational task using two functions as map and reduce.

Underlying storage infrastructure is of distributed nature in this model and data is generally

presented as a key-value pair.

The job of ‘Map’ function is to filter and sort data into queues. For example, name of

students can be sorted using their surnames. One queue is maintained for each distinct

surname. ‘Reduce’ function performs summary operation. For example, a summary may show

number of students under every different surname. Hence it can be observed that, MapReduce

operation takes key-value pair as input and produces lists.

19.14.3.1 MapReduce Programming in Aneka

MapReduce operation executes in two phases. First, multiple ‘Map’ operations run in parallel

independently. In second phase, ‘Reduce’ operations operate on output produced in first phase.

MapReduce in Aneka has been implemented following its implementation in Hadoop (a Java

based open-source programming framework). Figure 19.12 represents the model in Aneka.

The Mapper and Reducer correspond to the class implementations of map and reduce

operations respectively. Mapper and Reducer class are extended from Aneka MapReduce API.

The run-time implementation comprises of three modules as

■ A supporting distributed file system,

■ MapReduce Scheduling module and

■ MapReduce Execution Module.

Local data files from client’s MapReduce application is submitted along with MapReduce job.

Thereafter, the process remains transparent and output is returned to client.

Aneka PaaS consumers have the options of developing applications following Thread model,

Task model or MapReduce model, as Aneka supports all of them.

CLOUD PROGRAMMING: A CASE STUDY WITH ANEKA

Archives

Categories

Meta