It has been more than 30 years since John Holland’s ideas about robust adaptive systems that led to the design and implementation of genetic algorithms (GAs). Since then, GAs have been demonstrated to be an effective problem-solving tool for tackling complex optimization and machine learning problems [1]. The GA exhibits global search capabilities by simultaneously evaluating performances at multiple points in the solution space. Before this simulated evolution process begins, an initial population of multiple coded individuals representing random candidate solutions is formed, and every such individual is assigned a fitness function indicating the quality of the candidate solution. At each generation of search, multiple candidates are evaluated and the search is directed according to the principle of “survival of the fittest.” Useful search information and coordinates are then exchanged and altered for the next generation of candidate solutions. This evolution cycle will be repeated until the final generation is reached or the solution has been found.
The computational effort involved in such an evolutionary process is massive due to the inherent nature of parallel search, particularly for complex problems where a sufficiently large population and generation size that incurs a high computational workload is necessary in order to simulate a realistic evolutionary model and to find the global optimal solutions. The fundamental approach of reducing computational workload is to develop more efficient algorithms based on a sound theoretical understanding of GAs. Another promising approach involves exploiting the inherent parallelism of GAs by creating an infrastructure necessary to support distributed information processing using existing Internet and hardware resources (dividing a task into subtasks and solving the subtasks simultaneously using multiple processors over the Internet). Instead of evolving the entire population in GA using a single processor, the concept of multiple intercommunicating subpopulations can be introduced in analogy with the natural evolution of spatially distributed populations. Such intercommunication allows individuals to migrate among the subpopulations based upon some patterns to induce diversity of elite individuals periodically, in a way that simulates the species evolved in natural environment [4, 5, 7].
Peer-to-peer (P2P) computing applications attempt to harness the computing resources at the so-called edge of the Internet: the computers connected to the Internet at office or home, which are believed able to offer a great amount of computational power for a single user and to increase the overall rate of resource utilization at large [6]. In this article, we describe a P2P environment specifically built for the application of GAs that attempts to make use of the computational resources distributed over the Internet for executing the often complex and time-consuming GA applications. The P2P GA environment provides a friendly user interface and a uniform GA executor capable of utilizing the resources of all participating computers on the Internet, while hiding the complexity of network programming from the end users.
Design Consideration
Based on Flynn’s classification of computer architecture [2], a multiple-instruction-multiple-data (MIMD) architecture consists of processing elements that can execute different instructions on different data in parallel. A MIMD multicomputer is where communication between processing elements is via message passing, as opposed to shared memory in a multiprocessor machine. The approach of multiple computers connected in a P2P model for mass computation over the Internet conforms to this definition. Each computer connected to the Internet has a processing element or CPU, which can execute instructions on data stored locally in the main memory of the computer. The architectural organization of P2P computation consists of multiple computers, called peers, interconnected by message-passing network. Each peer is an autonomous computer consists of a processor, local memory, attached disks, and I/O peripherals. All the local memories are private and only accessible by local processors. The message-passing network provides a point-to-point packet-switching connection between any two peers on the Internet, and each point-to-point link is associated with a communication cost or latency indicating the overhead for sending a message between the two peers.
The aim of the P2P GA environment is to reduce the execution time of a complex program and to improve the solution quality by exploiting the vast amount of resources exposed in a P2P network. It is useful to understand the communication latency pattern in a P2P network and thus allocate an appropriate workload for each peer (for example, one high-latency link could significantly delay the computation of the whole system). A series of experiments has been conducted to measure the communication latency from one host in our lab to Internet nodes in different geographical regions such as LAN, campus, MAN, the U.S., Asia, and Europe. As expected, the latency of LAN is much smaller than other types of destinations, for example, the communication latency to the U.S. and Europe can be 1,000 times of that in an LAN, while the number of hops across continents is 20 on average.
A general leap is observed when links move from one type of network to another: the communication increases when the links go through a higher level of network hierarchy. This reveals an important issue in designing the P2P GA: peer computers over the Internet should be grouped according to the interlink communication latency, which are computers physically located in the same region. The benefit of this property is that computer and network usage patterns also conform to this physical region pattern since the working hours are often similar for the same organization, city, or country. This implies a large number of computers are available in a similar networking latency environment, which provides a straightforward approach to scheduling tasks in a P2P GA environment by simply assigning them to computers physically located in the same geographical region.
System Architecture
A deme-based P2P GA model is adopted where each P2P node is mapped to a deme and is evolved independently, except for the occasional exchange of elite individuals. At each generation, the population in GA is divided into a few demes. Each deme is assigned to a peer node and each peer will perform its own fitness evaluation. Since the major computation overhead in GA is often attributed to the evaluation of fitness function and these evaluation processes are independent of one another, it is possible to map each deme of the GA computation with one peer in the P2P network and to communicate with each other using the communication facilities provided by the P2P system, and thus reduce the overall GA computation time significantly.
The P2P GA architecture is divided into three separate layers representing different levels of functionalities in the system. The schematic view of the system architecture is shown in Figure 1. At the lowest is the Internet layer on top of which the P2P computation structure exists. This layer acts as a conceptual abstraction that includes all the basic networking services like TCP/IP and HTTP that are utilized by the upper layers. The immediate upper layer accesses the services provided by the Internet layer via respective programming interfaces, such as Berkeley Socket API for TCP/IP. At the middle is the P2P core service layer that deals with peer discovery, peer communication, peer management, and security enforcement, which are the basic “plumping” services essential in the P2P software. The highest layer is the P2P GA application layer that encompasses the functionalities required in a P2P GA application, including a P2P GA class library for customization and building of GA problem-solver application, a GA executor that performs the actual computation for solving the problem, and a GA planner that deals with the allocation of resources on the P2P network.
In Figure 1, the user interface component provides users with a friendly interaction experience and a central point of control where they can perform tasks interacting with all levels in the hierarchy. Typical tasks include the submission of GA jobs, monitoring of evolution progress, and the configuration and control of the system. Layered and monolithic structures are mixed in the design to address distinguished requirements in different aspects of the problem. The layered structure for system development gives a layered abstraction of the problem such that the layers can be upgraded gradually and independently of one another. On the other hand, the monolithic interface provides the end users an easy interaction with the system at a center point.
The P2P GA infrastructure is built on top of a class library that makes it easy to use or to expand with advanced features for solving sophisticated problems posted by the users. The classes are partitioned into two packages: P2P GA framework and P2P services. The P2P GA framework covers issues such as the choice of migration strategies, the use of communication mechanisms, and the installation of user-defined evaluators. The problems addressed in this package lie in the application level. The P2P services assume the duty of connecting all participating nodes on the Internet and providing services for the application layer on top of it. It corresponds to the middle layer in the system architecture.
There are several steps involved in utilizing the P2P GA system. Initially, the user compiles an evaluator module, which is essentially a computer language representing the model of the problem to be solved and the fitness function to be optimized, into the form of executable files. Since computers connected to the Internet are heterogeneous both in software and hardware platforms, the platform-independent language Java can be used to encode the evaluator and the GAs can be run inside a Java Virtual Machine.
After modeling the problem, the user-defined fitness function and associated directories are transported to each host computer in the P2P network. The task submission peer node then interacts with the user to get the required input and the user directs the peer to send the task module to the storage peer. The peer also reports the task submission request to the task-scheduling peer running as a service along with the directory service. It explores the directory to get the knowledge of available resources and matches the knowledge with the directions set by the task submitter to formulate an execution plan. The execution is then launched by sending messages to all the selected peer nodes on the network. To run a task, a peer connects to the storage peer to download the task module to local node and begins the execution. The user is able to control the P2P GA execution with a control panel as shown in Figure 2. Upon finishing the execution, the results are put into the system and the user is notified, which completes a typical problem-solving process in the P2P GA environment. An option is provided where users could abort from the system anytime during the execution, if necessary.
Simulation
A simulation study based on a 4-bit deceptive optimization problem has been performed to illustrate the P2P GA execution environment. The deceptive function evaluates over a binary string that is divided into 4-bit-length cells [3, 9]. The fitness function is the sum of all scores evaluated upon the 4-bit cell and the score of the 4-bit cell is a function of the number of bits set to one. The function is deceptive because there is a local trap when the set bits are three corresponding to the minimum score. There are 80 units of 4-bit cell and thus the maximum fitness value is 320 and the minimum is zero. A random migration topology is implemented where each deme selects a destination randomly and sends a fixed percentage of elite individuals for migration. The performance is evaluated based upon the average results over three independent simulation runs for one computer to 15 computers on the same problem.
Since the quality and cost depend nonlinearly on the deme size and neither the cost nor the quality is held constant, the performance is measured according to the percentage of correct partitions for the best individual among all demes at the end of the GA evolution. As shown in Figure 3, the search quality is improved significantly and the percentage of correct partitions is increased from 50% to 83% within the scale of 115 demes, which demonstrates the benefit of running GA applications using multiple computers over the Internet.
The speedup achievable for a P2P GA environment has also been investigated. The fixed-time speedup is measured based on the amount of computation time for work finished in a P2P GA system against the work finished in a standalone GA within the same period assuming all machines have the same execution ability. It is observed that the speedup is related to the fitness evaluation time, the number of subpopulations, the percentage of migrating individuals, the frequency of migration, and the time latency of sending the migrants. The simulation shows that a good speedup of 90 for a 100 sub-population P2P GA system can be achieved if the execution time for fitness evaluation of an individual takes >= 0.1 msec in a campus network or >= 1 msec in typical Internet connections [8]. In contrast, the speedup could be as low as two if the communication latency among peers over the Internet is very large as compared to the fitness evaluation time, which is a common drawback for many distributed systems.
Conclusion
We have described a P2P-based GA execution environment that makes use of the computational resources distributed over the Internet for executing typically complex and time-consuming GA applications. Object-oriented techniques were used as the basis of addressing the flexible GA application programming interface and user customization, while hiding any complexities of network programming from the users. As the number of demes is increased from one to 15, the performance is improved and the percentage of correct partitions is increased from 50% to 83% for a 4-bit deceptive function. The simulation shows that a good speedup for the P2P GA system can be achieved if the execution time for fitness evaluation is large and the communication latency among peers on the Internet is small. Future research includes more in-depth performance analysis of the P2P GA paradigm on real-world problems.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment