Giới thiệu về Lập trình Song song: Tài liệu cần thiết cho Sinh viên và Chuyên gia

Trường đại học

University of San Francisco

Chuyên ngành

Computer Science

Người đăng

Ẩn danh

Thể loại

sách

2011

391
0
0

Phí lưu trữ

50.000 VNĐ

Mục lục chi tiết

Preface

About the Author

1. CHAPTER 1: Why Parallel Computing?

1.1. Why We Need Ever-Increasing Performance

1.2. Why We’re Building Parallel Systems

1.3. Why We Need to Write Parallel Programs

1.4. How Do We Write Parallel Programs?

1.5. What We’ll Be Doing

1.6. Concurrent, Parallel, Distributed

1.7. The Rest of the Book

1.8. A Word of Warning

2. CHAPTER 2: Parallel Hardware and Parallel Software

2.1. The von Neumann architecture

2.2. Processes, multitasking, and threads

2.3. Modifications to the von Neumann Model

2.3.1. The basics of caching

2.4. Caches and programs: an example

2.5. Instruction-level parallelism

2.6. Shared-memory versus distributed-memory

2.7. Coordinating the processes/threads

2.8. Programming hybrid systems

2.9. Input and Output

2.10. Speedup and efficiency

2.11. Parallel Program Design

2.12. Writing and Running Parallel Programs

2.13. Input and output

2.14. Parallel program design

3. CHAPTER 3: Distributed-Memory Programming with MPI

3.1. Compilation and execution

3.2. MPI Init and MPI Finalize

3.3. Communicators, MPI Comm size and MPI Comm rank

3.4. The status p argument

3.5. Semantics of MPI Send and MPI Recv

3.6. Some potential pitfalls

3.7. The Trapezoidal Rule in MPI

3.7.1. The trapezoidal rule

3.7.2. Parallelizing the trapezoidal rule

3.8. Tree-structured communication. point-to-point communications

3.9. MPI Derived Datatypes

3.10. Performance Evaluation of MPI Programs

3.10.1. Speedup and efficiency

3.11. A Parallel Sorting Algorithm

3.11.1. Some simple serial sorting algorithms

3.11.2. Parallel odd-even transposition sort

3.11.3. Safety in MPI programs

3.11.4. Final details of parallel odd-even sort

4. CHAPTER 4: Shared-Memory Programming with Pthreads

4.1. Processes, Threads, and Pthreads

4.2. Starting the threads

4.3. Running the threads

4.4. Stopping the threads

4.5. Other approaches to thread startup

4.6. Matrix-Vector Multiplication

4.7. Producer-Consumer Synchronization and Semaphores

4.8. Barriers and Condition Variables

4.8.1. Busy-waiting and a mutex

4.9. Read-Write Locks

4.9.1. Linked list functions

4.9.2. A multi-threaded linked list

4.9.3. Pthreads read-write locks

4.9.4. Performance of the various implementations

4.9.5. Implementing read-write locks

4.10. Caches, Cache Coherence, and False Sharing

4.10.1. Incorrect programs can produce correct output

5. CHAPTER 5: Shared-Memory Programming with OpenMP

5.1. Compiling and running OpenMP programs

5.2. The Trapezoidal Rule

5.2.1. A first OpenMP version

5.3. Scope of Variables

5.4. The Reduction Clause

5.5. The parallel for Directive

5.5.1. Finding loop-carried dependences

5.6. More on scope

5.7. More About Loops in OpenMP: Sorting

5.7.1. Odd-even transposition sort

5.8. The schedule clause

5.8.1. The static schedule type

5.8.2. The dynamic and guided schedule types

5.8.3. The runtime schedule type

5.9. Producers and Consumers

5.10. The atomic directive

5.11. Critical sections and locks

5.12. Using locks in the message-passing program

5.13. critical directives, atomic directives, or locks?

5.14. Caches, Cache Coherence, and False Sharing

5.14.1. Incorrect programs can produce correct output

6. CHAPTER 6: Parallel Program Development

6.1. Two n-Body Solvers

6.2. Two serial programs

6.3. Parallelizing the n-body solvers

6.4. Parallelizing the basic solver using OpenMP

6.5. Parallelizing the reduced solver using OpenMP

6.6. Evaluating the OpenMP codes

6.7. Parallelizing the solvers using pthreads

6.8. Parallelizing the basic solver using MPI

6.9. Parallelizing the reduced solver using MPI

6.10. Performance of the MPI solvers

6.11. Recursive depth-first search

6.12. Nonrecursive depth-first search

6.13. Data structures for the serial implementations

6.14. Performance of the serial implementations

6.15. Parallelizing tree search

6.16. A static parallelization of tree search using pthreads

6.17. A dynamic parallelization of tree search using pthreads

6.18. Evaluating the pthreads tree-search programs

6.19. Parallelizing the tree-search programs using OpenMP

6.20. Performance of the OpenMP implementations

6.21. Implementation of tree search using MPI and static partitioning

6.22. Implementation of tree search using MPI and dynamic partitioning

6.23. A Word of Caution

6.23.1. Pthreads and OpenMP

7. CHAPTER 7: Where to Go from Here