AN ABSTRACT OF THE DISSERATION OF Zhe Fu for the degree of Doctor of Philosophy in Computer Science presented on July 6, 2006. Title: Automatic Program Generation for Scientic Computing Abstract approved: Martin Erwig The code reuse problem is a common software engineering problem in scientic computing. As a prevailing programming language in many scientic elds, For- tran does not provide support to address this problem. One particular reason is that Fortran lacks the support for generic programming.
By applying program- generation techniques, we developed two approaches to address the code reuse problem. The rst approach is to design a program generator for the equation- based specication of subroutines that can be generic in the dimensions of arrays, parameter lists, and called subroutines. We apply that approach to a real-world problem in scientic computing, which requires the generic description of inverse ocean modeling tools. In addition to a compiler that can transform generic spec- ications into ecient Fortran code for models, we have also developed a type system that can identify possible errors already in the specications.
The second approach is to extend Fortran with the support for generic programming. The result is the language Parametric Fortran, which supports dening Fortran pro- gram templates by allowing the parameterization of arbitrary Fortran constructs. A Fortran program template can be translated into a regular Fortran program guided by values for the parameters. Parametric Fortran is particularly useful in scientic computing.
The applications include dening generic functions, re- moving duplicated code, and automatic dierentiation. The described Fortran extension has also been successfully employed implementing the generic inverse ocean modeling system. c Copyright By Zhe Fu July 6, 2006 All Rights Reserved Automatic Program Generation for Scientic Computing by Zhe Fu A DISSERTATION submitted to Oregon State University in partial fulllment of the requirements for the degree of Doctor of Philosophy Presented on July 6, 2006 Commencement June 2007 UMI Number: 3236879 UMI Microform 3236879 Copyright 2006 by ProQuest Information and Learning Company. All rights reserved.
This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P. Box 1346 Ann Arbor, MI 48106-1346 Doctor of Philosophy dissertation of Zhe Fu presented on July 6, 2006. APPROVED: Major Professor, representing Computer Science Director of the School of Electrical Eengineering and Computer Science Dean of the Graduate School I understand that my dissertation will become part of the permanent collection of Oregon State University libraries.
My signature below authorizes release of my disseratation to any reader upon request. Zhe Fu, Author ACKNOWLEDGEMENTS Foremost I especially thank my advisor Martin Erwig for suggesting the won- derful thesis topic. His wide knowledge and logical way of thinking have been of great value for me. His understanding, encouraging and personal guidance have provided a good basis for the present thesis.
I also thank professor An- drew Bennett. His summer school in 2002 helped me to understand the area of inverse ocean modeling, which is the major application of the approaches pre- sented in this thesis. My thanks also go to Boon Chua and Ben Paum, who helped me with the implementation of the IOM system and Parametric Fortran. I am really grateful to my family, especially to my parents for their continuous support and encouragement.
At last, I want to express my special thanks to my wife, Yuanyuan Wang. I would not have been able to completed my Ph.D thesis without her support. TABLE OF CONTENTS Page 1 Introduction 1 1.1 The Code Reuse Problem in Scientic Computing .2 Solving the Code Reuse Problem by Automatic Program Generation 8 1.2 Generic Programming in Haskell .3 Array Programming Languages .2 Metaprogramming Tools for Fortran. 30 3 The Inverse Ocean Modeling System 32 3.1 Markovian Time Convolution .2 Bell-Shaped Space Convolution .3 A Graphical User Interface for the IOM.
38 4 Forge: A Domain-Specic Language for Dening Inverse Ocean Modeling Tools 42 4.1 Overview of Forge .1 Using Forge to Generate IOM Tools .2 The Forge Compiler .2 Generating Fortran from Tool Descriptions .1 Expressing Markovian Time Convolution in Forge .2 Interfaces and Dependent Indices .3 Summary: The Complete Tool Specication for the Marko- vian Convolution .4 Expressing the Bell-Shaped Space Convolution in Forge .5 Model Specications .3 The Forge Type System. 60 TABLE OF CONTENTS (Continued) Page 4.1 The Typing Rules of Forge. 74 5 Parametric Fortran: An Extension of Fortran for Generic Pro- gramming 77 5.1 Overview of Parametric Fortran .1 Array Addition for Arbitrary Dimensions .2 Realization of the Program Generation .3 Generic Array Slicing .4 Removing Duplicated Code .2 Implementing the IOM System in Parametric Fortran .1 Expressing Markovian Time Convolution in Parametric For- tran .2 Implementing the Bell-Shaped Space Convolution in Para- metric Fortran .3 Benets of Using Parametric Fortran in the IOM .3 Application of Parametric Fortran to Automatic Dierentiation .1 Implementing Automatic Dierentiation in Parametric For- tran .2 An Example: The Inviscid Burger's Model .3 Dierentiating the Inviscid Burger's Model Using Paramet- ric Fortran .4 Implementation of the Parametric Fortran Compiler .1 Representing Fortran Syntax in Haskell .2 A Type Class for Program Generation .3 Dening Parameter Types as Instances of Param .5 Implementation of Accessors .6 Implementation of List Parameters. 123 6 Conclusion and Future Work 128 TABLE OF CONTENTS (Continued) Page Bibliography 132 Appendices 138 Appendix A: The Syntax of Forge.
138 Appendix B: The Fortran Code of Markovian Time Convolution Gener- ated by Forge for PEZ. 140 Appendix C: The Haskell Implementation of the Parameter Type Diff. 141 Appendix D: The Installation of Parametric Fortran. 147 LIST OF FIGURES Figure Page 1.1 The Fortran code for the 2D simple convolution in time.2 The Fortran code for the 3D simple convolution in time.1 Summation of array elements.2 Using timeConv on 3-D arrays.3 The Fortran statement generated by CTADEL.4 The FORESYS Graphical User Interface .1 The discrete equations of Markovian time convolution.2 The Fortran program of Markovian Time Convolution.3 Graphical User Interface.2 Using model parameters in tools and library function calls .3 Fortran translation of discrete equations.4 The Forge specication for Markovian time convolution.5 The specication for the bell-shaped convolution in space.6 Rules for environment judgments.8 Typing rules for expressions.9 Partial order on types.10 Type rules for Forge statements.1 Array addition for 2D arrays.2 Parametric Fortran syntax tree .3 Syntax tree of {dim: c = a + b}.
84 LIST OF FIGURES (Continued) Figure Page 5.4 Transformation of nodes .5 Syntax tree of generated for loop .6 Parametric Fortran Template of an Array Slicing Subroutine .7 Removing Duplicated Code with Parametric Fortran .8 The Parametric Fortran Template for the Markovian Time Con- volution .9 The Parametric Fortran Template for the Bell-Shaped Space Con- volution .10 The Fortran Program for the Bell-Shaped Space Convolution .11 The Chain Rule for Computing the Tangent Linear Model .12 The Parametric Fortran Template for the Inviscid Burger's Model 107 5.13 The adjoint program for the inviscid Burger's model. 109 LIST OF TABLES Table Page 5.2 Code savings achieved by Parametric Fortran. 99 LIST OF APPENDIX FIGURES Figure Page 1 Syntax of specications. 138 2 Syntax of interfaces.
139 3 Syntax of dependent-index declarations. 139 5 Syntax of parameter denitions. 139 Chapter 1 Introduction Code reuse refers to the technique that allows a partial or complete computer program written at one time to be used in another program written at a later time. The goal of code reuse is to save time and energy of programmers by re- ducing redundant work.
Code reuse is critical especially in those areas in which the involved software fragments are large and thus costly to reproduce. A very common example of code reuse is using software libraries. Common operations, such as accessing data in a specic format, processing data in a specic way (sorting, searching, etc), or I/O operations, are needed by many dierent appli- cations. Developers of new programs can use the code in the software library to accomplish these tasks, instead of actually writing new code directly to perform the operations.
Using libraries is a very eective form of code reuse. However, the use of libraries is often limited by the fact that library functions work only on specic data structures. The most direct way to achieve code reuse is copying some or all of the code from an existing program into a new one. This approach is not preferable because code from the original program usually needs to be modied before being used in the new program, for example, variable names may need to be changed.
Such modications can easily introduce errors in the new program. Code reuse is highly desirable in scientic computing. However, as a widely 2 used programming language in scientic computing, Fortran still lacks the sup- port to achieve code reuse. This thesis makes contributions to the Fortran code reuse problem in scientic computing by applying program generation techniques.
In this introduction we rst give a description of the code reuse problem in scientic computing. We also describe the program-generation approach to addressing this problem and then outline the rest of the thesis.1 The Code Reuse Problem in Scientic Computing In the area of scientic computing, the inability to properly reuse software has a particularly severe impact on the eective use of computers to support the scientic activities. Although there has been considerable work in scientic com- puting on eciency issues, such as compilation and paralellization, the software engineering problems caused by large legacy software systems have been largely ignored so far. Fortran is still the prevailing programming language in many areas of science.
For example, ocean scientists have implemented ocean models in Fortran to sim- ulate and predict the state of oceans, such as the Regional Ocean Model System (ROMS) [32] and the Advanced Circulation (ADCIRC) model [2]. Major reasons for using Fortran are the existence of ecient Fortran compilers for supercom- puters that are needed to run large-scale simulations and the reliance on legacy software packages that are required for particular subsystems. At the same time, the reliance on Fortran causes severe software engineering problems, in particular, 3 the inability to reuse software. Although many projects incorporate previously written Fortran code, a closer look reveals that this reuse is coupled with high costs for adapting old Fortran programs.
Moreover, the code reuse is performed in an ad hoc way that makes future reuse even more dicult. Scientists often write data assimilation programs in Fortran to analyze sci- entic models with real observation data. Two examples are the Inverse Ocean Modeling (IOM) system [4, 17] and the Weather Research and Forecasting (WRF) model [20]. The algorithm of a particular data assimilation program is the same for all the scientic models it analyzes.
However, in dierent scientic models, the representation of data may be dierent. For example, in the ROMS [32] and the ADCIRC model [2], data are stored in 4-dimensional arrays that contain 3 space dimensions and the time dimension. In the 2D Shallow Water model [27], data are represented as 3 dimensional which is 2-dimensional space plus the time dimension. Furthermore, even though the number of dimensions of the arrays storing the data is the same in the ROMS model and the ADCIRC model, the time dimension in these two models is located in dierent dimensions of the ar- rays.
In the ROMS model, the time dimension is the rst dimension of the arrays, whereas in the ADCIRC model, the time dimension is at the last dimension. Since these data assimilation programs need to work for dierent scientic models, they have to be implemented in the way that can work with all the dif- ferent data representations in the scientic models. For example, convolution tools that compute the average of weighted values are commonly used in data assimilation programs.