A group based in Houston, High Performance Fortran Forum, has developed a proposal in the form of an extension to Fortran 90. The aim of this project High Performance Fortran or HPF is to offer a portable language which permits an efficient use on different parallel systems. The project has issued a final proposal in May 1993, and aims towards a de facto standard, and not formal standard from ANSI or ISO. In order to facilitate the introduction and general acceptance of HPF the group has also defined a subset based on Fortran 77 and just a few parts of Fortran 90. A number of manufacturers were involved in the group, and the proposal will hopefully get a fast acceptance, with many implementations. Those parts of the proposal that were controversial, and therefore requires further study, is available in a separate document "Journal of Development".
A very good book on HPF is the one by Koelbel et al.
The High Performance Fortran Forum has continued to develop the HPF language and are aiming to publish a new specification later this year. New features address asynchronous I/O, tasking, more general distributions, interfacing to ANSI C, reduction operations and more. It is likely that the core language will not be significantly expanded and that the specification will define extensions which are standardised but optional for a given implementation. There will also be a definition of a kernel of facilities which should be most efficient across a wide range of implementations.
The proposal is a Fortran-based language with facilities to control distribution of arrays onto distributed memory parallel computers and includes extensions for
CHPF$ directive ! Fortran 77 and fix form Fortran 90 !HPF$ directive ! Fortran 90 (both fix and free form)Since the founding of the group HPFF, much has been accomplished and commercial implementations of HPF version 1.0 are now appearing: Applied Parallel Research, Digital, Intel, Kuck and Associates, Meiko, Motorola, NEC, ACE, Hitachi, the Portland Group, Inc. and SofTech have already announced commercial products based on the HPF 1.0 standard (some of these are joint ventures).
Further information on the DEC Fortran 90, which includes HPF, is available.
The Portland Group has a High Performance Fortran Compiler available.
The Applied Parallel Research has a High Performance Fortran Compiler available.
The current (source: September 1995 HPFF Minutes) vendor implementation list is:
Cray Research, EPC, Convex, HP, Silicon Graphics, and Sun.
REAL A(1000), B(1000), C(1000), X(500), Y(0:501) INTEGER INX(1000) !HPF$ PROCESSORS PROCS(10) !HPF$ DISTRIBUTE A(BLOCK), B(BLOCK) ON PROCS !HPF$ DISTRIBUTE INX(BLOCK) ON PROCS !HPF$ DISTRIBUTE C(CYCLIC) ON PROCS !HPF$ ALIGN X(I) WITH Y(I-1) A(I) = B(I) ! (1) X(I) = Y(I-1) ! (2) A(I) = C(I) ! (3) A(I) = A(I-1) + A(I) + A(I+1) ! (4) C(I) = C(I-1) + C(I) + C(I+1) ! (5) X(I) = Y(I) ! (6) A(I) = A(INX(I)) + B(INX(I)) ! (7)We here work with 10 processors, and we have distributed the floating point vectors A and B and also the integer vector INX block wise over these processors. We then have the first one hundred elements from each one of the vectors on the first processor, the next one hundred on the next processor, and so on. The floating point vector C is however distributed cyclically, the first, eleventh, twenty-first etc. elements are on the first processor, the second, twelfth, twenty-second etc. elements are on the second, and so on.
SUBROUTINE CAROLUS (KARL, CHARLES) REAL, DIMENSION (1718) :: KARL, CHARLES !HPF$ ALIGN WITH * :: KARL !HPF$ ALIGN WITH KARL :: CHARLES ... ENDThis is called with
REAL, DIMENSION (1718) :: GUSTAV, ADOLF ... CALL CAROLUS (GUSTAV, ADOLF)This means that the first formal argument KARL is distributed in the same way as the actual argument GUSTAV. The second formal argument CHARLES is distributed in the same way as the first formal argument, in this case in the same way as GUSTAV, and not as the second actual argument ADOLF.
!HPF$ TEMPLATE, DISTRIBUTE (BLOCK, BLOCK) :: EARTH (N+1, N+1) REAL, DIMENSION (N, N) :: NW, NE, SW, SE !HPF$ ALIGN NW(I, J) WITH EARTH(I, J) !HPF$ ALIGN NE(I, J) WITH EARTH(I, J+1) !HPF$ ALIGN SW(I, J) WITH EARTH(I+1, J) !HPF$ ALIGN SE(I, J) WITH EARTH(I+1, J+1)Since a TEMPLATE does not represent any real storage, it can not be part of any COMMON. The alignment subscripts can be just a little more complicated than the old array indices in Fortran 66, they are of the general form m*i+n , where i is a variable which is permitted to appear only once in the expression, and m and n are constants (PARAMETER or numerical constant).
!HPF$ ALIGN A(:) WITH D(:,*)which means that a copy of the vector A is associated with each column of the matrix C. An example on this follows, where it is also shown that the HPF syntax permits several ways for expressing identical requirements.
!HPF$ TEMPLATE, D1(N), D2(N, N) REAL, DIMENSION (N, N) :: X, A, B, C, AR1, AR2, & P, Q, R, S !HPF$ ALIGN X(:,*) WITH D1(:) !HPF$ ALIGN (:,*) WITH D1 :: A, B, C, AR1, AR2 !HPF$ ALIGN WITH D2, DYNAMIC :: P, Q, R, SThe following is a more complete example, where an intrinsic function finds the present number of processors, which is used at the distribution of the arrays. In addition a usual (external) function is being used.
PARAMETER ( N = NUMBER_OF_PROCESSORS() ) !HPF$ PROCESSORS MPP(N) !HPF$ TEMPLATE T(1000), S(5000) !HPF$ DISTRIBUTE T(BLOCK) ONTO MPP ! Block size may !HPF$ DISTRIBUTE S(CYCLIC) ONTO MPP ! be specified REAL, DIMENSION (1000) :: X, Y, Z REAL, DIMENSION (5000) :: V, W !HPF$ ALIGN WITH T :: X, Y, Z !HPF$ ALIGN WITH S :: V, W REAL, DIMENSION (10,1000) :: A !HPF$ ALIGN WITH T :: A(*,:) ! Vector of columns ... TEMP = F(V) ... END FUNCTION F(X) REAL, DIMENSION (:) :: X !HPF$ ALIGN WITH * :: X ! Distribute the formal argument ! X as the real argument V REAL, DIMENSION (SIZE(X)) :: S !HPF$ ALIGN WITH X :: S ! Distribute the local vector S ! as the formal argument X, or ! as the actual argument V ... RETURN ENDYou can specify different distributions along different dimensions. The following specifications
REAL A(100,100), B(100,100), C(200) !HPF$ DISTRIBUTE A(BLOCK,*), B(*,CYCLIC), C(BLOCK(5))means that the first processor of a four processor computer stores the following array sections
A(1:25, 1:100) B(1:100, 1:97:4) C(1:5), C(21:25), C(41:45), C(61,65), C(81:85), C(101:105), C(121:125), C(141:145), C(161,165), C(181:185)It is also possible to distribute several dimensions in completely independent ways,
REAL D(8,100,100) !HPF$ DISTRIBUTE D(*,BLOCK, CYCLIC)means that the first processor of a four processor computer, configured as a 2*2 matrix, stores the following array sections
D(1:8, 1:50, 1:99:2)In addition to the static directives discussed above, there are also the two dynamic directives REDISTRIBUTE and REALIGN, who permit an array to switch its distribution within the subroutine. At the use of a subroutine it does exist three different possibilities for the formal arguments.
FORALL (I = 1:N, J = 1:N) H(I, J) = 1.0/REAL(I+J-1) FORALL (I = 1:N, J = 1:N, Y(I, J) .NE. 0.0) & X(I,J) = 1.0/Y(I,J) FORALL (I = 1:N) A(I,I+1:N) = 3.141592654The first of these define a Hilbert matrix of order N, the second inverts the elements of a matrix, avoiding division with zero. In the third example all elements above the main diagonal of the matrix are assigned the value of the mathematical constant pi.
In all the three statements above, FORALL can be considered as a double loop, which can be executed in arbitrary order. The general form of the FORALL statement is
FORALL ( v1 = l1:u1:s1, ... , vn = ln:un:sn, mask ) & a(e1, ... , em) = right_hand_sideand is evaluated according to certain well specified rules, in principle all indices are evaluated first.
REAL, DIMENSION(N, N) :: A, B ... FORALL (I = 2:N-1, J = 2:N-1) A(I,J) = 0.25*(A(I,J-1)+A(I,J+1)+A(I-1,J)+A(I+1,J)) B(I,J) = A(I,J) END FORALLWhen these statements have been executed the arrays A and B have identical values in the internal points, while B has kept its previous values on the boundaries.
In addition a directive INDEPENDENT has been introduced for both DO
loops and FORALL constructs. This directive is placed immediately
before the DO statement or FORALL construct, and is valid until the
corresponding END DO (or the old form of terminating a
DO loop) or END
FORALL. The directive assures the system that this part of the program
can be executed in an arbitrary order, including parallel, without any
computational differences in the result (no semantic change).
In the example below it is thus assured that the integer vector P does not have any repeated values (which would have meant that last one wins at a normal sequential execution). A potential conflict at parallel execution is thus avoided. It is also implicitly assured that all values of P are within the permitted bounds of 1 and 200.
REAL, DIMENSION(200) :: A REAL, DIMENSION(100) :: B INTEGER, DIMENSION(100) :: P ... !HPF$ INDEPENDENT DO I = 1, 100 A(P(I)) = B(I) END DOIt is also possible to indicate that certain parts of a nested loop shall be considered as independent. In the example below the innermost loop is not independent since each element of A is assigned several times, for all values of I4.
REAL, DIMENSION(N, N, N) :: A, B, C ... !HPF$ INDEPENDENT, NEW (I2) DO I1 = 1, N1 !HPF$ INDEPENDENT, NEW (I3) DO I2 = 1, N2 !HPF$ INDEPENDENT, NEW (I4) DO I3 = 1, N3 DO I4 = 1, N4 ! The innermost loop is NOT ! independent ! A(I1, I2, I3) = A(I1, I2, I3) & + B(I1, I2, I4) * C(I2, I3, I4) END DO END DO END DO END DOThe NEW clauses are required, since the inner loop indices are assigned and used in different iterations of the outer loops.
0 -5 8 -3 A = 3 4 -1 2 1 5 6 -4gives the following values
MAXLOC(A) = (/ 1, 3 /) MAXLOC(A, DIM = 1) = (/ 2, 3, 1, 2 /) MAXLOC(A, DIM = 2) = (/ 3, 2, 3 /)The following completely new functions have been added. The inquiry functions are to be intrinsic, but the others may instead be available in a library as external functions.
NUMBER_OF_PROCESSORS() 8192 NUMBER_OF_PROCESSORS(DIM=1) 128 NUMBER_OF_PROCESSORS(DIM=2) 64 PROCESSORS_SHAPE() (/ 128, 64 /)while on an ordinary workstation we get
NUMBER_OF_PROCESSORS() 1 PROCESSORS_SHAPE() (/ 1 /)
These three use the model in section 13.5.7 of the Fortran 90 standard. Note that the results are machine dependent, since they require the number of bits in an integer, which in Fortran 90 is available with the intrinsic function BIT_SIZE.
The array reduction functions IALL, IANY, IPARITY and PARITY are available and they correspond to the following intrinsic functions of Fortran 90, namely IAND, IOR, IEOR and the operator .NEQV.
A large number of functions are available to gather and scatter data,
they have names of the form XXX_SCATTER, where XXX
can be SUM, COUNT,
PRODUCT, MAXVAL, MINVAL, IALL, IANY, IPARITY, ALL, ANY and
For parallel operations there are the functions XXX_PREFIX and
XXX_SUFFIX, where XXX has the same possibilities as for XXX_SCATTER.
Example: SUM_PREFIX sums successively the elements of the array, the first remains unchanged, the second becomes the sum of the first two, and so on. With SUFFIX the summation is done in the other direction (backwards). In addition there are two sorting functions, GRADE_UP and GRADE_DOWN. Operations for parallel input and output are being considered, and can be found in the Journal of Development.
Fortran has storage association and sequence association. The Fortran 90 standard states in (14.6.3) that storage association is the association of two or more data objects that occurs when two or more storage sequences share or are aligned with one or more storage units, and in (126.96.36.199) that sequence association is the order that Fortran requires when an array is associated a formal argument. The rank and shape of the actual argument neeed not agree with the rank and shape of the dummy argument, but the number of elements in the dummy argument must not exceed the number of elements in the element sequence of the actual argument. If the dummy argument is assumed size, the number of elements in the dummy argument is exactly the number of elements in the element sequence.
Note that HPF has no problem with array parameters distributed over the processors, as long as both the actual and the dummy arguments have the same rank and shape. It is when the properties of Fortran, with respect to COMMON and EQUIVALENCE, are used too much, that we get into problems. If we use a subroutine that contains the following specifications
SUBROUTINE HOME(X) DIMENSION X(20,10)it can be called with CALL HOME(Y(2,1)) provided that the array X is specified SEQUENTIAL in the subroutine HOME and the array Y is also specified SEQUENTIAL in the calling program unit. The simple call CALL HOME(Y) is permitted if X and Y are both sequential, or if X and Y are dimensioned in exactly the same way.