From http://uk-mac.github.io/CloverLeaf/
CloverLeaf is a mini-app that solves the compressible Euler equations on a Cartesian grid, using an explicit, second-order accurate method. Each cell stores three values: energy, density, and pressure. A velocity vector is stored at each cell corner. This arrangement of data, with some quantities at cell centers, and others at cell corners is known as a staggered grid.
The computation in CloverLeaf has been broken down into “kernels” — low level building blocks with minimal complexity. Each kernel loops over the entire grid and updates one (or some) mesh variables, based on a kernel-dependent computational stencil. Control logic within each kernel is kept to a minimum , allowing maximum optimisation by the compiler. Memory is sacrificed in order to increase peformance, and any updates to variables that would introduce dependencies between loop iterations are written into copies of the mesh.
Problem Size Discussion
From README.md
There are four standard input files that are recommended for testing:
clover_bm_short.in
– This is not a very sensitive test and the kinetic energy at the end of this run should be 0.1193E+01.
clover_bm.in
– This runs for 2955 timesteps and is more sensitive than the first test. Through this simulation the whole computational mesh in traversed by a shock and so it is a good test of the parallel implementation because all internal boundaries will be crossed during the course of the simulation. The final kinetic energy should be 0.2590E+01.
clover_bm16_short.in
– This is the “socket” test and has a much larger mesh size and therefore, memory footprint. The final kinetic energy should be 0.3075E+00.
clover_bm16.in
This is a fairly long, large mesh run and the kinetic energy at the final time should be 0.4854E+01
Analysis
CloverLeaf becomes bound by the DRAM bandwidth, partly resulting from high cache miss rates on the Skylake machine which this analysis was performed.
Parameters
Compiler = ifort (IFORT) 18.0.1 20171018
Build_Flags = -g -O3 -march=native -no-prec-dev -qopenmp
Run_Parameters = Using input deck "clover_bm16_short.in"
Scaling
Hit Locations
FLOPS
Double Precision |
Scalar |
128B Packed |
256B Packed |
512B Packed |
Total FLOPS |
GFLOPS/sec |
PMU |
3.220e+08 |
2.510e+10 |
1.390e+11 |
0.000e+00 |
6.065e+11 |
3.866e+01 |
SDE |
3.702e+08 |
3.145e+10 |
1.297e+11 |
0.000e+00 |
5.822e+11 |
3.711e+01 |
Intel Software Development Emulator
Intel SDE |
CloverLeaf |
Arithmetic Intensity |
0.176 |
Bytes per Load Inst |
23.49 |
Bytes per Store Inst |
30.03 |
FLOPS per Inst |
3.60 |
Roofline – Intel(R) Xeon(R) Platinum 8180M CPU
112 Threads – 56 – Cores 3200.0 Mhz
UOPS Executed
Experiment Aggregate Metrics
Threads (Time) |
IPC per Core |
Loads per Cycle |
L1 Hits per Cycle |
L1 Miss Ratio |
L2 Miss Ratio |
L3 Miss Ratio |
L2 B/W Utilized |
L3 B/W Utilized |
DRAM B/W Utilized |
1 (100.0%) |
0.80 |
0.13 |
0.10 |
7.00% |
39.67% |
97.27% |
13.56% |
40.43% |
44.58% |
16 (100.0%) |
0.58 |
0.10 |
0.07 |
6.34% |
40.04% |
95.34% |
4.44% |
9.12% |
65.49% |
112 (100.0%) |
0.55 |
0.05 |
0.04 |
4.37% |
43.27% |
86.06% |
5.63% |
43.36% |
118.08% |
advec_mom_kernel
Threads (Time) |
IPC per Core |
Loads per Cycle |
L1 Hits per Cycle |
L1 Miss Ratio |
L2 Miss Ratio |
L3 Miss Ratio |
L2 B/W Utilized |
L3 B/W Utilized |
DRAM B/W Utilized |
1 (36.6%) |
0.74 |
0.10 |
0.06 |
11.48% |
39.86% |
98.53% |
13.48% |
40.82% |
44.96% |
16 (36.2%) |
0.54 |
0.07 |
0.04 |
10.91% |
40.74% |
96.85% |
4.47% |
9.30% |
67.32% |
112 (35.3%) |
0.48 |
0.03 |
0.02 |
6.57% |
42.70% |
88.57% |
5.80% |
44.35% |
120.84% |
29 SUBROUTINE advec_mom_kernel(x_min,x_max,y_min,y_max, &
30 vel1, &
31 mass_flux_x, &
32 vol_flux_x, &
33 mass_flux_y, &
34 vol_flux_y, &
35 volume, &
36 density1, &
37 node_flux, &
38 node_mass_post, &
39 node_mass_pre, &
40 mom_flux, &
41 pre_vol, &
42 post_vol, &
43 celldx, &
44 celldy, &
45 which_vel, &
46 sweep_number, &
47 direction )
48
49 IMPLICIT NONE
50
51 INTEGER :: x_min,x_max,y_min,y_max
52 INTEGER :: which_vel,sweep_number,direction
53
54 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: vel1
55 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+2) :: mass_flux_x
56 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+2) :: vol_flux_x
57 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+3) :: mass_flux_y
58 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+3) :: vol_flux_y
59 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: volume
60 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: density1
61 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: node_flux
62 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: node_mass_post
63 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: node_mass_pre
64 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: mom_flux
65 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: pre_vol
66 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: post_vol
67
68 REAL(KIND=8), DIMENSION(x_min-2:x_max+2) :: celldx
69 REAL(KIND=8), DIMENSION(y_min-2:y_max+2) :: celldy
70
71 INTEGER :: j,k,mom_sweep
72 INTEGER :: upwind,donor,downwind,dif
73 REAL(KIND=8) :: sigma,wind,width
74 REAL(KIND=8) :: vdiffuw,vdiffdw,auw,adw,limiter
75 REAL(KIND=8) :: advec_vel_s
76
77 mom_sweep=direction+2*(sweep_number-1)
78
79 !$OMP PARALLEL
80
81 IF(mom_sweep.EQ.1)THEN ! x 1
82 !$OMP DO
83 DO k=y_min-2,y_max+2
84 DO j=x_min-2,x_max+2
85 post_vol(j,k)= volume(j,k)+vol_flux_y(j ,k+1)-vol_flux_y(j,k)
86 pre_vol(j,k)=post_vol(j,k)+vol_flux_x(j+1,k )-vol_flux_x(j,k)
87 ENDDO
88 ENDDO
89 !$OMP END DO
90 ELSEIF(mom_sweep.EQ.2)THEN ! y 1
91 !$OMP DO
92 DO k=y_min-2,y_max+2
93 DO j=x_min-2,x_max+2
94 post_vol(j,k)= volume(j,k)+vol_flux_x(j+1,k )-vol_flux_x(j,k)
95 pre_vol(j,k)=post_vol(j,k)+vol_flux_y(j ,k+1)-vol_flux_y(j,k)
96 ENDDO
97 ENDDO
98 !$OMP END DO
99 ELSEIF(mom_sweep.EQ.3)THEN ! x 2
100 !$OMP DO
101 DO k=y_min-2,y_max+2
102 DO j=x_min-2,x_max+2
103 post_vol(j,k)=volume(j,k)
104 pre_vol(j,k)=post_vol(j,k)+vol_flux_y(j ,k+1)-vol_flux_y(j,k)
105 ENDDO
106 ENDDO
107 !$OMP END DO
108 ELSEIF(mom_sweep.EQ.4)THEN ! y 2
109 !$OMP DO
110 DO k=y_min-2,y_max+2
111 DO j=x_min-2,x_max+2
112 post_vol(j,k)=volume(j,k)
113 pre_vol(j,k)=post_vol(j,k)+vol_flux_x(j+1,k )-vol_flux_x(j,k)
114 ENDDO
115 ENDDO
116 !$OMP END DO
117 ENDIF
118
119 IF(direction.EQ.1)THEN
120 IF(which_vel.EQ.1)THEN
121 !$OMP DO
122 DO k=y_min,y_max+1
123 DO j=x_min-2,x_max+2
124 ! Find staggered mesh mass fluxes, nodal masses and volumes.
125 node_flux(j,k)=0.25_8*(mass_flux_x(j,k-1 )+mass_flux_x(j ,k) &
126 +mass_flux_x(j+1,k-1)+mass_flux_x(j+1,k))
127 ENDDO
128 ENDDO
129 !$OMP END DO
130 !$OMP DO
131 DO k=y_min,y_max+1
132 DO j=x_min-1,x_max+2
133 ! Staggered cell mass post advection
134 node_mass_post(j,k)=0.25_8*(density1(j ,k-1)*post_vol(j ,k-1) &
135 +density1(j ,k )*post_vol(j ,k ) &
136 +density1(j-1,k-1)*post_vol(j-1,k-1) &
137 +density1(j-1,k )*post_vol(j-1,k ))
138 node_mass_pre(j,k)=node_mass_post(j,k)-node_flux(j-1,k)+node_flux(j,k)
139 ENDDO
140 ENDDO
141 ENDIF
142
143 !$OMP DO PRIVATE(upwind,downwind,donor,dif,sigma,width,limiter, &
!$OMP vdiffuw,vdiffdw,auw,adw,wind,advec_vel_s)
144 DO k=y_min,y_max+1
145 DO j=x_min-1,x_max+1
146 IF(node_flux(j,k).LT.0.0)THEN
147 upwind=j+2
148 donor=j+1
149 downwind=j
150 dif=donor
151 ELSE
152 upwind=j-1
153 donor=j
154 downwind=j+1
155 dif=upwind
156 ENDIF
157 sigma=ABS(node_flux(j,k))/(node_mass_pre(donor,k))
158 width=celldx(j)
159 vdiffuw=vel1(donor,k)-vel1(upwind,k)
160 vdiffdw=vel1(downwind,k)-vel1(donor,k)
161 limiter=0.0
162 IF(vdiffuw*vdiffdw.GT.0.0)THEN
163 auw=ABS(vdiffuw)
164 adw=ABS(vdiffdw)
165 wind=1.0_8
166 IF(vdiffdw.LE.0.0) wind=-1.0_8
167 limiter=wind*MIN(width*((2.0_8-sigma)*adw/width+(1.0_8+sigma)*auw/celldx(dif))
/6.0_8,auw,adw)
168 ENDIF
169 advec_vel_s=vel1(donor,k)+(1.0-sigma)*limiter
170 mom_flux(j,k)=advec_vel_s*node_flux(j,k)
171 ENDDO
172 ENDDO
173 !$OMP END DO
174 !$OMP DO
175 DO k=y_min,y_max+1
176 DO j=x_min,x_max+1
177 vel1 (j,k)=(vel1 (j,k)*node_mass_pre(j,k)+mom_flux(j-1,k) &
-mom_flux(j,k))/node_mass_post(j,k)
178 ENDDO
179 ENDDO
180 !$OMP END DO
181 ELSEIF(direction.EQ.2)THEN
182 IF(which_vel.EQ.1)THEN
183 !$OMP DO
184 DO k=y_min-2,y_max+2
185 DO j=x_min,x_max+1
186 ! Find staggered mesh mass fluxes and nodal masses and volumes.
187 node_flux(j,k)=0.25_8*(mass_flux_y(j-1,k )+mass_flux_y(j ,k ) &
188 +mass_flux_y(j-1,k+1)+mass_flux_y(j ,k+1))
189 ENDDO
190 ENDDO
191 !$OMP END DO
192 !$OMP DO
193 DO k=y_min-1,y_max+2
194 DO j=x_min,x_max+1
195 node_mass_post(j,k)=0.25_8*(density1(j ,k-1)*post_vol(j ,k-1) &
196 +density1(j ,k )*post_vol(j ,k ) &
197 +density1(j-1,k-1)*post_vol(j-1,k-1) &
198 +density1(j-1,k )*post_vol(j-1,k ))
199 node_mass_pre(j,k)=node_mass_post(j,k)-node_flux(j,k-1)+node_flux(j,k)
200 ENDDO
201 ENDDO
202 ENDIF
203 !$OMP DO PRIVATE(upwind,donor,downwind,dif,sigma,width, &
!$OMP limiter,vdiffuw,vdiffdw,auw,adw,wind,advec_vel_s)
204 DO k=y_min-1,y_max+1
205 DO j=x_min,x_max+1
206 IF(node_flux(j,k).LT.0.0)THEN
207 upwind=k+2
208 donor=k+1
209 downwind=k
210 dif=donor
211 ELSE
212 upwind=k-1
213 donor=k
214 downwind=k+1
215 dif=upwind
216 ENDIF
217
218 sigma=ABS(node_flux(j,k))/(node_mass_pre(j,donor))
219 width=celldy(k)
220 vdiffuw=vel1(j,donor)-vel1(j,upwind)
221 vdiffdw=vel1(j,downwind)-vel1(j,donor)
222 limiter=0.0
223 IF(vdiffuw*vdiffdw.GT.0.0)THEN
224 auw=ABS(vdiffuw)
225 adw=ABS(vdiffdw)
226 wind=1.0_8
227 IF(vdiffdw.LE.0.0) wind=-1.0_8
228 limiter=wind*MIN(width*((2.0_8-sigma)*adw/width+(1.0_8+sigma)*auw/celldy(dif))
/6.0_8,auw,adw)
229 ENDIF
230 advec_vel_s=vel1(j,donor)+(1.0_8-sigma)*limiter
231 mom_flux(j,k)=advec_vel_s*node_flux(j,k)
232 ENDDO
233 ENDDO
234 !$OMP END DO
235 !$OMP DO
236 DO k=y_min,y_max+1
237 DO j=x_min,x_max+1
238 vel1 (j,k)=(vel1(j,k)*node_mass_pre(j,k)+mom_flux(j,k-1) &
-mom_flux(j,k))/node_mass_post(j,k)
239 ENDDO
240 ENDDO
241 !$OMP END DO
242 ENDIF
243
244 !$OMP END PARALLEL
245
246 END SUBROUTINE advec_mom_kernel
advec_cell_kernel
Threads (Time) |
IPC per Core |
Loads per Cycle |
L1 Hits per Cycle |
L1 Miss Ratio |
L2 Miss Ratio |
L3 Miss Ratio |
L2 B/W Utilized |
L3 B/W Utilized |
DRAM B/W Utilized |
1 (22.8%) |
1.16 |
0.17 |
0.15 |
4.00% |
40.34% |
95.37% |
11.29% |
33.83% |
37.48% |
16 (21.3%) |
0.94 |
0.14 |
0.13 |
3.40% |
40.67% |
93.31% |
3.90% |
7.98% |
57.44% |
112 (18.4%) |
0.78 |
0.06 |
0.06 |
3.10% |
43.53% |
87.09% |
5.71% |
43.38% |
118.14% |
27 SUBROUTINE advec_cell_kernel(x_min, &
28 x_max, &
29 y_min, &
30 y_max, &
31 dir, &
32 sweep_number, &
33 vertexdx, &
34 vertexdy, &
35 volume, &
36 density1, &
37 energy1, &
38 mass_flux_x, &
39 vol_flux_x, &
40 mass_flux_y, &
41 vol_flux_y, &
42 pre_vol, &
43 post_vol, &
44 pre_mass, &
45 post_mass, &
46 advec_vol, &
47 post_ener, &
48 ener_flux )
49
50 IMPLICIT NONE
51
52 INTEGER :: x_min,x_max,y_min,y_max
53 INTEGER :: sweep_number,dir
54 INTEGER :: g_xdir=1,g_ydir=2
55
56 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: volume
57 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: density1
58 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: energy1
59 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+2) :: vol_flux_x
60 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+3) :: vol_flux_y
61 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+2) :: mass_flux_x
62 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+3) :: mass_flux_y
63 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: pre_vol
64 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: post_vol
65 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: pre_mass
66 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: post_mass
67 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: advec_vol
68 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: post_ener
69 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: ener_flux
70
71 REAL(KIND=8), DIMENSION(x_min-2:x_max+3) :: vertexdx
72 REAL(KIND=8), DIMENSION(y_min-2:y_max+3) :: vertexdy
73
74 INTEGER :: j,k,upwind,donor,downwind,dif
75
76 REAL(KIND=8) :: wind,sigma,sigmat,sigmav,sigmam,sigma3,sigma4
77 REAL(KIND=8) :: diffuw,diffdw,limiter
78 REAL(KIND=8) :: one_by_six=1.0_8/6.0_8
79 REAL(KIND=8) :: pre_mass_s,post_mass_s,post_ener_s,advec_vol_s
80
81 !$OMP PARALLEL
82
83 IF(dir.EQ.g_xdir) THEN
84
85 IF(sweep_number.EQ.1)THEN
86 !$OMP DO
87 DO k=y_min-2,y_max+2
88 DO j=x_min-2,x_max+2
89 pre_vol(j,k)=volume(j,k)+(vol_flux_x(j+1,k )-vol_flux_x(j,k) &
+vol_flux_y(j ,k+1)-vol_flux_y(j,k))
90 post_vol(j,k)=pre_vol(j,k)-(vol_flux_x(j+1,k )-vol_flux_x(j,k))
91 ENDDO
92 ENDDO
93 !$OMP END DO
94 ELSE
95 !$OMP DO
96 DO k=y_min-2,y_max+2
97 DO j=x_min-2,x_max+2
98 pre_vol(j,k)=volume(j,k)+vol_flux_x(j+1,k)-vol_flux_x(j,k)
99 post_vol(j,k)=volume(j,k)
100 ENDDO
101 ENDDO
102 !$OMP END DO
103 ENDIF
104
105 !$OMP DO PRIVATE(upwind,donor,downwind,dif,sigmat,sigma3,sigma4, &
!$OMP sigmav,sigma,sigmam, &
106 !$OMP diffuw,diffdw,limiter,wind)
107 DO k=y_min,y_max
108 DO j=x_min,x_max+2
109
110 IF(vol_flux_x(j,k).GT.0.0)THEN
111 upwind =j-2
112 donor =j-1
113 downwind =j
114 dif =donor
115 ELSE
116 upwind =MIN(j+1,x_max+2)
117 donor =j
118 downwind =j-1
119 dif =upwind
120 ENDIF
121
122 sigmat=ABS(vol_flux_x(j,k))/pre_vol(donor,k)
123 sigma3=(1.0_8+sigmat)*(vertexdx(j)/vertexdx(dif))
124 sigma4=2.0_8-sigmat
125
126 sigma=sigmat
127 sigmav=sigmat
128
129 diffuw=density1(donor,k)-density1(upwind,k)
130 diffdw=density1(downwind,k)-density1(donor,k)
131 wind=1.0_8
132 IF(diffdw.LE.0.0) wind=-1.0_8
133 IF(diffuw*diffdw.GT.0.0)THEN
134 limiter=(1.0_8-sigmav)*wind*MIN(ABS(diffuw),ABS(diffdw)&
135 ,one_by_six*(sigma3*ABS(diffuw)+sigma4*ABS(diffdw)))
136 ELSE
137 limiter=0.0
138 ENDIF
139 mass_flux_x(j,k)=vol_flux_x(j,k)*(density1(donor,k)+limiter)
140
141 sigmam=ABS(mass_flux_x(j,k))/(density1(donor,k)*pre_vol(donor,k))
142 diffuw=energy1(donor,k)-energy1(upwind,k)
143 diffdw=energy1(downwind,k)-energy1(donor,k)
144 wind=1.0_8
145 IF(diffdw.LE.0.0) wind=-1.0_8
146 IF(diffuw*diffdw.GT.0.0)THEN
147 limiter=(1.0_8-sigmam)*wind*MIN(ABS(diffuw),ABS(diffdw)&
148 ,one_by_six*(sigma3*ABS(diffuw)+sigma4*ABS(diffdw)))
149 ELSE
150 limiter=0.0
151 ENDIF
152
153 ener_flux(j,k)=mass_flux_x(j,k)*(energy1(donor,k)+limiter)
154
155 ENDDO
156 ENDDO
157 !$OMP END DO
158
159 !$OMP DO PRIVATE(pre_mass_s,post_mass_s,post_ener_s,advec_vol_s)
160 DO k=y_min,y_max
161 DO j=x_min,x_max
162 pre_mass_s=density1(j,k)*pre_vol(j,k)
163 post_mass_s=pre_mass_s+mass_flux_x(j,k)-mass_flux_x(j+1,k)
164 post_ener_s=(energy1(j,k)*pre_mass_s+ener_flux(j,k) &
-ener_flux(j+1,k))/post_mass_s
165 advec_vol_s=pre_vol(j,k)+vol_flux_x(j,k)-vol_flux_x(j+1,k)
166 density1(j,k)=post_mass_s/advec_vol_s
167 energy1(j,k)=post_ener_s
168 ENDDO
169 ENDDO
170 !$OMP END DO
171
172 ELSEIF(dir.EQ.g_ydir) THEN
173
174 IF(sweep_number.EQ.1)THEN
175 !$OMP DO
176 DO k=y_min-2,y_max+2
177 DO j=x_min-2,x_max+2
178 pre_vol(j,k)=volume(j,k)+(vol_flux_y(j ,k+1)-vol_flux_y(j,k) &
+vol_flux_x(j+1,k )-vol_flux_x(j,k))
179 post_vol(j,k)=pre_vol(j,k)-(vol_flux_y(j ,k+1)-vol_flux_y(j,k))
180 ENDDO
181 ENDDO
182 !$OMP END DO
183 ELSE
184 !$OMP DO
185 DO k=y_min-2,y_max+2
186 DO j=x_min-2,x_max+2
187 pre_vol(j,k)=volume(j,k)+vol_flux_y(j ,k+1)-vol_flux_y(j,k)
188 post_vol(j,k)=volume(j,k)
189 ENDDO
190 ENDDO
191 !$OMP END DO
192 ENDIF
193
194 !$OMP DO PRIVATE(upwind,donor,downwind,dif,sigmat,sigma3, &
!$OMP sigma4,sigmav,sigma,sigmam, &
195 !$OMP diffuw,diffdw,limiter,wind)
196 DO k=y_min,y_max+2
197 DO j=x_min,x_max
198
199 IF(vol_flux_y(j,k).GT.0.0)THEN
200 upwind =k-2
201 donor =k-1
202 downwind =k
203 dif =donor
204 ELSE
205 upwind =MIN(k+1,y_max+2)
206 donor =k
207 downwind =k-1
208 dif =upwind
209 ENDIF
210
211 sigmat=ABS(vol_flux_y(j,k))/pre_vol(j,donor)
212 sigma3=(1.0_8+sigmat)*(vertexdy(k)/vertexdy(dif))
213 sigma4=2.0_8-sigmat
214
215 sigma=sigmat
216 sigmav=sigmat
217
218 diffuw=density1(j,donor)-density1(j,upwind)
219 diffdw=density1(j,downwind)-density1(j,donor)
220 wind=1.0_8
221 IF(diffdw.LE.0.0) wind=-1.0_8
222 IF(diffuw*diffdw.GT.0.0)THEN
223 limiter=(1.0_8-sigmav)*wind*MIN(ABS(diffuw),ABS(diffdw)&
224 ,one_by_six*(sigma3*ABS(diffuw)+sigma4*ABS(diffdw)))
225 ELSE
226 limiter=0.0
227 ENDIF
228 mass_flux_y(j,k)=vol_flux_y(j,k)*(density1(j,donor)+limiter)
229
230 sigmam=ABS(mass_flux_y(j,k))/(density1(j,donor)*pre_vol(j,donor))
231 diffuw=energy1(j,donor)-energy1(j,upwind)
232 diffdw=energy1(j,downwind)-energy1(j,donor)
233 wind=1.0_8
234 IF(diffdw.LE.0.0) wind=-1.0_8
235 IF(diffuw*diffdw.GT.0.0)THEN
236 limiter=(1.0_8-sigmam)*wind*MIN(ABS(diffuw),ABS(diffdw)&
237 ,one_by_six*(sigma3*ABS(diffuw)+sigma4*ABS(diffdw)))
238 ELSE
239 limiter=0.0
240 ENDIF
241 ener_flux(j,k)=mass_flux_y(j,k)*(energy1(j,donor)+limiter)
242
243 ENDDO
244 ENDDO
245 !$OMP END DO
246
247 !$OMP DO PRIVATE(pre_mass_s,post_mass_s,post_ener_s,advec_vol_s)
248 DO k=y_min,y_max
249 DO j=x_min,x_max
250 pre_mass_s=density1(j,k)*pre_vol(j,k)
251 post_mass_s=pre_mass_s+mass_flux_y(j,k)-mass_flux_y(j,k+1)
252 post_ener_s=(energy1(j,k)*pre_mass_s+ener_flux(j,k) - &
ener_flux(j,k+1))/post_mass_s
253 advec_vol_s=pre_vol(j,k)+vol_flux_y(j,k)-vol_flux_y(j,k+1)
254 density1(j,k)=post_mass_s/advec_vol_s
255 energy1(j,k)=post_ener_s
256 ENDDO
257 ENDDO
258 !$OMP END DO
259
260 ENDIF
261
262 !$OMP END PARALLEL
263
264 END SUBROUTINE advec_cell_kernel
PdV_kernel
Threads (Time) |
IPC per Core |
Loads per Cycle |
L1 Hits per Cycle |
L1 Miss Ratio |
L2 Miss Ratio |
L3 Miss Ratio |
L2 B/W Utilized |
L3 B/W Utilized |
DRAM B/W Utilized |
1 (12.0%) |
0.63 |
0.18 |
0.14 |
7.39% |
37.20% |
95.26% |
16.16% |
47.75% |
52.84% |
16 (12.3%) |
0.41 |
0.13 |
0.10 |
6.98% |
37.98% |
92.47% |
4.95% |
10.48% |
75.06% |
112 (12.5%) |
0.43 |
0.05 |
0.04 |
6.65% |
44.86% |
81.55% |
6.23% |
53.58% |
149.79% |
29 SUBROUTINE PdV_kernel(predict, &
30 x_min,x_max,y_min,y_max,dt, &
31 xarea,yarea,volume, &
32 density0, &
33 density1, &
34 energy0, &
35 energy1, &
36 pressure, &
37 viscosity, &
38 xvel0, &
39 xvel1, &
40 yvel0, &
41 yvel1, &
42 volume_change )
43
44 IMPLICIT NONE
45
46 LOGICAL :: predict
47
48 INTEGER :: x_min,x_max,y_min,y_max
49 REAL(KIND=8) :: dt
50 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+2) :: xarea
51 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+3) :: yarea
52 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: volume
53 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: density0,energy0
54 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: pressure
55 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: density1,energy1
56 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: viscosity
57 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: xvel0,yvel0
58 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: xvel1,yvel1
59 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: volume_change
60
61 INTEGER :: j,k
62
63 REAL(KIND=8) :: recip_volume,energy_change,min_cell_volume
64 REAL(KIND=8) :: right_flux,left_flux,top_flux,bottom_flux,total_flux
65 REAL(KIND=8) :: volume_change_s
66
67 !$OMP PARALLEL
68
69 IF(predict)THEN
70
71 !$OMP DO PRIVATE(right_flux,left_flux,top_flux,bottom_flux, &
!$OMP total_flux,min_cell_volume, &
72 !$OMP energy_change,recip_volume,volume_change_s)
73 DO k=y_min,y_max
74 DO j=x_min,x_max
75
76 left_flux= (xarea(j ,k )*(xvel0(j ,k )+xvel0(j ,k+1) &
77 +xvel0(j ,k )+xvel0(j ,k+1)))*0.25_8*dt*0.5
78 right_flux= (xarea(j+1,k )*(xvel0(j+1,k )+xvel0(j+1,k+1) &
79 +xvel0(j+1,k )+xvel0(j+1,k+1)))*0.25_8*dt*0.5
80 bottom_flux=(yarea(j ,k )*(yvel0(j ,k )+yvel0(j+1,k ) &
81 +yvel0(j ,k )+yvel0(j+1,k )))*0.25_8*dt*0.5
82 top_flux= (yarea(j ,k+1)*(yvel0(j ,k+1)+yvel0(j+1,k+1) &
83 +yvel0(j ,k+1)+yvel0(j+1,k+1)))*0.25_8*dt*0.5
84 total_flux=right_flux-left_flux+top_flux-bottom_flux
85
86 volume_change_s=volume(j,k)/(volume(j,k)+total_flux)
87
88 min_cell_volume=MIN(volume(j,k)+right_flux-left_flux+ &
top_flux-bottom_flux &
89 ,volume(j,k)+right_flux-left_flux &
90 ,volume(j,k)+top_flux-bottom_flux)
91
92 recip_volume=1.0/volume(j,k)
93
94 energy_change=(pressure(j,k)/density0(j,k)+viscosity(j,k)/density0(j,k)) &
*total_flux*recip_volume
95
96 energy1(j,k)=energy0(j,k)-energy_change
97
98 density1(j,k)=density0(j,k)*volume_change_s
99
100 ENDDO
101 ENDDO
102 !$OMP END DO
103
104 ELSE
105
106 !$OMP DO PRIVATE(right_flux,left_flux,top_flux, &
!$OMP. bottom_flux,total_flux,min_cell_volume, &
107 !$OMP energy_change,recip_volume,volume_change_s)
108 DO k=y_min,y_max
109 DO j=x_min,x_max
110
111 left_flux= (xarea(j ,k )*(xvel0(j ,k )+xvel0(j ,k+1) &
112 +xvel1(j ,k )+xvel1(j ,k+1)))*0.25_8*dt
113 right_flux= (xarea(j+1,k )*(xvel0(j+1,k )+xvel0(j+1,k+1) &
114 +xvel1(j+1,k )+xvel1(j+1,k+1)))*0.25_8*dt
115 bottom_flux=(yarea(j ,k )*(yvel0(j ,k )+yvel0(j+1,k ) &
116 +yvel1(j ,k )+yvel1(j+1,k )))*0.25_8*dt
117 top_flux= (yarea(j ,k+1)*(yvel0(j ,k+1)+yvel0(j+1,k+1) &
118 +yvel1(j ,k+1)+yvel1(j+1,k+1)))*0.25_8*dt
119 total_flux=right_flux-left_flux+top_flux-bottom_flux
120
121 volume_change_s=volume(j,k)/(volume(j,k)+total_flux)
122
123 min_cell_volume=MIN(volume(j,k)+right_flux-left_flux+ &
top_flux-bottom_flux &
124 ,volume(j,k)+right_flux-left_flux &
125 ,volume(j,k)+top_flux-bottom_flux)
126
127 recip_volume=1.0/volume(j,k)
128
129 energy_change=(pressure(j,k)/density0(j,k)+viscosity(j,k)/density0(j,k)) &
*total_flux*recip_volume
130
131 energy1(j,k)=energy0(j,k)-energy_change
132
133 density1(j,k)=density0(j,k)*volume_change_s
134
135 ENDDO
136 ENDDO
137 !$OMP END DO
138
139 ENDIF
140
141 !$OMP END PARALLEL
142
143 END SUBROUTINE PdV_kernel
accelerate_kernel
Threads (Time) |
IPC per Core |
Loads per Cycle |
L1 Hits per Cycle |
L1 Miss Ratio |
L2 Miss Ratio |
L3 Miss Ratio |
L2 B/W Utilized |
L3 B/W Utilized |
DRAM B/W Utilized |
1 (4.9%) |
0.84 |
0.19 |
0.15 |
6.16% |
33.84% |
90.45% |
18.59% |
48.41% |
53.25% |
16 (5.3%) |
0.62 |
0.14 |
0.11 |
5.74% |
32.98% |
88.08% |
6.01% |
10.37% |
75.25% |
112 (5.4%) |
0.49 |
0.05 |
0.05 |
5.56% |
39.20% |
77.40% |
7.22% |
53.23% |
148.75% |
27 SUBROUTINE accelerate_kernel(x_min,x_max,y_min,y_max,dt, &
28 xarea,yarea, &
29 volume, &
30 density0, &
31 pressure, &
32 viscosity, &
33 xvel0, &
34 yvel0, &
35 xvel1, &
36 yvel1 )
37
38 IMPLICIT NONE
39
40 INTEGER :: x_min,x_max,y_min,y_max
41 REAL(KIND=8) :: dt
42
43 REAL(KIND=8), DIMENSION(x_min-2:x_max+2 ,y_min-2:y_max+2) :: density0
44 REAL(KIND=8), DIMENSION(x_min-2:x_max+2 ,y_min-2:y_max+2) :: volume
45 REAL(KIND=8), DIMENSION(x_min-2:x_max+3 ,y_min-2:y_max+2) :: xarea
46 REAL(KIND=8), DIMENSION(x_min-2:x_max+2 ,y_min-2:y_max+3) :: yarea
47 REAL(KIND=8), DIMENSION(x_min-2:x_max+2 ,y_min-2:y_max+2) :: pressure
48 REAL(KIND=8), DIMENSION(x_min-2:x_max+2 ,y_min-2:y_max+2) :: viscosity
49 REAL(KIND=8), DIMENSION(x_min-2:x_max+3 ,y_min-2:y_max+3) :: xvel0,yvel0
50 REAL(KIND=8), DIMENSION(x_min-2:x_max+3 ,y_min-2:y_max+3) :: xvel1,yvel1
51
52 INTEGER :: j,k
53 REAL(KIND=8) :: nodal_mass,stepbymass_s,halfdt
54
55 halfdt=0.5_8*dt
56
57 !$OMP PARALLEL
58
59 !$OMP DO PRIVATE(j,k,stepbymass_s)
60 DO k=y_min,y_max+1
61 DO j=x_min,x_max+1
62 stepbymass_s=halfdt/((density0(j-1,k-1)*volume(j-1,k-1) &
63 +density0(j ,k-1)*volume(j ,k-1) &
64 +density0(j ,k )*volume(j ,k ) &
65 +density0(j-1,k )*volume(j-1,k )) &
66 *0.25_8)
67
68 xvel1(j,k)=xvel0(j,k)-stepbymass_s*(xarea(j ,k )* &
(pressure(j ,k )-pressure(j-1,k )) &
69 +xarea(j ,k-1)*(pressure(j ,k-1)-pressure(j-1,k-1)))
70 yvel1(j,k)=yvel0(j,k)-stepbymass_s*(yarea(j ,k )* &
(pressure(j ,k )-pressure(j ,k-1)) &
71 +yarea(j-1,k )*(pressure(j-1,k )-pressure(j-1,k-1)))
72 xvel1(j,k)=xvel1(j,k)-stepbymass_s*(xarea(j ,k )* &
(viscosity(j ,k )-viscosity(j-1,k )) &
73 +xarea(j ,k-1)*(viscosity(j ,k-1)-viscosity(j-1,k-1)))
74 yvel1(j,k)=yvel1(j,k)-stepbymass_s*(yarea(j ,k )* &
(viscosity(j ,k )-viscosity(j ,k-1)) &
75 +yarea(j-1,k )*(viscosity(j-1,k )-viscosity(j-1,k-1)))
76 ENDDO
77 ENDDO
78 !$OMP END DO
79
80 !$OMP END PARALLEL
81
82 END SUBROUTINE accelerate_kernel
ideal_gas_kernel
Threads (Time) |
IPC per Core |
Loads per Cycle |
L1 Hits per Cycle |
L1 Miss Ratio |
L2 Miss Ratio |
L3 Miss Ratio |
L2 B/W Utilized |
L3 B/W Utilized |
DRAM B/W Utilized |
1 (4.3%) |
0.51 |
0.03 |
0.01 |
1.20% |
53.70% |
75.46% |
11.86% |
45.80% |
50.32% |
16 (5.4%) |
0.25 |
0.02 |
0.01 |
1.39% |
56.57% |
76.79% |
3.08% |
8.25% |
58.95% |
112 (5.2%) |
0.35 |
0.02 |
0.02 |
2.35% |
57.92% |
82.77% |
3.83% |
36.30% |
101.15% |
27 SUBROUTINE ideal_gas_kernel(x_min,x_max,y_min,y_max, &
28 density, &
29 energy, &
30 pressure, &
31 soundspeed )
32
33 IMPLICIT NONE
34
35 INTEGER :: x_min,x_max,y_min,y_max
36 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: density
37 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: energy
38 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: pressure
39 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: soundspeed
40
41 INTEGER :: j,k
42
43 REAL(KIND=8) :: sound_speed_squared,v,pressurebyenergy,pressurebyvolume
44
45 !$OMP PARALLEL
46 !$OMP DO PRIVATE(v,pressurebyenergy,pressurebyvolume,sound_speed_squared)
47 DO k=y_min,y_max
48 DO j=x_min,x_max
49 v=1.0_8/density(j,k)
50 pressure(j,k)=(1.4_8-1.0_8)*density(j,k)*energy(j,k)
51 pressurebyenergy=(1.4_8-1.0_8)*density(j,k)
52 pressurebyvolume=-density(j,k)*pressure(j,k)
53 sound_speed_squared=v*v*(pressure(j,k)*pressurebyenergy-pressurebyvolume)
54 soundspeed(j,k)=SQRT(sound_speed_squared)
55 ENDDO
56 ENDDO
57 !$OMP END DO
58 !$OMP END PARALLEL
59
60 END SUBROUTINE ideal_gas_kernel
reset_field_kernel
Threads (Time) |
IPC per Core |
Loads per Cycle |
L1 Hits per Cycle |
L1 Miss Ratio |
L2 Miss Ratio |
L3 Miss Ratio |
L2 B/W Utilized |
L3 B/W Utilized |
DRAM B/W Utilized |
1 (4.8%) |
0.22 |
0.03 |
0.02 |
8.31% |
49.83% |
95.06% |
9.93% |
38.41% |
42.35% |
16 (5.4%) |
0.06 |
0.02 |
0.01 |
11.34% |
50.02% |
94.98% |
2.82% |
7.56% |
55.47% |
112 (4.9%) |
0.16 |
0.01 |
0.01 |
10.17% |
55.98% |
93.03% |
4.17% |
40.42% |
113.08% |
27 SUBROUTINE reset_field_kernel(x_min,x_max,y_min,y_max, &
28 density0, &
29 density1, &
30 energy0, &
31 energy1, &
32 xvel0, &
33 xvel1, &
34 yvel0, &
35 yvel1)
36
37 IMPLICIT NONE
38
39 INTEGER :: x_min,x_max,y_min,y_max
40 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: density0,energy0
41 REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: density1,energy1
42 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: xvel0,yvel0
43 REAL(KIND=8), DIMENSION(x_min-2:x_max+3,y_min-2:y_max+3) :: xvel1,yvel1
44
45 INTEGER :: j,k
46
47 !$OMP PARALLEL
48 !$OMP DO
49 DO k=y_min,y_max
50 DO j=x_min,x_max
51 density0(j,k)=density1(j,k)
52 energy0(j,k)=energy1(j,k)
53 ENDDO
54 ENDDO
55 !$OMP END DO
56
57 !$OMP DO
58 DO k=y_min,y_max+1
59 DO j=x_min,x_max+1
60 xvel0(j,k)=xvel1(j,k)
61 yvel0(j,k)=yvel1(j,k)
62 ENDDO
63 ENDDO
64 !$OMP END DO
65
66 !$OMP END PARALLEL
67
68 END SUBROUTINE reset_field_kernel