At the end of February 2019, I did a presentation at the SIAM Conference on Computer Science and Engineering (CSE) in Spokane Washington. I live in the Seattle area, and Spokane is reasonably close, so I decided to drive instead of fly. Unfortunately, the entire nation, including Washington state, was still in the grips of the dreaded “polar vortex.” The night before my drive to Spokane all of the mountain passes were closed due to heavy snowfall. They opened in time but the drive was slippery and slow. I probably should have taken a flight instead! On the drive, I came up with this Haiku…
Driving to Spokane
Snow whirlwinds on pavement
Must make conference!
Join the Tecplot Community
Stay up-to-date by subscribing to the TecIO Newsletter, events and product updates.
The Goal: Adding Parallel SZL Output to SU2
My presentation at the SIAM CSE conference was on the progress made adding parallel SZL (SubZone Load-on-demand) file output to SU2. The SU2 suite is an open-source collection of C++ based software tools for performing Partial Differential Equation (PDE) analysis and solving PDE-constrained optimization problems. The toolset is designed with Computational Fluid Dynamics (CFD) and aerodynamic shape optimization in mind, but is extensible to treat arbitrary sets of governing equations such as potential flow, elasticity, electrodynamics, chemically-reacting flows, and many others. SU2 is under active development by individuals all around the world on GitHub and is released under an open-source license. For more details, visit SU2 on Github.
The Challenge: Building System Compatibility
We implemented parallel SZL output in SU2 using the TecIO-MPI library, available for free download from the TecIO page. In some CFD codes, such as NASA’s FUN3D code, each user site is required to download and link the TecIO library. However, in the case of SU2 we decided to include the obfuscated TecIO source code directly into the distribution of SU2. This makes it much easier for the user – they need only download and build SU2 and they have SZL file output available.
However, this did add some complications from our end.
The main complication is that SU2 is built using the GNU configure system whereas TecIO is built using CMake. We had to create new automake, autoconfig, and m4 script files to seamlessly build TecIO as a part of the SU2 build.
If you find yourself integrating TecIO source into a CFD code built with the GNU configure system, feel free to shoot me some questions – email@example.com
Implementing Serial vs. Parallel TecIO
Once TecIO was building as part of the SU2 build, it was straight-forward to get the serial version of SZL output working. SU2 already included an older version of TecIO, so we simply replaced those calls with the newer TecIO calls.
To get the parallel SZL output (using TecIO-MPI) working was a little more complicated. Specifically, it required knowing which nodes on each MPI rank were ghost nodes. Ghost nodes are nodes that are duplicated between partitions to facilitate the communication of solution data between MPI ranks. We only want the node to show up once in the SZL file, so we need to tell TecIO-MPI which nodes are the ghost nodes. In addition, CFD codes often utilize ghost cells (finite-element cells duplicated between MPI ranks) which must be supplied to TecIO-MPI. This information took a little effort to extract from the SU2 “output” framework.
How Well Does It Perform?
We now have a version of SU2 that is capable of writing SZL files in parallel while being run on an HPC system. The next obvious questions: “How well does it perform?”
Test Case #1: Common Research Model (CRM) in High-Lift Configuration
The first test case is the Common Research Model from the High-Lift Prediction workshop. It was run with 3 grid refinement levels:
- 10 million cells
- 47.5 million cells
- 118 million cells
These refinements allowed us to measure the effect of problem size on the overhead of parallel output. All three cases were run on 640 MPI Ranks on the NCSA Blue Waters supercomputer. The results are shown in the following table:
|10M Cells||47.5M Cells||118M Cells|
|Time for CFD Step||17.6 sec||70 sec||88 sec|
|Time Restart write||6.1 sec||10.7 sec||31.4 sec|
|Time SZL File Write||43.9 sec||171 sec||216 sec|
For comparison we include the cost of incrementing the solution a single CFD time step and the cost of writing an SU2 restart file. It should be noted that the SU2 restart file only contains the conservative field variables – no grid variables and no auxiliary variables – so there is far less writing involved with the creation of the restart file. The cost of writing the SZL file is roughly 2.5 the cost of a single time step. If you write the SZL file infrequently (every 100 steps or so) this overhead is fairly small (2.5%).
Test Case #2: Inlet
The second test case is an inlet like you might find on the next generation jet fighter. It aggressively compresses the flow to keep the inlet as short as possible.
The inlet was analyzed using 93 million tetrahedral cells and 38 million nodes. As with the CRM case, the inlet case was run on the NCSA Blue Waters computer using 640 MPI ranks.
SU2 takes 74.7 seconds to increment the inlet CFD solution by one time-step and 31 seconds to write a restart file. To write the SZL plot file requires 216 seconds – 2.9 times as long as a single CFD time step.
The parallel SZL file output is currently in the pull-request phase of SU2 development. Once it is accepted it will be available in the Develop branch on GitHub. On occasion (I’m told every six months to a year), the develop branch is merged into the master branch. If you are interested in trying the parallel SZL output from SU2, send me an email (firstname.lastname@example.org) and I’ll let you know which branch to download.
Better yet, subscribe to our TecIO Newsletter and we will send you the updates.