Description of the DLR TAU Code
For parallel computations the grids are partitioned in the requested number of domains at the start of the simulation. Therefore a simple bisection algorithm is employed. The load balancing is performed on edge- and point weights which are adjusted to the needs of the solver, which is the most time consuming part of the simulation system. During the migration of the grid domains onto the different processes the communication tables are stored in the grid partition, which contains the necessary information for updating data on points in the overlap region between a domain and its neighbours. For adapted grids the grid hierarchy, which contains the information for grid de-refinement, is distributed over the domains as well.
After the grid partitioning, all other modules of TAU, which are described below, compute the requested data for a single domain per process. Grid re-partitioning is performed either if the grid was locally (de-)refinement in an adaptation or if the number of domains is changed, when a simulation is restarted on a different number of CPU’s.
pre-processing needs to be employed once for a given primary grid. It computes the dual grid composed of general control volumes from the primary elements . They are stored in an edge based data structure, which makes the solver independent of the element types of the primary grid. All metrics are given by normal vectors, representing size and orientation of the faces, the geometric coordinates of the grid nodes and the volumes of the auxiliary cells (i.e. dual cells). The connectivity of the grid is given by linking the two nodes on both sides of each face to the corresponding edge from the primary grid elements. In order to enable the use of a multi-grid technique the agglomeration approach  is employed to obtain coarse grids by fusing fine grid control volumes together. As the coarse grids employ the same type of metric description as the fine dual grids, the solution on coarse grids can be computed with the same approach as on the finest grid. The transfer operators needed for the communication between the different grids are obtained directly during the agglomeration process.
In order to optimize the solver efficiency the edges of the dual grid are sorted such that cache-loads are minimised in the flux-computation part of the solver. The point indices are re-ordered to optimise memory and cache-line accesses as far as possible. This optimisation reduces the solver runtime to less than half for standard PC-architectures.
For the use in turbulence models wall distances are computed for each grid point and regions of laminar flow are flagged depending on user input or on the result of a transition prediction method.
Fig. 1: Convergence behaviour of the hybrid TAU-Code for calculations of viscous flow around a delta wing at M=0.5, alpha=9ş. Comparison shows baseline Runge-Kutta scheme (RK) and implicit LU-SGS scheme.
As the explicit approach leads to severe restrictions of the CFL number which in turn often resulted in slow convergence, especially in case of large scale applications an implicit approximate factorization scheme has recently been implemented , in order to improve the performance and robustness of the solver. The LU-SGS (Lower-Upper Symmetric Gauss-Seidel) scheme has been selected because this method has low memory requirements, low operation counts and can be parallelized with relative ease. Compared to the explicit Runge-Kutta method, the LU-SGS scheme is stable with almost no time step restrictions.An example of the performance improvement achieved is given in Fig. 1, where two convergence histories for viscous calculations on a delta wing are shown. The calculations were performed with multi-grid on 16 processors of a Linux cluster. The figure shows the residual and the rolling moment against iteration count. In terms of iterations LU-SGS can be seen to converge approximately twice as fast as the Runge-Kutta scheme. Furthermore, one iteration of LU-SGS costs roughly 80% of one Runge-Kutta step. This results in a reduction of the overall calculation time by a factor of 2.5.
For time accurate computations the dual time stepping approach of the Jameson is employed. As the solver respects also the geometric conservation law both grid deformation as well as bodies in arbitrary motion can be simulated.
In order to efficiently resolve detailed flow features, a grid adaptation algorithm for hybrid meshes based on local grid refinement and wall-normal mesh movement in semi-structured near-wall layers was implemented . This algorithm has been extended to allow also for de-refinement of earlier refined elements thus enabling the code to be used for unsteady time-accurate adaptation in unsteady flows . Fig. 2 gives an example of the de-/refinement process for the flow in a shock tube: After a second diaphragm to the right of the depicted area is broken, a complex interaction between shock waves and expansions begins. The figures display the density gradients resulting from the simulation as well as the grid which is automatically adapted to those gradients in each time step. This local refinement approach greatly reduces the number of grid points needed for the total simulation compared to a simulation on globally refined grids. Thus, given a limited computer memory it allows better resolution at the cost of additional CPU time (usually below 20 %) for adaptation per time step. Compared to the same resolution on globally refined grids this reduces the CPU and memory requirements considerably.
Fig 2: Dynamic mesh refinement and de-refinement for the flow in a shock-tube 50, 70 and 110 ms after breaking of the second diaphragm. (left: computed Schlieren-pictures, right: grid development)
A grid deformation tool is used to account for moderate changes of the geometry, defined e.g. by an optimization technique during shape design or by the structural response of the geometry on aerodynamic loads in a coupled simulation, e.g. ,. An algebraic method has been developed in order to avoid time consuming iterative solutions of equations based e.g. on linear elasticity or spring analogy. The displacements which are the input for the deformation tool and the rotation of surface points are transported into the interior of the grid by an advancing front technique. Depending on the ratio between the local point displacement and the cell size the displacement is reduced by some fraction in each step of the front. This procedure ends, when no more grid points are moved during a sweep. In parallel computations, due to displacements coming from neighbouring domains the sweeps are continued until the grid does not change any more. Since a single sweep requires almost negligible effort, this is not a significant drawback of the parallel mode where usually an order of 10 to 20 sweeps is needed.
This algebraic method is robust enough for small and sometimes also for medium displacements and can handle e.g. wing tip deflections of one or several chord length. It has been observed that the limit, i.e. the deflection which results in a collapse of a first cell, can be extended considerably if it is accepted that a few more cells collapse. Repairing these cells with another algorithm, in a second stage of the deformation increases the robustness considerably and allows for large grid deformations. In this algorithm, each region in the grid containing collapsed cells is marked such that it is bounded by valid cells only. With the shape of this boundary in both the deformed and the non-deformed grid a transformation can be computed applying the volume spline technique, which allows rebuilding the collapsed cells as images of the original ones. As long as these regions remain small the additional computational costs for the local volume spline is low.
Fig. 3 indicates that this robust approach allows going beyond realistic deformations. It shows the maximum possible wing tip deflections for a hybrid grid composed of 2.5*106 points. CPU time requirements on one single Opteron CPU is less than 2 minutes for small deflections (of about a chord length) and less than 10 minutes for the maximum wing tip deflection.
Fig. 3: (Left) Maximum wing tip deflections and details of the deformed hybrid grid for the DLR F6 configuration (available from the second DPWS); (Right) Maximum wing tip deflections in parallel (left) and sequential mode (right)
As the Chimera technique has been recognized as an important feature to efficiently simulate manoeuvring aircraft, it has been also integrated into the TAU-Code . In the context of hybrid meshes the overlapping grid technique allows an efficient handling of complex configurations with movable control surfaces. For the point update on chimera boundaries linear interpolation based on a finite element approach is used in case of tetrahedral mesh elements. For other types of elements (prisms, hexahedrons, pyramids) either linear interpolation is performed by splitting the elements into tetrahedrons or non-linear interpolation for the different element types is used. The search for cells which are used for interpolation is performed using the data structure of an alternating digital tree. The current implementation of the Chimera technique can handle both steady and unsteady simulations for inviscid and viscous flows with multiple moving bodies and is also available in parallel mode. Applications of this technique in TAU can be found e.g. in ,,.
Transition and Turbulence modelling
The turbulence models implemented within the TAU code include linear as well as non-linear eddy viscosity models spanning both one- and two-equation model families.
The standard turbulence model in TAU is the Spalart-Allmaras model with Edwards modification, yielding highly satisfactory results for a wide range of applications while being numerically robust. The k-ω model provides the basis for the two equation models, where the one mostly used is probably the Menter SST model. Besides this, a number of different k-ω models, like Wilcox and Kok-TNT, are available. Also nonlinear explicit algebraic Reynolds stress models (EARSM) and the linearized LEA model  have been integrated. The implementation of RSM models is ongoing work. A number of rotation corrections for vortex dominated flows are available for the different models.
For a long time only the low Reynolds formulation has been available in TAU for accuracy reasons, but recently model specific “universal” wall-function have been introduced to achieve a higher efficiency of the solver, especially for use in design or optimisation as well as to allow for a “first quick look” on new configurations. Although tests are till on-going, this very promising approach  seems to be able to deliver nearly as good results as the low Reynolds approach for pressure and skin friction distributions over a wide range of y+ values for the first cell height at the wall, while saving up to 75 percent of computation time and 40 percent of memory.
Finally, there are options to perform Detached Eddy Simulations (DES)  based on the Spalart-Allmaras or the Menter SST model or the so-called Extra-Large Eddy Simulation (XLES) . Since the DES method is a development strategy that can be applied, in principal, to any eddy viscosity turbulence model, the original implementation within the TAU code required only the calculation of additional terms and a suitable switch to activate the DES model within the calculation of the source term of the turbulence model. In computational terms, the overhead for the solution of a time step using Detached Eddy Simulation is negligible compared to URANS in three-dimensional cases. However, the true cost arises from the need to sufficiently resolve the temporal scales such that unsteadiness in a solution can grow in a physical way.
In order to allow for modelling of transitional flow the turbulent production terms are suppressed in regions which are flagged in the grid as being of laminar flow type. Flagging of laminar regions can be done in the pre-processing by the definition of polygon-lines which encircle the laminar region on the surface grid and the definition of a maximum height over the surface. The polygon lines for laminar regions can be defined by the user for simulations with fixed transition to turbulent flow or can be computed by a transition prediction module, which is ongoing development and is thus described in more detail in the following chapter.
TAU version for hypersonic and reacting flows
To extend the range of applicability of TAU to hypersonic  or high enthalpy flows additional modifications and extensions have been introduced into the code already a while ago, enabling e.g. simulations of re-entry vehicles including chemical reactions of air as a five component gas. These modifications range from stabilization of the solver for high Mach numbers over additions for thermo-chemical equilibrium flows to the consideration of non-equilibrium gases. Due to the latter additional conservation equations for the partial densities and the vibrations energies of the species are introduced in the code and to close the system, models for the state of species as well as fits for their viscosity (Blottner) and the resulting heat conductivity (modified Eucken correction) are taken into account. Furthermore mixture rules (Wilke and Herning/Zipperer), diffusion (following Ficks law) and detailed chemistry based on the Arrhenius-Ansatz with thermal coupling after Park are implemented together with thermal relaxation after Landau-Teller to allow for full thermo-chemical non-equilibrium simulations.
With respect to boundary conditions walls with full, finite or non catalytic surfaces can be considered as well as radiation-adiabatic walls or effusion-cooled walls (porous walls).
Currently data bases are under construction to consider
- air plasma with 11 components (including electrical conductivity for MHD effects),
- a CO2 atmosphere (Mars) as well as
- H2-O2 combustion including probability density functions (PDF) for coupling of chemistry and turbulence.
This description is available as pdf-file.