SiMLInt Data Generation Implementation

Following the structure given in the general data generation case, this page describes the data generation phase of our implementation for the Hasegawa-Wakatani example.

Fine- and coarse-grained resolutions. We chose:
- 1024x1024 for fine-grained simulations; and
- 256x256 for coarse-grained simulations.
For BOUT++ our implementation, we performed 2-d simulations, with guard cells in the x-dimension (1028 or 260 x values), 1 y value, and a periodic z-dimension (1024 or 256 values). Therefore the shapes of the variables used by BOUT++ are (1028x1x1024) for the fine-grained simulations and (260x1x256) for the coarse-grained simultions.

Note: a number of values related to running BOUT++ are hard-coded to match these choices of fine- and coarse-grained simulations. These are given to BOUT++ as command-line arguments in SLURM scripts.

Generate a “fully resolved” simulation

The first step is to create a modified version of the Hasegawa-Wakatani example from BOUT++, which we’ll copy to your work directory.

 # For full compatibility with the test simulations run during the SiMLInt project, the diffusive function is modified.
 cp -r $WORK/BOUT-dev/examples/hasegawa-wakatani $WORK/my-hw
 cd $WORK/my-hw
 sed -i 's/-Dn \* Delp4(n);/+Dn \* Delp2(n);/g' hw.cxx
 sed -i 's/-Dvort \* Delp4(vort);/+Dvort \* Delp2(vort);/g' hw.cxx

Compile the code.

 module load intel-20.4/mpi
 module load intel-20.4/compilers
 module load fftw/3.3.10-intel20.4-impi20.4
 module load netcdf-parallel/4.9.2-intel20-impi20
 module load cmake

 cmake . -B build -Dbout++_DIR=$WORK/BOUT-dev/build -DCMAKE_CXX_FLAGS=-std=c++17 -DCMAKE_BUILD_TYPE=Release
 cmake --build build --target hasegawa-wakatani

Before simulating the training data, a burn-in run must be conducted at the desired resolution. For an example of this, see fine_init.sh.

Submit the burn-in run:

 cd ${SIMLINT_HOME}/files/1-data-generation/
 sbatch fine_init.sh --account $ACCOUNT

Following that, we run a number of sequentially trajectories to generate fine-grained ground-truth data. See fine_trajectories.sh

The initial simulation produces “restart files” BOUT.restart.*.nc from which a simulation can be continued.

Edit fine_trajectories.sh

 for TRAJ_INDEX in {1..10}

to give the desired number of trajectories.

Coarsen selected simulation snapshots.

Fine-grained data must be coarsened to match the desired coarse-grained resolution. This can be done via interpolation for a general solution. Files in files/2-coarsening perform this task. Submit submit-resize.sh via
```
 sbatch submit-resize.sh --account $ACCOUNT
```
Note: this operates an array job on all trajectories simultaneously. Edit
```
 #SBATCH --array=1-10
```
to match the selected number of trajectories.
Single-timestep coarse simulations.

With the previous step having extracted fine-grained data for each time step (and each trajectory for which it was repeated), we now need to run a single-timestep coarse-grained simulation. To do this, see files/3-coarse-simulations. Submitting run_coarse_sims.sh will run a single step simulation for each coarsened timestep created in the previous step:
```
 sbatch run_coarse_sims.sh --account $ACCOUNT
```
Generating training data.

We now have all of the data required to train the ML models, but not in the format we require. Files in files/4-training-data perform this task.

Submit sub_gen_training_nc.sh:
```
 sbatch sub_gen_training_nc.sh --account $ACCOUNT
```
As before, the SLURM array specification in sub_gen_training_nc.sh should match the selected number of trajectories.

Subsequent steps: calculating the error and model training are covered in ML model training implementation.

< Back