How To

In this page you can find a list of typical use-cases supported by SURF-SHYFEM. Before starting to explore the available options, if you haven't done so already, we recommend you to first follow the Getting Started section before starting to customize your experiment.

Download Input Data Products From the Internet

Click here to expand...

When you run your first experiment, you manually downloaded and used local input data. Luckily, you don't have to do that every time you run a simulation with SURF.

By correctly configuring SURF, the input data will be automatically downloaded for you from the internet, using the official server APIs such as the Copernicus Marine Toolbox or the CDSAPI service. The only requirement is for you to have a stable and relatively fast internet connection.

Note that some data providers require you to have a personal account in order to download data from their servers (see table below). For those services, after creating an account you will have to upload you personal credentials into the SURF configuration file.

The currently supported input data products in SURF are:

Provider	Type	Account required?	Config section	Product ID	Type
CMEMS - Copernicus Marine Data Store	Ocean	Yes (CMEMS)	🔗	CMEMS_GLOBAL_MULTIYEAR_PHY_001_030	Reanalysis, Global
				CMEMS_GLOBAL_ANALYSISFORECAST_PHY_001_024	Analysis and Forecast, Global
				CMEMS_MEDSEA_MULTIYEAR_PHY_006_004	Reanalysis, Mediterranean Sea
				CMEMS_MEDSEA_ANALYSISFORECAST_PHY_006_013	Analysis and Forecast, Mediterranean Sea
				CMEMS_BLKSEA_MULTIYEAR_PHY_007_004	Reanalysis, Black Sea
				CMEMS_BLKSEA_ANALYSISFORECAST_PHY_007_001	Analysis and Forecast, Black Sea
ECMWF - Climate Data Store	Atmosphere	Yes (CDS)	🔗	ECMWF_ERA5_REANALYSIS	Reanalysis, Global
NOAA - Global Forecast System	Atmosphere	No	🔗	NCEP_GFS_FORECAST	Analysis and Forecast, Global
GEBCO	Bathymetry	No	🔗	GEBCO 2024	Global
GSHHG	Coastline	No	🔗	GSHHG h	Global
TPXO	Tide	No	🔗	TPXO8	Global

ECMWF Credentials

Configuring your credentials to download ECMWF data can be a little tricky. For this reason, here is a more detailed step-by-step guide:

Log in on the (CDS) website
Browse to the how-to-api page. Here, if you are still logged in, you will be able to see your personal access token
Copy your personal access token and paste it under the 🔗Surface Boundary Conditions section of your configuration file (it's a parameter called Password)
Additionally, in order to automatically download data from the CDS you must manually agree to the Terms of Use. To do so, browse to the product download page, scroll down until you encounter the Terms of Use section and click on the "Agree" button.
That's it, you can now launch your experiment and SURF will automatically download ECMWF data for you!

Use Local Input Data Products

Click here to expand...

In addition to downloading data from predefined products, SURF also allows you to provide your own input datasets for a simulation. This can be done for the following components:

Coastline
Bathymetry
Ocean data
Atmospheric data

Steps to Use Custom Datasets

Set the Input Product to offline

In the JSON configuration file, set the desired input product option to offline. For example, to use personalized ocean datasets, in the 🔗Initial and Open Boundary Conditions section of the JSON configuration file, set the Ocean Product parameter to offline. This tells SURF to look for external datasets instead of downloading them.
Ensure Complete Coverage

The datasets you provide must: - Cover the entire spatial domain of the nested grid (consider including some buffer), and - Span the full temporal range of the experiment (including spin-up days).

Organize Your Dataset Files

Place the files within your Base Directory. For example, to provide personalized ocean datasets for the 8 days simulation from the Getting Started section (3 spin-up + 5 experiment), you could structure them as follows

graph LR
    base[🏠&nbsp Base Directory]:::leaf --> A[📁&nbsp Experiment Directory]:::leaf
    base[🏠&nbsp Base Directory]:::leaf --> A1[📁&nbsp river_input/]:::folder
    A1[📁&nbsp datasets/]:::folder --> A2[📁&nbsp bay_madagascar/]:::folder
    A2[📁&nbsp bay_madagascar/]:::folder --> A3[📁&nbsp ocean/]:::folder
    A3[📁&nbsp ocean/]:::folder --> A4[🗄️&nbsp velu_yYYYYmMMdDD.nc]:::file
    A3[📁&nbsp ocean/]:::folder --> A5[🗄️&nbsp velv_yYYYYmMMdDD.nc]:::file
    A3[📁&nbsp ocean/]:::folder --> A6[🗄️&nbsp tem_yYYYYmMMdDD.nc]:::file
    A3[📁&nbsp ocean/]:::folder --> A7[🗄️&nbsp ssh_yYYYYmMMdDD.nc]:::file
    A3[📁&nbsp ocean/]:::folder --> A8[🗄️&nbsp sal_yYYYYmMMdDD.nc]:::file
    A3[📁&nbsp ocean/]:::folder --> A9[🗄️&nbsp mask_ocean_T.nc]:::file
    A3[📁&nbsp ocean/]:::folder --> A10[🗄️&nbsp mask_ocean_T_bathy.nc]:::file
    A3[📁&nbsp ocean/]:::folder --> A11[🗄️&nbsp mask_ocean_T_coords.nc]:::file
    A --> B[📁&nbsp .../]:::leaf

    %% Class definitions for coloring
    classDef folder fill:#d0ebff,stroke:#339af0,stroke-width:1px;
    classDef file fill:#ffe4b2,stroke:#ffa400,stroke-width:1px;
    classDef leaf fill:#f8f9fa,stroke:#adb5bd,stroke-width:1px;

Here, YYYY, MM and DD represent daily files of each variable, covering the entire simulation period (e.g. velu_y2021m03d31.nc, velu_y2021m04d01.nc, ... ).

Specify Paths and Filenames in the Configuration In the JSON configuration file, under the offline section of the relevant product, specify the correct Path and Filename values. Following the example above, in the 🔗Offline product settings section, the Path variable should be specified as datasets/bay_madagascar/ocean.
Coastline Dataset Requirements

Dataset Formatting

When providing a custom coastline dataset, please ensure that:
- The shapefile geometry is of type Polygon.
- The shapefile package includes at least the following files: .shp, .shx, .dbf, and .prj.
Missing or incorrect files will prevent SURF from correctly reading your coastline data.

Dataset Preparation

Unlike the built-in coastline products, custom shapefiles are used exactly as provided, with no preprocessing (e.g., clipping a global dataset to the region of interest). If your dataset is too large or needs adjustments, be sure to simplify or process it beforehand.

Select the Simulation Domain

Click here to expand...

To change the simulation domain, you need to adapt the JSON configuration file section 🔗Nested Grid and Land-Sea Mask Settings. There, you can directly choose the domain coordinates, the grid resolution, distribution of vertical levels, and the land-sea masking dataset.

Best Practices

When defining the domain polygon for your nested grid, it is important to follow some guidelines to prevent mesh issues such as poor element quality and non-conformity.

Refer to the Best Practices section for detailed tips on properly drawing your domain polygon.

Select the Simulation Time Period

Click here to expand...

To change the simulation time period, you need to adapt the JSON configuration file section 🔗Simulation Period and Time Step Parameters. There, you can directly choose the start and end date of the simulation.

Spin-up Days

You shouldn't be surprised if the actual start date of your simulation does not correspond to the start_date you have chosen in the configuration file.

That's because SURF automatically offsets the start date by a certain number of days, called the spin-up period:

actual_start_date = start_date - n_days_spinup

The purpose of the spin-up period is to allow the model to adjust dynamically to the external forcing and reach a steady state (see Trotta et al. 2016). By default, the spin-up time is set to 3 days. As an advanced option, the spin-up period can be disabled or modified in the Expert JSON configuration file section 🔗Simulation Period and Time Step Parameters

Execute Experiments in Parallel Using Multiple CPUs (MPI)

Click here to expand...

To run an experiment with SHYFEM-MPI on multiple CPUs (MPI), you have to:

Specify a Number of Parallel Processes greater than 1 in the JSON configuration file section 🔗General Settings.
In your Docker resource settings, make sure that you have allocated enough CPUs to match the requested number of processes.

Best Practices

The number of CPUs should be chosen carefully as it directly impacts the scalability of the code. Using too many MPI processes for a given grid may significantly increase communication overhead, which can actually reduce performance rather than improve it. Since scalability also depends on your hardware architecture, it is difficult to provide here universally valid guidelines.

As a rule of thumb, try ensuring that \(\dfrac{n_{nodes}}{n_{proc}} \ge 4000\), where \(n_{proc}\) is the number of MPI processes and \(n_{nodes}\) is the total number of horizontal grid nodes.

Since the number of grid nodes cannot be know before a first complete execution of the grid generation step, you may consider re-running only the Grid Generation task in the SURF Workflow to fine-tune your settings. SURF will also issue warning log messages (see example below) to help you find a suboptimal configuration.

WARNING!
The number of CPUs selected (nproc=8) may be excessive for the current grid with 17608 nodes.
Using fewer CPUs could help improve computational efficiency.

Keep in mind however that these are only general guidelines and MPI communication performance may vary depending on your specific hardware. For more precise tuning, it may be useful to run a few scalability tests on your machine. For further details on scalability performace of SHYFEM-MPI, see Micaletto et al. (2022).

Include Rivers in the Simulation

Click here to expand...

To include rivers in your simulation, you must first obtain the related data from a suitable source, such as in-situ observations, remote sensing, analysis/reanalysis products provided e.g. by a governmental authority or regional hydrological institute. This data must then be formatted and placed in a designated folder within the Base Directory so that SURF can access it. For detailed instructions on preparing and formatting river data, refer to the River Boundary Conditions section.

Next, ensure that the River Boundary Conditions task is enabled in the JSON configuration file under the 🔗Workflow Tasks section. You can add as many rivers as needed by providing the corresponding data and specifying the following information in the 🔗River Forcing at Domain Boundaries section:

River Names: A list of the river names.
River Mouth Coordinates : Approximate coordinates for the centre of each river mouth.
River Mouth Widths: Estimated widths (bank-to-bank) of each river mouth.

This setup will allow SURF to automatically prepare the forcing data required by SHYFEM-MPI.

Run an Operational Forecast

Click here to expand...

To run an operational forecast, you have to:

Set the simulation start date to null under the JSON configuration file section 🔗Simulation Period and Time Step Parameters. This way, SURF will automatically start the simulation from the current date (excluding spin-up days which are in the past).
Select analysis and forecast input products, both for the atmosphere and the ocean (see the available products above).

Customize the Output Figures

Click here to expand...

With SURF, you have the possibility to customize the output visualization. To do so, you can modify the JSON configuration file section 🔗Postprocessing to select the desired visualization parameters.

Image Generation for Multiple Days

By default, SURF generates images only for the final simulation day. If you want to generate images for all simulation days, you can set the Plot Only Final Date flag to false in the JSON configuration file section 🔗Postprocessing.

Resume an Existing Experiment

Click here to expand...

It may happen sometimes that you have started an experiment, made some progress but had to interrupt it. For example, due to a network error or missing credentials.

In this case, instead of starting again from scratch, you can simply set the overwrite flag to false in the 🔗Session Settings when running the experiment. This way, SURF will mount the existing experiment directory and resume from the last checkpoint.

Modify the Source Code and Create a Custom Image

Click here to expand...

💡 Advanced

In some cases, it may be useful to directly modify the source code inside the container, to accomodate some advanced use cases.

To do so, you have to start an interactive session with SURF:

1	`surf_shyfem [...] --interactive`

Once inside the container, you can make your modifications using the installed editors or your favorite IDE. Be careful, if you close the container, all your changes will be lost!

The trick to persist your modifications is to export the container as a new image. First, open another Terminal and find the container ID:

1	`docker ps`

Then, take a snapshot of the container state and export it as a new image:

1	`docker commit [CONTAINER_ID] [NEW_IMAGE_NAME]`

You can now safely close the container. You will be able to launch your modified image with the SURF CLI by adding the --image flag:

1	`surf_shyfem [...] --image [NEW_IMAGE_NAME]`

How To

Download Input Data Products From the Internet

Use Local Input Data Products

Steps to Use Custom Datasets

Dataset Formatting

Dataset Preparation

Select the Simulation Domain

Select the Simulation Time Period

Execute Experiments in Parallel Using Multiple CPUs (MPI)

Include Rivers in the Simulation

Run an Operational Forecast

Customize the Output Figures

Resume an Existing Experiment

Modify the Source Code and Create a Custom Image