GRIB to GeoTIFF extraction

2025-03-18 08:49 UTC o3-mini-high Open in ChatGPT ↗

Below is an overview of one common workflow using Python to extract each day of each variable from a GRIB file into individual floating point GeoTIFFs—and as an optional step, how you might combine the daily images into a “flipbook texture” for Unity.

1. Reading the GRIB File

You can use libraries such as xarray with the cfgrib engine (or alternatively pygrib) to read GRIB files. For example:

import xarray as xr

# Open the GRIB file using cfgrib.
ds = xr.open_dataset('your_file.grib', engine='cfgrib')

# Inspect the dataset to see available variables and dimensions (e.g., time, latitude, longitude)
print(ds)

Note: GRIB files sometimes contain extra dimensions (like forecast steps) or metadata that you may need to filter out. Make sure the time dimension corresponds to the days you want.

2. Extracting Each Day as a Floating Point TIFF

Once you have the data in an xarray Dataset, you can iterate over each variable and each time step to extract the array and write it out using rasterio. For each day, you’d convert the data to a float32 array and write it as a GeoTIFF. For example:

import numpy as np
import rasterio
from rasterio.transform import from_origin

# Loop through each variable in the dataset.
for var in ds.data_vars:
    da = ds[var]  # This DataArray should have dimensions like time, lat, lon.

    # Iterate over each time (day).
    for t in da.time.values:
        # Extract the data for the given day as a numpy array.
        # Ensure the array is in float32.
        arr = da.sel(time=t).values.astype(np.float32)

        # Define geospatial metadata. This example assumes a regular grid.
        # You may need to adjust based on the dataset's coordinate values.
        lats = ds.latitude.values  # or ds['lat'].values
        lons = ds.longitude.values  # or ds['lon'].values
        pixel_width = (lons[-1] - lons[0]) / (len(lons) - 1)
        pixel_height = (lats[0] - lats[-1]) / (len(lats) - 1)  # assuming descending order

        transform = from_origin(lons[0], lats[0], pixel_width, pixel_height)

        # Create an output filename using the variable name and date.
        # Formatting the date depends on how it's stored.
        out_file = f"{var}_{str(t)[:10]}.tif"  # e.g., "temperature_2025-03-18.tif"

        # Write the array to a GeoTIFF.
        with rasterio.open(
            out_file,
            'w',
            driver='GTiff',
            height=arr.shape[0],
            width=arr.shape[1],
            count=1,
            dtype='float32',
            crs='EPSG:4326',  # adjust CRS if needed
            transform=transform,
        ) as dst:
            dst.write(arr, 1)

        print(f"Saved: {out_file}")

Key points in this snippet:

Conversion to float32: Ensure the data is in floating point format.
Coordinate system and geotransform: The above code creates a simple affine transform from the extents of your coordinate arrays. Depending on your GRIB file, you might need to extract the resolution and origin from metadata.
Iteration: Each variable and each time step (assumed to be daily) is processed individually.

3. Optional: Creating a Flipbook Texture

If you prefer a single “flipbook texture” (an image atlas where each cell is one day’s frame), you can combine the daily images for each variable into one larger image. One approach is to:

Determine the number of days $N$ and decide on a grid layout (e.g., rows × columns such that rows × columns ≥ $N$).
Create a blank numpy array with dimensions $(rows \times h, columns \times w)$ where $h$ and $w$ are the height and width of each TIFF image.
Paste each daily frame into its corresponding position in the atlas.
Save the atlas as a floating point TIFF.

Here’s a simplified example:

import math

# Assume da is the DataArray for a given variable.
da = ds['your_variable']  # Replace with your variable name.
time_values = da.time.values
num_frames = len(time_values)
h, w = da.isel(time=0).shape  # Height and width of one frame

# Decide grid dimensions (e.g., a square grid)
cols = math.ceil(math.sqrt(num_frames))
rows = math.ceil(num_frames / cols)

# Create an empty array for the flipbook texture.
flipbook = np.zeros((rows * h, cols * w), dtype=np.float32)

# Fill the flipbook with each frame.
for idx, t in enumerate(time_values):
    frame = da.sel(time=t).values.astype(np.float32)
    r = idx // cols
    c = idx % cols
    flipbook[r*h:(r+1)*h, c*w:(c+1)*w] = frame

# Write the flipbook texture as a TIFF.
flipbook_out = "your_variable_flipbook.tif"
with rasterio.open(
    flipbook_out,
    'w',
    driver='GTiff',
    height=flipbook.shape[0],
    width=flipbook.shape[1],
    count=1,
    dtype='float32',
    crs='EPSG:4326',  # adjust as needed
    transform=from_origin(lons[0], lats[0], pixel_width, pixel_height),  # transform might need adjusting
) as dst:
    dst.write(flipbook, 1)

print(f"Flipbook saved as {flipbook_out}")

Note: When creating a flipbook texture, you’ll need to decide how Unity will sample the texture (e.g., UV tiling) so that each frame is displayed correctly. The geospatial transform used for the flipbook might not be as critical if the image is just used as a texture atlas.

4. Final Considerations

Library installation: Make sure you have installed the necessary Python libraries:
Terminal window
```
pip install xarray cfgrib rasterio numpy
```
GRIB specifics: GRIB files can vary in structure. Always inspect the dataset (print(ds)) to understand dimensions and variable names.
Coordinate system: The above examples assume geographic coordinates (EPSG:4326). Adjust the CRS and transform as needed.

This workflow focuses on the extraction process. Once you have your floating point TIFF files (or a flipbook texture), you can then import them into Unity and animate them on your mesh as desired.

Feel free to ask if you need more details on any of these steps!

When working with GRIB files, sometimes the time coordinate is labeled as valid_time instead of time. You can easily handle this by checking which coordinate exists in your DataArray. For example, you might modify your extraction code as follows:

# Determine which time dimension is available
time_dim = "time" if "time" in da.dims else "valid_time"
time_values = da[time_dim].values
num_frames = len(time_values)

# Get the height and width from the first frame
first_frame = da.isel(**{time_dim: 0})
h, w = first_frame.shape  # Height and width of one frame

In this snippet, we first check if "time" exists among the dimensions. If not, we default to "valid_time". Then, we extract the time values and dimensions accordingly, ensuring your workflow works regardless of which coordinate label is present.

This small change integrates seamlessly with the rest of your extraction workflow.

Below is an updated code snippet that calculates the grid dimensions, then pads the resulting flipbook image to the nearest power‐of‐two dimensions for both width and height:

import math
import numpy as np

# Determine grid dimensions (using a square grid)
cols = math.ceil(math.sqrt(num_frames))
rows = math.ceil(num_frames / cols)

# Original flipbook dimensions based on the number of frames
orig_width = cols * w
orig_height = rows * h

# Function to compute the next power of two for a given number
def next_power_of_two(n):
    return 2**math.ceil(math.log2(n))

# Compute the padded dimensions (each will be a power of two)
flipbook_width = next_power_of_two(orig_width)
flipbook_height = next_power_of_two(orig_height)

# Create an empty array for the flipbook texture with padded dimensions
flipbook = np.zeros((flipbook_height, flipbook_width), dtype=np.float32)

# Fill the flipbook with each frame (placed in the top-left corner of their grid cell)
for idx, t in enumerate(time_values):
    # Use the appropriate time coordinate (either 'time' or 'valid_time') as needed
    frame = da.sel(time=t).values.astype(np.float32)
    r = idx // cols
    c = idx % cols
    flipbook[r*h:(r+1)*h, c*w:(c+1)*w] = frame

print("Flipbook dimensions (padded to power-of-two):", flipbook.shape)

Explanation

Grid Calculation:
- The grid is defined as a square grid with cols = ceil(sqrt(num_frames)) and rows = ceil(num_frames / cols).
Original Dimensions:
- orig_width and orig_height are computed by multiplying the number of columns by the frame width and rows by the frame height.
Power-of-Two Padding:
- The next_power_of_two function computes the smallest power of two greater than or equal to a given number using logarithms.
- The new dimensions (flipbook_width and flipbook_height) are then computed to ensure the texture dimensions are powers of two.
Flipbook Creation:
- An empty NumPy array is created with these padded dimensions.
- Each frame is then inserted into its corresponding grid cell. The remaining area (if any) remains as zero padding.

This approach ensures your flipbook texture will have power-of-two dimensions, making it more compatible with Unity’s texture requirements.

Below is an example that takes each GRIB frame, resamples it so that its pixels become “square” in terms of physical distance (using an approximate conversion from degrees to meters), and then adds the resampled frame to a flipbook. In this example, we also handle the case where the time coordinate may be named "valid_time" instead of "time", and afterward, we pad the final flipbook to the closest power‐of‐two dimensions.

Note: This example uses OpenCV for image resizing. Make sure you have it installed (e.g. pip install opencv-python).

Code Example

import math
import numpy as np
import cv2
import xarray as xr
import rasterio
from rasterio.transform import from_origin

# Open the GRIB file (using cfgrib)
ds = xr.open_dataset('your_file.grib', engine='cfgrib')

# Determine which time coordinate exists: 'time' or 'valid_time'
time_dim = "time" if "time" in ds.dims or "time" in ds.coords else "valid_time"
print(f"Using time coordinate: {time_dim}")

# Select a variable (or loop over all desired variables)
da = ds['your_variable']  # replace with the variable name you want to extract

# Extract the coordinate arrays (adjust names if needed)
lats = ds['latitude'].values  # or ds['lat'] if appropriate
lons = ds['longitude'].values  # or ds['lon']

# Compute a basic affine transform from the dataset (assuming a regular grid)
# Here we assume the coordinates are ordered so that lats[0] is the top (max lat)
pixel_width = (lons[-1] - lons[0]) / (len(lons) - 1)
pixel_height = (lats[0] - lats[-1]) / (len(lats) - 1)

# For converting degrees to meters, approximate conversion factors:
# (1 degree latitude ~ 111320 meters; 1 degree longitude ~ 111320*cos(latitude))
avg_lat = np.mean(lats)
m_per_deg_lat = 111320.0
m_per_deg_lon = 111320.0 * math.cos(math.radians(avg_lat))

# Calculate the pixel dimensions in meters
px_width_m = pixel_width * m_per_deg_lon
px_height_m = abs(pixel_height) * m_per_deg_lat

# Determine new dimensions so that each pixel represents a square area:
# If the pixel is "narrow" (px_width_m < px_height_m) then we must stretch horizontally,
# otherwise we stretch vertically.
h, w = da.isel(**{time_dim: 0}).shape  # original frame dimensions

if px_width_m < px_height_m:
    scale_factor = px_height_m / px_width_m
    new_w = round(w * scale_factor)
    new_h = h
elif px_width_m > px_height_m:
    scale_factor = px_width_m / px_height_m
    new_h = round(h * scale_factor)
    new_w = w
else:
    new_w, new_h = w, h

print(f"Original dimensions: ({w}, {h}) -> Resized dimensions: ({new_w}, {new_h})")

# Gather the time values from the correct coordinate
time_values = da[time_dim].values
num_frames = len(time_values)

# --- Creating the Flipbook ---

# First, resize each frame to have square pixels
resized_frames = []
for t in time_values:
    # Extract the frame (ensure float32)
    frame = da.sel(**{time_dim: t}).values.astype(np.float32)
    # Resize the frame using OpenCV's interpolation
    resized_frame = cv2.resize(frame, (new_w, new_h), interpolation=cv2.INTER_LINEAR)
    resized_frames.append(resized_frame)

# Decide grid layout: a roughly square grid (based on the number of frames)
cols = math.ceil(math.sqrt(num_frames))
rows = math.ceil(num_frames / cols)

# Compute the unpadded flipbook dimensions
orig_flipbook_width = cols * new_w
orig_flipbook_height = rows * new_h

# Function to get next power of two for a given dimension
def next_power_of_two(n):
    return 2**math.ceil(math.log2(n))

# Pad dimensions to the closest power of two (for texture compatibility in Unity)
flipbook_width = next_power_of_two(orig_flipbook_width)
flipbook_height = next_power_of_two(orig_flipbook_height)

# Create an empty (padded) flipbook array
flipbook = np.zeros((flipbook_height, flipbook_width), dtype=np.float32)

# Fill the flipbook with each resized frame
for idx, frame in enumerate(resized_frames):
    r = idx // cols
    c = idx % cols
    y0, x0 = r * new_h, c * new_w
    flipbook[y0:y0+new_h, x0:x0+new_w] = frame

print("Final flipbook dimensions (padded):", flipbook.shape)

# --- Optional: Save an individual resized frame (as an example) ---
# You might want to adjust the transform for the resized frame if georeferencing is needed.
out_file = "your_variable_resized_example.tif"
transform = from_origin(lons[0], lats[0], pixel_width, pixel_height)  # original; may not be used for UV mapping

with rasterio.open(
    out_file,
    'w',
    driver='GTiff',
    height=new_h,
    width=new_w,
    count=1,
    dtype='float32',
    crs='EPSG:4326',  # source CRS; for UV mapping, the georeferencing might be irrelevant
    transform=transform,
) as dst:
    dst.write(resized_frames[0], 1)

print(f"Saved an example resized frame as {out_file}")

# --- Optional: Save the flipbook as a TIFF ---
flipbook_out = "your_variable_flipbook_texture.tif"
# For the flipbook, georeferencing is usually not needed because it will be used as a texture atlas.
with rasterio.open(
    flipbook_out,
    'w',
    driver='GTiff',
    height=flipbook.shape[0],
    width=flipbook.shape[1],
    count=1,
    dtype='float32',
    crs='EPSG:4326',  # This CRS is kept, but it might not be used in Unity UV mapping.
    transform=from_origin(0, flipbook.shape[0], 1, 1),  # Dummy transform
) as dst:
    dst.write(flipbook, 1)

print(f"Flipbook texture saved as {flipbook_out}")

Explanation

Time Dimension Handling:
We check whether the dataset uses "time" or "valid_time" and then extract frames accordingly.
Geometric Correction for Square Pixels:
- The code computes the degree-to-meter conversion for latitude and longitude using an approximate conversion factor and the average latitude.
- It calculates the real-world pixel sizes (px_width_m and px_height_m) and then determines the scaling factor needed so that the resampled image has square pixels in terms of meters.
- Depending on whether the original pixel is “wider” or “taller” in real-world units, we rescale either the width or height.
Resampling Frames:
Each frame is resized using OpenCV to produce an image whose pixels now represent a square area. This resampling makes the image ready for UV mapping on a 3D mesh.
Flipbook Assembly and Power-of-Two Padding:
The resized frames are arranged into a grid (the “flipbook”), and then the final image is padded so that its dimensions are the nearest power of two—ensuring compatibility with Unity’s texture requirements.

This workflow yields individual images (and an overall flipbook texture) that are transformed for proper UV mapping on a 3D mesh rather than for geospatial display.

Below is a complete code example that uses Rasterio’s built‐in reproject (which wraps resampling) to “stretch” each frame so that its pixels become square in terms of physical (meter) distance. This way, the output image is ready for UV mapping on a 3D mesh rather than for geospatial display.

In this example, we also handle the case where the time coordinate may be named "time" or "valid_time", and then we assemble a flipbook atlas that is padded to power‐of‐two dimensions.

import math
import numpy as np
import xarray as xr
import rasterio
from rasterio.transform import from_origin
from rasterio.warp import reproject, Resampling

# --- 1. Open the GRIB file and select the time coordinate ---

# Open the GRIB file (using cfgrib)
ds = xr.open_dataset('your_file.grib', engine='cfgrib')

# Determine the time coordinate: 'time' or 'valid_time'
time_dim = "time" if "time" in ds.dims or "time" in ds.coords else "valid_time"
print(f"Using time coordinate: {time_dim}")

# Select your desired variable (replace 'your_variable' with the actual name)
da = ds['your_variable']

# --- 2. Determine the original grid and compute physical pixel sizes ---

# Get coordinate arrays (adjust names if necessary)
lats = ds['latitude'].values  # or ds['lat']
lons = ds['longitude'].values  # or ds['lon']

# Compute original pixel size in degrees (assumes regular grid)
pixel_width = (lons[-1] - lons[0]) / (len(lons) - 1)
pixel_height = (lats[0] - lats[-1]) / (len(lats) - 1)

# Create a source transform (upper-left origin)
src_transform = from_origin(lons[0], lats[0], pixel_width, pixel_height)

# Approximate conversion factors:
# 1 degree latitude ~ 111320 meters; 1 degree longitude ~ 111320*cos(latitude) meters.
avg_lat = np.mean(lats)
m_per_deg_lat = 111320.0
m_per_deg_lon = 111320.0 * math.cos(math.radians(avg_lat))

# Compute physical pixel sizes in meters
px_width_m = pixel_width * m_per_deg_lon
px_height_m = pixel_height * m_per_deg_lat

# --- 3. Compute new resolution for square pixels in meters ---

# We choose to keep the vertical resolution (pixel height in degrees) unchanged
# and adjust the horizontal resolution (pixel width in degrees) when needed.
if px_width_m < px_height_m:
    # Pixels are “narrow” in meters; stretch horizontally.
    # New pixel width (in degrees) so that: new_pixel_width * m_per_deg_lon = pixel_height * m_per_deg_lat.
    new_pixel_width = (pixel_height * m_per_deg_lat) / m_per_deg_lon
    new_pixel_height = pixel_height  # unchanged
    # New number of columns for the image extent
    new_w = math.ceil((lons[-1] - lons[0]) / new_pixel_width)
    # Use the original row count (height in pixels)
    new_h = da.isel(**{time_dim: 0}).shape[0]
elif px_width_m > px_height_m:
    # Pixels are “short” in meters; stretch vertically.
    # New pixel height (in degrees) so that: new_pixel_height * m_per_deg_lat = pixel_width * m_per_deg_lon.
    new_pixel_height = (pixel_width * m_per_deg_lon) / m_per_deg_lat
    new_pixel_width = pixel_width  # unchanged
    # New number of rows for the image extent
    new_h = math.ceil((lats[0] - lats[-1]) / new_pixel_height)
    # Use the original column count (width in pixels)
    new_w = da.isel(**{time_dim: 0}).shape[1]
else:
    # Already square in meters
    new_w, new_h = da.isel(**{time_dim: 0}).shape[1], da.isel(**{time_dim: 0}).shape[0]
    new_pixel_width, new_pixel_height = pixel_width, pixel_height

print(f"Original frame dimensions: {da.isel(**{time_dim: 0}).shape}")
print(f"Resampled frame dimensions: ({new_h}, {new_w})")
print(f"New pixel size (deg): ({new_pixel_width}, {new_pixel_height})")

# Define destination transform for the resampled grid
dst_transform = from_origin(lons[0], lats[0], new_pixel_width, new_pixel_height)

# --- 4. Resample each frame using Rasterio's warp.reproject ---

time_values = da[time_dim].values
num_frames = len(time_values)
resampled_frames = []

for t in time_values:
    # Extract the frame as float32
    frame = da.sel(**{time_dim: t}).values.astype(np.float32)
    # Prepare an empty destination array for the resampled data
    dst_frame = np.empty((new_h, new_w), dtype=np.float32)
    # Reproject (resample) from the original grid to the new grid
    reproject(
        source=frame,
        destination=dst_frame,
        src_transform=src_transform,
        src_crs='EPSG:4326',
        dst_transform=dst_transform,
        dst_crs='EPSG:4326',
        resampling=Resampling.bilinear
    )
    resampled_frames.append(dst_frame)

# --- 5. Assemble the resampled frames into a flipbook atlas and pad to power-of-two dimensions ---

# Decide grid layout (roughly square)
cols = math.ceil(math.sqrt(num_frames))
rows = math.ceil(num_frames / cols)

# Calculate the unpadded flipbook dimensions
orig_flipbook_width = cols * new_w
orig_flipbook_height = rows * new_h

# Function to compute the next power of two for a given dimension
def next_power_of_two(n):
    return 2 ** math.ceil(math.log2(n))

flipbook_width = next_power_of_two(orig_flipbook_width)
flipbook_height = next_power_of_two(orig_flipbook_height)

# Create an empty flipbook array with padded dimensions
flipbook = np.zeros((flipbook_height, flipbook_width), dtype=np.float32)

# Place each resampled frame into its cell in the flipbook grid
for idx, frame in enumerate(resampled_frames):
    r = idx // cols
    c = idx % cols
    y0, x0 = r * new_h, c * new_w
    flipbook[y0:y0+new_h, x0:x0+new_w] = frame

print("Final flipbook dimensions (padded):", flipbook.shape)

Explanation

Time Coordinate Handling:
The code checks for "time" versus "valid_time" and then uses the available coordinate to index each frame.
Computing Physical Pixel Size:
We derive the original pixel dimensions in degrees and convert these to approximate meters using average latitude.
Resampling for Square Pixels:
Instead of using OpenCV, we use Rasterio’s reproject function to resample each frame. The destination grid is defined by a new transform (via dst_transform) that sets the pixel size in degrees so that the physical size (in meters) becomes square. In this example, we choose to keep one of the dimensions unchanged and adjust the other accordingly.
Flipbook Assembly:
The resampled frames are arranged into a grid. The overall flipbook image is then padded to the next power of two in each dimension, which is often required for textures in game engines like Unity.

This workflow produces images with the proper aspect ratio for UV mapping on a 3D mesh—all without using OpenCV, but relying solely on Rasterio’s built-in resampling capabilities.

Below is an example that, for each time frame in your GRIB file, will:

Trim the data to the specified bounding box
Resample (via Rasterio’s built‐in reproject) the subset to an 8×8 pixel grid
(Optionally) Store each 8×8 image for further processing (e.g. creating a flipbook)

This approach uses Rasterio’s functions rather than OpenCV so that the entire workflow is handled via geospatial routines.

import math
import numpy as np
import xarray as xr
import rasterio
from rasterio.transform import from_bounds
from rasterio.warp import reproject, Resampling

# --- 1. Open the GRIB file and determine the time coordinate ---
ds = xr.open_dataset('your_file.grib', engine='cfgrib')
time_dim = "time" if "time" in ds.dims or "time" in ds.coords else "valid_time"
print(f"Using time coordinate: {time_dim}")

# Select the desired variable (replace 'your_variable' with the actual variable name)
da = ds['your_variable']

# --- 2. Define the target bounding box and destination grid ---
# Bounding box: left, bottom, right, top
left, bottom, right, top = 17.86595452, 49.14841321, 19.12404888, 49.95775356

# Define the destination dimensions (8x8 pixels, power-of-two)
dst_width, dst_height = 8, 8

# Compute a destination transform that covers the specified bounds
dst_transform = from_bounds(left, bottom, right, top, dst_width, dst_height)
print("Destination transform:", dst_transform)

# --- 3. Determine the source grid transform ---
# Assume the dataset has regular longitude and latitude arrays named "longitude" and "latitude"
lons = ds['longitude'].values
lats = ds['latitude'].values

# If lats is descending (common for geospatial data), then:
src_left = lons[0]
src_right = lons[-1]
src_bottom = lats[-1]
src_top = lats[0]

# Use the full dataset dimensions to build the source transform
src_width = len(lons)
src_height = len(lats)
src_transform = from_bounds(src_left, src_bottom, src_right, src_top, src_width, src_height)
print("Source transform:", src_transform)

# --- 4. Process each time frame: trim and resample to 8x8 ---
time_values = da[time_dim].values
resampled_frames = []

for t in time_values:
    # Extract the frame as a float32 array
    src_array = da.sel(**{time_dim: t}).values.astype(np.float32)

    # Create an empty destination array with shape (8,8)
    dst_array = np.empty((dst_height, dst_width), dtype=np.float32)

    # Reproject (which will also trim and resample) the source data to the destination grid
    reproject(
        source=src_array,
        destination=dst_array,
        src_transform=src_transform,
        src_crs='EPSG:4326',
        dst_transform=dst_transform,
        dst_crs='EPSG:4326',
        resampling=Resampling.bilinear
    )

    resampled_frames.append(dst_array)
    print(f"Processed frame for time {t} -> shape: {dst_array.shape}")

# At this point, each frame in resampled_frames is an 8x8 array trimmed to the desired area.
# These images are now ready for further processing (e.g., saved as TIFFs or assembled into a flipbook).

Explanation

Bounding Box & Destination Transform:
The bounding box is defined by the four coordinates. We then create a destination transform using from_bounds(...) that forces the output grid to be exactly 8×8 pixels. This effectively “trims” the data to the given region and resamples it.
Source Transform:
The source transform is derived from the full dataset’s coordinate arrays. Here we assume the GRIB file provides "longitude" and "latitude" and that the latitude array is descending (i.e. the first element is the top).
Reprojecting:
Rasterio’s reproject function is used to extract the subset defined by the destination transform while resampling from the source grid. We use bilinear resampling here, but you can adjust the resampling parameter if needed.
Result:
Each processed frame is stored in an 8×8 NumPy array—perfectly sized for power-of-two textures (8×8) and ready for UV mapping on a 3D mesh in Unity.

Feel free to adjust the variable names and file paths as needed for your project.

this is highly parallelizable. Execute data variables in parallel

import glob import math

import xarray as xr import os import tempfile

from data_utils import open_dataset_file import numpy as np import rasterio from rasterio.transform import from_origin, from_bounds from rasterio.warp import reproject, Resampling

directory = “F:/Github/ULC/ioniq-geography/gis/data/copernicus/CZ_TRAVNY_ENSEMBLE_DAILYMEAN_02_2025/”

cache_directory = os.path.join(tempfile.gettempdir(), “cache”) os.makedirs(cache_directory, exist_ok=True)

files = [file for file in glob.glob(“F:/Github/ULC/ioniq-geography/gis/data/copernicus/CZ_TRAVNY_ENSEMBLE_DAILYMEAN_02_2025/*”) if file.endswith((‘.nc’, ‘.grib’))]

cached_files = [] for file in files: cache_path = os.path.join(cache_directory, os.path.basename(file)) if not os.path.exists(cache_path): os.symlink(file, cache_path) cached_files.append(cache_path)

for file in cached_files:

    # Open the GRIB file using cfgrib.
    ds = open_dataset_file(file)

    # Inspect the dataset to see available variables and dimensions (e.g., time, latitude, longitude)
    vars = ds.data_vars

    for var in vars:
        print(f"Variable: {var} : {ds[var].attrs['GRIB_name']}")
        print(f"Attributes: {ds[var].attrs}")
        print(f"Dimensions: {ds[var].dims}")
        print(f"Shape: {ds[var].shape}\n")

        da = ds[var]  # Replace with your variable name.
        # Determine which time dimension is available
        time_dim = "time" if "time" in da.dims else "valid_time"
        time_values = da[time_dim].values
        num_frames = len(time_values)

        # --- 2. Define the target bounding box and destination grid ---
        # Bounding box: left, bottom, right, top
        left, bottom, right, top = 17.86595452, 49.14841321, 19.12404888, 49.95775356

        # Define the destination dimensions (8x8 pixels, power-of-two)
        dst_width, dst_height = 8, 8

        # Compute a destination transform that covers the specified bounds
        dst_transform = from_bounds(left, bottom, right, top, dst_width, dst_height)
        print("Destination transform:", dst_transform)

        # --- 3. Determine the source grid transform ---
        # Assume the dataset has regular longitude and latitude arrays named "longitude" and "latitude"
        lons = ds['longitude'].values
        lats = ds['latitude'].values

        # If lats is descending (common for geospatial data), then:
        src_left = lons[0]
        src_right = lons[-1]
        src_bottom = lats[-1]
        src_top = lats[0]

        # Use the full dataset dimensions to build the source transform
        src_width = len(lons)
        src_height = len(lats)
        src_transform = from_bounds(src_left, src_bottom, src_right, src_top, src_width, src_height)
        # print("Source transform:", src_transform)

        # --- 4. Process each time frame: trim and resample to 8x8 ---
        time_values = da[time_dim].values
        resampled_frames = []

        for t in time_values:
            # Extract the frame as a float32 array
            src_array = da.sel(**{time_dim: t}).values.astype(np.float32)

            # Create an empty destination array with shape (8,8)
            dst_array = np.empty((dst_height, dst_width), dtype=np.float32)

            # Reproject (which will also trim and resample) the source data to the destination grid
            reproject(
                source=src_array,
                destination=dst_array,
                src_transform=src_transform,
                src_crs='EPSG:4326',
                dst_transform=dst_transform,
                dst_crs='EPSG:4326',
                resampling=Resampling.bilinear
            )

            resampled_frames.append(dst_array)
            # print(f"Processed frame for time {t} -> shape: {dst_array.shape}")

            # Create an output filename using the variable name and date.
            # os.makedirs(os.path.join(directory,var), exist_ok=True)
            # out_file = os.path.join(directory, var, f"{var}_{str(t)[:10]}.tif")  # e.g., "temperature_2025-03-18.tif"
            #
            # # Write the array to a GeoTIFF.
            # with rasterio.open(
            #         out_file,
            #         'w',
            #         driver='GTiff',
            #         height=dst_array.shape[0],
            #         width=dst_array.shape[1],
            #         count=1,
            #         dtype='float32',
            #         crs='EPSG:4326',  # adjust CRS if needed
            #         transform=dst_transform,
            # ) as dst:
            #     dst.write(dst_array, 1)

        # Decide grid dimensions (e.g., a square grid)
        cols = math.ceil(math.sqrt(num_frames))
        rows = math.ceil(num_frames / cols)

        # Original flipbook dimensions based on the number of frames
        orig_width = cols * dst_width
        orig_height = rows * dst_height

        # Function to compute the next power of two for a given number
        def next_power_of_two ( n ):
            return 2 ** math.ceil(math.log2(n))

        # Compute the padded dimensions (each will be a power of two)
        flipbook_width = next_power_of_two(orig_width)
        flipbook_height = next_power_of_two(orig_height)

        # Create an empty array for the flipbook texture with padded dimensions
        flipbook = np.zeros((flipbook_height, flipbook_width), dtype=np.float32)
        #
        print("Flipbook Info:")
        print(f"Grid Dimensions: {rows} rows x {cols} cols")
        print(f"Total Flipbook Size: {flipbook.shape[0]} x {flipbook.shape[1]}")
        print()

        # # Fill the flipbook with each frame (placed in the top-left corner of their grid cell)
        for idx, frame in enumerate(resampled_frames):
            # Use the appropriate time coordinate (either 'time' or 'valid_time') as needed
            r = idx // cols
            c = idx % cols
            flipbook[r * dst_height:(r + 1) * dst_height, c * dst_width:(c + 1) * dst_width] = frame
        #
        # Write the flipbook texture as a TIFF.
        flipbook_out = os.path.join(directory,
                                    f"{os.path.splitext(os.path.basename(file))[0]}__{rows}x{cols}_{str(time_values[0])[:10]}_{str(time_values[-1])[:10]}.tif")
        with rasterio.open(
                flipbook_out,
                'w',
                driver='GTiff',
                height=flipbook.shape[0],
                width=flipbook.shape[1],
                count=1,
                dtype='float32',
                crs='EPSG:4326',  # adjust as needed
                transform=dst_transform,
                # transform might need adjusting
        ) as dst:
            dst.write(flipbook, 1)

Below is an example that wraps the variable‐processing code into a function and then uses Python’s ProcessPoolExecutor to run each variable’s processing in parallel. In this example, each variable from each GRIB/NetCDF file is trimmed to the given bounding box, resampled to an 8×8 grid, assembled into a flipbook (with padding to power‐of‐two dimensions), and then saved as a GeoTIFF.

import glob
import math
import os
import tempfile
import concurrent.futures

import xarray as xr
import numpy as np
import rasterio
from rasterio.transform import from_bounds
from rasterio.warp import reproject, Resampling

# Import your own helper to open datasets.
from data_utils import open_dataset_file

# Define the working directory and cache directory.
directory = "F:/Github/ULC/ioniq-geography/gis/data/copernicus/CZ_TRAVNY_ENSEMBLE_DAILYMEAN_02_2025/"
cache_directory = os.path.join(tempfile.gettempdir(), "cache")
os.makedirs(cache_directory, exist_ok=True)

# Get list of GRIB/NetCDF files and create symlinks in the cache.
files = [file for file in glob.glob(os.path.join(directory, "*"))
         if file.endswith(('.nc', '.grib'))]

cached_files = []
for file in files:
    cache_path = os.path.join(cache_directory, os.path.basename(file))
    if not os.path.exists(cache_path):
        os.symlink(file, cache_path)
    cached_files.append(cache_path)

def process_variable(file, var_name):
    """
    Process a single variable from a GRIB/NetCDF file:
      - Opens the file.
      - Trims to the given bounding box.
      - Resamples each time frame to an 8x8 grid.
      - Assembles the frames into a flipbook padded to power-of-two dimensions.
      - Saves the flipbook as a GeoTIFF.
    """
    ds = open_dataset_file(file)
    da = ds[var_name]

    # Determine the time coordinate key.
    time_dim = "time" if "time" in da.dims else "valid_time"
    time_values = da[time_dim].values
    num_frames = len(time_values)

    # Define the target bounding box: left, bottom, right, top.
    left, bottom, right, top = 17.86595452, 49.14841321, 19.12404888, 49.95775356

    # Destination grid dimensions (8x8 pixels).
    dst_width, dst_height = 8, 8
    dst_transform = from_bounds(left, bottom, right, top, dst_width, dst_height)

    # Create source transform from dataset's longitude/latitude.
    lons = ds['longitude'].values
    lats = ds['latitude'].values
    # Assuming lats are in descending order.
    src_left = lons[0]
    src_right = lons[-1]
    src_bottom = lats[-1]
    src_top = lats[0]
    src_width = len(lons)
    src_height = len(lats)
    src_transform = from_bounds(src_left, src_bottom, src_right, src_top, src_width, src_height)

    # Process each time frame.
    resampled_frames = []
    for t in time_values:
        src_array = da.sel(**{time_dim: t}).values.astype(np.float32)
        dst_array = np.empty((dst_height, dst_width), dtype=np.float32)
        reproject(
            source=src_array,
            destination=dst_array,
            src_transform=src_transform,
            src_crs='EPSG:4326',
            dst_transform=dst_transform,
            dst_crs='EPSG:4326',
            resampling=Resampling.bilinear
        )
        resampled_frames.append(dst_array)

    # Assemble the flipbook from the resampled frames.
    cols = math.ceil(math.sqrt(num_frames))
    rows = math.ceil(num_frames / cols)

    # Compute padded dimensions (power of two).
    def next_power_of_two(n):
        return 2 ** math.ceil(math.log2(n))

    orig_width = cols * dst_width
    orig_height = rows * dst_height
    flipbook_width = next_power_of_two(orig_width)
    flipbook_height = next_power_of_two(orig_height)

    flipbook = np.zeros((flipbook_height, flipbook_width), dtype=np.float32)
    for idx, frame in enumerate(resampled_frames):
        r = idx // cols
        c = idx % cols
        flipbook[r * dst_height:(r + 1) * dst_height,
                 c * dst_width:(c + 1) * dst_width] = frame

    # Create an output filename using the file base, variable name, and time range.
    start_date = str(time_values[0])[:10]
    end_date = str(time_values[-1])[:10]
    out_filename = f"{os.path.splitext(os.path.basename(file))[0]}__{var_name}__{rows}x{cols}_{start_date}_{end_date}.tif"
    out_file = os.path.join(directory, out_filename)

    # Write the flipbook as a GeoTIFF.
    with rasterio.open(
        out_file,
        'w',
        driver='GTiff',
        height=flipbook.shape[0],
        width=flipbook.shape[1],
        count=1,
        dtype='float32',
        crs='EPSG:4326',
        transform=dst_transform  # Using the destination transform as a placeholder.
    ) as dst:
        dst.write(flipbook, 1)

    print(f"Saved flipbook for variable '{var_name}' from file '{file}' as {out_file}")
    return out_file

# Process each variable in parallel using a ProcessPoolExecutor.
jobs = []
with concurrent.futures.ProcessPoolExecutor() as executor:
    for file in cached_files:
        # Open file once to get available data variables.
        ds = open_dataset_file(file)
        for var_name in ds.data_vars:
            # Submit a processing job for each variable.
            jobs.append(executor.submit(process_variable, file, var_name))

    # Collect results.
    for future in concurrent.futures.as_completed(jobs):
        try:
            result = future.result()
        except Exception as e:
            print("Error processing job:", e)
        else:
            print("Job finished, output file:", result)

Explanation

Function Definition:
The process_variable function encapsulates the steps for opening a file, determining the correct time coordinate, trimming to a bounding box, reprojecting/resampling to an 8×8 grid, assembling the flipbook (with power‐of‐two padding), and saving the output.
Parallel Execution:
Using concurrent.futures.ProcessPoolExecutor, we submit each variable processing job in parallel. This is effective because processing one variable (and its time frames) is independent of the others.
Caching & File Handling:
The code creates symbolic links in a temporary cache directory and then iterates over those cached files to process each variable in parallel.

This approach should help speed up the processing by taking advantage of multiple CPU cores. Feel free to adjust file paths and parameters as needed for your specific workflow.

Below is a revised version of the code where each file is loaded once (outside the variable‐processing loop) and then each variable is processed in parallel. In this example, we use a ThreadPoolExecutor so that the in‑memory dataset (loaded only once per file) is shared between threads. (Note that if your processing is very CPU‑bound, you might need to use processes—but then you must re‑open the dataset in each process since objects aren’t shared between processes.)

import glob
import math
import os
import tempfile
import concurrent.futures

import xarray as xr
import numpy as np
import rasterio
from rasterio.transform import from_bounds
from rasterio.warp import reproject, Resampling

# Import your own helper to open datasets.
from data_utils import open_dataset_file

# Define the working directory and cache directory.
directory = "F:/Github/ULC/ioniq-geography/gis/data/copernicus/CZ_TRAVNY_ENSEMBLE_DAILYMEAN_02_2025/"
cache_directory = os.path.join(tempfile.gettempdir(), "cache")
os.makedirs(cache_directory, exist_ok=True)

# Get list of GRIB/NetCDF files and create symlinks in the cache.
files = [file for file in glob.glob(os.path.join(directory, "*"))
         if file.endswith(('.nc', '.grib'))]

cached_files = []
for file in files:
    cache_path = os.path.join(cache_directory, os.path.basename(file))
    if not os.path.exists(cache_path):
        os.symlink(file, cache_path)
    cached_files.append(cache_path)

def process_variable(ds, var_name, file):
    """
    Process a single variable from an already opened dataset:
      - Trims to the given bounding box.
      - Resamples each time frame to an 8x8 grid.
      - Assembles the frames into a flipbook padded to power-of-two dimensions.
      - Saves the flipbook as a GeoTIFF.
    """
    da = ds[var_name]
    # Determine which time coordinate is available.
    time_dim = "time" if "time" in da.dims else "valid_time"
    time_values = da[time_dim].values
    num_frames = len(time_values)

    # Define target bounding box: left, bottom, right, top.
    left, bottom, right, top = 17.86595452, 49.14841321, 19.12404888, 49.95775356

    # Destination grid dimensions (8x8 pixels).
    dst_width, dst_height = 8, 8
    dst_transform = from_bounds(left, bottom, right, top, dst_width, dst_height)

    # Build source transform from the dataset's longitude/latitude.
    lons = ds['longitude'].values
    lats = ds['latitude'].values
    # Assuming latitude is descending.
    src_left = lons[0]
    src_right = lons[-1]
    src_bottom = lats[-1]
    src_top = lats[0]
    src_width = len(lons)
    src_height = len(lats)
    src_transform = from_bounds(src_left, src_bottom, src_right, src_top, src_width, src_height)

    # Process each time frame.
    resampled_frames = []
    for t in time_values:
        src_array = da.sel(**{time_dim: t}).values.astype(np.float32)
        dst_array = np.empty((dst_height, dst_width), dtype=np.float32)
        reproject(
            source=src_array,
            destination=dst_array,
            src_transform=src_transform,
            src_crs='EPSG:4326',
            dst_transform=dst_transform,
            dst_crs='EPSG:4326',
            resampling=Resampling.bilinear
        )
        resampled_frames.append(dst_array)

    # Assemble the flipbook from the resampled frames.
    cols = math.ceil(math.sqrt(num_frames))
    rows = math.ceil(num_frames / cols)

    # Compute padded dimensions (power of two).
    def next_power_of_two(n):
        return 2 ** math.ceil(math.log2(n))

    orig_width = cols * dst_width
    orig_height = rows * dst_height
    flipbook_width = next_power_of_two(orig_width)
    flipbook_height = next_power_of_two(orig_height)

    flipbook = np.zeros((flipbook_height, flipbook_width), dtype=np.float32)
    for idx, frame in enumerate(resampled_frames):
        r = idx // cols
        c = idx % cols
        flipbook[r * dst_height:(r + 1) * dst_height,
                 c * dst_width:(c + 1) * dst_width] = frame

    # Create an output filename using the file base, variable name, and time range.
    start_date = str(time_values[0])[:10]
    end_date = str(time_values[-1])[:10]
    out_filename = f"{os.path.splitext(os.path.basename(file))[0]}__{var_name}__{rows}x{cols}_{start_date}_{end_date}.tif"
    out_file = os.path.join(directory, out_filename)

    # Write the flipbook as a GeoTIFF.
    with rasterio.open(
        out_file,
        'w',
        driver='GTiff',
        height=flipbook.shape[0],
        width=flipbook.shape[1],
        count=1,
        dtype='float32',
        crs='EPSG:4326',
        transform=dst_transform  # Using the destination transform as a placeholder.
    ) as dst:
        dst.write(flipbook, 1)

    print(f"Saved flipbook for variable '{var_name}' from file '{file}' as {out_file}")
    return out_file

# Process each file sequentially but process variables in parallel for each file.
for file in cached_files:
    print("Processing file:", file)
    ds = open_dataset_file(file)
    futures = []
    with concurrent.futures.ThreadPoolExecutor() as executor:
        for var_name in ds.data_vars:
            # Submit a job for each variable from the already loaded dataset.
            futures.append(executor.submit(process_variable, ds, var_name, file))

        # Wait for variable processing to complete.
        for future in concurrent.futures.as_completed(futures):
            try:
                result = future.result()
            except Exception as e:
                print("Error processing variable:", e)
            else:
                print("Finished processing output file:", result)

Explanation

Loading the Dataset Once per File:
For each file in cached_files, the dataset is loaded (via open_dataset_file(file)) once. Then, we iterate over ds.data_vars and submit jobs for each variable.
Passing the Dataset Object:
The function process_variable now accepts the already opened dataset (ds), the variable name, and the file path. This avoids re‑opening the dataset inside the function.
Parallel Variable Processing with Threads:
A ThreadPoolExecutor is used to process the variables in parallel. Threads can share in‑memory objects, so this method avoids re‑opening files and re‑loading the dataset.
Flipbook Creation:
Each variable’s time frames are trimmed to the bounding box, resampled to an 8×8 grid, assembled into a flipbook (with padding to power‑of‑two dimensions), and then written as a GeoTIFF.

This approach minimizes redundant I/O by loading the dataset only once per file while still parallelizing the processing of multiple variables.

Ensure that the rows of the flipbook are filled as much as possible

flipbook = np.zeros((flipbook_height, flipbook_width), dtype=np.float32)
for idx, frame in enumerate(resampled_frames):
    r = idx // cols
    c = idx % cols
    flipbook[r * dst_height:(r + 1) * dst_height,
    c * dst_width:(c + 1) * dst_width] = frame

One approach is to choose the number of columns as the greatest power‐of‐two that is less than or equal to the total number of frames. This way, you “pack” as many frames per row as possible (with that power‐of‐two value) so that the first rows are completely filled and only the last row might be incomplete. For example:

import math

def greatest_power_of_two_leq(n):
    p = 1
    while p * 2 <= n:
        p *= 2
    return p

# Instead of using:
# cols = math.ceil(math.sqrt(num_frames))
# rows = math.ceil(num_frames / cols)
# We choose:
cols = greatest_power_of_two_leq(num_frames)
rows = math.ceil(num_frames / cols)

# Then create the flipbook as before.
flipbook = np.zeros((flipbook_height, flipbook_width), dtype=np.float32)
for idx, frame in enumerate(resampled_frames):
    r = idx // cols
    c = idx % cols
    flipbook[r * dst_height:(r + 1) * dst_height,
             c * dst_width:(c + 1) * dst_width] = frame

Explanation

Choosing Columns:
The helper function greatest_power_of_two_leq(n) computes the largest power of two that does not exceed the number of frames. This value is used as the number of columns so that the first (and possibly all but the last) row(s) are completely filled.
Computing Rows:
With cols determined, the number of rows is simply rows = ceil(num_frames / cols). This minimizes empty cells in all rows except possibly the last one.
Filling the Flipbook:
The loop then fills the flipbook grid row‑by‑row (left to right), ensuring that the first rows are fully occupied.

This modification makes sure that you maximize the number of frames per row (within the constraint of keeping the count as a power of two) so that rows are filled as much as possible.

Below is an example that—after generating individual 8×8 frames from each time step—stacks them into a 3D array and writes the result as a multi‐band GeoTIFF, with each band representing one slice (time frame) of the 3D texture. This output can be used in Unity as a 3D texture (volume texture), avoiding the jagged flipbook approach.

import glob
import math
import os
import tempfile
import concurrent.futures

import xarray as xr
import numpy as np
import rasterio
from rasterio.transform import from_bounds
from rasterio.warp import reproject, Resampling

# Import your own helper to open datasets.
from data_utils import open_dataset_file

# Define the working directory and cache directory.
directory = "F:/Github/ULC/ioniq-geography/gis/data/copernicus/CZ_TRAVNY_ENSEMBLE_DAILYMEAN_02_2025/"
cache_directory = os.path.join(tempfile.gettempdir(), "cache")
os.makedirs(cache_directory, exist_ok=True)

# Get list of GRIB/NetCDF files and create symlinks in the cache.
files = [file for file in glob.glob(os.path.join(directory, "*"))
         if file.endswith(('.nc', '.grib'))]

cached_files = []
for file in files:
    cache_path = os.path.join(cache_directory, os.path.basename(file))
    if not os.path.exists(cache_path):
        os.symlink(file, cache_path)
    cached_files.append(cache_path)

def process_variable_3d(ds, var_name, file):
    """
    Process a single variable from an already opened dataset:
      - Trims to the given bounding box.
      - Resamples each time frame to an 8x8 grid.
      - Stacks all frames into a 3D array.
      - Saves the 3D array as a multi-band GeoTIFF, where each band is one time slice.
    """
    da = ds[var_name]
    # Determine the time coordinate key.
    time_dim = "time" if "time" in da.dims else "valid_time"
    time_values = da[time_dim].values
    num_frames = len(time_values)

    # Define target bounding box: left, bottom, right, top.
    left, bottom, right, top = 17.86595452, 49.14841321, 19.12404888, 49.95775356

    # Destination grid dimensions (8x8 pixels).
    dst_width, dst_height = 8, 8
    dst_transform = from_bounds(left, bottom, right, top, dst_width, dst_height)

    # Build source transform from dataset's longitude/latitude.
    lons = ds['longitude'].values
    lats = ds['latitude'].values
    # Assuming latitude is descending.
    src_left = lons[0]
    src_right = lons[-1]
    src_bottom = lats[-1]
    src_top = lats[0]
    src_width = len(lons)
    src_height = len(lats)
    src_transform = from_bounds(src_left, src_bottom, src_right, src_top, src_width, src_height)

    # Process each time frame: trim and resample to an 8x8 grid.
    resampled_frames = []
    for t in time_values:
        src_array = da.sel(**{time_dim: t}).values.astype(np.float32)
        dst_array = np.empty((dst_height, dst_width), dtype=np.float32)
        reproject(
            source=src_array,
            destination=dst_array,
            src_transform=src_transform,
            src_crs='EPSG:4326',
            dst_transform=dst_transform,
            dst_crs='EPSG:4326',
            resampling=Resampling.bilinear
        )
        resampled_frames.append(dst_array)

    # Stack all resampled frames into a 3D texture (shape: num_frames x dst_height x dst_width).
    texture3d = np.stack(resampled_frames, axis=0)

    # Create an output filename using the file base, variable name, and time range.
    start_date = str(time_values[0])[:10]
    end_date = str(time_values[-1])[:10]
    out_filename = f"{os.path.splitext(os.path.basename(file))[0]}__{var_name}__3D_{start_date}_{end_date}.tif"
    out_file = os.path.join(directory, out_filename)

    # Write the 3D texture as a multi-band GeoTIFF.
    with rasterio.open(
        out_file,
        'w',
        driver='GTiff',
        height=dst_height,
        width=dst_width,
        count=num_frames,
        dtype='float32',
        crs='EPSG:4326',
        transform=dst_transform
    ) as dst:
        for i in range(num_frames):
            dst.write(texture3d[i, :, :], i + 1)

    print(f"Saved 3D texture for variable '{var_name}' from file '{file}' as {out_file}")
    return out_file

# Process each file sequentially but process variables in parallel for each file.
for file in cached_files:
    print("Processing file:", file)
    ds = open_dataset_file(file)
    futures = []
    with concurrent.futures.ThreadPoolExecutor() as executor:
        for var_name in ds.data_vars:
            # Submit a job for each variable from the already loaded dataset.
            futures.append(executor.submit(process_variable_3d, ds, var_name, file))

        # Wait for variable processing to complete.
        for future in concurrent.futures.as_completed(futures):
            try:
                result = future.result()
            except Exception as e:
                print("Error processing variable:", e)
            else:
                print("Finished processing output file:", result)

Explanation

Dataset Loading Once:
Each file is opened just once (using your helper function open_dataset_file), and then its data variables are processed in parallel using a ThreadPoolExecutor.
Resampling Frames:
For each time frame in the selected variable, the data is trimmed to the specified bounding box and resampled to an 8×8 grid using Rasterio’s reproject.
3D Texture Assembly:
Instead of assembling a 2D flipbook, all 8×8 frames are stacked along a new axis to create a 3D NumPy array. In this array, each “slice” (band) represents a time frame.
Output as Multi-band GeoTIFF:
The 3D array is written as a multi-band GeoTIFF (with count=num_frames), where each band corresponds to one frame. This file can later be imported into Unity and used as a 3D texture.

This approach avoids the jagged appearance of a flipbook atlas and provides a clean 3D texture that is directly usable in Unity.

Below is an example of how you can take your processed 3D texture data (a NumPy array with shape (depth, height, width) of type float32) and write it as a DDS file that Unity can import as a 3D texture. Unity supports 3D textures in DDS format (using a DX10 header) with a DXGI format such as R32_FLOAT for a single‐channel float texture.

For this example we’ll use the Python package ddsfile (install via pip install ddsfile). This package lets you create DDS files with DX10 headers. You can adjust the code if your texture has a different format or channel count.

Below is a self‑contained snippet that defines a helper function to write the 3D texture to a DDS file:

import numpy as np
import ddsfile

def write_texture3d_to_dds(texture3d: np.ndarray, filename: str):
    """
    Write a 3D texture (numpy array of shape (depth, height, width) with dtype float32)
    to a DDS file with a DX10 header using the DXGI_FORMAT_R32_FLOAT format.

    This file can then be imported into Unity as a 3D texture.
    """
    # Ensure texture3d is float32 and 3-dimensional.
    assert texture3d.ndim == 3, "Expected a 3D numpy array (depth, height, width)"
    if texture3d.dtype != np.float32:
        texture3d = texture3d.astype(np.float32)

    depth, height, width = texture3d.shape
    data_bytes = texture3d.tobytes()

    # Create the DDS file with DX10 header.
    # We use DXGI_FORMAT_R32_FLOAT (one 32-bit float per pixel).
    # Also specify that this is a volume texture (3D) with one mipmap level.
    dds = ddsfile.DDSFile(
        data_bytes,
        height=height,
        width=width,
        depth=depth,
        mipmapCount=1,
        flags=(
            ddsfile.DDSD_CAPS |
            ddsfile.DDSD_HEIGHT |
            ddsfile.DDSD_WIDTH |
            ddsfile.DDSD_PIXELFORMAT |
            ddsfile.DDSD_DEPTH
        ),
        caps=ddsfile.DDSCAPS_TEXTURE,
        caps2=ddsfile.DDSCAPS2_VOLUME,  # Mark as a volume texture.
        dx10=True,  # Enable DX10 header.
        dxgiFormat=ddsfile.DXGI_FORMAT_R32_FLOAT,  # 32-bit float format.
        resourceDimension=ddsfile.D3D10_RESOURCE_DIMENSION_TEXTURE3D,
    )

    # Write the DDS file to disk.
    with open(filename, 'wb') as f:
        f.write(dds.write())
    print(f"Saved 3D texture DDS: {filename}")

# --- Example Usage ---
# Suppose 'texture3d' is your 3D texture data obtained from processing (shape: num_frames x 8 x 8)
# For demonstration, we'll create a dummy texture:
texture3d = np.random.rand(16, 8, 8).astype(np.float32)  # e.g. 16 slices of 8x8 pixels

output_dds = "my_3d_texture.dds"
write_texture3d_to_dds(texture3d, output_dds)

How This Works

Texture Data Assumptions:
The function expects a 3D NumPy array where the first axis is depth (number of slices or time frames) and the remaining axes are height and width (8×8 in your case). The data is converted to bytes.
DDS Header with DX10 Extension:
By setting dx10=True and providing parameters such as dxgiFormat=ddsfile.DXGI_FORMAT_R32_FLOAT and resourceDimension=ddsfile.D3D10_RESOURCE_DIMENSION_TEXTURE3D, the DDS file is configured to represent a 3D texture (volume texture) in a format that Unity understands.
Output File:
The DDS file is written to disk. You can then import this file into Unity (for example, by using a custom importer or via scripting) as a Texture3D asset.

Feel free to adjust the pixel format or other parameters if you need a different channel configuration or compression.

Below is an updated example that processes each variable as follows:

Trims and resamples each time frame to an 8×8 grid (using a given bounding box).
Computes the global min and max across all the reprojected frames for that variable.
Normalizes each frame using these global min/max values.
Assembles a flipbook (for later conversion into a 3D texture in Unity).
Stores important properties (such as the original min/max, time range, grid dimensions, and file names) in a CSV file for later use in Unity.

Below is the complete code:

import glob
import math
import os
import tempfile
import concurrent.futures

import xarray as xr
import numpy as np
import rasterio
from rasterio.transform import from_bounds
from rasterio.warp import reproject, Resampling

# For writing out CSV.
import pandas as pd

# Import your own helper to open datasets.
from data_utils import open_dataset_file

# Define the working directory and a cache directory.
directory = "F:/Github/ULC/ioniq-geography/gis/data/copernicus/CZ_TRAVNY_ENSEMBLE_DAILYMEAN_02_2025/"
cache_directory = os.path.join(tempfile.gettempdir(), "cache")
os.makedirs(cache_directory, exist_ok=True)

# Get list of GRIB/NetCDF files and create symlinks in the cache.
files = [file for file in glob.glob(os.path.join(directory, "*"))
         if file.endswith(('.nc', '.grib'))]

cached_files = []
for file in files:
    cache_path = os.path.join(cache_directory, os.path.basename(file))
    if not os.path.exists(cache_path):
        os.symlink(file, cache_path)
    cached_files.append(cache_path)

def process_variable(ds, var_name, file):
    """
    For a given already opened dataset and a variable name:
      - Trim to the specified bounding box.
      - Resample each time frame to an 8x8 grid.
      - Compute the global min/max (across all time frames) and normalize each frame.
      - Assemble the normalized frames into a flipbook.
      - Save the flipbook as a GeoTIFF.
      - Return important properties as a dictionary.
    """
    da = ds[var_name]

    # Determine the time coordinate.
    time_dim = "time" if "time" in da.dims else "valid_time"
    time_values = da[time_dim].values
    num_frames = len(time_values)

    # --- Define destination grid parameters ---
    # Bounding box: left, bottom, right, top.
    left, bottom, right, top = 17.86595452, 49.14841321, 19.12404888, 49.95775356
    # Destination dimensions (8x8 pixels).
    dst_width, dst_height = 8, 8
    dst_transform = from_bounds(left, bottom, right, top, dst_width, dst_height)

    # --- Build source grid transform from dataset's coordinates ---
    lons = ds['longitude'].values
    lats = ds['latitude'].values
    # Assuming lats are in descending order.
    src_left = lons[0]
    src_right = lons[-1]
    src_bottom = lats[-1]
    src_top = lats[0]
    src_width = len(lons)
    src_height = len(lats)
    src_transform = from_bounds(src_left, src_bottom, src_right, src_top, src_width, src_height)

    # --- Process each time frame: reproject to the 8x8 grid ---
    reprojected_frames = []
    for t in time_values:
        src_array = da.sel(**{time_dim: t}).values.astype(np.float32)
        dst_array = np.empty((dst_height, dst_width), dtype=np.float32)
        reproject(
            source=src_array,
            destination=dst_array,
            src_transform=src_transform,
            src_crs='EPSG:4326',
            dst_transform=dst_transform,
            dst_crs='EPSG:4326',
            resampling=Resampling.bilinear
        )
        reprojected_frames.append(dst_array)

    # --- Compute global min and max across all frames ---
    all_frames = np.stack(reprojected_frames, axis=0)
    global_min = float(np.min(all_frames))
    global_max = float(np.max(all_frames))

    # --- Normalize each frame using the global min/max ---
    normalized_frames = [
        (frame - global_min) / (global_max - global_min) for frame in reprojected_frames
    ]

    # --- Assemble normalized frames into a flipbook ---
    # For this example, we lay out the frames in a square grid.
    cols = math.ceil(math.sqrt(num_frames))
    rows = math.ceil(num_frames / cols)
    orig_width = cols * dst_width
    orig_height = rows * dst_height
    flipbook = np.zeros((orig_height, orig_width), dtype=np.float32)
    for idx, frame in enumerate(normalized_frames):
        r = idx // cols
        c = idx % cols
        flipbook[r*dst_height:(r+1)*dst_height, c*dst_width:(c+1)*dst_width] = frame

    # --- Write the flipbook as a GeoTIFF ---
    start_date = str(time_values[0])[:10]
    end_date = str(time_values[-1])[:10]
    out_filename = f"{os.path.splitext(os.path.basename(file))[0]}__{var_name}__{rows}x{cols}_{start_date}_{end_date}.tif"
    out_file = os.path.join(directory, out_filename)

    with rasterio.open(
        out_file,
        'w',
        driver='GTiff',
        height=flipbook.shape[0],
        width=flipbook.shape[1],
        count=1,
        dtype='float32',
        crs='EPSG:4326',
        transform=dst_transform  # Using the destination transform as a placeholder.
    ) as dst:
        dst.write(flipbook, 1)

    print(f"Processed variable '{var_name}' from file '{file}'")

    # Prepare properties to store in the CSV.
    properties = {
        "file": file,
        "variable": var_name,
        "start_date": start_date,
        "end_date": end_date,
        "global_min": global_min,
        "global_max": global_max,
        "num_frames": num_frames,
        "dst_width": dst_width,
        "dst_height": dst_height,
        "flipbook_cols": cols,
        "flipbook_rows": rows,
        "output_file": out_file
    }
    return properties

# Process each file sequentially; for each file, process variables in parallel.
all_properties = []
for file in cached_files:
    print("Processing file:", file)
    ds = open_dataset_file(file)
    futures = []
    with concurrent.futures.ThreadPoolExecutor() as executor:
        for var_name in ds.data_vars:
            futures.append(executor.submit(process_variable, ds, var_name, file))
        for future in concurrent.futures.as_completed(futures):
            try:
                props = future.result()
                all_properties.append(props)
            except Exception as e:
                print("Error processing variable:", e)

# Save the variable properties to a CSV file.
df = pd.DataFrame(all_properties)
csv_file = os.path.join(directory, "variable_properties.csv")
df.to_csv(csv_file, index=False)
print("Saved variable properties CSV:", csv_file)

Explanation

Reprojection & Resampling:
Each time frame is reprojected to an 8×8 grid defined by the specified bounding box (using from_bounds and Rasterio’s reproject with bilinear resampling).
Global Min/Max Calculation:
All reprojected frames are stacked, and the global minimum and maximum are computed. These values are then used to normalize every frame.
Flipbook Assembly:
The normalized frames are arranged in a square grid (flipbook). (Unity can later convert this flipbook into a 3D texture automatically.)
CSV Output:
A dictionary of properties (including the original global min and max, time range, and grid dimensions) is returned for each variable and then saved as a CSV file using pandas.

This setup ensures that when you import the CSV into Unity later, you’ll have all the metadata needed to correctly interpret and display the normalized data.