Reading Subsets of Datasets#

Read in subsets of datasets and scales from standard PSI data files.

This example demonstrates how to use

These functions allow users to read in subsets of datasets based on either index positions or scale values. This is particularly useful for large datasets where only a portion of the data is needed for analysis.

Note

To reiterate, these functions do not read in the entire dataset; instead, they extract only the specified slices based on the provided indices or values. The only caveat to this is that when using read_hdf_by_value(), the function’s scales are read in to determine the appropriate indices corresponding to the requested values.

from pathlib import Path
from psi_io import data, read_hdf_meta, read_hdf_by_index, read_hdf_by_value, read_hdf_by_ivalue

Read in the metadata for a 3D data file (the radial magnetic field data).

br_data_filepath = data.get_3d_data()
metadata = read_hdf_meta(br_data_filepath)

Datasets for file: br.h5
+-----------------+-----------------+-----------------+
| Name            | Shape           | Type            |
+-----------------+-----------------+-----------------+
| Data            | (181, 100, 151) | float32         |
+-----------------+-----------------+-----------------+

Scales for file: br.h5
+------------+------------+------------+------------+------------+------------+------------+
| Dataset    | Dim        | Name       | Shape      | Type       | Min        | Max        |
+------------+------------+------------+------------+------------+------------+------------+
| Data       | 0          | dim1       | (151,)     | float32    | 0.99956024 | 30.511646  |
| Data       | 1          | dim2       | (100,)     | float32    | 0.0        | 3.1415927  |
| Data       | 2          | dim3       | (181,)     | float32    | 0.0        | 6.2831855  |
+------------+------------+------------+------------+------------+------------+------------+

Using this metadata, we can see that the dataset is a standard PSI 3D data file with scales:

radius (1st dimension), in units of solar radii (R☉),

co-latitude (2nd dimension), in radians,

longitude (3rd dimension), in radians.

To read in a subset of this data based on index positions, we can use read_hdf_by_index(). For example, to read in only the first position along the radius dimension (index 0) and all positions along the co-latitude and longitude dimensions, we can do:

subset_at_r0 = read_hdf_by_index(br_data_filepath, 0, None, None)
for r in subset_at_r0:
    print(r.shape)

(181, 100, 1)
(1,)
(100,)
(181,)

Or to read in the first five positions along the phi dimension (indices 0 to 4):

subset_phi0_4 = read_hdf_by_index(br_data_filepath, None, None, (0, 5))
for r in subset_phi0_4:
    print(r.shape)

(5, 100, 151)
(151,)
(100,)
(5,)

Or to read a single position at the midpoint of each dimension:

mid_indices = (dim.shape[0]//2 for dim in metadata[0].scales)
subset_midpoint = read_hdf_by_index(br_data_filepath, *mid_indices)
for r in subset_midpoint:
    print(r.shape)

(1, 1, 1)
(1,)
(1,)
(1,)

Alternatively, to read in subsets based on scale values, we can use read_hdf_by_value(). For example, to read in data at a radius of 2 R☉ and all co-latitude and longitude values, we can do:

subset_at_r2 = read_hdf_by_value(br_data_filepath, 2.0, None, None)
for r in subset_at_r2:
    print(r.shape)

(181, 100, 2)
(2,)
(100,)
(181,)

Note

When using read_hdf_by_value(), the minimum number of elements returned along each dimension is two, viz. the values that bracket the requested value. Therefore, in the above example, the radius scale returned will contain two values: one just below 2.0 R☉ and one just above. This allows for interpolation between scale values if desired (using the minimum amount of data necessary).

_, rscale, *_ = subset_at_r2
print(rscale)

[1.9383442 2.004063 ]

Suppose we haven’t read in the metadata and don’t know the exact scale values (or, alternatively, the desired scale values don’t exist in the dataset). In that case, we can use read_hdf_by_ivalue(), which reads in data based on the nearest integer value of the requested scale values.

Note

While this function performs a seemingly trivial task (one that is replicable through read_hdf_by_index()), it is provided for user convenience and for interpolation routines.

subset_ivalue = read_hdf_by_ivalue(br_data_filepath, 0.5, None, None)
for r in subset_ivalue:
    print(r.shape)
_, rscale_ivalue, *_ = subset_ivalue

(181, 100, 2)
(2,)
(100,)
(181,)

Total running time of the script: (0 minutes 0.025 seconds)

Gallery generated by Sphinx-Gallery