diff --git a/docs/user_manual/dataset-terminology.md b/docs/user_manual/dataset-terminology.md index 3068ed4ee..bf2e0165e 100644 --- a/docs/user_manual/dataset-terminology.md +++ b/docs/user_manual/dataset-terminology.md @@ -11,6 +11,92 @@ attribute. For detailed data types used throughout `power-grid-model`, please refer to [Python API Reference](../api_reference/python-api-reference.md). +## Buffer Type + +Defines how component data is ordered in memory. Two buffer types are supported: row-based and columnar-based. + +### Row-based (row-major) + +Attributes of the same component are stored contiguously before moving to the next component. + +### Columnar-based (column-major) + +Attributes are grouped across components by attribute type. + +## Buffer Representation + +Defines whether component data can be interpreted as a dense 2D matrix. + +### Dense + +Dense buffers represent data as a rectangular matrix. +This representation implies that all scenarios contain the same number of component entries. + +### Sparse + +Component data is stored as a flattened 1D buffer. + +Scenario boundaries are defined using an index pointer (`indptr`). +The `indptr` defines how the flattened buffer is segmented into per-scenario ranges. + +Sparse buffers may be either uniform or non-uniform. + +## Component Dataset Independency + +Defines whether all scenarios operate on the same component IDs. + +### Independent + +All scenarios modify the same component IDs in the same order. + +Each scenario starts from the original input dataset, without carrying over changes from previous scenarios, therefore +a reset is required between scenarios. + +### Dependent + +Different scenarios may modify different components. + +Each scenario starts from the original input dataset, without carrying over changes from previous scenarios, therefore +a reset is required between scenarios. + +## Component Data Uniformity + +Defines whether all scenarios contain the same number of component entries, independent of buffer representation. +Uniformity is independent of buffer representation. + +### Uniform + +All scenarios contain the same number of component entries. + +- Dense buffers are always uniform (by construction) +- Sparse buffers may also be uniform + +### Non-uniform + +Scenarios contain different numbers of component entries. + +- Only possible in sparse representation + +## Serialization Representation + +Defines how datasets are serialized. Three serialization representations are supported: compact list, named map, +and mixed. + +### Compact List + +Uses positional arrays instead of named attributes. +The attributes present in the dataset are stored separately. + +Generated when using `compact_list=True`. + +### Named Map + +Uses explicit attribute names per component. + +### Mixed + +Combination of compact list and named map (only possible in manual construction, e.g. validation datasets). + ## Data structures ```{mermaid} @@ -75,7 +161,7 @@ graph TD elements of all components) for a single scenario. - **{py:class}`BatchDataset `:** A data type storing update and or output data for one or more scenarios. - A batch dataset can contain sparse or dense data, depending on the component. + A batch dataset can contain dense or sparse representations per component. - **{py:class}`ComponentData `:** The data corresponding to the component. - **{py:class}`DataArray `:** A data array can be a single or a batch array. @@ -85,10 +171,11 @@ graph TD - **{py:class}`BatchArray `:** Multiple batches of data can be represented in sparse or dense forms. - **{py:class}`DenseBatchArray `:** A 2D structured numpy array - containing a list of components of the same type for each scenario. + containing a list of components of the same type for each scenario. This implies all scenarios contain the + same number of components (uniform structure). - **{py:class}`SparseBatchArray `:** A typed dictionary with a 1D numpy array of `Indexpointer` type under `indptr` key and `SingleArray` under `data` key which is all components - flattened over all batches. + flattened across scenarios, with scenario boundaries defined by `indptr`. - **{py:class}`ColumnarData `:** A dictionary of attributes as keys and individual numpy arrays as values. This format is described in more detail in @@ -183,9 +270,10 @@ The batch size is the number of scenarios. - **n_scenarios:** The total number of scenarios in the batch. (Same as Batch Size) -- **n_component_elements_per_scenario:** The number of elements of a specific component for each scenario. - This can be an integer (for dense batches), or a list of integers for sparse batches, where each integer in the list - represents the number of elements of a specific component for the scenario corresponding to the index of the integer. +- **n_component_elements_per_scenario:** The number of component instances per scenario, independent of representation + format (dense or sparse). This can be an integer (for dense batches), or a list of integers for sparse batches, + where each integer in the list represents the number of elements of a specific component for the scenario + corresponding to the index of the integer. - **Sub-batch:** When computing in parallel, all scenarios in batch calculation are distributed over threads. Each thread handles a subset of the `Batch`, called a `Sub-batch`. diff --git a/docs/user_manual/serialization.md b/docs/user_manual/serialization.md index 96792ee17..f1286f002 100644 --- a/docs/user_manual/serialization.md +++ b/docs/user_manual/serialization.md @@ -40,16 +40,16 @@ data. #### JSON schema attributes object -[`Attributes`](#json-schema-attributes-object) contains specified attributes per [`Component`](#json-schema-component) -type (e.g.: `"node"`). -It is only required for those components that contain `HomogeneousComponentData` objects and that data needs to follow -the attributes listed in this object. -It may be empty if for data for all instances certain component is `InhomogeneousComponentData`. -It reduces compression when a dataset largely follows the exact same pattern. +[`Attributes`](#json-schema-attributes-object) defines the attribute list and ordering +for each [`Component`](#json-schema-component) (e.g.: `"node"`)when component data is represented +using the compact list format (`use_compact_list=True`). + +The order of attributes in this section determines the order of values in the compact list representation. +This is independent of whether the component data is stored as `DenseComponentData` or `SparseComponentData`. - [`Attributes`](#json-schema-attributes-object): `Object` - - [`Component`](#json-schema-component): [`ComponentAttributes`](#json-schema-component-attributes) containing the - desired [`Attribute`](#json-schema-attribute)s for that [`Component`](#json-schema-component). + - [`Component`](#json-schema-component): [`ComponentAttributes`](#json-schema-component-attributes) + defining the ordered list of [`Attribute`](#json-schema-attribute)s for that component. For example, for an `"update"` dataset that contains only updates to the `"from_status"` attribute of `"branch"` components, it may be `{"branch": ["from_status"]}`. @@ -80,8 +80,8 @@ E.g.: `"id"`. #### JSON schema dataset object -The [`Dataset`](#json-schema-dataset-object) object is either a [`SingleDataset`](#json-schema-single-dataset-object) if -the [`is_batch`](#json-schema-root-object) field in the [`PowerGridModelRoot`](#json-schema-root-object) object is +The [`Dataset`](#json-schema-dataset-object) object is either a [`SingleDataset`](#json-schema-single-dataset-object) +if the [`is_batch`](#json-schema-root-object) field in the [`PowerGridModelRoot`](#json-schema-root-object) object is `false`, or a [`BatchDataset`](#json-schema-batch-dataset-object) otherwise. - [`Dataset`](#json-schema-dataset-object): [`SingleDataset`](#json-schema-single-dataset-object) | @@ -124,33 +124,30 @@ remains the same. #### JSON schema component data object -A [`ComponentData`](#json-schema-component-data-object) object is either a -[`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object) object or an -[`InhomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object) object +A [`ComponentData`](#json-schema-component-data-object) represents the data of a single component instance. + +It can be stored in either dense or sparse representation: -- [`ComponentData`](#json-schema-component-data-object): - [`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object) | - [`InhomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object) +- [`DenseComponentData`](#json-schema-component-data-object-dense-representation) +- [`SparseComponentData`](#json-schema-component-data-object-sparse-representation) -#### JSON schema homogeneous component data object +#### JSON schema component data object (dense representation) -A [`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object) object contains the actual values of a -certain component following the exact order of the attributes listed in the [`attributes`](#json-schema-root-object) -field in the [`PowerGridModelRoot`](#json-schema-root-object) object. +A dense component data object stores values in a fixed positional order defined by the `attributes` field +in the root object. -- [`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object): `Array` - - [`AttributeValue`](#json-schema-attribute-value): the value of each attribute. +- [`DenseComponentData`](#json-schema-component-data-object-dense-representation): `Array` + - [`AttributeValue`](#json-schema-attribute-value): values in the exact order defined by the component's attribute + list. -#### JSON schema inhomogeneous component data object +#### JSON schema component data object (sparse representation) -An [`InhomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object) object contains actual values per -attribute of a certain component. -Contrary to the [`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object), it lists the names of the -attributes for which the values are specified, so the attributes may be in arbitrary order and do not have to follow the -schema listed in the [`attributes`](#json-schema-root-object) field in the -[`PowerGridModelRoot`](#json-schema-root-object) object. +A component data object in sparse representation contains values grouped per attribute. +It stores values grouped by attribute, with explicit attribute names and no fixed ordering. +Unlike dense representation, it explicitly stores attribute names, allowing attributes to appear in arbitrary order +and vary between components or scenarios. -- [`InhomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object): `Object` +- [`SparseComponentData`](#json-schema-component-data-object-sparse-representation): `Object` - [`Attribute`](#json-schema-attribute): [`AttributeValue`](#json-schema-attribute-value): the value of each attribute per attribute. @@ -255,11 +252,11 @@ The type is listed for each attribute in [Components](components.md). The following example contains an input dataset. The nodes and sym_loads are represented using -[`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object), -the lines are represented using [`InomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object), +[`DenseComponentData`](#json-schema-component-data-object-dense-representation), +the lines are represented using [`SparseComponentData`](#json-schema-component-data-object-sparse-representation), while the sources are represented using a mixture of -[`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object) and -[`InomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object). +[`DenseComponentData`](#json-schema-component-data-object-dense-representation) and +[`SparseComponentData`](#json-schema-component-data-object-sparse-representation). ```json { diff --git a/src/power_grid_model/_core/power_grid_model.py b/src/power_grid_model/_core/power_grid_model.py index 3b253026c..c3a140d7e 100644 --- a/src/power_grid_model/_core/power_grid_model.py +++ b/src/power_grid_model/_core/power_grid_model.py @@ -613,11 +613,11 @@ def calculate_power_flow( # noqa: PLR0913 - key: Component type name to be updated in batch. - value: - - For homogeneous update batch (a 2D numpy structured array): + - For dense (uniform) update batch (a 2D numpy structured array): - Dimension 0: Each batch. - Dimension 1: Each updated element per batch for this component type. - - For inhomogeneous update batch (a dictionary containing two keys): + - For sparse (non-uniform) update batch (a dictionary containing two keys):: - indptr: A 1D numpy int64 array with length n_batch + 1. Given batch number k, the update array for this batch is data[indptr[k]:indptr[k + 1]]. This is the concept of @@ -800,11 +800,11 @@ def calculate_state_estimation( # noqa: PLR0913 - key: Component type name to be updated in batch. - value: - - For homogeneous update batch (a 2D numpy structured array): + - For dense (uniform) update batch (a 2D numpy structured array): - Dimension 0: Each batch. - Dimension 1: Each updated element per batch for this component type. - - For inhomogeneous update batch (a dictionary containing two keys): + - For sparse (non-uniform) update batch (a dictionary containing two keys):: - indptr: A 1D numpy int64 array with length n_batch + 1. Given batch number k, the update array for this batch is data[indptr[k]:indptr[k + 1]]. This is the concept of @@ -964,11 +964,11 @@ def calculate_short_circuit( # noqa: PLR0913 - key: Component type name to be updated in batch - value: - - For homogeneous update batch (a 2D numpy structured array): + - For dense (uniform) update batch (a 2D numpy structured array): - Dimension 0: each batch - Dimension 1: each updated element per batch for this component type - - For inhomogeneous update batch (a dictionary containing two keys): + - For sparse (non-uniform) update batch (a dictionary containing two keys):: - indptr: A 1D numpy int64 array with length n_batch + 1. Given batch number k, the update array for this batch is data[indptr[k]:indptr[k + 1]]. This is the concept of