From f055342ef42ba45d1a15023f5dd3acc5d250b85b Mon Sep 17 00:00:00 2001 From: harshitha1201 Date: Thu, 6 Apr 2023 19:47:56 +0530 Subject: [PATCH 01/18] added align, broadcast,merge, concatenate, combine --- doc/user-guide/terminology.rst | 60 ++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index 24e6ab69927..4d993f4761b 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -112,3 +112,63 @@ complete examples, please consult the relevant documentation.* ``__array_ufunc__`` and ``__array_function__`` protocols are also required. __ https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html + + Aligning + + Aligning refers to the process of making sure that the dimensions + and coordinates of two or more DataArrays or Datasets are consistent with each + other so that they can be combined or compared properly. + + For example, if you have two DataArrays that represent temperature measurements at + different times, but one has a time coordinate in seconds and the other has a time + coordinate in hours, you would need to align the two arrays by converting the time + coordinate in one of the arrays to match the other array. Once the arrays are aligned, + you can perform operations on them that require matching dimensions or coordinates, + such as taking the difference between the two arrays or calculating the mean across time. + + Broadcasting + + Broadcasting is a technique that allows operations to be performed on arrays + with different shapes and dimensions. When performing operations on arrays with different + shapes and dimensions, xarray will automatically broadcast the arrays to a common shape + before the operation is applied. + + For example, if you have two arrays with different shapes, xarray will try to match the + dimensions of the arrays and add new dimensions as necessary. This allows for easy element-wise + operations on arrays that might otherwise have incompatible shapes. + + Merging + + Merging refers to the process of combining multiple DataArrays or Dataset objects + along one or more dimensions to create a new Dataset. + + The merge() function allows you to combine multiple DataArrays or Dataset objects into a + single ``Dataset`` along one or more shared dimensions. If the input objects have different values for + the same coordinate, merge() will create a new coordinate with the union of the values from the input objects. + + Suppose you have two datasets, both containing temperature data from different weather + stations over the same time period. You want to combine these two datasets into a single dataset. + Assuming that both datasets have the same coordinates (time, latitude, and longitude), you can merge + them using merge() function. + + Concatenating + + Concatenating refers to the process of combining two or more arrays along a given dimension + to create a new array. The resulting array has the same shape as the input arrays, except for the dimension + along which the concatenation was performed, which is expanded to include the data from all input arrays. + + Concatenation is commonly used when working with multi-dimensional arrays that represent data over time or space. + For example, if you have daily temperature data for multiple years, you can concatenate the arrays along the time + dimension to create a single array with all the data. + + Combining + + Combining refers to the process of merging multiple DataArrays or Datasets along a shared dimension + to create a new object. This can be useful when working with data that has been split into multiple files, or + when wanting to combine data from different sources. + + Suppose we have one dataset containing temperature data and another dataset containing precipitation data, + both measured at the same set of locations and times. We can combine these two datasets using the ``combine_by_coords`` + method in xarray to create a single dataset with both temperature and precipitation variables. The resulting dataset + will have the same coordinates as the original datasets and the variables will be combined based on their coordinates. + This allows us to easily analyze and visualize both variables together in a single dataset. From 3c0c0a280fb6552a3c6232da7cd505ef5f0eaae4 Mon Sep 17 00:00:00 2001 From: harshitha1201 Date: Tue, 11 Apr 2023 21:59:02 +0530 Subject: [PATCH 02/18] examples added --- doc/user-guide/terminology.rst | 193 +++++++++++++++++++++++---------- 1 file changed, 134 insertions(+), 59 deletions(-) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index 4d993f4761b..55d5cd9bb23 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -113,62 +113,137 @@ complete examples, please consult the relevant documentation.* __ https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html - Aligning - - Aligning refers to the process of making sure that the dimensions - and coordinates of two or more DataArrays or Datasets are consistent with each - other so that they can be combined or compared properly. - - For example, if you have two DataArrays that represent temperature measurements at - different times, but one has a time coordinate in seconds and the other has a time - coordinate in hours, you would need to align the two arrays by converting the time - coordinate in one of the arrays to match the other array. Once the arrays are aligned, - you can perform operations on them that require matching dimensions or coordinates, - such as taking the difference between the two arrays or calculating the mean across time. - - Broadcasting - - Broadcasting is a technique that allows operations to be performed on arrays - with different shapes and dimensions. When performing operations on arrays with different - shapes and dimensions, xarray will automatically broadcast the arrays to a common shape - before the operation is applied. - - For example, if you have two arrays with different shapes, xarray will try to match the - dimensions of the arrays and add new dimensions as necessary. This allows for easy element-wise - operations on arrays that might otherwise have incompatible shapes. - - Merging - - Merging refers to the process of combining multiple DataArrays or Dataset objects - along one or more dimensions to create a new Dataset. - - The merge() function allows you to combine multiple DataArrays or Dataset objects into a - single ``Dataset`` along one or more shared dimensions. If the input objects have different values for - the same coordinate, merge() will create a new coordinate with the union of the values from the input objects. - - Suppose you have two datasets, both containing temperature data from different weather - stations over the same time period. You want to combine these two datasets into a single dataset. - Assuming that both datasets have the same coordinates (time, latitude, and longitude), you can merge - them using merge() function. - - Concatenating - - Concatenating refers to the process of combining two or more arrays along a given dimension - to create a new array. The resulting array has the same shape as the input arrays, except for the dimension - along which the concatenation was performed, which is expanded to include the data from all input arrays. - - Concatenation is commonly used when working with multi-dimensional arrays that represent data over time or space. - For example, if you have daily temperature data for multiple years, you can concatenate the arrays along the time - dimension to create a single array with all the data. - - Combining - - Combining refers to the process of merging multiple DataArrays or Datasets along a shared dimension - to create a new object. This can be useful when working with data that has been split into multiple files, or - when wanting to combine data from different sources. - - Suppose we have one dataset containing temperature data and another dataset containing precipitation data, - both measured at the same set of locations and times. We can combine these two datasets using the ``combine_by_coords`` - method in xarray to create a single dataset with both temperature and precipitation variables. The resulting dataset - will have the same coordinates as the original datasets and the variables will be combined based on their coordinates. - This allows us to easily analyze and visualize both variables together in a single dataset. +.. ipython:: python + :suppress: + + import xarray as xr + import numpy as np + +Aligning + Aligning refers to the process of ensuring that two or more DataArrays or Datasets + have the same dimensions and coordinates, so that they can be combined or compared properly. + +.. ipython:: python + + # Two DataArrays with different time coordinates + time1 = np.arange("2022-01-01", "2022-01-06", dtype="datetime64") + time2 = np.arange("2022-01-03", "2022-01-08", dtype="datetime64") + + # Two DataArrays of random temperature values, each with time as a coordinate + temp1 = xr.DataArray( + np.random.rand(len(time1)), coords=[("time", time1)], name="temp" + ) + temp2 = xr.DataArray( + np.random.rand(len(time2)), coords=[("time", time2)], name="temp" + ) + + # Align the two DataArrays along the time dimension using the 'outer' join method + temp1_aligned, temp2_aligned = xr.align(temp1, temp2, join="outer") + + # Print the resulting DataArrays + print(temp1_aligned) + print(temp2_aligned) + +There are two DataArrays 'temp1' and 'temp2' with different time coordinates. We then use the align +method to align the two DataArrays along the time dimension. The join parameter is set to 'outer', which means that +the resulting DataArrays will have all time values that are present in either temp1 or temp2. +The align method returns two new DataArrays, temp1_aligned and temp2_aligned now have the same length, and their time +coordinates span the entire range from '2022-01-01' to '2022-01-07'. Any missing values are filled with NaNs. + +Broadcasting + Broadcasting is a technique that allows operations to be performed on arrays with different shapes and dimensions. + When performing operations on arrays with different shapes and dimensions, xarray will automatically broadcast the + arrays to a common shape before the operation is applied. + +.. ipython:: python + + a = xr.DataArray(np.array([1, 2, 3]), dims=["x"]) + b = xr.DataArray(np.array([4, 5, 6, 7]), dims=["y"]) + result = a + b + print(result) + +In this example, 'a' has shape (3,) and 'b' has shape (4,). +If we try to add these two arrays, xarray will automatically broadcast the arrays to a common shape before performing +the addition. It will extend 'a' along the new 'y' dimension and extend 'b' along the new 'x' dimension so that both +arrays have shape (3, 4). +The result is a 2D array with shape (3, 4) where each element is the sum of the corresponding elements +in 'a' and 'b'. Note that xarray has also automatically added coordinates for the new dimensions 'x' and 'y'. + +**In xarray, "merging", "concatenating", and "combining" are all operations used to combine two or more DataArrays or +Datasets into a single** ``DataArray`` **or** ``Dataset``. **However, each of these operations has a slightly different meaning and +purpose.** + +Merging + Merging is used to combine two or more Datasets or DataArrays that have different variables or coordinates along + the same dimensions. When merging, xarray aligns the variables and coordinates of the different datasets along + the specified dimensions and creates a new ``Dataset`` containing all the variables and coordinates. + +.. ipython:: python + + # create two 1D arrays with names + arr1 = xr.DataArray([1, 2, 3], dims=["x"], coords={"x": [10, 20, 30]}, name="arr1") + arr2 = xr.DataArray([4, 5, 6], dims=["x"], coords={"x": [20, 30, 40]}, name="arr2") + + # merge the two arrays into a new dataset + merged_ds = xr.Dataset({"arr1": arr1, "arr2": arr2}) + + # print the merged dataset + print(merged_ds) + +Both arrays 'arr1' and 'arr2' have one dimension 'x', which has three coordinate values each. +This code creates a new ``dataset`` 'merged_ds' by merging the two arrays 'arr1' and 'arr2'. +The ``merge()`` function allows you to combine multiple ``DataArray`` or ``Dataset`` objects into a single ``Dataset`` +along one or more shared dimensions. If the input objects have different values for the same coordinate, +``merge()`` will create a new coordinate with the union of the values from the input objects. + +Concatenating + Concatenating is used to combine two or more Datasets or DataArrays along a new dimension. When concatenating, + xarray stacks the datasets or dataarrays along a new dimension, and the resulting ``Dataset`` or ``Dataarray`` + will have the same variables and coordinates along the other dimensions. + +.. ipython:: python + + a = xr.DataArray([[1, 2], [3, 4]], dims=("x", "y")) + b = xr.DataArray([[5, 6], [7, 8]], dims=("x", "y")) + c = xr.concat([a, b], dim="c") + print(c) + +This code creates two 2D arrays 'a' and 'b'. Both arrays have two dimensions "x" and "y", and contain the numbers 1 to 4 +and 5 to 8, respectively. +This code concatenates the two arrays 'a' and 'b' along a new dimension "c" using the ``xr.concat()`` function. The resulting +array 'c' has three dimensions "c", "x", and "y", and contains the numbers 1 to 8 arranged in two 2D arrays. + +Combining + Combining in xarray is a general term used to describe the process of combining two or more DataArrays or Datasets + into a single ``DataArray`` or ``Dataset``. This can include both merging and concatenating, as well as other operations + like arithmetic operations (e.g., adding two arrays together) or stacking (e.g., stacking two arrays along a new + dimension). + +.. ipython:: python + + # create the first dataset + ds1 = xr.Dataset( + {"data": xr.DataArray([[1, 2], [3, 4]], dims=("x", "y"))}, + coords={"x": [1, 2], "y": [3, 4]}, + ) + + # create the second dataset + ds2 = xr.Dataset( + {"data": xr.DataArray([[5, 6], [7, 8]], dims=("x", "y"))}, + coords={"x": [2, 3], "y": [4, 5]}, + ) + + # combine the datasets + combined_ds = xr.combine_by_coords([ds1, ds2]) + + # print the combined dataset + print(combined_ds) + +This code creates two datasets, ds1 and ds2, each containing a 2D array of data with dimensions 'x' and 'y' and +corresponding coordinate arrays. The datasets have overlapping coordinates on dimension 'x', with values [1, 2] in +ds1 and [2, 3] in ds2, but no overlapping coordinates on dimension 'y'. + +The ``xr.combine_by_coords`` function is then used to combine the datasets by their coordinates. This function combines +datasets with non-overlapping dimensions and concatenates arrays along overlapping dimensions. In this case, it will +concatenate the data arrays along dimension 'x' and create a new coordinate array with values [1, 2, 3]. The resulting +combined dataset 'combined_ds' will have dimensions 'x' and 'y'. From 59b9b18b786b5a33ec0d7012832bbc1d8d5a7fa0 Mon Sep 17 00:00:00 2001 From: Harshitha <97012127+harshitha1201@users.noreply.github.com> Date: Tue, 25 Jul 2023 11:53:35 +0530 Subject: [PATCH 03/18] Update doc/user-guide/terminology.rst Co-authored-by: Tom Nicholas --- doc/user-guide/terminology.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index 55d5cd9bb23..0ce08b81e90 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -240,7 +240,7 @@ Combining print(combined_ds) This code creates two datasets, ds1 and ds2, each containing a 2D array of data with dimensions 'x' and 'y' and -corresponding coordinate arrays. The datasets have overlapping coordinates on dimension 'x', with values [1, 2] in +corresponding coordinate arrays. The datasets have overlapping coordinates on dimension ``'x'``, with values ``[1, 2]`` in ds1 and [2, 3] in ds2, but no overlapping coordinates on dimension 'y'. The ``xr.combine_by_coords`` function is then used to combine the datasets by their coordinates. This function combines From f8da298d8ebbf7bd552f1d12424e02926732b4a4 Mon Sep 17 00:00:00 2001 From: Harshitha <97012127+harshitha1201@users.noreply.github.com> Date: Tue, 25 Jul 2023 11:53:55 +0530 Subject: [PATCH 04/18] Update doc/user-guide/terminology.rst Co-authored-by: Tom Nicholas --- doc/user-guide/terminology.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index 0ce08b81e90..4dd582b5561 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -243,7 +243,7 @@ This code creates two datasets, ds1 and ds2, each containing a 2D array of data corresponding coordinate arrays. The datasets have overlapping coordinates on dimension ``'x'``, with values ``[1, 2]`` in ds1 and [2, 3] in ds2, but no overlapping coordinates on dimension 'y'. -The ``xr.combine_by_coords`` function is then used to combine the datasets by their coordinates. This function combines +The :py:func:`xr.combine_by_coords` function is then used to combine the datasets by their coordinates. This function combines datasets with non-overlapping dimensions and concatenates arrays along overlapping dimensions. In this case, it will concatenate the data arrays along dimension 'x' and create a new coordinate array with values [1, 2, 3]. The resulting combined dataset 'combined_ds' will have dimensions 'x' and 'y'. From cee6dadc211935a74ee39410020eea3b7e805c3c Mon Sep 17 00:00:00 2001 From: Harshitha <97012127+harshitha1201@users.noreply.github.com> Date: Tue, 25 Jul 2023 11:54:20 +0530 Subject: [PATCH 05/18] Update doc/user-guide/terminology.rst Co-authored-by: Tom Nicholas --- doc/user-guide/terminology.rst | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index 4dd582b5561..f50838655ff 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -215,9 +215,7 @@ array 'c' has three dimensions "c", "x", and "y", and contains the numbers 1 to Combining Combining in xarray is a general term used to describe the process of combining two or more DataArrays or Datasets - into a single ``DataArray`` or ``Dataset``. This can include both merging and concatenating, as well as other operations - like arithmetic operations (e.g., adding two arrays together) or stacking (e.g., stacking two arrays along a new - dimension). + into a single ``DataArray`` or ``Dataset`` using some combination of merging and concatenation operations. .. ipython:: python From b11d72fe10d4075c634d678972e194806a13738a Mon Sep 17 00:00:00 2001 From: harshitha1201 Date: Thu, 10 Aug 2023 19:38:48 +0530 Subject: [PATCH 06/18] changes made --- doc/user-guide/terminology.rst | 154 +++++---------------------------- 1 file changed, 22 insertions(+), 132 deletions(-) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index 8c9f273f9d4..f5e6cd82ba1 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -132,135 +132,25 @@ complete examples, please consult the relevant documentation.* __ https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html -.. ipython:: python - :suppress: - - import xarray as xr - import numpy as np - -Aligning - Aligning refers to the process of ensuring that two or more DataArrays or Datasets - have the same dimensions and coordinates, so that they can be combined or compared properly. - -.. ipython:: python - - # Two DataArrays with different time coordinates - time1 = np.arange("2022-01-01", "2022-01-06", dtype="datetime64") - time2 = np.arange("2022-01-03", "2022-01-08", dtype="datetime64") - - # Two DataArrays of random temperature values, each with time as a coordinate - temp1 = xr.DataArray( - np.random.rand(len(time1)), coords=[("time", time1)], name="temp" - ) - temp2 = xr.DataArray( - np.random.rand(len(time2)), coords=[("time", time2)], name="temp" - ) - - # Align the two DataArrays along the time dimension using the 'outer' join method - temp1_aligned, temp2_aligned = xr.align(temp1, temp2, join="outer") - - # Print the resulting DataArrays - print(temp1_aligned) - print(temp2_aligned) - -There are two DataArrays 'temp1' and 'temp2' with different time coordinates. We then use the align -method to align the two DataArrays along the time dimension. The join parameter is set to 'outer', which means that -the resulting DataArrays will have all time values that are present in either temp1 or temp2. -The align method returns two new DataArrays, temp1_aligned and temp2_aligned now have the same length, and their time -coordinates span the entire range from '2022-01-01' to '2022-01-07'. Any missing values are filled with NaNs. - -Broadcasting - Broadcasting is a technique that allows operations to be performed on arrays with different shapes and dimensions. - When performing operations on arrays with different shapes and dimensions, xarray will automatically broadcast the - arrays to a common shape before the operation is applied. - -.. ipython:: python - - a = xr.DataArray(np.array([1, 2, 3]), dims=["x"]) - b = xr.DataArray(np.array([4, 5, 6, 7]), dims=["y"]) - result = a + b - print(result) - -In this example, 'a' has shape (3,) and 'b' has shape (4,). -If we try to add these two arrays, xarray will automatically broadcast the arrays to a common shape before performing -the addition. It will extend 'a' along the new 'y' dimension and extend 'b' along the new 'x' dimension so that both -arrays have shape (3, 4). -The result is a 2D array with shape (3, 4) where each element is the sum of the corresponding elements -in 'a' and 'b'. Note that xarray has also automatically added coordinates for the new dimensions 'x' and 'y'. - -**In xarray, "merging", "concatenating", and "combining" are all operations used to combine two or more DataArrays or -Datasets into a single** ``DataArray`` **or** ``Dataset``. **However, each of these operations has a slightly different meaning and -purpose.** - -Merging - Merging is used to combine two or more Datasets or DataArrays that have different variables or coordinates along - the same dimensions. When merging, xarray aligns the variables and coordinates of the different datasets along - the specified dimensions and creates a new ``Dataset`` containing all the variables and coordinates. - -.. ipython:: python - - # create two 1D arrays with names - arr1 = xr.DataArray([1, 2, 3], dims=["x"], coords={"x": [10, 20, 30]}, name="arr1") - arr2 = xr.DataArray([4, 5, 6], dims=["x"], coords={"x": [20, 30, 40]}, name="arr2") - - # merge the two arrays into a new dataset - merged_ds = xr.Dataset({"arr1": arr1, "arr2": arr2}) - - # print the merged dataset - print(merged_ds) - -Both arrays 'arr1' and 'arr2' have one dimension 'x', which has three coordinate values each. -This code creates a new ``dataset`` 'merged_ds' by merging the two arrays 'arr1' and 'arr2'. -The ``merge()`` function allows you to combine multiple ``DataArray`` or ``Dataset`` objects into a single ``Dataset`` -along one or more shared dimensions. If the input objects have different values for the same coordinate, -``merge()`` will create a new coordinate with the union of the values from the input objects. - -Concatenating - Concatenating is used to combine two or more Datasets or DataArrays along a new dimension. When concatenating, - xarray stacks the datasets or dataarrays along a new dimension, and the resulting ``Dataset`` or ``Dataarray`` - will have the same variables and coordinates along the other dimensions. - -.. ipython:: python - - a = xr.DataArray([[1, 2], [3, 4]], dims=("x", "y")) - b = xr.DataArray([[5, 6], [7, 8]], dims=("x", "y")) - c = xr.concat([a, b], dim="c") - print(c) - -This code creates two 2D arrays 'a' and 'b'. Both arrays have two dimensions "x" and "y", and contain the numbers 1 to 4 -and 5 to 8, respectively. -This code concatenates the two arrays 'a' and 'b' along a new dimension "c" using the ``xr.concat()`` function. The resulting -array 'c' has three dimensions "c", "x", and "y", and contains the numbers 1 to 8 arranged in two 2D arrays. - -Combining - Combining in xarray is a general term used to describe the process of combining two or more DataArrays or Datasets - into a single ``DataArray`` or ``Dataset`` using some combination of merging and concatenation operations. - -.. ipython:: python - - # create the first dataset - ds1 = xr.Dataset( - {"data": xr.DataArray([[1, 2], [3, 4]], dims=("x", "y"))}, - coords={"x": [1, 2], "y": [3, 4]}, - ) - - # create the second dataset - ds2 = xr.Dataset( - {"data": xr.DataArray([[5, 6], [7, 8]], dims=("x", "y"))}, - coords={"x": [2, 3], "y": [4, 5]}, - ) - - # combine the datasets - combined_ds = xr.combine_by_coords([ds1, ds2]) - - # print the combined dataset - print(combined_ds) - -This code creates two datasets, ds1 and ds2, each containing a 2D array of data with dimensions 'x' and 'y' and -corresponding coordinate arrays. The datasets have overlapping coordinates on dimension ``'x'``, with values ``[1, 2]`` in -ds1 and [2, 3] in ds2, but no overlapping coordinates on dimension 'y'. - -The :py:func:`xr.combine_by_coords` function is then used to combine the datasets by their coordinates. This function combines -datasets with non-overlapping dimensions and concatenates arrays along overlapping dimensions. In this case, it will -concatenate the data arrays along dimension 'x' and create a new coordinate array with values [1, 2, 3]. The resulting -combined dataset 'combined_ds' will have dimensions 'x' and 'y'. + Aligning + Aligning refers to the process of ensuring that two or more DataArrays or Datasets + have the same dimensions and coordinates, so that they can be combined or compared properly. + + Broadcasting + A technique that allows operations to be performed on arrays with different shapes and dimensions. + When performing operations on arrays with different shapes and dimensions, xarray will automatically broadcast the + arrays to a common shape before the operation is applied. + + Merging + Merging is used to combine two or more Datasets or DataArrays that have different variables or coordinates along + the same dimensions. When merging, xarray aligns the variables and coordinates of the different datasets along + the specified dimensions and creates a new ``Dataset`` containing all the variables and coordinates. + + Concatenating + Concatenating is used to combine two or more Datasets or DataArrays along a new dimension. When concatenating, + xarray stacks the datasets or dataarrays along a new dimension, and the resulting ``Dataset`` or ``Dataarray`` + will have the same variables and coordinates along the other dimensions. + + Combining + Combining in xarray is a general term used to describe the process of combining two or more DataArrays or Datasets + into a single ``DataArray`` or ``Dataset`` using some combination of merging and concatenation operations. From 6e5aa82e804aa18289979c28b03b422349ccfc8c Mon Sep 17 00:00:00 2001 From: harshitha1201 Date: Thu, 10 Aug 2023 22:37:50 +0530 Subject: [PATCH 07/18] add changes --- doc/user-guide/terminology.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index f5e6cd82ba1..966e5b215b6 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -147,10 +147,10 @@ complete examples, please consult the relevant documentation.* the specified dimensions and creates a new ``Dataset`` containing all the variables and coordinates. Concatenating - Concatenating is used to combine two or more Datasets or DataArrays along a new dimension. When concatenating, - xarray stacks the datasets or dataarrays along a new dimension, and the resulting ``Dataset`` or ``Dataarray`` + Concatenating is used to combine two or more Datasets or DataArrays along a dimension. When concatenating, + xarray arranges the datasets or dataarrays along a new dimension, and the resulting ``Dataset`` or ``Dataarray`` will have the same variables and coordinates along the other dimensions. Combining - Combining in xarray is a general term used to describe the process of combining two or more DataArrays or Datasets + Combining in xarray is a general term used to describe the process of arranging two or more DataArrays or Datasets into a single ``DataArray`` or ``Dataset`` using some combination of merging and concatenation operations. From 5bd705cab39ac4201ea74f570336704b14ebf865 Mon Sep 17 00:00:00 2001 From: harshitha1201 Date: Tue, 15 Aug 2023 18:03:23 +0530 Subject: [PATCH 08/18] . --- doc/user-guide/terminology.rst | 79 ++++++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index 966e5b215b6..2ba5dd0544c 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -132,25 +132,104 @@ complete examples, please consult the relevant documentation.* __ https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html + .. ipython:: python + :suppress: + + import numpy as np + import pandas as pd + import xarray as xr + Aligning Aligning refers to the process of ensuring that two or more DataArrays or Datasets have the same dimensions and coordinates, so that they can be combined or compared properly. + .. ipython:: python + + x = xr.DataArray( + [[25, 35], [10, 24]], + dims=("lat", "lon"), + coords={"lat": [35.0, 40.0], "lon": [100.0, 120.0]}, + ) + y = xr.DataArray( + [[20, 5], [7, 13]], + dims=("lat", "lon"), + coords={"lat": [35.0, 42.0], "lon": [100.0, 120.0]}, + ) + x + y + Broadcasting A technique that allows operations to be performed on arrays with different shapes and dimensions. When performing operations on arrays with different shapes and dimensions, xarray will automatically broadcast the arrays to a common shape before the operation is applied. + .. ipython:: python + + # 'a' has shape (3,) and 'b' has shape (4,) + a = xr.DataArray(np.array([1, 2, 3]), dims=["x"]) + b = xr.DataArray(np.array([4, 5, 6, 7]), dims=["y"]) + + # 2D array with shape (3, 4) + a + b + Merging Merging is used to combine two or more Datasets or DataArrays that have different variables or coordinates along the same dimensions. When merging, xarray aligns the variables and coordinates of the different datasets along the specified dimensions and creates a new ``Dataset`` containing all the variables and coordinates. + .. ipython:: python + + # create two 1D arrays with names + arr1 = xr.DataArray( + [1, 2, 3], dims=["x"], coords={"x": [10, 20, 30]}, name="arr1" + ) + arr2 = xr.DataArray( + [4, 5, 6], dims=["x"], coords={"x": [20, 30, 40]}, name="arr2" + ) + + # merge the two arrays into a new dataset + merged_ds = xr.Dataset({"arr1": arr1, "arr2": arr2}) + merged_ds + Concatenating Concatenating is used to combine two or more Datasets or DataArrays along a dimension. When concatenating, xarray arranges the datasets or dataarrays along a new dimension, and the resulting ``Dataset`` or ``Dataarray`` will have the same variables and coordinates along the other dimensions. + .. ipython:: python + + a = xr.DataArray([[1, 2], [3, 4]], dims=("x", "y")) + b = xr.DataArray([[5, 6], [7, 8]], dims=("x", "y")) + c = xr.concat([a, b], dim="c") + c + Combining Combining in xarray is a general term used to describe the process of arranging two or more DataArrays or Datasets into a single ``DataArray`` or ``Dataset`` using some combination of merging and concatenation operations. + + .. ipython:: python + + ds1 = xr.Dataset( + {"data": xr.DataArray([[1, 2], [3, 4]], dims=("x", "y"))}, + coords={"x": [1, 2], "y": [3, 4]}, + ) + ds2 = xr.Dataset( + {"data": xr.DataArray([[5, 6], [7, 8]], dims=("x", "y"))}, + coords={"x": [2, 3], "y": [4, 5]}, + ) + + # combine the datasets + combined_ds = xr.combine_by_coords([ds1, ds2]) + combined_ds + + lazy + When working with xarray, you often deal with big sets of data. Instead of doing + calculations right away, xarray lets you plan what calculations you want to do, like finding the + average temperature in a dataset.This planning is called "lazy evaluation." It's like writing down the + steps you need to follow to build the LEGO spaceship, without actually building it yet.Later, when + you're ready to see the final result, you tell xarray, "Okay, go ahead and do those calculations now!" + That's when xarray starts working through the steps you planned and gives you the answer you wanted.This + lazy approach helps save time and memory because xarray only does the work when you actually need the + results. + + labeled From fba6824c00f15d9115f0d01f10c562f082bbd929 Mon Sep 17 00:00:00 2001 From: harshitha1201 Date: Wed, 16 Aug 2023 16:19:41 +0530 Subject: [PATCH 09/18] . --- doc/user-guide/terminology.rst | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index 2ba5dd0544c..b2fa903d0ca 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -225,11 +225,35 @@ complete examples, please consult the relevant documentation.* lazy When working with xarray, you often deal with big sets of data. Instead of doing calculations right away, xarray lets you plan what calculations you want to do, like finding the - average temperature in a dataset.This planning is called "lazy evaluation." It's like writing down the - steps you need to follow to build the LEGO spaceship, without actually building it yet.Later, when + average temperature in a dataset.This planning is called "lazy evaluation." Later, when you're ready to see the final result, you tell xarray, "Okay, go ahead and do those calculations now!" That's when xarray starts working through the steps you planned and gives you the answer you wanted.This lazy approach helps save time and memory because xarray only does the work when you actually need the results. labeled + labeled refers to the way data is named with meaningful labels or coordinates.Instead of just having + numerical indices to locate values, xarray allows you to attach labels to each dimension. These labels + provide context and meaning to the data, making it easier to understand and work with. If you have + temperature data for different cities over time. Using xarray, you can label the dimensions: one for + cities and another for time. + + serialization + Serialization is like putting your collection of data into a format that makes it easy to save and share. + When you serialize data in xarray, you're taking all those temperature measurements, along with their + labels and other information, and turning them into a format that can be stored in a file or sent over + the internet. + + indexing + Indexing is way to quickly find and grab the specific pieces of data you're interested in from your + dataset. + Label-based Indexing: You can use labels to specify what you want like "Give me the temperature for New York on July 15th." + Positional Indexing: You can use numbers to refer to positions in the data like "Give me the third temperature in the list." This is useful when you know the order of your data but don't need to remember the exact labels. + Slicing: You can take a "slice" of your data, like you might want all temperatures from July 1st to July 10th. + Boolean Indexing: You can use true/false statements to filter your data. It's like saying "Show me temperatures where it was above 80 degrees." + + backend + "backend" refers to the way xarray stores and manages your data behind the scenes.If you have a bunch + of temperature measurements from different cities. You want to use xarray to organize and analyze this + data. The backend is how xarray decides to store this information in memory so that you can easily + access and manipulate it. From 2faa23ed4caa4fa7dcf3772a06896f19549aece0 Mon Sep 17 00:00:00 2001 From: Harshitha <97012127+harshitha1201@users.noreply.github.com> Date: Thu, 17 Aug 2023 17:03:42 +0530 Subject: [PATCH 10/18] Update doc/user-guide/terminology.rst Co-authored-by: Tom Nicholas --- doc/user-guide/terminology.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index b2fa903d0ca..f1fa62cb8b8 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -245,7 +245,7 @@ complete examples, please consult the relevant documentation.* the internet. indexing - Indexing is way to quickly find and grab the specific pieces of data you're interested in from your + Indexing is how you select subsets of your data which you are interested in. dataset. Label-based Indexing: You can use labels to specify what you want like "Give me the temperature for New York on July 15th." Positional Indexing: You can use numbers to refer to positions in the data like "Give me the third temperature in the list." This is useful when you know the order of your data but don't need to remember the exact labels. From 0b3a66af2d278ecc83ec93efbea14d17dd12734b Mon Sep 17 00:00:00 2001 From: Harshitha <97012127+harshitha1201@users.noreply.github.com> Date: Thu, 17 Aug 2023 17:04:05 +0530 Subject: [PATCH 11/18] Update doc/user-guide/terminology.rst Co-authored-by: Tom Nicholas --- doc/user-guide/terminology.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index f1fa62cb8b8..874fdc1ab38 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -223,7 +223,7 @@ complete examples, please consult the relevant documentation.* combined_ds lazy - When working with xarray, you often deal with big sets of data. Instead of doing +Instead of doing calculations right away, xarray lets you plan what calculations you want to do, like finding the average temperature in a dataset.This planning is called "lazy evaluation." Later, when you're ready to see the final result, you tell xarray, "Okay, go ahead and do those calculations now!" From 3cc357cb53521d119d689d55fcca441bf2cca52e Mon Sep 17 00:00:00 2001 From: Harshitha <97012127+harshitha1201@users.noreply.github.com> Date: Thu, 17 Aug 2023 17:04:42 +0530 Subject: [PATCH 12/18] Update doc/user-guide/terminology.rst Co-authored-by: Tom Nicholas --- doc/user-guide/terminology.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index 874fdc1ab38..e5b7486bfd6 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -239,7 +239,7 @@ Instead of doing cities and another for time. serialization - Serialization is like putting your collection of data into a format that makes it easy to save and share. + Serialization is the process of converting your data into a format that makes it easy to save and share. When you serialize data in xarray, you're taking all those temperature measurements, along with their labels and other information, and turning them into a format that can be stored in a file or sent over the internet. From d344641dfbc72d6d2f2ee807f2e3b02242ca844d Mon Sep 17 00:00:00 2001 From: harshitha1201 Date: Fri, 18 Aug 2023 20:24:44 +0530 Subject: [PATCH 13/18] changes done --- doc/user-guide/terminology.rst | 37 +++++++++++++++++----------------- doc/whats-new.rst | 3 +++ 2 files changed, 22 insertions(+), 18 deletions(-) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index e5b7486bfd6..f58ac039701 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -136,7 +136,6 @@ complete examples, please consult the relevant documentation.* :suppress: import numpy as np - import pandas as pd import xarray as xr Aligning @@ -204,8 +203,8 @@ complete examples, please consult the relevant documentation.* c Combining - Combining in xarray is a general term used to describe the process of arranging two or more DataArrays or Datasets - into a single ``DataArray`` or ``Dataset`` using some combination of merging and concatenation operations. + Combining is the process of arranging two or more DataArrays or Datasets into a single ``DataArray`` or + ``Dataset`` using some combination of merging and concatenation operations. .. ipython:: python @@ -223,8 +222,8 @@ complete examples, please consult the relevant documentation.* combined_ds lazy -Instead of doing - calculations right away, xarray lets you plan what calculations you want to do, like finding the + Lazily-evaluated operations do not load data into memory until necessary.Instead of doing calculations + right away, xarray lets you plan what calculations you want to do, like finding the average temperature in a dataset.This planning is called "lazy evaluation." Later, when you're ready to see the final result, you tell xarray, "Okay, go ahead and do those calculations now!" That's when xarray starts working through the steps you planned and gives you the answer you wanted.This @@ -232,8 +231,9 @@ Instead of doing results. labeled - labeled refers to the way data is named with meaningful labels or coordinates.Instead of just having - numerical indices to locate values, xarray allows you to attach labels to each dimension. These labels + Labeled data has metadata describing the context of the data, not just the raw data values. + These can be tick labels (stored as Coordinates) or unique names for each array. labels are + constituted by two main components: coordinates and attributes. These labels provide context and meaning to the data, making it easier to understand and work with. If you have temperature data for different cities over time. Using xarray, you can label the dimensions: one for cities and another for time. @@ -242,18 +242,19 @@ Instead of doing Serialization is the process of converting your data into a format that makes it easy to save and share. When you serialize data in xarray, you're taking all those temperature measurements, along with their labels and other information, and turning them into a format that can be stored in a file or sent over - the internet. + the internet. xarray objects can be serialized into formats which store the labels alongside the data. + "Some supported serialization formats are files that can then be stored or transferred (e.g. netCDF), + whilst others are protocols that allow for data access over a network (e.g. Zarr)." indexing Indexing is how you select subsets of your data which you are interested in. dataset. - Label-based Indexing: You can use labels to specify what you want like "Give me the temperature for New York on July 15th." - Positional Indexing: You can use numbers to refer to positions in the data like "Give me the third temperature in the list." This is useful when you know the order of your data but don't need to remember the exact labels. - Slicing: You can take a "slice" of your data, like you might want all temperatures from July 1st to July 10th. - Boolean Indexing: You can use true/false statements to filter your data. It's like saying "Show me temperatures where it was above 80 degrees." - - backend - "backend" refers to the way xarray stores and manages your data behind the scenes.If you have a bunch - of temperature measurements from different cities. You want to use xarray to organize and analyze this - data. The backend is how xarray decides to store this information in memory so that you can easily - access and manipulate it. + + - Label-based Indexing: Selecting data by passing a specific label and comparing it to the labels + stored in the associated coordinates. You can use labels to specify what you want like "Give me the + temperature for New York on July 15th." + + - Positional Indexing: You can use numbers to refer to positions in the data like "Give me the third temperature value" This is useful when you know the order of your data but don't need to remember the exact labels. + + - Slicing: You can take a "slice" of your data, like you might want all temperatures from July 1st + to July 10th. xarray supports slicing for both positional and label-based indexing. diff --git a/doc/whats-new.rst b/doc/whats-new.rst index 564c68bfc35..1590df4d1d6 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -97,6 +97,9 @@ Documentation (:pull:`7999`) By `Tom Nicholas `_. - Fixed broken links in "See also" section of :py:meth:`Dataset.count` (:issue:`8055`, :pull:`8057`) By `Articoking `_. +- Extended the glossary by adding terms Aligning, Broadcasting, Merging, Concatenating, Combining, lazy, + labeled, serialization, indexing (:issue:`3355`, :pull:`7732`) + By `Harshitha `_. Internal Changes ~~~~~~~~~~~~~~~~ From 613b5443306ab10d073352f55217a434686010e6 Mon Sep 17 00:00:00 2001 From: Harshitha <97012127+harshitha1201@users.noreply.github.com> Date: Fri, 18 Aug 2023 20:31:17 +0530 Subject: [PATCH 14/18] Update doc/user-guide/terminology.rst Co-authored-by: Tom Nicholas --- doc/user-guide/terminology.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index f58ac039701..fcce5dc315b 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -243,8 +243,8 @@ complete examples, please consult the relevant documentation.* When you serialize data in xarray, you're taking all those temperature measurements, along with their labels and other information, and turning them into a format that can be stored in a file or sent over the internet. xarray objects can be serialized into formats which store the labels alongside the data. - "Some supported serialization formats are files that can then be stored or transferred (e.g. netCDF), - whilst others are protocols that allow for data access over a network (e.g. Zarr)." + Some supported serialization formats are files that can then be stored or transferred (e.g. netCDF), + whilst others are protocols that allow for data access over a network (e.g. Zarr). indexing Indexing is how you select subsets of your data which you are interested in. From da3bce54badd0f96f2818411182b9b29a34ad180 Mon Sep 17 00:00:00 2001 From: Harshitha <97012127+harshitha1201@users.noreply.github.com> Date: Fri, 18 Aug 2023 21:12:48 +0530 Subject: [PATCH 15/18] Update doc/user-guide/terminology.rst Co-authored-by: Tom Nicholas --- doc/user-guide/terminology.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index fcce5dc315b..63c47868c24 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -232,8 +232,7 @@ complete examples, please consult the relevant documentation.* labeled Labeled data has metadata describing the context of the data, not just the raw data values. - These can be tick labels (stored as Coordinates) or unique names for each array. labels are - constituted by two main components: coordinates and attributes. These labels + This contextual information can be labels for array axes (i.e. dimension names) tick labels along axes (stored as Coordinate variables) or unique names for each array. These labels provide context and meaning to the data, making it easier to understand and work with. If you have temperature data for different cities over time. Using xarray, you can label the dimensions: one for cities and another for time. From cbf78fd2d1571cf859bf17e762a0a840965865b6 Mon Sep 17 00:00:00 2001 From: Harshitha <97012127+harshitha1201@users.noreply.github.com> Date: Fri, 18 Aug 2023 21:13:06 +0530 Subject: [PATCH 16/18] Update doc/user-guide/terminology.rst Co-authored-by: Tom Nicholas --- doc/user-guide/terminology.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index 63c47868c24..71ea2342042 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -159,7 +159,7 @@ complete examples, please consult the relevant documentation.* Broadcasting A technique that allows operations to be performed on arrays with different shapes and dimensions. - When performing operations on arrays with different shapes and dimensions, xarray will automatically broadcast the + When performing operations on arrays with different shapes and dimensions, xarray will automatically attempt to broadcast the arrays to a common shape before the operation is applied. .. ipython:: python From 2ffe4c9d499f4cbf571b8468e62f524c63067231 Mon Sep 17 00:00:00 2001 From: Harshitha <97012127+harshitha1201@users.noreply.github.com> Date: Fri, 18 Aug 2023 22:04:45 +0530 Subject: [PATCH 17/18] Update doc/user-guide/terminology.rst Co-authored-by: Tom Nicholas --- doc/user-guide/terminology.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index 71ea2342042..99f657530aa 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -247,7 +247,6 @@ complete examples, please consult the relevant documentation.* indexing Indexing is how you select subsets of your data which you are interested in. - dataset. - Label-based Indexing: Selecting data by passing a specific label and comparing it to the labels stored in the associated coordinates. You can use labels to specify what you want like "Give me the From 34a75f6451c5a43b4de5b3c4ca14ac35d3a504e1 Mon Sep 17 00:00:00 2001 From: Harshitha <97012127+harshitha1201@users.noreply.github.com> Date: Fri, 18 Aug 2023 22:04:59 +0530 Subject: [PATCH 18/18] Update doc/user-guide/terminology.rst Co-authored-by: Tom Nicholas --- doc/user-guide/terminology.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index 99f657530aa..d99312643aa 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -246,7 +246,7 @@ complete examples, please consult the relevant documentation.* whilst others are protocols that allow for data access over a network (e.g. Zarr). indexing - Indexing is how you select subsets of your data which you are interested in. + :ref:`Indexing` is how you select subsets of your data which you are interested in. - Label-based Indexing: Selecting data by passing a specific label and comparing it to the labels stored in the associated coordinates. You can use labels to specify what you want like "Give me the