Quick Start¶
This guide will help you get started with gedixr in just a few minutes.
Prerequisites¶
Before you begin, you'll need GEDI L2A/L2B v002 files. You can either:
- Download them through the NASA Earthdata Search web interface - see here, or
- Download them using gedixr - see Downloading Data for details
If Earthdata Search provided you with zipped files, please unzip them before proceeding.
Spatial Subsetting
Both NASA Earthdata Search and gedixr's download functionality allow you to subset
GEDI data to an area of interest during download, which can significantly reduce the
amount of data you need to process. You can then use gedixr's spatial subsetting
during extraction for further refinement to one or multiple detailed areas of interest.
Basic Workflow¶
1. Extract data¶
The following example will:
- Recursively search for GEDI L2B files in the specified directory
- Extract the default data variables for L2B files: rh100, tcc, fhd, pai. (See Variables for full lists of variables extracted by default.)
- Apply the default quality filtering criteria
- Save results as a GeoParquet file in the
extracted/subdirectory relative to the input directory and log the extraction process in thelog/subdirectory
Optional: Check extraction logs¶
The extraction process logs errors and warnings. Check the log/ subdirectory in your
input directory for detailed information if issues occur.
2. Load and merge extracted data¶
You can load the extracted GeoParquet files back into Python for further analysis using
the load_to_gdf function. If you extracted both L2A and L2B data, you can merge them
into a single GeoDataFrame while loading.
from gedixr.xr import load_to_gdf
gdf_merged = load_to_gdf(l2a="extracted/20260106_L2A_1.parquet",
l2b="extracted/20260106_L2B_1.parquet")
# or load single product:
gdf_l2b = load_to_gdf(l2b="extracted/20260106_L2B_1.parquet")
You can also merge L2A and L2B data directly after extraction using the merge_gdf
function:
from gedixr.extract import extract_data
from gedixr.xr import merge_gdf
# Extract both products
gdf_l2a, out_path_l2a = extract_data(directory="path/to/data", gedi_product='L2A')
gdf_l2b, out_path_l2b = extract_data(directory="path/to/data", gedi_product='L2B')
# Merge them (using inner join)
gdf_merged = merge_gdf(l2a=gdf_l2a, l2b=gdf_l2b)
3. Explore / Analyze data¶
Now that you have the data loaded as a geopandas.GeoDataFrame, you can start exploring
and analyzing it using geopandas and pandas, or other related libraries. For
example, you could use the xvec package to extract other environmental variables from
xarray Datasets based on the GEDI shot locations and acquisition times and then train
a machine learning model for predicting forest structure.
Overview of extraction options¶
The main extraction function extract_data (or gedixr extract CLI command) provides
various options to customize the extraction process. Here is a quick overview of these
options, which you can combine as needed.
Quality Filtering¶
Control whether to apply default quality filters:
See Quality Filtering for detailed information on the default quality filters applied as well as an example of how to implement custom filtering after extraction.
Output File Naming
The output filename indicates whether quality filtering was applied by using a
boolean suffix after the product type:
YYYYMMDDHHMMSS_L2B_1.parquet (filtered data), YYYYMMDDHHMMSS_L2B_0.parquet
(unfiltered data)
Spatial Subsetting¶
Extract data for specific areas using vector files:
from gedixr.extract import extract_data
# Single area
gdf = extract_data(
directory="path/to/data",
subset_vector="study_area.geojson"
)
# Multiple areas (returns a dictionary)
result_dict = extract_data(
directory="path/to/data",
subset_vector=["area1.geojson", "area2.geojson"]
)
# Access individual results
area1_gdf = result_dict['area1']['gdf']
area2_gdf = result_dict['area2']['gdf']
The output GeoParquet file(s) will be saved in the extracted/ subdirectory with the
vector file basename included in the filename (e.g.,
YYYYMMDDHHMMSS_L2B_1_study_area.parquet for the single area example above).
When using multiple vector files, the output dictionary will contain separate entries for each vector file with the following structure:
Where <Vector Basename> is the name of the vector file without the file extension,
geo is the geometry of the area, and gdf is the extracted GeoDataFrame for that area.
Specific Months¶
Extract only data from certain months (e.g., June to August):
Specific Beams¶
Extract data from specific beam types:
Custom Variables¶
You can specify custom variables to extract instead of the default variables. See Variables for more details.