Data loader package

The data loader package is used to open station and model data.

Base module

class pymepps.loader.base.BaseLoader(data_path, file_type=None, processes=1, checking=True)[source]
load_data()[source]
class pymepps.loader.datasets.metdataset.MetDataset(file_handlers, data_origin=None, processes=1)[source]

MetDataset is a base class for handling meteorolgical files.

The normal workroutine would be:
  1. load the files (use of file handlers)
  2. select the important variables within the files (this object)
  3. post-process the variables (MetData/SpatialData/TSData object)
Parameters:
  • file_handlers (list of childs of FileHandler or None.) – The loaded file handlers. This instance load the variables. If the file handlers are None then the dataset is used for conversion between Spatial and TSData.
  • data_origin (optional) – The class where the data comes from. Normally this would be a model or a measurement site. If this is None, this isn’t set. Default is None.
  • processes (int, optional) – This number of processes is used to calculate time-consuming functions. For time-consuming functions a progress bar is shown. If the number of processes is one the functions will be processed sequential. For more processes than one the multiprocessing module will be used. Default is 1.
data_merge(data, var_name)[source]

Method to merge the given data by given metadata into one data structure.

file_handlers
processes
select(var_name, **kwargs)[source]

Method to select a variable from this dataset. If the variable is find in more than one file or message, the method tries to find similarities within the metadata and to combine the data into one array, with several dimensions. This method could have a long running time, due to data loading and combination.

Parameters:
  • var_name (str) – The variable which should be extracted. If the variable is not found within the dataset there would be a value error exception.
  • kwargs (dict) – Additional parameters that are passed to the file handlers.
Returns:

extracted_data – A child instance of MetData with the data of the selected variable as data. If None is returned the variable wasn’t found within the list with possible variable names.

Return type:

SpatialData, TSData or None

select_by_pattern(pattern, return_list=True, **kwargs)[source]

Method to select variables from this dataset by keywords. This method uses list comprehension to extract the variable names where the var_name pattern is within the variable name. If the variable names are found the variable is selected with the select method.

Parameters:
  • pattern (str) – The pattern for which should be searched.
  • return_list (bool) – If the return value should be a list or a dictionary.
  • kwargs (dict) – Additional parameters that are passed to the file handlers.
Returns:

data_list – list(SpatialData or TSData) or None The return value is a dict/list with SpatialData instances, one entry for every found variable name. If return_list is False, are the keys the variable names. If None is returned no variable with this pattern was found.

Return type:

dict(str, SpatialData or TSData) or

select_ds(include=None, exclude=None, **kwargs)[source]

Extract the dataset data into a MetData instance. The include list is handled superior to the exclude list. If both lists are None all available variables are used.

Parameters:
  • include (iterable or None) – Within the include iterable are all variable names, which should be included into the MetData data. The list will be filtered for available variable names. If no variable name is available a ValueError will be raised. If this is None, the include will be skipped and the exclude list will be used. Default is None.
  • exclude (iterable or None) – If no include iterable is given, this exclude iterable is used. In this case, any available variable name, which is not within this list is used. If this iterable is also None, all available data variables are used to construct the MetData instance. Default is None.
  • kwargs (dict) – Additional parameters that are passed to the file handlers.
Returns:

extracted_data – The extracted data instance.

Return type:

TSData or SpatialData

Raises:

ValueError: – A ValueError is raised if no variable was selected from the dataset.

var_names

Get the available variable names.

variables

Return the variable names and the corresponding file handlers.

Open model files

class pymepps.loader.model.ModelLoader(data_path, file_type=None, grid=None, processes=1, checking=True)[source]

Bases: pymepps.loader.base.BaseLoader

A simplified way to load weather model data into a SpatialDataset. Technically this class is a helper and wrapper around the file handlers and SpatialDataset.

Parameters:
  • data_path (str) – The path to the files. This path could have a glob-conform path pattern. Every file found within this pattern will be used to determine the file type and to generate the SpatialDataset.
  • file_type (str or None, optional) –

    The file type determines which file handler will be used to load the data. If the file type is None it will be determined automatically based on given files. All the files with the majority file type will be used to generate the SpatialDataset. The available file_types are:

    nc: NetCDF files grib2: Grib2 files grib1: Grib1 files dap: Opendap urls
  • grid (str or Grid or None, optional) – The grid describes the horizontal grid of the spatial data. The given grid will be forwarded to the given SpatialDataset instance. Default is None.
pymepps.loader.model.open_model_dataset(data_path, file_type=None, grid=None, processes=1, checking=True)[source]
class pymepps.loader.datasets.spatialdataset.SpatialDataset(file_handlers, grid=None, data_origin=None, processes=1)[source]

Bases: pymepps.loader.datasets.metdataset.MetDataset

SpatialDataset is a class for a pool of file handlers. Typically a spatial dataset combines the files of one model run, such that it is possible to select a variable and get a SpatialData instance. For memory reasons the data of a variable is only loaded if it is selected.

Parameters:
  • file_handlers (list of childs of FileHandler or None) – The spatial dataset is based on these files. The files should be either instances of GribHandler or NetCDFHandler. If file handlers is None then the dataset is used for conversion from TSData to SpatialData.
  • grid (str or Grid or None) – The grid describes the horizontal grid of the spatial data. The grid will be appended to every created SpatialData instance. If a str is given it will be checked if the str is a path to a cdo-conform grid file or a cdo-conform grid string. If this is a instance of a child of Grid it is assumed that the grid is already initialized and this grid will be used. If this is None the Grid will be automatically read from the first file handler. Default is None.
  • data_origin (optional) – The data origin. This parameter is important to trace the data flow. If this is None, there is no data origin and this dataset will be the starting point of the data flow. Default is None.
  • processes (int, optional) – This number of processes is used to calculate time-consuming functions. For time-consuming functions a progress bar is shown. If the number of processes is one the functions will be processed sequential. For more processes than one the multiprocessing module will be used. Default is 1.
select()

Method to select a variable.

selnearest()

Method to select the nearest grid point for given coordinates.

sellonlatbox()

Method to slice a box with the given coordinates.

data_merge(data, var_name)[source]

Method to merge instances of xarray.DataArray into a single xarray.DataArray. Also the grid is read and set to the xarray.DataArray.

Parameters:
  • data (list of xarray.DataArray) – The data list.
  • var_name (str) – The name of the variable which is selected within the data list.
Returns:

merged_array – The merged DataArray with the grid coordinates and the extracted grid. If the grid could not extracted the grid is None and a DataArray without set grid is returned.

Return type:

xarray.DataArray

get_grid(var_name, data_array=None)[source]

Method to get for given variable name a Grid instance. If the grid attribute is already a Grid instance this grid will be returned. If the grid attribute is a str instance, the str will be read from file or from the given grid str. If the grid attribute isn’t set the grid instance will be the grid for the variable selected with the first corresponding file handler and cdo.

Parameters:
  • var_name (str) – The variable name, which should be used to generate the grid.
  • data_array (xarray.DataArray or None, optional) – If the data array is given the method will try to load the grid from the data array’s attributes. If None the DataArray method will be skipped. Default is None.
Returns:

grid – The returned grid. If the returned grid is None, the grid could not be read.

Return type:

Instance of child of grid or None

Open station files

class pymepps.loader.station.StationLoader(data_path, file_type=None, lonlat=None, processes=1, checking=True)[source]

Bases: pymepps.loader.base.BaseLoader

A simplified way to load station data into a TSDataset. Technically this class is a helper and wrapper around the file handlers and TSData.

Parameters:
  • data_path (str) – The path to the files. This path could have a glob-conform path pattern. Every file found within this pattern will be used to determine the file type and to generate the TSDataset.
  • file_type (str or None, optional) –

    The file type determines which file handler will be used to load the data. If the file type is None it will be determined automatically based on given files. All the files with the majority file type will be used to generate the TSDataset. The available file_types are:

    nc: NetCDF files wm: Text files in a specific “Wettermast format”
  • lonlat (tuple(float, float), optional) – The lonlat coordinate tuple describes the position of the station in degrees. If this is None the position is unknown. Default is None.
lon_lat()[source]
pymepps.loader.station.open_station_dataset(data_path, file_type=None, lonlat=None, processes=1, checking=True)[source]
class pymepps.loader.datasets.tsdataset.TSDataset(file_handlers, data_origin=None, lonlat=None, processes=1)[source]

Bases: pymepps.loader.datasets.metdataset.MetDataset

TSDataset is a class for a pool of file handlers. Typically a time series dataset combines the files of a station, such that it is possible to select a variable and get a TSData instance. For memory reasons the data of a variable is only loaded if it is selected.

Parameters:
  • file_handlers (list of childs of FileHandler or None) – The spatial dataset is based on these files. The files should be either instances of NetCDFHandler or TextHandler. If file handlers is None then the dataset is used for conversion from SpatialData to TSData.
  • data_origin (optional) – The data origin. This parameter is important to trace the data flow. If this is None, there is no data origin and this dataset will be the starting point of the data flow. Default is None.
  • lonlat (tuple(float, float) or None) – The coordinates (longitude, latitude) where the data is valid. If this is None the coordinates will be set based on data_origin or based on the first file handler.
select()

Method to select a variable.

data_merge(data, var_name)[source]
select_by_pattern(pattern, return_list=False, **kwargs)[source]

File handlers

The file handlers are used to open files with a specific format.

Base file handler

class pymepps.loader.filehandler.filehandler.FileHandler(file_path)[source]
get_messages(var_name, **kwargs)[source]
get_timeseries(var_name, **kwargs)[source]
var_names

Grib file handler

class pymepps.loader.filehandler.gribhandler.GribHandler(file_path)[source]

Bases: pymepps.loader.filehandler.filehandler.FileHandler

close()[source]
get_messages(var_name, **kwargs)[source]

Method to get message-wise the data for a given variable as xr.DataArray.

Parameters:var_name (str) – The name of the variable which should be extracted.
Returns:data – The list with the message-wise data as DataArray. The DataArray have six coordinates (analysis, ensemble, time, level, y, x). The shape of DataArray are normally (1,1,1,1,y_size,x_size).
Return type:list of xr.DataArray
is_type()[source]
open()[source]

NetCDF file handler

class pymepps.loader.filehandler.netcdfhandler.NetCDFHandler(file_path)[source]

Bases: pymepps.loader.filehandler.filehandler.FileHandler

close()[source]
get_messages(var_name, **kwargs)[source]

Method to imitate the message-like behaviour of grib files.

Parameters:
  • var_name (str) – The variable name, which should be extracted.
  • runtime (np.datetime64, optional) – If the dataset has no runtime this runtime is used. If the runtime is not set, the runtime will be inferred from file name.
  • ensemble (int or str, optional) – If the dataset has no ensemble information this ensemble is used. If the ensemble is not set, the ensemble will be inferred from file name.
  • sliced_coords (tuple(slice), optional) – If the cube should be sliced before it is loaded. This is helpful by large opendap requests. These slice will be used from the behind. So (slice(1,2,1), slice(3,5,1)) means […, 1:2, 3:5]. If it is not set all data is used. T
Returns:

data – The list with the message-wise data as DataArray. The DataArray have six coordinates (analysis, ensemble, time, level, y, x). The shape of DataArray are normally (1,1,1,1,y_size,x_size).

Return type:

list of xr.DataArray

get_timeseries(var_name, **kwargs)[source]

Method to get the time series from a NetCDF file. This is designed for measurement site data in netcdf format. At the moment this method is only tested for Wettermast Hamburg data!

Parameters:var_name (str) – The variable name, which should be extracted.
Returns:data – The selected variable is extracted as dict with pandas series as values.
Return type:dict with pandas series
is_type()[source]
load_cube(var_name)[source]

Method to load a variable from the netcdf file and return it as xr.DataArray.

Parameters:var_name (str) – The variable name, which should be extracted.
Returns:variable – The DataArray of the variable.
Return type:xr.DataArray
lon_lat
open()[source]
pymepps.loader.filehandler.netcdfhandler.cube_to_series(cube, var_name)[source]

Opendap file handler

class pymepps.loader.filehandler.opendaphandler.OpendapHandler(file_path)[source]

Bases: pymepps.loader.filehandler.netcdfhandler.NetCDFHandler

is_type()[source]
open()[source]