Python meteorological post-processing system

The PYthon Meteorological Post-Processing System could be used to post-process meteorological data. The main purpose of this project is the post-processing of numerical weather forecast data. This package has some modules to load and process meteorological spatial and time series data.

Note

The data structure was changed from composition to accessor structure with version 0.4.0.

Installation

This package is a python package. Further this package is based on different python tools. So there are some dependencies which should be matched. To install this package you should read the installation section.

Requirements

This package is programmed with python 3.5. It is not planned to support python 2.7. In the future this package will be back checked with travis ci for version 3.4 – 3.6. But at the moment we couldn’t guarantee the compatibility of other versions than 3.5.

The following packages are only recommended to use all features. * cdo * cdo bindings

In the future some requirements will be added, e.g. scikit-learn.

Installation

At the moment this package is not available on pypi and conda. So you have to clone this package and install it via pip.

It is recommended to install the requirements via a conda virtual environment, but it is also possible to install them via pip.

Installation via pip

git clone git@github.com:maestrotf/pymepps.git
cd pymepps
pip install -r requirements.txt
pip install .

History

Let’s talk a little about the history and the formation process of this package.

There are so many python packages why a new one?

Python is a rapidly developing programming language. In the last few years has got more fans, especially in the geoscientific community. There are so many different packages for different purposes, but no package matched my requirements.

What are the requirements?

In my bachelor thesis I used a simple method for post-processing of numerical weather model data. This method was based on a linear regression and is called model output statistics in the meteorological community. The work for this thesis needed a system for offline statistical processing of data. Later I developed an operational weather forecast system based on the same methods. The requirements of an online and operational weather forecast system are very different to an offline system. So the requirements for this package are offline and online processing of weather model data.

The biggest part of the online processing is outsourced to a companion project called pymepps-streaming. But this package will be the base for offline and online processing of numerical weather model data.

Data structure

Pymepps is a system to read and process meteorological data. So we defined some base types and tools for an easier read and process workflow.

File handlers

File handlers are used to read in the data. A file handler is working on file basis, such that one file handler could only process one file. There different file handlers for different file types. Some are only to read in spatial data and some could only read in time series data. But all file handlers have three common methods:

  • Method to load the data into the memory
  • Method to get the variable names within the file
  • Method to extract a variable and prepare the variable for the dataset.

The method to extract a variable uses a message based interface so that similar files could be merged within a dataset.

NetCDF handler

The NetCDF handler could be used to read in netcdf files. The NetCDF handler is based on the xarray and the netcdf4 package, so it is also possible to load opendap data streams with this handler. The NetCDF handler could be used to read in spatial and time series files. At the moment the load of time series data with this handler is only tested for measurement data from the “Universtät Hamburg”.

Grib handler

The grib handler could be used to read in grib1 and grib2 files. The grib handler is based on the pygrib package. The grib handler could be only used to read in spatial data, due to the requirements of a grib file.

At the moment there are only these two differnt file handlers, but it is planned to implement some other file handlers to read in hdf4/5 and csv based data.

Dataset

Datasets are used to combine file handlers and to manage the variable selection. A dataset is working at multiple file level. The messages of the file handlers are bundled to spatial or time series data. So the two different dataset types the spatial and the times series dataset have a merge method in common.

Spatial dataset

A spatial dataset is used to combine the file handlers, which are capable to read in spatial data. The spatial dataset interacts on the same level as the climate data operators (cdo). So it is possible to process the data of a spatial dataset with some of the cdos. A method for the general support of the cdos is planned. The spatial dataset also creates the grid for the spatial data. The grid could be either predefined or is read in with the griddes function from the cdo.

Time series dataset

A time series dataset is used to combine the times series file handlers. A time series dataset is valid for a given coordinates, so it is possible to defined a coordinate tuple. If no coordinate tuple is set the time series dataset tries to infer the coordinates from the data origin.

Data

The main data types are based on xarray and pandas. Within this package these data structures are extended by accessors. The accessor could be accessed with a property named pp. With the use of accessors the namespace of the base data structures remains clean.

Spatial data

Spatial data is loaded as xarray.DataArray. The SpatialAccessor extends the DataArray structure by grid capabilities. With the grid it is possible to remap and slice the data based on coordinates and other grids. Every time the xarray.DataArray is modified with xarray internal methods a new grid has to be set. For more information, please see the example `example_set_grid`_.

Time series data

Time series data is loaded as pandas.Series or pandas.DataFrame. The PandasAccessor extends the pandas structure by some station specific capabilities. It is planned to expand the accessor in the future.

Grid

There are several different grids defined for processing of spatial data. The grids are inspired by the grid description of the cdo. So it is possible to read the grid from meteorological files with the cdos. The grids are defined to process xarray.DataArray and numpy.ndarray instances.

LatLonGrid

The LatLonGrid is defined for equal spaced latitude and longitude grids. The grid has two coordinate values for one grid point and could be described by a start, end and step value for both coordinates. If the grid is sliced it will become a new LatLonGrid.

GaussianGrid

The GaussianGrid is almost like a LatLonGrid the single difference is that one of two coordinates are unequal spaced. So for this coordinate the coordinate values have to be given. If the grid is sliced it will become a new GaussianGrid.

ProjectionGrid

The ProjectionGrid could be described by two evenly-distributed coordinates. The coordinates could be translated with a given projection to longitude and latitude coordinates. An example for a ProjectionGrid is a grid with rotated poles. If the grid is sliced it will become an UnstructuredGrid, due to the nature of the grid type.

CurvilinearGrid

The CurvilinearGrid could be described as a grid where four vertices are given for every grid point. The CurvilinearGrid has two coordinates in their own system and at the moment the latitude and longitude values have to be predefined for every grid point. If the grid is sliced it will become an UnstructuredGrid, due to the nature of the grid type.

UnstructuredGrid

The UnstructuredGrid could be described as a grid with n-vertices for every grid point. So internally the UnstructuredGrid has only the number of cells as coordinate. For cell center a latitude and longitude value have to be defined. An example for an UnstructuredGrid is the icosahedral grid of the Icosahedral Nonhydrostatic (ICON) model. If the grid is sliced it will be a new UnstructuredGrid.

GridBuilder

The grid builder is used to encode cdo-conform grid information into a grid.

Examples for pymepps

The examples shows how to use the pymepps module for a simplified post-processing of meteorological data. There are several examples explaining how the accessor and the load functions are working.

Load station data based on NetCDF files

In this example we show how to load station data based on NetCDF files. The data is loaded with the pymepps package. Thanks to Ingo Lange we could use original data from the Wettermast for this example. In the following the data is loaded, plotted and saved as json file.

import pymepps
import matplotlib.pyplot as plt

We could use the global pymepps open_station_dataset function to open the Wettermast data. We have to specify the data path and the data type.

wm_ds = pymepps.open_station_dataset('../data/station/wettermast.nc', 'nc')

print(wm_ds)

Out:

TSDataset
---------
File handlers: 1
Variables: ['TT002_M10', 'lat', 'lon', 'product', 'station_details', 'time_bnds', 'zsl']
Lonlat: None

Now we could extract the temperature in 2 m height. For this we use the select method of the resulted dataset.

t2m = wm_ds.select('TT002_M10')

print(type(t2m))
print(t2m.describe())

Out:

<class 'pandas.core.series.Series'>
count    720.000000
mean      -3.317834
std        4.251115
min       -8.500000
25%       -6.690000
50%       -5.595000
75%        0.802500
max        5.040000
Name: TT002_M10, dtype: float64

We could see that the resulting temperature is a normal pandas.Series. So it is possible to use all pandas methods, e.g. plotting of the Series.

t2m.plot()
plt.xlabel('Date')
plt.ylabel('Temperature in °C')
plt.title('Temperature at the Wettermast Hamburg')
plt.show()
_images/sphx_glr_example_plot_stationnc_001.png

Pymepps uses an accessor to extend the pandas functionality. The accessor could be accessed with Series.pp. At the moment there is only a lonlat attribute, update, save and load method defined, but it is planned to expand the number of additional methods.

print(t2m.pp.lonlat)

Out:

None

We could see that the logitude and latitude are None at the moment, because we haven’t set the yet. We could either set them directly or set the coordintes in the open_station_dataset function with the lonlat argument.

Total running time of the script: ( 0 minutes 0.233 seconds)

Generated by Sphinx-Gallery

How to use Xarray accessor

This example shows how to use the SpatialData accessor to extend the capabilities of xarray.

To extend xarray.DataArray you need only to load also pymepps with “import pymepps”. The extensions could be used with the property xarray.DataArray.pp

import matplotlib.pyplot as plt
import xarray as xr
import pymepps

To use the full power of pymepps, you have to set a grid. If you load the data with the xarray functions you have to set the grid afterwards. So the next step is to load a NetCDF model file with xarray. There are also pymepps functions to load model data. These are shown in another example.

ds = xr.open_dataset('../data/model/GFS_Global_0p25deg_20161219_0600.nc')
t2m_max = ds['Maximum_temperature_height_above_ground_Mixed_intervals_Maximum']
print(t2m_max)

Out:

<xarray.DataArray 'Maximum_temperature_height_above_ground_Mixed_intervals_Maximum' (time: 1, height_above_ground: 1, lat: 721, lon: 1440)>
[1038240 values with dtype=float32]
Coordinates:
  * time                 (time) datetime64[ns] 2016-12-19T09:00:00
  * height_above_ground  (height_above_ground) float32 2.0
  * lat                  (lat) float32 90.0 89.75 89.5 89.25 89.0 88.75 88.5 ...
  * lon                  (lon) float32 0.0 0.25 0.5 0.75 1.0 1.25 1.5 1.75 ...
Attributes:
    long_name:                       Maximum temperature (Mixed_intervals Max...
    units:                           K
    abbreviation:                    TMAX
    grid_mapping:                    LatLon_Projection
    Grib_Statistical_Interval_Type:  Maximum
    Grib_Variable_Id:                VAR_0-0-4_L103_Imixed_S2
    Grib2_Parameter:                 [0 0 4]
    Grib2_Parameter_Discipline:      Meteorological products
    Grib2_Parameter_Category:        Temperature
    Grib2_Parameter_Name:            Maximum temperature
    Grib2_Level_Type:                Specified height level above ground
    Grib2_Generating_Process_Type:   Forecast

The grid definition is inspired by the climate data operators. So you could either generate your own grid (done in this example), or you could load a cdo-conform grid file.

We could see that the grid is a structured latitude and longitude grid with a resolution of 0.25 degree.

grid_dict = dict(
    gridtype='lonlat',
    xsize=t2m_max['lon'].size,
    ysize=t2m_max['lat'].size,
    xfirst=t2m_max['lon'].values[0],
    xinc=0.25,
    yfirst=t2m_max['lat'].values[0],
    yinc=-0.25,
)

We created our grid dict with the information. Now we have to build the grid. In pymepps you could use the GridBuilder to build the grid with given grid_dict.

builder = pymepps.GridBuilder(grid_dict)
grid = builder.build_grid()
print(grid)

Out:

LonLatGrid
----------
gridtype = lonlat
xlongname = longitude
xname = lon
xunits = degrees
ylongname = latitude
yname = lat
yunits = degrees
xsize = 1440
ysize = 721
xfirst = 0.0
xinc = 0.25
yfirst = 90.0
yinc = -0.25

The next step is to set the grid for our dataset. For this we could use the set_grid method of the SpatialAccessor.

t2m_max = t2m_max.pp.set_grid(grid)
print(t2m_max.pp.grid)

Out:

LonLatGrid
----------
gridtype = lonlat
xlongname = longitude
xname = lon
xunits = degrees
ylongname = latitude
yname = lat
yunits = degrees
xsize = 1440
ysize = 721
xfirst = 0.0
xinc = 0.25
yfirst = 90.0
yinc = -0.25

Now we set the grid. It is also possible to normalize the coordinates to allow a consistent processing of the model data.

# Before normalization
print('Before:\n{0:s}\n'.format(str(t2m_max)))

t2m_max = t2m_max.pp.normalize_coords()
# After normalization
print('After:\n{0:s}'.format(str(t2m_max)))

Out:

Before:
<xarray.DataArray 'Maximum_temperature_height_above_ground_Mixed_intervals_Maximum' (time: 1, height_above_ground: 1, lat: 721, lon: 1440)>
[1038240 values with dtype=float32]
Coordinates:
  * time                 (time) datetime64[ns] 2016-12-19T09:00:00
  * height_above_ground  (height_above_ground) float32 2.0
  * lat                  (lat) float64 90.0 89.75 89.5 89.25 89.0 88.75 88.5 ...
  * lon                  (lon) float64 0.0 0.25 0.5 0.75 1.0 1.25 1.5 1.75 ...
Attributes:
    long_name:                       Maximum temperature (Mixed_intervals Max...
    units:                           K
    abbreviation:                    TMAX
    grid_mapping:                    LatLon_Projection
    Grib_Statistical_Interval_Type:  Maximum
    Grib_Variable_Id:                VAR_0-0-4_L103_Imixed_S2
    Grib2_Parameter:                 [0 0 4]
    Grib2_Parameter_Discipline:      Meteorological products
    Grib2_Parameter_Category:        Temperature
    Grib2_Parameter_Name:            Maximum temperature
    Grib2_Level_Type:                Specified height level above ground
    Grib2_Generating_Process_Type:   Forecast

After:
<xarray.DataArray 'Maximum_temperature_height_above_ground_Mixed_intervals_Maximum' (runtime: 1, ensemble: 1, validtime: 1, height: 1, lat: 721, lon: 1440)>
array([[[[[[ 260.700012, ...,  260.700012],
           ...,
           [ 248.800003, ...,  248.800003]]]]]], dtype=float32)
Coordinates:
  * validtime  (validtime) datetime64[ns] 2016-12-19T09:00:00
  * height     (height) float32 2.0
  * lat        (lat) float64 90.0 89.75 89.5 89.25 89.0 88.75 88.5 88.25 ...
  * lon        (lon) float64 0.0 0.25 0.5 0.75 1.0 1.25 1.5 1.75 2.0 2.25 ...
  * ensemble   (ensemble) <U3 'det'
  * runtime    (runtime) object None
Attributes:
    long_name:                       Maximum temperature (Mixed_intervals Max...
    units:                           K
    abbreviation:                    TMAX
    grid_mapping:                    LatLon_Projection
    Grib_Statistical_Interval_Type:  Maximum
    Grib_Variable_Id:                VAR_0-0-4_L103_Imixed_S2
    Grib2_Parameter:                 [0 0 4]
    Grib2_Parameter_Discipline:      Meteorological products
    Grib2_Parameter_Category:        Temperature
    Grib2_Parameter_Name:            Maximum temperature
    Grib2_Level_Type:                Specified height level above ground
    Grib2_Generating_Process_Type:   Forecast

We could see that the height_above_ground and the time variable are renamed to a more common name. The ensemble member is set to the default value ‘det’, while the runtime is set to the missing value None. Now lets plot the data with the xarray internal plot method.

t2m_max.plot()
plt.show()
_images/sphx_glr_example_plot_xr_accessor_001.png

Lets make use of the SpatialAccessor to slice an area over germany. We would also transform the temperature unit to degree celsius. For this we could use the normal xarray.DataArray mathematical operations. After the transformation lets plot the temperature.

# sphinx_gallery_thumbnail_number = 2
ger_t2m_max = t2m_max.pp.sellonlatbox([5, 55, 15, 45])
# K to deg C
ger_t2m_max -= 273.15
ger_t2m_max.plot()
plt.show()
_images/sphx_glr_example_plot_xr_accessor_002.png

If we use a xarray.DataArray method where the DataArray instance is copied, we have to set a new grid. This behaviour coud seen in the following code block.

stacked_array = t2m_max.stack(stacked=('runtime', 'validtime'))
# we have to catch the error for sphinx documentation
try:
    print(stacked_array.pp.grid)
except TypeError:
    print('This DataArray has no grid defined!')

Out:

This DataArray has no grid defined!

This seen behavior arises from the fact that the grid is depending on the grid coordinates of the DataArray and they could be changed with a xarray.DataArray method.

Total running time of the script: ( 0 minutes 0.387 seconds)

Generated by Sphinx-Gallery

Load a thredds dataset

In the following example we will load a thredds dataset from the norwegian met.no thredds server.

import numpy as np
import matplotlib.pyplot as plt

import pymepps

The first step is to load the dataset. This will be performed with pymepps.open_model_dataset. The NetCDF4 backend is also supporting opendap paths. So we could specify nc as data type.

metno_path = 'http://thredds.met.no/thredds/dodsC/meps25files/' \
    'meps_det_pp_2_5km_latest.nc'
metno_ds = pymepps.open_model_dataset(metno_path, 'nc')

The resulting dataset is a SpatialDataset. The dataset has several methods to load a xr.DataArray from the path. It also possible to print the content of the dataset. The content contains the dataset type, the number of file handlers within the dataset and all available data variables.

print(metno_ds)

Out:

SpatialDataset
--------------
File handlers: 1
Variables: ['air_pressure_at_sea_level', 'air_temperature_2m', 'altitude', 'cloud_area_fraction', 'fog_area_fraction', 'forecast_reference_time', 'helicopter_triggered_index', 'high_type_cloud_area_fraction', 'land_area_fraction', 'low_type_cloud_area_fraction', 'medium_type_cloud_area_fraction', 'precipitation_amount', 'precipitation_amount_acc', 'precipitation_amount_high_estimate', 'precipitation_amount_low_estimate', 'precipitation_amount_middle_estimate', 'precipitation_amount_prob_low', 'projection_lambert', 'relative_humidity_2m', 'surface_air_pressure', 'thunderstorm_index_combined', 'wind_speed_maxarea_10m', 'wind_speed_of_gust', 'x_wind_10m', 'y_wind_10m']

The next step is to select/extract a variable from the Dataset. We will select the air temperature in 2 metre height and print the content of the resulting data

metno_t2m = metno_ds.select('air_temperature_2m')
print(metno_t2m)
metno_t2m.isel(validtime=0).plot()
plt.show()
_images/sphx_glr_example_plot_thredds_001.png

Out:

<xarray.DataArray 'air_temperature_2m' (runtime: 1, ensemble: 1, validtime: 67, height: 1, y: 929, x: 719)>
array([[[[[[ 291.478027, ...,  291.090332],
           ...,
           [ 278.397949, ...,  282.883789]]],


         ...,
         [[[ 291.09082 , ...,  290.721008],
           ...,
           [ 278.949707, ...,  279.577637]]]]]])
Coordinates:
  * validtime  (validtime) datetime64[ns] 2017-07-19T06:00:00 ...
  * height     (height) float32 2.0
  * x          (x) float64 -8.974e+05 -8.949e+05 -8.924e+05 -8.899e+05 ...
  * y          (y) float64 -1.104e+06 -1.102e+06 -1.099e+06 -1.097e+06 ...
    longitude  (y, x) float64 1.918 1.954 1.989 2.025 2.06 2.096 2.131 2.167 ...
    latitude   (y, x) float64 52.3 52.31 52.31 52.32 52.32 52.32 52.33 52.33 ...
  * ensemble   (ensemble) int64 0
  * runtime    (runtime) object None
Attributes:
    long_name:                       Screen level temperature (T2M)
    standard_name:                   air_temperature
    units:                           K
    _ChunkSizes:                     [  1   1 929 719]
    Conventions:                     CF-1.6
    institution:                     Norwegian Meteorological Institute, MET ...
    creator_url:                     met.no
    summary:                         MEPS (MetCoOp-Ensemble Prediction System...
    title:                           MEPS 2.5km
    geospatial_lat_min:              51.0
    geospatial_lat_max:              88.0
    geospatial_lon_min:              -20.0
    geospatial_lon_max:              80.0
    references:                      unknown
    license:                         https://www.met.no/en/free-meteorologica...
    comment:                         none
    history:                         2017-07-19 creation by fimex
    min_time:                        2017-07-19 06:00 UTC
    max_time:                        2017-07-22 00:00 UTC
    source:                          meps
    DODS_EXTRA.Unlimited_Dimension:  time
    name:                            air_temperature_2m

We could see that the resulting data is a normal xarray.DataArray and all of the DataArray methods could be used. The coordinates of the DataArray are normalized. The DataArray is expanded with an accessor. Also the coordinates are normalized. We could access the accessor with metno_t2m.pp. The main methods of the accessor are allowing a grid handling. So our next step is to explore the grid of the DataArray.

print(metno_t2m.pp.grid)

Out:

ProjectionGrid
--------------
gridtype = projection
xlongname = x-coordinate in Cartesian system
xname = x
xunits = m
ylongname = y-coordinate in Cartesian system
yname = y
yunits = m
proj4 = +proj=lcc +lat_0=63 +lon_0=15 +lat_1=63 +lat_2=63 +no_defs +R=6.371e+06
gridsize = 667951.0
xsize = 719.0
ysize = 929.0
xdimname = x
ydimname = y
xfirst = -897442.2
xinc = 2500.0
yfirst = -1104322.0
yinc = 2500.0
grid_mapping = projection_lambert
grid_mapping_name = lambert_conformal_conic
standard_parallel = [63.0, 63.0]
longitude_of_central_meridian = 15.0
latitude_of_projection_origin = 63.0
earth_radius = 6371000.0

We could see that the grid is a grid with a defined projection. In our next step we will slice out an area around Hamburg. We will see that a new DataArray with a new grid is created.

hh_bounds = [9, 54, 11, 53]
t2m_hh = metno_t2m.pp.sellonlatbox(hh_bounds)
print(t2m_hh.pp.grid)
print(t2m_hh)

Out:

UnstructuredGrid
----------------
gridtype = unstructured
xlongname = longitude
xname = lon
xunits = degrees
ylongname = latitude
yname = lat
yunits = degrees
gridsize = 2401
<xarray.DataArray (runtime: 1, ensemble: 1, validtime: 67, height: 1, ncells: 2401)>
array([[[[[ 288.434265, ...,  287.86261 ]],

         ...,
         [[ 289.025787, ...,  286.954559]]]]])
Coordinates:
  * runtime    (runtime) object None
  * ensemble   (ensemble) int64 0
  * validtime  (validtime) datetime64[ns] 2017-07-19T06:00:00 ...
  * height     (height) float32 2.0
  * ncells     (ncells) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
Attributes:
    long_name:                       Screen level temperature (T2M)
    standard_name:                   air_temperature
    units:                           K
    _ChunkSizes:                     [  1   1 929 719]
    Conventions:                     CF-1.6
    institution:                     Norwegian Meteorological Institute, MET ...
    creator_url:                     met.no
    summary:                         MEPS (MetCoOp-Ensemble Prediction System...
    title:                           MEPS 2.5km
    geospatial_lat_min:              51.0
    geospatial_lat_max:              88.0
    geospatial_lon_min:              -20.0
    geospatial_lon_max:              80.0
    references:                      unknown
    license:                         https://www.met.no/en/free-meteorologica...
    comment:                         none
    history:                         2017-07-19 creation by fimex
    min_time:                        2017-07-19 06:00 UTC
    max_time:                        2017-07-22 00:00 UTC
    source:                          meps
    DODS_EXTRA.Unlimited_Dimension:  time
    name:                            air_temperature_2m

We sliced a longitude and latitude box around the given grid. So we sliced the data in a longitude and latitude projection. Our original grid was in another projection with unstructured lat lon coordinates. So it is not possible to create a structured grid based on this slice. So the grid becomes an unstructured grid. In the next step we will show the remapping capabilities of the pymepps grid structure.

If we slice the data we have seen that the structured grid could not maintained. So in the next step we will create a structured LonLatGrid from scratch. After the grid building we will remap the raw DataArray basen on the new grid.

The first step is to calculate the model resolution in degree.

res = 2500   # model resolution in metre
earth_radius = 6371000 # Earth radius in metre
res_deg = np.round(res*360/(earth_radius*2*np.pi), 4)
# rounded model resolution equivalent in degree if it where on the equator
print(res_deg)

Out:

0.0225

Our next step is to build the grid. The grid implementation is inspired by the climate data operators. So to build the grid we will use the same format.

grid_dict = dict(
    gridtype='lonlat',
    xsize=int((hh_bounds[2]-hh_bounds[0])/res_deg),
    ysize=int((hh_bounds[1]-hh_bounds[3])/res_deg),
    xfirst=hh_bounds[0],
    xinc=res_deg,
    yfirst=hh_bounds[3],
    yinc=res_deg,
)

Now we use our grid dict together with the GridBuilder to build our grid.

builder = pymepps.GridBuilder(grid_dict)
hh_grid = builder.build_grid()
print(hh_grid)

Out:

LonLatGrid
----------
gridtype = lonlat
xlongname = longitude
xname = lon
xunits = degrees
ylongname = latitude
yname = lat
yunits = degrees
xsize = 88
ysize = 44
xfirst = 9
xinc = 0.0225
yfirst = 53
yinc = 0.0225

Now we created the grid. The next step is a remapping of the raw DataArray to the new Grid. We will use th enearest neighbour approach to remap the data.

t2m_hh_remapped = metno_t2m.pp.remapnn(hh_grid)

print(t2m_hh_remapped)

Out:

<xarray.DataArray (runtime: 1, ensemble: 1, validtime: 67, height: 1, lat: 44, lon: 88)>
array([[[[[[ 288.265228, ...,  288.897858],
           ...,
           [ 287.495361, ...,  289.750275]]],


         ...,
         [[[ 287.590698, ...,  287.320648],
           ...,
           [ 285.674713, ...,  290.341522]]]]]])
Coordinates:
  * runtime    (runtime) object None
  * ensemble   (ensemble) int64 0
  * validtime  (validtime) datetime64[ns] 2017-07-19T06:00:00 ...
  * height     (height) float32 2.0
  * lat        (lat) float64 53.0 53.02 53.05 53.07 53.09 53.11 53.14 53.16 ...
  * lon        (lon) float64 9.0 9.023 9.045 9.068 9.09 9.113 9.135 9.158 ...
Attributes:
    long_name:                       Screen level temperature (T2M)
    standard_name:                   air_temperature
    units:                           K
    _ChunkSizes:                     [  1   1 929 719]
    Conventions:                     CF-1.6
    institution:                     Norwegian Meteorological Institute, MET ...
    creator_url:                     met.no
    summary:                         MEPS (MetCoOp-Ensemble Prediction System...
    title:                           MEPS 2.5km
    geospatial_lat_min:              51.0
    geospatial_lat_max:              88.0
    geospatial_lon_min:              -20.0
    geospatial_lon_max:              80.0
    references:                      unknown
    license:                         https://www.met.no/en/free-meteorologica...
    comment:                         none
    history:                         2017-07-19 creation by fimex
    min_time:                        2017-07-19 06:00 UTC
    max_time:                        2017-07-22 00:00 UTC
    source:                          meps
    DODS_EXTRA.Unlimited_Dimension:  time
    name:                            air_temperature_2m

To plot the data in a map, we have to slice the data. We will select the first validtime as plotting parameter.

t2m_hh_remapped.isel(validtime=0).plot()
plt.show()
_images/sphx_glr_example_plot_thredds_002.png

In the map around Hamburg we could see the north and baltic sea in the top edges. But with the nearest enighbour approach we retain some of the sharp edges at the map. Our last step is a second remap plot, this time with a bilinear approach.

# sphinx_gallery_thumbnail_number = 3
metno_t2m.pp.remapbil(hh_grid).isel(validtime=0).plot()
plt.show()
_images/sphx_glr_example_plot_thredds_003.png

Total running time of the script: ( 1 minutes 8.125 seconds)

Generated by Sphinx-Gallery

Generated by Sphinx-Gallery

API

Accessor package

The accessor package is used to extend the pandas and xarray data structures.

Base module

class pymepps.accessor.base.MetData(data)[source]

MetData is the base class for meteorological data, like station data, nwp forecast data etc.

data

Access the parent object.

static load(load_path)[source]

Load a new instance.

save(save_path)[source]

Save this instance.

Pandas module

class pymepps.accessor.pandas.PandasAccessor(data)[source]

Bases: pymepps.accessor.base.MetData

An accessor to extend the pandas data structure. This could be used more actively in the future to add more post-processing specific features to pandas.

static load(load_path)[source]

Load the given json file and return a TSData instance with the loaded file. The loader uses tries to locate the lonlat and the data keys within the json file. If there are not these keys the loader tries to load the whole json file into pandas.

Parameters:load_path (str) – Path to the json file which should be loaded. It is recommended to load only previously saved TSData instances.
Returns:load_data – The loaded pandas object.
Return type:pandas object
save(save_path)[source]

The data is saved as json file. The pandas to_json method is used to generate convert the data to json. If lonlat was given it will be saved under a lonlat key. Json is used instead of HDF5 due to possible corruption problems.

Parameters:save_path (str) – Path where the json file should be saved.
update(*items)[source]

Update the data.

Spatial module

class pymepps.accessor.spatial.SpatialAccessor(data, grid=None)[source]

Bases: pymepps.accessor.base.MetData

The SpatialAccessor extends a xarray.DataArray for post-processing of meteorological numerical weather model data. The SpatialAccessor works mostly with grid and gridded data.

check_data_coordinates(item)[source]

Check if items grid coordinates shape is the same as those of the grid.

Parameters:

item (xarray.DataArray) – Instance to test for type and grid dimension length.

Returns:

item – The checked item.

Return type:

xarray.DataArray

Raises:
  • TypeError: – The grid is not set.
  • ValueError: – The given item has not the same last coordinates as the grid.
grid

The corresponding grid of this xarray.DataArray instance. This grid is used to interpolate/remap the data and to select the nearest grid point to a given longitude/latitude pair.

grid_to_attrs()[source]
static load(load_path)[source]

Load a NetCDF-based previously saved xarray.DataArray instance. If the NetCDF file has grid attributes they will be decoded as new grid.

Parameters:load_path (str) – The path to the saved xarray.DataArray instance.
Returns:loaded_array – The loaded DataArray instance. If a grid could be created it will be set to the DataArray instance.
Return type:xarray.DataArray
merge(*items)[source]

The merge routine could be used to merge this SpatialData instance with other instances. The merge creates a new merge dimension, named after the variable names. The grid of this instance is used as merged grid.

Parameters:items (xarray.DataArray) – The items are merged with this xarray.DataArray instance. The grid dimensions have to be same as the grid.
Returns:merged_array – The DataArray instance with the merged data.
Return type:xarray.DataArray
merge_analysis_timedelta(analysis_axis='runtime', timedelta_axis='validtime')[source]

The analysis time axis will be merged with the valid time axis, which should be given as timedelta. The merged time coordinate is called time and will be the first coordinate .

Parameters:
  • analysis_axis (str, optional) – The analysis time axis name. This axis will be used as basis for the valid time. Default is runtime.
  • timedelta_axis (str, optional) – The time delta axis name. This axis should contain the difference to the analysis time.
Returns:

merged_array – The DataArray with the merged analysis and timedelta coordinate.

Return type:

xarray.DataArray

normalize_coords(runtime=None, ensemble='det', validtime=None, height=None)[source]

Normalize the coordinates of the DataArray. The number, order and names of the coordinates are normalized. The number of coordinates will be four to six, depending if the DataArray is a merged multi-variable DataArray and the number of grid coordinates. The values of the added coordinates is set to the given values or will be None as filling value. The order and name of the DataArray will be:

  • (variable) (Only if the DataArray is a multi-variable DataArray. This is the variable name)
  • runtime (The analysis time of the model. The model is started at this time. The runtime is np.datetime64 as type)
  • ensemble (The ensemble member of the model.)
  • validtime (The lead time of the model. The model is valid for this times. The validtime timedelta to the runtime.)
  • height (The height information of the model.)
  • first grid coordinate
  • (second grid coordinate) (Only if the grid is not an unstructured grid)
Parameters:
  • runtime (datetime.datetime, np.datetime64 or None, optional) – The runtime of the model. The runtime will be converted to np.datetime64[ns] if it is not already this type. Default is None.
  • ensemble (int or str, optional) – The ensemble member of the model. An integer is indicating the member number, with zero as control run. Default is ‘det’.
  • validtime (datetime.datetime, np.datetiime64, np.timedelta or None,) – optional The validtime of the model. The validtime is converted to np.timedelta. Default is None.
  • height (int, str or None, optional) – The height of the model. Default is None.
Returns:

normalized_array – The DataArray with normalized coordinates.

Return type:

xr.DataArray

remapbil(new_grid)[source]

Remap the horizontal grid with a bilinear approach to a given new grid.

Parameters:new_grid (Child instance of Grid) – The data is remapped to this grid.
Returns:remapped_array – The xarray.DataArray with the replaced grid.
Return type:xarray.DataArray
remapnn(new_grid)[source]

Remap the horizontal grid with a nearest neighbour approach to a given new grid.

Parameters:new_grid (Child instance of Grid) – The data is remapped to this grid.
Returns:remapped_array – The xarray.DataArray with the replaced grid.
Return type:xarray.DataArray
save(save_path)[source]

Save the DataArray and the grid as attributes together. The grid attributes are used by the load method to recreate the grid, but it is also possible to load the data with the normal xarray load functions.

Parameters:save_path (str) – The path where the netcdf file should be saved.
sellonlatbox(lonlatbox)[source]

This DataArray instance is sliced by given lonlatbox. A new grid is created and set based on the sliced coordinates.

Parameters:lonlatbox (tuple(float)) –

The longitude and latitude box with four entries as degree. The entries are handled in the following way:

(left/west, top/north, right/east, bottom/south)
Returns:sliced_array – The sliced data array with the new grid.
Return type:xarray.DataArray

Notes

For some grids the new grid is based on an UnstructuredGrid, due to technical limitations.

selpoint(lonlat)[source]

Select a longitude, latitude point within this DataArray. A new lonlat grid with a single point is created.

Parameters:lonlat (tuple(float)) – The longitude and latitude point as degree. The nearest neighbour point to this given coordinate pair is used.
Returns:sliced_array – The sliced data array with the data for the nearest neighbour point to the given coordinates.
Return type:xarray.DataArray
set_grid(grid=None)[source]

Set the grid to the given grid and set the grid coordinates. It is assumed that the last n dimensions (n=1 for unstructured grid, n=2 all other grids) are the grid coordinates. Please make sure that this assumption is fulfilled!

Parameters:grid (Grid or None, optional) – This grid is used to set the grid and the grid coordinates of the returned array. If this is None, the grid of this DataArray instance is used. Default is None.
Returns:gridded_array – The DataArray with the grid coordinates and the grid.
Return type:xarray.DataArray
Raises:ValueError – A ValueError is raised if the grid of this instance is used and not grid set.
to_pandas(lonlat=None)[source]

Transform the DataArray to Pandas based on given coordinates. If coordinates are given this method selects the nearest neighbour grid point to this coordinates. The data is flatten to a 2d-DataFrame with the time as row axis.

Parameters:lonlat (tuple(float, float) or None) – The nearest grid point to this coordinates (longitude, latitude) is used to generate the pandas data. If lonlat is None no coordinates will be selected and the data is flatten. If the horizontal grid coordinates are not a single point it is recommended to set lonlat.
Returns:extracted_data – The extracted pandas data. The data is based on either a Series (1 Column) or Dataframe (multiple column) depending on the dimensions.
Return type:pandas.Series or pandas.DataFrame
update(*items)[source]

The update routine could be used to update the DataArray, based on other DataArrays. There are some assumptions done:

1. The used data to update this DataArray instance has the same grid coordinates as this instance. 2. Beginning from the left the given items are used to update the data. Such that intersection problems are resolved in favor of the newest data.
Parameters:items (xarray.DataArray) – The items are merged with this xarray.DataArray instance. The grid dimensions have to be same as the grid.
Returns:merged_array – The DataArray instance with the updated data.
Return type:xarray.DataArray

Utilities module

pymepps.accessor.utilities.register_dataframe_accessor(name)[source]

Register a custom accessor on pandas.DataFrame objects.

Parameters:name (str) – Name under which the accessor should be registered. A warning is issued if this name conflicts with a preexisting attribute.
pymepps.accessor.utilities.register_series_accessor(name)[source]

Register a custom accessor on pandas.Series objects.

Parameters:name (str) – Name under which the accessor should be registered. A warning is issued if this name conflicts with a preexisting attribute.

Grid package

The grid package is an extension for the SpatialDataset to support horizontal grid transformations and functions.

Grid builder

class pymepps.grid.builder.GridBuilder(griddes)[source]
build_grid()[source]

This method build up the grid with the griddes attribute.

Returns:grid – The built grid. The class of the grid is defined by the gridtype. The values of the grid are calculated with griddes.
Return type:child instance of Grid
static decode_str(grid_str)[source]

Method to clean the given grid str and to get a python dict. Key and value are separated with =. Every new key value pair needs a new line delimiter. Only alphanumeric characters are allowed as key and value. To delimit a value list use spaces and new lines. Lines with # are used as comment lines.

Steps to decode the grid string:
  1. String splitting by new line delimiter
  2. Clean the lines from unallowed characters
  3. Split the non-comment lines to key, value pairs
  4. Append elements where no key, value pair is available to the previous value
  5. Clean and split the key, value elements from spaces
  6. Convert the values to float numbers
Parameters:grid_str (str or list(str)) – The given grid_str which should be decoded. If this is a string the string will be splitten by new line into a list. It is necessary that every list entry has only one key = value entry.
Returns:grid_dict – The decoded grid dict from the str.
Return type:dict(str, str or float)
griddes
static open_string(path_str)[source]

This method is used to check if the given str is a path or a grid string.

Parameters:path_str (str) – This string is checked and if it is a path it will be read.
Returns:grid_str – The given str or the read str.
Return type:str
Raises:TypeError – If path_str is not a str type.

Grids

BaseGrid
class pymepps.grid.grid.Grid(grid_dict)[source]

The base class for every grid type.

static convert_to_deg(field, unit)[source]

Method to convert given field with given unit into degree.

Parameters:
  • field
  • unit
copy()[source]
get_coord_names()[source]

Returns the name of the coordinates.

Returns:
  • yname (str) – The name of the y-dimension.
  • xname (str) – The name of the x-dimension
get_coords()[source]

Get the coordinates in a xarray-compatible way.

Returns:coords – The coordinates in a xarray compatible coordinates format. The key is the coordinate name. The coordinates have as value a tuple with their own name, indicating that the they are self-describing, and the coordinate values as numpy array.
Return type:dict(str, (str, numpy.ndarray))
get_nearest_point(data, coord)[source]

Get the nearest neighbour grid point for a given coordinate. The distance between the grid points and the given coordinates is calculated with the haversine formula.

Parameters:
  • data (numpy.array or xarray.DataArray) – The return value is extracted from this array. The array should have at least two dimensions. If the array has more than two dimensions the last two dimensions will be used as horizontal grid dimensions.
  • coord (tuple(float, float)) – The data of the nearest grid point to this coordinate (latitude, longitude) will be returned. The coordinate should be in degree.
Returns:

nearest_data – The extracted data for the nearest neighbour grid point. The dimensions of this array are the same as the input data array without the horizontal coordinate dimensions. There is at least one dimension.

Return type:

numpy.ndarray or xarray.DataArray

interpolate(data, other_grid, order=0)[source]

Interpolate the given data to the given other grid.

Parameters:
  • data (numpy.ndarray or xarray.DataArray) – This data is used for the interpolation. The shape of data’s grid axis needs to be the same as this grid.
  • other_grid (Grid instance) – The other_grid is used as target grid for the interpolation.
  • order (int, optional) – Specifies the interpolation order, based on basemap.interp order. 0. order: nearest neighbour 1. order: bilinear interpolation
Returns:

remapped_data – The remapped data with the same type as the input data. If the input data is a xarray.DataArray the output data will use the same attributes and non-grid dimensions as the input data.

Return type:

numpy.ndarray or xarray.DataArray

lat_lon

Get latitudes and longitudes for every grid point as xarray.Dataset.

Returns:lat_lon – The latitude and longitude values for every grid point as xarray.Dataset with latitude and longitude as variables.
Return type:xarray.Dataset
len_coords

Get the number of coordinates for this grid.

Returns:len_coords – Number of coordinates for this grid.
Return type:int
lonlatbox(data, ll_box)[source]

Slice a lonlatbox from the given data.

nearest_point(coord)[source]
static normalize_lat_lon(lat, lon, data=None)[source]

The given coordinates will be normalized and reorder into basemap conform coordinates. If the longitude values are between 0° and 360°, they will be normalized to values between -180° and 180°. Then the coordinates will be reorder, such that they are in an increasing order.

Parameters:
  • lat (numpy.ndarray) – The latitude values. They are representing the first data dimension.
  • lon (numpy.ndarray) – The longitude values. They are representing the second data dimension.
  • data (numpy.ndarray, xarray.DataArray or None, optional) – The data values. They will be also reordered by lat and lon. If this is None, only lat and lon will be reordered and returned. Default is None.
Returns:

  • ordered_lat (numpy.ndarray) – Ordered latitude values.
  • ordered_lon (numpy.ndarray) – Ordered and normalized longitude values.
  • ordered_data (numpy.ndarray, xarray.DataArray or None) – The orderd data based on given latitudes and longitudes. This is None if no other data was given as parameter.

raw_dim

Get the raw dimension values, as they are constructed by the grid description.

Returns:constructed_dim – The constructed dimensions. Depending on the given grid type, it is either a tuple of arrays or a single array.
Return type:tuple(numpy.ndarray) or numpy.ndarray
raw_lat_lon()[source]
shape
pymepps.grid.grid.distance_haversine(p1, p2)[source]

Calculate the great circle distance between two points on the earth. The formula is based on the haversine formula [1].

Parameters:
  • p1 (tuple (array_like, array_like)) – The coordinates (latitude, longitude) of the first point in degrees.
  • p2 (tuple (array_like, array_like)) – The coordinates (latitude, longitude) of the second point in degrees.
Returns:

d – The calculated haversine distance in meters.

Return type:

float

Notes

Script based on: http://stackoverflow.com/a/29546836

References

[1](1, 2) de Mendoza y Ríos, Memoria sobre algunos métodos nuevos de calcular la longitud por las distancias lunares: y aplication de su teórica á la solucion de otros problemas de navegacion, 1795.
LonLat
class pymepps.grid.lonlat.LonLatGrid(grid_dict)[source]

Bases: pymepps.grid.grid.Grid

A LonLatGrid is a grid with evenly distributed longitude and latitude values. This is the right grid if the grid could be described with a evenly distributed range of values for longitude and latitude.

lonlatbox(data, ll_box)[source]

The data is sliced as structured grid with given lonlat box.

Parameters:
  • data (numpy.ndarray or xarray.DataArray) – The data which should be sliced. The shape of the last two dimensions should be the same as the grid dimensions.
  • ll_box (tuple(float)) –

    The longitude and latitude box with four entries as degree. The entries are handled in the following way:

    (left/west, top/north, right/east, bottom/south)
Returns:

  • sliced_data (numpy.ndarray or xarray.DataArray) – The sliced data with the same type as the input data. If the input data is a xarray.DataArray the output data will use the same attributes and non-grid dimensions as the input data.
  • sliced_grid (Grid) – A new child instance of Grid with the sliced coordinates as values and the same Grid type as this grid.

Gaussian
class pymepps.grid.gaussian.GaussianGrid(grid_dict)[source]

Bases: pymepps.grid.lonlat.LonLatGrid

The gaussian grid is similar to the lonlat grid. This is the right grid if longitude and/or latitude could be described with a non-evenly distributed list of values.

Projection
class pymepps.grid.projection.BaseProj[source]

BaseProj is a base class for every projection in a proj4-conform way.

transform_from_latlon(lon, lat)[source]

Transform the given lon, lat arrays to x and y values.

transform_to_latlon(x, y)[source]

Transform the given x and y arrays to latitude and longitude values.

class pymepps.grid.projection.ProjectionGrid(grid_dict)[source]

A projection grid could be defined by a evenly distributed grid. The grid could be translated to a longitude and latitude grid by a predefined projection. At the moment only projections defined by a proj4 string or a rotated latitude and longitude are supported.

get_projection()[source]
lonlatbox(data, ll_box)[source]

The data is sliced as unstructured grid with given lonlat box.

Parameters:
  • data (numpy.ndarray or xarray.DataArray) – The data which should be sliced. The shape of the last two dimensions should be the same as the grid dimensions.
  • ll_box (tuple(float)) –

    The longitude and latitude box with four entries as degree. The entries are handled in the following way:

    (left/west, top/north, right/east, bottom/south)
Returns:

  • sliced_data (numpy.ndarray or xarray.DataArray) – The sliced data with the same type as the input data. If the input data is a xarray.DataArray the output data will use the same attributes and non-grid dimensions as the input data.
  • sliced_grid (UnstructuredGrid) – A new UnstructuredGrid with the sliced coordinates as values.

class pymepps.grid.projection.RotPoleProj(npole_lat, npole_lon)[source]

Class for to calculate the transformation from rotated pole coordinates to normal latitude and longitude coordinates. The rotated pole coordinates are calculated in a cf-conform manner, with a rotated north pole. The calculations are based on [1]. If the resulting latitude coordinate equals -90° or 90° the longitude coordinate will be set to 0°.

Parameters:
  • npole_lat (float) – The latitude of the rotated north pole in degrees.
  • npole_lon (float) – The longitude of the rotated north pole in degrees.

References

[1] http://de.mathworks.com/matlabcentral/fileexchange/43435-rotated-grid-
transform
lonlatbox(data, ll_box)[source]

The data is sliced with given lonlat box to a unstructured grid.

Parameters:
  • data (numpy.ndarray) – The data which should be sliced. The shape of the last two dimensions should be the same as the grid dimensions.
  • ll_box (tuple(float)) –

    The longitude and latitude box with four entries as degree. The entries are handled in the following way:

    (left/west, top/north, right/east, bottom/south)
Returns:

  • sliced_data (numpy.ndarray) – The sliced data. The last two dimensions are flattened and sliced.
  • grid (UnstructuredGrid) – A new instance of UnstructuredGrid with the sliced coordinates as values.

north_pole

Get the north pole for the rotated pole projection.

transform_from_lonlat(lon, lat)[source]

Transform the given lon, lat arrays to x and y values.

transform_to_lonlat(x, y)[source]

Transform the given x and y arrays to latitude and longitude values.

Unstructured
class pymepps.grid.unstructured.UnstructuredGrid(grid_dict)[source]

Bases: pymepps.grid.grid.Grid

In an unstructured grid the grid could have any shape. A famous example is the triangulated ICON grid. At the moment the longitude and latitude values should have been precomputed. The grid could be calculated with the number of vertices and the coordinates of the boundary.

get_coord_names()[source]

Returns the name of the coordinates.

Returns:coord_names – The coordinate name for this unstructured grid. This is always a list, with only one entry: ncells.
Return type:list(str)
len_coords

Get the number of coordinates for this grid.

Returns:len_coords – Number of coordinates for this grid.
Return type:int
lonlatbox(data, ll_box)[source]

The data is sliced as unstructured grid with given lonlat box.

Parameters:
  • data (numpy.ndarray or xarray.DataArray) – The data which should be sliced. The shape of the last two dimensions should be the same as the grid dimensions.
  • ll_box (tuple(float)) –

    The longitude and latitude box with four entries as degree. The entries are handled in the following way:

    (left/west, top/north, right/east, bottom/south)
Returns:

  • sliced_data (numpy.ndarray or xarray.DataArray) – The sliced data with the same type as the input data. If the input data is a xarray.DataArray the output data will use the same attributes and non-grid dimensions as the input data.
  • sliced_grid (UnstructuredGrid) – A new UnstructuredGrid with the sliced coordinates as values.

Curvilinear
class pymepps.grid.curvilinear.CurvilinearGrid(grid_dict)[source]

Bases: pymepps.grid.lonlat.LonLatGrid

A curvilinear grid could be described as special case of a lonlat grid where the number of vertices is 4. The raw grid values are calculated based on the given grid rules. At the moment the lon lat values had to be precomputed.

lonlatbox(data, ll_box)[source]

The data is sliced as unstructured grid with given lonlat box.

Parameters:
  • data (numpy.ndarray or xarray.DataArray) – The data which should be sliced. The shape of the last two dimensions should be the same as the grid dimensions.
  • ll_box (tuple(float)) –

    The longitude and latitude box with four entries as degree. The entries are handled in the following way:

    (left/west, top/north, right/east, bottom/south)
Returns:

  • sliced_data (numpy.ndarray or xarray.DataArray) – The sliced data with the same type as the input data. If the input data is a xarray.DataArray the output data will use the same attributes and non-grid dimensions as the input data.
  • sliced_grid (UnstructuredGrid) – A new UnstructuredGrid with the sliced coordinates as values.

Data loader package

The data loader package is used to open station and model data.

Base module

class pymepps.loader.base.BaseLoader(data_path, file_type=None, processes=1, checking=True)[source]
load_data()[source]
class pymepps.loader.datasets.metdataset.MetDataset(file_handlers, data_origin=None, processes=1)[source]

MetDataset is a base class for handling meteorolgical files.

The normal workroutine would be:
  1. load the files (use of file handlers)
  2. select the important variables within the files (this object)
  3. post-process the variables (MetData/SpatialData/TSData object)
Parameters:
  • file_handlers (list of childs of FileHandler or None.) – The loaded file handlers. This instance load the variables. If the file handlers are None then the dataset is used for conversion between Spatial and TSData.
  • data_origin (optional) – The class where the data comes from. Normally this would be a model or a measurement site. If this is None, this isn’t set. Default is None.
  • processes (int, optional) – This number of processes is used to calculate time-consuming functions. For time-consuming functions a progress bar is shown. If the number of processes is one the functions will be processed sequential. For more processes than one the multiprocessing module will be used. Default is 1.
data_merge(data, var_name)[source]

Method to merge the given data by given metadata into one data structure.

file_handlers
processes
select(var_name, **kwargs)[source]

Method to select a variable from this dataset. If the variable is find in more than one file or message, the method tries to find similarities within the metadata and to combine the data into one array, with several dimensions. This method could have a long running time, due to data loading and combination.

Parameters:
  • var_name (str) – The variable which should be extracted. If the variable is not found within the dataset there would be a value error exception.
  • kwargs (dict) – Additional parameters that are passed to the file handlers.
Returns:

extracted_data – A child instance of MetData with the data of the selected variable as data. If None is returned the variable wasn’t found within the list with possible variable names.

Return type:

SpatialData, TSData or None

select_by_pattern(pattern, return_list=True, **kwargs)[source]

Method to select variables from this dataset by keywords. This method uses list comprehension to extract the variable names where the var_name pattern is within the variable name. If the variable names are found the variable is selected with the select method.

Parameters:
  • pattern (str) – The pattern for which should be searched.
  • return_list (bool) – If the return value should be a list or a dictionary.
  • kwargs (dict) – Additional parameters that are passed to the file handlers.
Returns:

data_list – list(SpatialData or TSData) or None The return value is a dict/list with SpatialData instances, one entry for every found variable name. If return_list is False, are the keys the variable names. If None is returned no variable with this pattern was found.

Return type:

dict(str, SpatialData or TSData) or

select_ds(include=None, exclude=None, **kwargs)[source]

Extract the dataset data into a MetData instance. The include list is handled superior to the exclude list. If both lists are None all available variables are used.

Parameters:
  • include (iterable or None) – Within the include iterable are all variable names, which should be included into the MetData data. The list will be filtered for available variable names. If no variable name is available a ValueError will be raised. If this is None, the include will be skipped and the exclude list will be used. Default is None.
  • exclude (iterable or None) – If no include iterable is given, this exclude iterable is used. In this case, any available variable name, which is not within this list is used. If this iterable is also None, all available data variables are used to construct the MetData instance. Default is None.
  • kwargs (dict) – Additional parameters that are passed to the file handlers.
Returns:

extracted_data – The extracted data instance.

Return type:

TSData or SpatialData

Raises:

ValueError: – A ValueError is raised if no variable was selected from the dataset.

var_names

Get the available variable names.

variables

Return the variable names and the corresponding file handlers.

Open model files

class pymepps.loader.model.ModelLoader(data_path, file_type=None, grid=None, processes=1, checking=True)[source]

Bases: pymepps.loader.base.BaseLoader

A simplified way to load weather model data into a SpatialDataset. Technically this class is a helper and wrapper around the file handlers and SpatialDataset.

Parameters:
  • data_path (str) – The path to the files. This path could have a glob-conform path pattern. Every file found within this pattern will be used to determine the file type and to generate the SpatialDataset.
  • file_type (str or None, optional) –

    The file type determines which file handler will be used to load the data. If the file type is None it will be determined automatically based on given files. All the files with the majority file type will be used to generate the SpatialDataset. The available file_types are:

    nc: NetCDF files grib2: Grib2 files grib1: Grib1 files dap: Opendap urls
  • grid (str or Grid or None, optional) – The grid describes the horizontal grid of the spatial data. The given grid will be forwarded to the given SpatialDataset instance. Default is None.
pymepps.loader.model.open_model_dataset(data_path, file_type=None, grid=None, processes=1, checking=True)[source]
class pymepps.loader.datasets.spatialdataset.SpatialDataset(file_handlers, grid=None, data_origin=None, processes=1)[source]

Bases: pymepps.loader.datasets.metdataset.MetDataset

SpatialDataset is a class for a pool of file handlers. Typically a spatial dataset combines the files of one model run, such that it is possible to select a variable and get a SpatialData instance. For memory reasons the data of a variable is only loaded if it is selected.

Parameters:
  • file_handlers (list of childs of FileHandler or None) – The spatial dataset is based on these files. The files should be either instances of GribHandler or NetCDFHandler. If file handlers is None then the dataset is used for conversion from TSData to SpatialData.
  • grid (str or Grid or None) – The grid describes the horizontal grid of the spatial data. The grid will be appended to every created SpatialData instance. If a str is given it will be checked if the str is a path to a cdo-conform grid file or a cdo-conform grid string. If this is a instance of a child of Grid it is assumed that the grid is already initialized and this grid will be used. If this is None the Grid will be automatically read from the first file handler. Default is None.
  • data_origin (optional) – The data origin. This parameter is important to trace the data flow. If this is None, there is no data origin and this dataset will be the starting point of the data flow. Default is None.
  • processes (int, optional) – This number of processes is used to calculate time-consuming functions. For time-consuming functions a progress bar is shown. If the number of processes is one the functions will be processed sequential. For more processes than one the multiprocessing module will be used. Default is 1.
select()

Method to select a variable.

selnearest()

Method to select the nearest grid point for given coordinates.

sellonlatbox()

Method to slice a box with the given coordinates.

data_merge(data, var_name)[source]

Method to merge instances of xarray.DataArray into a single xarray.DataArray. Also the grid is read and set to the xarray.DataArray.

Parameters:
  • data (list of xarray.DataArray) – The data list.
  • var_name (str) – The name of the variable which is selected within the data list.
Returns:

merged_array – The merged DataArray with the grid coordinates and the extracted grid. If the grid could not extracted the grid is None and a DataArray without set grid is returned.

Return type:

xarray.DataArray

get_grid(var_name, data_array=None)[source]

Method to get for given variable name a Grid instance. If the grid attribute is already a Grid instance this grid will be returned. If the grid attribute is a str instance, the str will be read from file or from the given grid str. If the grid attribute isn’t set the grid instance will be the grid for the variable selected with the first corresponding file handler and cdo.

Parameters:
  • var_name (str) – The variable name, which should be used to generate the grid.
  • data_array (xarray.DataArray or None, optional) – If the data array is given the method will try to load the grid from the data array’s attributes. If None the DataArray method will be skipped. Default is None.
Returns:

grid – The returned grid. If the returned grid is None, the grid could not be read.

Return type:

Instance of child of grid or None

Open station files

class pymepps.loader.station.StationLoader(data_path, file_type=None, lonlat=None, processes=1, checking=True)[source]

Bases: pymepps.loader.base.BaseLoader

A simplified way to load station data into a TSDataset. Technically this class is a helper and wrapper around the file handlers and TSData.

Parameters:
  • data_path (str) – The path to the files. This path could have a glob-conform path pattern. Every file found within this pattern will be used to determine the file type and to generate the TSDataset.
  • file_type (str or None, optional) –

    The file type determines which file handler will be used to load the data. If the file type is None it will be determined automatically based on given files. All the files with the majority file type will be used to generate the TSDataset. The available file_types are:

    nc: NetCDF files wm: Text files in a specific “Wettermast format”
  • lonlat (tuple(float, float), optional) – The lonlat coordinate tuple describes the position of the station in degrees. If this is None the position is unknown. Default is None.
lon_lat()[source]
pymepps.loader.station.open_station_dataset(data_path, file_type=None, lonlat=None, processes=1, checking=True)[source]
class pymepps.loader.datasets.tsdataset.TSDataset(file_handlers, data_origin=None, lonlat=None, processes=1)[source]

Bases: pymepps.loader.datasets.metdataset.MetDataset

TSDataset is a class for a pool of file handlers. Typically a time series dataset combines the files of a station, such that it is possible to select a variable and get a TSData instance. For memory reasons the data of a variable is only loaded if it is selected.

Parameters:
  • file_handlers (list of childs of FileHandler or None) – The spatial dataset is based on these files. The files should be either instances of NetCDFHandler or TextHandler. If file handlers is None then the dataset is used for conversion from SpatialData to TSData.
  • data_origin (optional) – The data origin. This parameter is important to trace the data flow. If this is None, there is no data origin and this dataset will be the starting point of the data flow. Default is None.
  • lonlat (tuple(float, float) or None) – The coordinates (longitude, latitude) where the data is valid. If this is None the coordinates will be set based on data_origin or based on the first file handler.
select()

Method to select a variable.

data_merge(data, var_name)[source]
select_by_pattern(pattern, return_list=False, **kwargs)[source]

File handlers

The file handlers are used to open files with a specific format.

Base file handler
class pymepps.loader.filehandler.filehandler.FileHandler(file_path)[source]
get_messages(var_name, **kwargs)[source]
get_timeseries(var_name, **kwargs)[source]
var_names
Grib file handler
class pymepps.loader.filehandler.gribhandler.GribHandler(file_path)[source]

Bases: pymepps.loader.filehandler.filehandler.FileHandler

close()[source]
get_messages(var_name, **kwargs)[source]

Method to get message-wise the data for a given variable as xr.DataArray.

Parameters:var_name (str) – The name of the variable which should be extracted.
Returns:data – The list with the message-wise data as DataArray. The DataArray have six coordinates (analysis, ensemble, time, level, y, x). The shape of DataArray are normally (1,1,1,1,y_size,x_size).
Return type:list of xr.DataArray
is_type()[source]
open()[source]
NetCDF file handler
class pymepps.loader.filehandler.netcdfhandler.NetCDFHandler(file_path)[source]

Bases: pymepps.loader.filehandler.filehandler.FileHandler

close()[source]
get_messages(var_name, **kwargs)[source]

Method to imitate the message-like behaviour of grib files.

Parameters:
  • var_name (str) – The variable name, which should be extracted.
  • runtime (np.datetime64, optional) – If the dataset has no runtime this runtime is used. If the runtime is not set, the runtime will be inferred from file name.
  • ensemble (int or str, optional) – If the dataset has no ensemble information this ensemble is used. If the ensemble is not set, the ensemble will be inferred from file name.
  • sliced_coords (tuple(slice), optional) – If the cube should be sliced before it is loaded. This is helpful by large opendap requests. These slice will be used from the behind. So (slice(1,2,1), slice(3,5,1)) means […, 1:2, 3:5]. If it is not set all data is used. T
Returns:

data – The list with the message-wise data as DataArray. The DataArray have six coordinates (analysis, ensemble, time, level, y, x). The shape of DataArray are normally (1,1,1,1,y_size,x_size).

Return type:

list of xr.DataArray

get_timeseries(var_name, **kwargs)[source]

Method to get the time series from a NetCDF file. This is designed for measurement site data in netcdf format. At the moment this method is only tested for Wettermast Hamburg data!

Parameters:var_name (str) – The variable name, which should be extracted.
Returns:data – The selected variable is extracted as dict with pandas series as values.
Return type:dict with pandas series
is_type()[source]
load_cube(var_name)[source]

Method to load a variable from the netcdf file and return it as xr.DataArray.

Parameters:var_name (str) – The variable name, which should be extracted.
Returns:variable – The DataArray of the variable.
Return type:xr.DataArray
lon_lat
open()[source]
pymepps.loader.filehandler.netcdfhandler.cube_to_series(cube, var_name)[source]
Opendap file handler
class pymepps.loader.filehandler.opendaphandler.OpendapHandler(file_path)[source]

Bases: pymepps.loader.filehandler.netcdfhandler.NetCDFHandler

is_type()[source]
open()[source]

Utilities package

Cdo functions

pymepps.utilities.cdo_funcs.cdo_path_helper(file_path, new_path=None, inplace=False)[source]
pymepps.utilities.cdo_funcs.griddes(*args, **kwargs)[source]
pymepps.utilities.cdo_funcs.sellonlatbox(ds, lonlatbox, new_path=None, inplace=False, in_opt=None, options=None, processes=1)[source]
pymepps.utilities.cdo_funcs.selnearest(ds, lonlat, new_path=None, inplace=False, in_opt=None, options=None, processes=1)[source]

Multiprocessing utility

class pymepps.utilities.multiproc_util.MultiThread(processes, threads=True)[source]

Bases: object

processes

Path encoder

class pymepps.utilities.path_encoder.PathEncoder(base_path, date=None, undet_numbers=None)[source]

Bases: object

get_encoded()[source]

Encode the path with given data.

Returns:List with encoded paths.
Return type:list of str
get_file_number()[source]

TestCase

class pymepps.utilities.testcase.TestCase(methodName='runTest')[source]
assertAttribute(obj, attr)[source]
assertCallable(obj, method)[source]
assertMethod(obj, method)[source]

Data description

The following text describes the licenses of the used data and how the data was generated. Some of the data might have a large size.

Station data

The following explanation is for the station data folder.

wettermast.nc

The wettermast.nc is extracted from original Wettermast Hamburg data. We get the permission of Ingo Lange to use the data in a sliced manner.

We extracted the data from the original NetCDF-file and sliced a time period of 5 days and used only the 2 metre temperature.

Model data

The model data explanation is for the model data folder.

GFS_Global_0p25deg_20161219_0600.nc

The NetCDF file is a transformed subset of the GFS global weather model from the 19th december 2016. The GFS data is available for free in the public domain under provisions of U.S. law and was as grib2 file from the NCEP ftp server. Later only one variable was extracted and transformed with the climate data operators.

Grid data

All of the grids are generated with the cdo griddes command. In the following the grids and their origin are described.

gaussian_y

The gaussian_y is a gaussian grid with an equal-spaced x coordinate and an unequal-spaced y coordinate. The grid is copied from the cdo documentation.

lcc

The lcc is a lambert conformal conic grid. The proj4 description is used to read in the grid as projection grid. The grid is extracted from the met.no Arome-MetCoOp model and is based on data from MET Norway. The data from the Thredds-OPeNDAP server is used. The license of the original data is: Norwegian license for public data (NLOD) and Creative Commons 4.0 BY Internasjonal.

lon_lat

The lon_lat is a lonlat grid with equal-spaced longitude and latitude coordinates. It is extracted from the GFS model with a resolution of 0.25°x0.25°. The original data was used from the UCAR-edu Thredds server. The original GFS data is available for free in the public domain under provisions of U.S. law.