Comparison of IMERG Precipitation with Station Information with QGIS, Python and Pandas - Tutorial
/There are tools for temporal data analysis like Python, IPython and Jupyter; there are tools for spatial data analysis like QGIS. But, are there tools for spatio-temporal analysis? Unfortunately no, but there are good approaches to manage spatial data in Jupyter or to run IPython in QGIS3. These approaches aren't a complete ansqwe to the current demands of big data processing in few computational time with simple scripts, but by sure it will help to shape better solutions.
QGIS 3 runs with Python 3 and allows us to use some interactive tools as IPython with the IPython QGIS Console. On this tutorial we will show the procedure to do an analysis of IMERG precipitation estimation and observed precipitation values with Python. Input data format is not homogeneous, observed data comes in csv format and IMERG data is on raster format for the entire globe. Comparison is done on a daily basis and uses 245MB in 365 HDF files.
Data analysis was done for the entire 2017, but the code can be adapted to the entire IMERG data (from 2016) or even to the TRMM precipitation estimations. Other environmental variables as temperature or soil cover can be analysed with some fixes on the code.
Tutorial
The tutorial is divided on the following parts
- Part 1: Download the IMERG precipitation estimation as HDF5 files with a procedure similar to this tutorial: https://www.hatarilabs.com/ih-en/direct-nasa-imerg-precipitation-images-download-in-qgis3-with-python
- Part 2 Capture of precipitation station coordinates for a point shapefile
- Part 3: Create a IMERG precipitation file. This is the most complex part of the tutorial since it requires to open every HDF5 file with some commands in GDAL, make some geotransformations, assign projections, identify values on coordinates and store them in a separate text file.
- Part 4: Installation of Jupyter and Pandas and the the IPython QGIS Console plugin. For this you need to run QGIS as a administrator. Python packages are installed inside QGIS with the subprocess module.
- Part 5: Data analysis in IPython. Import data as Pandas Dataframes and the plot the observed and estimated precipitation on a linear graph.
Python code
These are the python codes for every part:
Part 1:
import gdal, os, osr from qgis.core import * import qgis.utils mylayer = qgis.utils.iface.activeLayer() for i, elem in enumerate(mylayer.getFeatures()): geom= elem.geometry() wkt = geom.asWkt() refpoint = geom.asPoint() print(wkt)
Part 2:
os.chdir('C:\\Users\\USER\\Documents\\InfohatarisPosted\\Comparison_IMERPpt_MeasuredPpt_QGIS\\HDF5') totallinks = os.listdir(os.getcwd()) hdflinks = [] for link in totallinks: if link[-4:] == 'HDF5': hdflinks.append(link) i = 0
Part 3:
imergFile = open("../Txt/precipImergAkron.txt","w") imergFile.write('date,precipitation\n') for link in hdflinks: if link[-4:] == 'HDF5': hdf_ds = gdal.Open(link, gdal.GA_ReadOnly) band_ds = gdal.Open(hdf_ds.GetSubDatasets()[2][0], gdal.GA_ReadOnly) #choose the 5th dataset that corresponds to precipitationCal band_array = band_ds.ReadAsArray() band_array[band_array<0] = 0 #filter all NaN values that appear as negative values, specially for the tiff representation geotransform = ([-180,0.1,0,90,0,-0.1]) tempLink = '../Tiff/tempTiff.tif' raster = gdal.GetDriverByName('GTiff').Create(tempLink,3600,1800,1,gdal.GDT_Float32) raster.SetGeoTransform(geotransform) srs=osr.SpatialReference() srs.ImportFromEPSG(4326) raster.SetProjection(srs.ExportToWkt()) raster.GetRasterBand(1).WriteArray(band_array.T[::-1]) raster = None tempraster = QgsRasterLayer(tempLink,'tempraster') ident = tempraster.dataProvider().identify(refpoint, QgsRaster.IdentifyFormatValue) if ident.isValid(): print(str(i) + ' ' + link + ' ' + str(ident.results())) #print(help(ident)) precipday = str(ident.results()).split(': ')[1][:-1] date = link[:4]+'-'+link[4:6]+'-'+link[6:8] imergFile.write(date+','+str(precipday)+'\n') i+=1 imergFile.close()
Part 4:
from subprocess import call call (['python3','-m','pip','install','jupyter'])
Part 5:
#import required packages %matplotlib inline import matplotlib.pyplot as plt import os import pandas as pd print(os.getcwd()) #open observed and values akron = pd.read_csv("../Txt/1352620.csv", index_col = 6,parse_dates=True) akron.head() #open imerg values imergvalues = pd.read_csv("../Txt/precipImergAkron.txt", index_col = 0,parse_dates=True) imergvalues #create a new column in akron akron['IMERG']=imergvalues akron.head() #create a graph observed-imerg plt.figure(figsize=(20,8)) plt.plot(akron['PRCP']) plt.plot(akron['IMERG']) plt.legend() plt.show()
Input files
You can download the input files for this tutorial here.