Comparison of IMERG Precipitation with Station Information with QGIS, Python and Pandas - Tutorial

ComparisonIMERGObserved.PNG

There are tools for temporal data analysis like Python, IPython and Jupyter; there are tools for spatial data analysis like QGIS. But, are there tools for spatio-temporal analysis? Unfortunately no, but there are good approaches to manage spatial data in Jupyter or to run IPython in QGIS3. These approaches aren't a complete ansqwe to the current demands of big data processing in few computational time with simple scripts, but by sure it will help to shape better solutions.

QGIS 3 runs with Python 3 and allows us to use some interactive tools as IPython with the IPython QGIS Console. On this tutorial we will show the procedure to do an analysis of IMERG precipitation estimation and observed precipitation values with Python. Input data format is not homogeneous, observed data comes in csv format and IMERG data is on raster format for the entire globe. Comparison is done on a daily basis and uses 245MB in 365 HDF files.

Data analysis was done for the entire 2017, but the code can be adapted to the entire IMERG data (from 2016) or even to the TRMM precipitation estimations. Other environmental variables as temperature or soil cover can be analysed with some fixes on the code.

ComparisonIMERGObserved2.PNG

 

Tutorial

The tutorial is divided on the following parts

  • Part 1: Download the IMERG precipitation estimation as HDF5 files with a procedure similar to this tutorial:  https://www.hatarilabs.com/ih-en/direct-nasa-imerg-precipitation-images-download-in-qgis3-with-python
  • Part 2 Capture of precipitation station coordinates for a point shapefile
  • Part 3: Create a IMERG precipitation file. This is the most complex part of the tutorial since it requires to open every HDF5 file with some commands in GDAL, make some geotransformations, assign projections, identify values on coordinates and store them in a separate text file.
  • Part 4: Installation of Jupyter and Pandas and the the IPython QGIS Console plugin. For this you need to run QGIS as a administrator. Python packages are installed inside QGIS with the subprocess module.
  • Part 5: Data analysis in IPython. Import data as Pandas Dataframes and the plot the observed and estimated precipitation on a linear graph.

 

Python code

These are the python codes for every part:


Part 1:

import gdal, os, osr
from qgis.core import *
import qgis.utils

mylayer = qgis.utils.iface.activeLayer()

for i, elem in enumerate(mylayer.getFeatures()):
     geom= elem.geometry()
     wkt = geom.asWkt()
     refpoint = geom.asPoint()
     print(wkt)

Part 2:

os.chdir('C:\\Users\\USER\\Documents\\InfohatarisPosted\\Comparison_IMERPpt_MeasuredPpt_QGIS\\HDF5')
totallinks = os.listdir(os.getcwd())
hdflinks = []
for link in totallinks:
    if link[-4:] == 'HDF5':
        hdflinks.append(link)
        
i = 0

Part 3:

imergFile = open("../Txt/precipImergAkron.txt","w")
imergFile.write('date,precipitation\n')

for link in hdflinks:
    if link[-4:] == 'HDF5':
        hdf_ds = gdal.Open(link, gdal.GA_ReadOnly)
       
        band_ds = gdal.Open(hdf_ds.GetSubDatasets()[2][0], gdal.GA_ReadOnly) #choose the 5th dataset that corresponds to precipitationCal

        band_array = band_ds.ReadAsArray()
        band_array[band_array<0] = 0 #filter all NaN values that appear as negative values, specially for the tiff representation

        geotransform = ([-180,0.1,0,90,0,-0.1])
        
        tempLink = '../Tiff/tempTiff.tif'

        raster = gdal.GetDriverByName('GTiff').Create(tempLink,3600,1800,1,gdal.GDT_Float32)
        raster.SetGeoTransform(geotransform)

        srs=osr.SpatialReference()
        srs.ImportFromEPSG(4326)
        raster.SetProjection(srs.ExportToWkt())
        raster.GetRasterBand(1).WriteArray(band_array.T[::-1])
        raster = None
        
        tempraster = QgsRasterLayer(tempLink,'tempraster')
        
        
        ident = tempraster.dataProvider().identify(refpoint, QgsRaster.IdentifyFormatValue)
        if ident.isValid():
            print(str(i) + '   ' + link  + '    ' + str(ident.results()))
            #print(help(ident))
            precipday =  str(ident.results()).split(': ')[1][:-1]
            date = link[:4]+'-'+link[4:6]+'-'+link[6:8]
            imergFile.write(date+','+str(precipday)+'\n')
            
        i+=1
        
imergFile.close()

Part 4:

from subprocess import call
call (['python3','-m','pip','install','jupyter'])

Part 5:

#import required packages
%matplotlib inline
import matplotlib.pyplot as plt
import os
import pandas as pd
print(os.getcwd())

#open observed and values
akron = pd.read_csv("../Txt/1352620.csv", index_col = 6,parse_dates=True)
akron.head()

#open imerg values
imergvalues = pd.read_csv("../Txt/precipImergAkron.txt", index_col = 0,parse_dates=True)
imergvalues

#create a new column in akron
akron['IMERG']=imergvalues
akron.head()

#create a graph observed-imerg
plt.figure(figsize=(20,8))
plt.plot(akron['PRCP'])
plt.plot(akron['IMERG'])
plt.legend()
plt.show()

 

Input files

You can download the input files for this tutorial here.

Saul Montoya

Saul Montoya es Ingeniero Civil graduado de la Pontificia Universidad Católica del Perú en Lima con estudios de postgrado en Manejo e Ingeniería de Recursos Hídricos (Programa WAREM) de la Universidad de Stuttgart con mención en Ingeniería de Aguas Subterráneas y Hidroinformática.

 

Suscribe to our online newsletter

Subscribe for free newsletter, receive news, interesting facts and dates of our courses in water resources.