Fill Missing Precipitation Data with Artificial Intelligence (Python Keras) - Tutorial
/Evaluation of hydrological processes as evapotranspiration, runoff, routing and infiltration require data as precipitation, flow, temperature and radiation on a daily basis. Required data for the hydrological modeling need to be accurate and must be completed over the period of study. Many times historical data from hydrological stations are incomplete and present many gaps that can be filled by the use of Artificial Intelligence tools as the Keras library in Python.
This tutorial show the procedure to run a complete script for the filling of missing precipitation in one station by the use of data from 2 nearby stations. The Python script is done on a Jupyter Notebook.
Tutorial
Code
This is the complete code for the tutorial:
get_ipython().magic('pylab inline') import pandas as pd import numpy as np TodasEstaciones = pd.read_csv('Est1_Est2_Est3.csv',index_col=0,parse_dates=True) TodasEstaciones.head() TodasEstaciones.loc['2014-11-01':'2015-03-31'].plot(subplots=True, figsize=(12, 8)); plt.legend(loc='best') xticks(rotation='vertical') import datetime #we create a date column to extract the week number TodasEstaciones['date']=TodasEstaciones.index #apply a lambda function to the whole panda dataframe column TodasEstaciones['week'] = TodasEstaciones['date'].apply(lambda x: x.isocalendar()[1]) #drop the date column because we dont need it del TodasEstaciones['date'] #let see our dataframe TodasEstaciones.head() #creation of a correlation plot with seaborn import seaborn as sns corr = TodasEstaciones.corr() sns.heatmap(corr, xticklabels=corr.columns.values, yticklabels=corr.columns.values) #Definition of training sets X_train = TodasEstaciones.loc['2015-01-20':'2015-03-27',['Est1','Est3','week']].astype(float32).values#,'week']] # Est 1, 3 and #week y_train = TodasEstaciones.loc['2015-01-20':'2015-03-27','Est2'].astype(float32).values # Est 2 # Import `StandardScaler` from `sklearn.preprocessing` from sklearn.preprocessing import StandardScaler # Define the scaler scaler = StandardScaler().fit(X_train) # Scale the train set X_train = scaler.transform(X_train) X_train[:20] from keras.models import Sequential from keras.layers import Dense model = Sequential() model.add(Dense(12, activation='linear', input_shape=(3,))) model.add(Dense(8, activation='linear')) model.add(Dense(1, activation='linear')) model.summary() model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy']) model.fit(X_train, y_train,epochs=200,verbose=0) y_pred = model.predict(X_train) y_pred[:10] plot(TodasEstaciones.loc['2015-01-20':'2015-03-27'].index,y_pred,label='Predicted') TodasEstaciones['Est2'].loc['2015-01-20':'2015-03-27'].plot() figsize(12,8) ylim(0,40) legend(loc='best') #Get the prediction for the train set X_missing = TodasEstaciones.loc['2014-11-23':'2015-01-11',['Est1','Est3','week']].astype(float32).values # Import `StandardScaler` from `sklearn.preprocessing` from sklearn.preprocessing import StandardScaler # Define the scaler scaler = StandardScaler().fit(X_missing) # Scale the train set X_missing = scaler.transform(X_missing) y_missing = model.predict(X_missing) y_missing = y_missing.reshape([50]).tolist() TodasEstaciones['Est2_Completed']=TodasEstaciones['Est2'] TodasEstaciones['Est2_Completed'].loc['2014-11-23':'2015-01-11']=y_missing TodasEstaciones.loc['2014-11-01':'2015-03-31',['Est1','Est2','Est2_Completed','Est3']].plot(subplots=True, figsize=(15, 10)); plt.legend(loc='best') xticks(rotation='vertical') ylim(0,50)
Input Data
You can download the input data here.