The dark side of Python for Hydrology / Hydrogeology and the nightmare of unmet dependencies
/The near future of data processing for Hydrology / Hydrogeology is written in Python 3, and many universities and institutions are shifting from teaching C++, Matlab or Fortran to Python. It is surprising the amount of tools, packages, codes and Ipython notebooks available for the data processing and data analysis of water related data, even with high success in data analysis reproducibility.
However, there is not always a sunshine on working with Python, especially if you are working on Windows. After you have done your first steps in hydrological series plotting, statistical analysis or even run a neural network, you might want to do more with Python and there is when the situation can become unbearable.
The unavoidable nature of change
Psychologist says that life happens, and what happens is something called change. So Python is very alive. Years ago there was the issue to change from Python 2 to Python 3, now the issue is to find if a package is compatible to Python 3.5, Python 3.6 or Python 3.7.
The rate of change from Python versions, from QGIS versions and the lack of backward compatibility make the Python change itself a kind of “Guillotine” where all the previous code and packages are not useful. Of course that I do exaggerate, but I guess you get the point of this issue.
Real stories about failure
By sure everyone have their own story, but we can mention some real stories of failure:
You can struggle 2 hours in installing a hydrological delineation package like Pysheds without success, just because the procedure you found on the internet works for Windows 10 but you were on Windows 7.
The Gempy package is amazing for geological modeling with great 3D visualization tools in VTK, but the VTK package is not compatible with Python 3.7, so you have to check for an old Anaconda distribution that supports 3.6.
Rasterio provide great tools for gridded spatial data processing, analysis and representation. The current PIP tools cannot install on a Anaconda distribution, nor even in Conda so you have to use a unofficial python binary wheel that a nice guy from a fluorescence laboratory has compiled.
Solutions that are not solutions
We are not Python core developers, but we get the idea that this problem was always an issue so there are partial solutions to develop the full power of Python in water related topics:
Conda: It is a package, dependency and environment management. The package manager looks for a package , and install it in any operating system (Windows, macOS, Linux). If the package is not available in Conda, you can use the Conda-Forge that is a collection of recipes. However, if the recipe is outdated, your package cannot be installed.
Docker: A higher level of solution and the alternative that Gempy offers for using the package. Docker is a platform to build and run an application in any computer, it actually creates a platform on top of your operating system where you application is deployed and run. The problem is that support for Windows versions other that Windows 10 is limited and requires to turn on some virtualization options on the operating system and on the BIOS to run, steps that are not easy for a non-geek or normal water resources specialist.
Digital Ocean: Or any cloud infrastructure provider as Amazon AWS, Google Cloud, Rackspace, etc. You can setup a linux machine on the cloud even with docker and met the specific requirements that your package needs. However this a monthly cost.
More reasonable solution
Last week we did a research on the Point Data Abstraction Library (PDAL) that is something like the geospatial library GDAL but for LIDAR or Drone LAS point clouds. The library has binary distributions for Linux and Windows. With Python and Subprocess package we could get all tools from the library without normal hassle and frustration.
One solution could be that advance packages should be compiled as executables in every operating system and work with arguments to ensure the applicability in every operating system.
More solutions are definitely needed to overcome this dark side of Python.