Preface
Recently, I saw the implementation of the shortest path in Chapter 8.5 of the “Python Geospatial Analysis Guide” (Second Edition). It happened that some engineering problems need to be solved by this algorithm, so I delved into it. I took the code to RUN
and found that there are quite a lot of bugs. It took almost a month to solve all kinds of problems, but I can understand it. It seems that this book was published in 14 years, and networkx is iterated . There are several versions. When I checked the official website, I installed several virtual environments to find out the problem. Fortunately, it was successfully solved in the end. I wrote an article here to record the implementation and improvement of this algorithm. All the codes in this article run python Version is 3.8.
Data introduction
The algorithm mainly solves the shortest distance of the road network between two points in the city, so the Point
file of the starting point and the end point is given in the sample data, and a simplified road network shapefile polyline
is also given, as shown in the figure below. Ask to find the shortest vector route between two points.
Design thinking
First guide the package:
|
|
The functions used include the distance weight calculation function and the efficient iterative access function. Among them, the tee
function of itertools
is the most impressive . This is the first time I see that the coordinate pair can be iterated in this way. I have a lot of engineering inspiration. Both functions are available in the original book, and there are currently no bugs. Here is one piece:
|
|
read the vector data here, I use the pyshp
package of the original book, but it is not recommended to use this package. There are many bugs and it is difficult to find solutions on the Internet, but fortunately, it is lightweight and will not be changed for the time being, and then use The networkx
package defines directed graphs and iterates over each node of the shapefile, resulting in a double tuple:
|
|
Next, this place needs to be changed. Look at the original first:
|
|
Here is a very old version of networkx
where the connected_component_subgraphs function has been removed in 2.4, by looking at stackoverflow
and networkx
github issue The solution of the predecessors has been changed to the following way of writing:
|
|
It should be noted here that it is not enough to use the general connected_components
function. I don’t know the specific reason, but the error report prompts to use the connected_components of implement , so to be on the safe side, the enhanced connected graph is used directly.
Then read in the shp point file of the starting point and the ending point, which is consistent with the original writing:
|
|
At this point, the loop iterates to calculate the distance
distance. Both sg.edges_iter
and sg.edge [n0][n1][" dist "]
in the original text have been deprecated, and now need to be changed to the following:
|
|
In fact, there is no information on the Internet at present. I also corrected it by looking at the suggestions in the original function. The reader can compare the original function and what I wrote here. The results obtained by the subtle differences are often very different.
The next series of distance iterative comparisons and networkx
shortest distance function calls are basically no problem, you can use it directly:
|
|
At this point, it is almost over. path
is a list
object with coordinate pairs. In the original text, using pyshp
to output directly, it will report an error because the decimal place after the number of coordinate pairs in list
is too long. , at present, I checked the github issue
and found no solution, so I made some modifications here, and directly use the OGR
and OSR
of GDAL
to output the list
object of path
to Polyline
When it comes to shapefile
, the code is also relatively conventional, which is a common way of writing GDAL
. Here is the rest of the writing:
|
|
Although it is a bit longer than the original code, it is fortunate that it has high precision. After all, OGR
is used directly, and from an engineering perspective, this length is not a big deal.
Finally, let’s put the length of the result picture and the article.
Afterword
Although the code is only about 100 lines in total, the iteration of the version has created a lot of pits in the original code. Write it down here and keep a record. This algorithm will be used in the future to do some more in-depth engineering applications. It is worth mentioning that this networkx
package can only use the dijkstra
algorithm at present, and other algorithms such as A-star
, BFS
and DFS
are not integrated, which is a pity. Then if you want to understand the principle of dijkstra
, you have to check it yourself.