Python Network Analysis: GIS Shortest Paths

Preface

Recently, I saw the implementation of the shortest path in Chapter 8.5 of the “Python Geospatial Analysis Guide” (Second Edition). It happened that some engineering problems need to be solved by this algorithm, so I delved into it. I took the code to RUN and found that there are quite a lot of bugs. It took almost a month to solve all kinds of problems, but I can understand it. It seems that this book was published in 14 years, and networkx is iterated . There are several versions. When I checked the official website, I installed several virtual environments to find out the problem. Fortunately, it was successfully solved in the end. I wrote an article here to record the implementation and improvement of this algorithm. All the codes in this article run python Version is 3.8.

Data introduction

The algorithm mainly solves the shortest distance of the road network between two points in the city, so the Point file of the starting point and the end point is given in the sample data, and a simplified road network shapefile polyline is also given, as shown in the figure below. Ask to find the shortest vector route between two points.

Data introduction

Design thinking

First guide the package:

1
2
3
4
5


import networkx as nx
import math
from itertools import tee
import shapefiles
from osgeo import ogr

The functions used include the distance weight calculation function and the efficient iterative access function. Among them, the tee function of itertools is the most impressive . This is the first time I see that the coordinate pair can be iterated in this way. I have a lot of engineering inspiration. Both functions are available in the original book, and there are currently no bugs. Here is one piece:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


def pairwise( iterable ):
"""Returns an iterable access binary tuple
s -> (s0,s1), (s1,s2), (s2, s3), ..."""
a, b = tee( iterable )
next(b, None)
return zip(a, b)

def haversine(n0, n1):
x1, y1 = n0
x2, y2 = n1
    x_dist = math.radians (x1 - x2)
    y_dist = math.radians (y1 - y2)
y1_rad = math.radians (y1)
y2_rad = math.radians (y2)
a = math.sin ( y_dist /2)**2 + math.sin ( x_dist /2)**2 \
* math.cos (y1_rad) * math.cos (y2_rad)
c = 2 * math.asin ( math.sqrt (a))
distance = c * 6371
return distance

read the vector data here, I use the pyshp package of the original book, but it is not recommended to use this package. There are many bugs and it is difficult to find solutions on the Internet, but fortunately, it is lightweight and will not be changed for the time being, and then use The networkx package defines directed graphs and iterates over each node of the shapefile, resulting in a double tuple:

1
2
3
4
5
6


shp = " road_network.shp "
G = nx.DiGraph ()
r = shapefile.Reader ( shp )
for s in r.shapes ():
for p1, p2 in pairwise( s.points ):
        G.add_edge (tuple(p1), tuple(p2))

Next, this place needs to be changed. Look at the original first:

1

sg = list( nx.connected_component_subgraphs ( G.to_undirected ()))[0]

Here is a very old version of networkx where the connected_component_subgraphs function has been removed in 2.4, by looking at stackoverflow and networkx github issue The solution of the predecessors has been changed to the following way of writing:

1

sg = list( G.to_undirected (c) for c in nx.strongly_connected_components (G))[0]

It should be noted here that it is not enough to use the general connected_components function. I don’t know the specific reason, but the error report prompts to use the connected_components of implement , so to be on the safe side, the enhanced connected graph is used directly.

Then read in the shp point file of the starting point and the ending point, which is consistent with the original writing:

1
2
3


r = shapefile.Reader ( r"start_end.shp ")
start = r.shape (0).points[0]
end = r.shape (1).points[0]

At this point, the loop iterates to calculate the distance distance. Both sg.edges_iter and sg.edge [n0][n1][" dist "] in the original text have been deprecated, and now need to be changed to the following:

1
2
3


for n0, n1 in sg.edges ():
    dist = haversine(n0, n1)
    sg.edges [n0,n1][" dist "] = dist

In fact, there is no information on the Internet at present. I also corrected it by looking at the suggestions in the original function. The reader can compare the original function and what I wrote here. The results obtained by the subtle differences are often very different.

function content

The next series of distance iterative comparisons and networkx shortest distance function calls are basically no problem, you can use it directly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


nn_start = None
nn_end = None
start_delta = float("inf")
end_delta = float("inf")
for n in sg.nodes ():
    s_dist = haversine(start, n)
    e_dist = haversine(end, n)
if s_dist < start_delta :
        nn_start = n
        start_delta = s_dist
if e_dist < end_delta :
        nn_end = n
        end_delta = e_dist
        
path = nx.shortest_path (sg, source= nn_start , target= nn_end ,
weight=" dist ") #list

At this point, it is almost over. path is a list object with coordinate pairs. In the original text, using pyshp to output directly, it will report an error because the decimal place after the number of coordinate pairs in list is too long. , at present, I checked the github issue and found no solution, so I made some modifications here, and directly use the OGR and OSR of GDAL to output the list object of path to Polyline When it comes to shapefile, the code is also relatively conventional, which is a common way of writing GDAL. Here is the rest of the writing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


multiline = ogr.Geometry ( ogr.wkbMultiLineString )
# build geometry type: line
line = ogr.Geometry ( ogr.wkbLineString )
for point in path:
    line.AddPoint (point[0],point[1])
multiline.AddGeometry (line)
wkt = multiline.ExportToWkt ()
driver = ogr.GetDriverByName ("ESRI Shapefile")
data_source = driver.CreateDataSource (" shortest.shp ")
srs = osr.SpatialReference ()
srs.ImportFromEPSG (4326)
layer = data_source.CreateLayer ("shortest", srs , ogr.wkbLineString )

field_name = ogr.FieldDefn ("Name", ogr.OFTString )
field_name.SetWidth (14)
layer.CreateField ( field_name )

field_name = ogr.FieldDefn ("data", ogr.OFTString )
field_name.SetWidth (14)
layer.CreateField ( field_name )

feature = ogr.Feature ( layer.GetLayerDefn ())
feature.SetField ("Name", "shortest")
feature.SetField ("data", "1.2")

line = ogr.CreateGeometryFromWkt ( wkt )
feature.SetGeometry (line)
layer.CreateFeature (feature)

feature = None
data_source = None

Although it is a bit longer than the original code, it is fortunate that it has high precision. After all, OGR is used directly, and from an engineering perspective, this length is not a big deal. Finally, let’s put the length of the result picture and the article.

shortest path

Afterword

Although the code is only about 100 lines in total, the iteration of the version has created a lot of pits in the original code. Write it down here and keep a record. This algorithm will be used in the future to do some more in-depth engineering applications. It is worth mentioning that this networkx package can only use the dijkstra algorithm at present, and other algorithms such as A-star, BFS and DFS are not integrated, which is a pity. Then if you want to understand the principle of dijkstra , you have to check it yourself.