Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional tolerance for intersection consolidation #1150

Closed
3 tasks done
EwoutH opened this issue Mar 26, 2024 · 16 comments · Fixed by #1160
Closed
3 tasks done

Conditional tolerance for intersection consolidation #1150

EwoutH opened this issue Mar 26, 2024 · 16 comments · Fixed by #1160

Comments

@EwoutH
Copy link
Contributor

EwoutH commented Mar 26, 2024

Contributing guidelines

  • I understand the contributing guidelines

Documentation

  • My proposal is not addressed by the documentation or examples

Existing issues

  • Nothing similar appears in an existing issue

What problem does your feature proposal solve?

Current consolidate_intersections in OSMnx applies a uniform tolerance across a network, which may not suit complex urban models where different areas require different levels of detail. For example, a dense city center often needs a lower tolerance due to more intricate street layouts, while suburban areas might warrant higher tolerances.

What is your proposed solution?

I propose enhancing the consolidate_intersections function to accept a tolerance_dict parameter. This dictionary would map node identifiers to tolerance values, allowing conditional consolidation based on node-specific criteria. If a node does not appear in this dictionary, a default tolerance would be used.

What alternatives have you considered?

One workaround is to consolidate subsets of the network separately with different tolerances and then attempt to merge them. However, this is cumbersome and can lead to complex merging issues, particularly at the boundaries of the subsets.

Additional context

A minimal example of the proposed usage:

import networkx as nx
import osmnx as ox
from shapely.geometry import Polygon

def consolidate_intersections(G, tolerance=10, rebuild_graph=True, dead_ends=False, reconnect_edges=True, tolerance_dict=None):
    """
    The extended description of parameters...

    tolerance_dict : dict
        A dictionary mapping node identifiers to tolerance values. This enables
        applying different consolidation tolerances to different nodes.
    """
    # Rest of the existing function's code...

    if tolerance_dict:
        # New logic to apply tolerances conditionally based on tolerance_dict
        pass  # Placeholder for the new functionality

    # Rest of the existing function's code...

Example usage with the proposed feature:

# Assume G is a projected graph of an urban area
tolerance_dict = {node: 5 if node in inner_city_nodes else 15 for node in G.nodes()}
G_consolidated = ox.consolidate_intersections(G, tolerance_dict=tolerance_dict)

This example outlines how users might apply a tighter tolerance for a set of nodes (e.g., inner_city_nodes) while using a looser tolerance for the rest, making the function more flexible for many applications.

@EwoutH
Copy link
Contributor Author

EwoutH commented Mar 26, 2024

Just tried an implementation, and while doable, it's also quite cumbersome, especially when rebuild_graph=True). I'm curious for alternatives on this problem.

First consolidating and then merging is problematic, because it's difficult to properly connect both networks again after te fact.

@EwoutH
Copy link
Contributor Author

EwoutH commented Mar 26, 2024

One option would be a exclude_nodes attribute to not consolidate all nodes. Then you could first consolidate the nodes that require a higher tolerance, exclude those nodes, and then run the consolidation process again with a lower tolerance for the remaining nodes.


More generally, it would be really nice for OSMnx to allow applying functions only on parts of the network. That could potentially allow for many function to be used (conditionally) on a subset of the network, and would solve use cases like the one presented in #1140.

@gboeing
Copy link
Owner

gboeing commented Mar 30, 2024

Just tried an implementation, and while doable, it's also quite cumbersome, especially when rebuild_graph=True). I'm curious for alternatives on this problem.

There is no straightforward way to do this. Anything truly useful with regards to this use case would require a custom coded solution because of the numerous ad hoc decisions and local graph characteristics to meet a specific user's specific analytical needs.

More generally, it would be really nice for OSMnx to allow applying functions only on parts of the network.

The standard workflow for this in the NetworkX ecosystem is to create subgraphs, apply your function, then compose those subgraphs. OSMnx follows this. If your use case cannot be easily done in this way (such as here), it suggests a deeper challenge that requires a custom solution because a generalizable solution is most likely impossible.

I'll leave some sample code here if others come upon this thread and wonder about that subgraph-function-compose workflow:

import networkx as nx
import osmnx as ox
G = ox.graph.graph_from_place("Piedmont, CA, USA", network_type="drive")
Gp = ox.projection.project_graph(G)

# divide graph into subgraphs based on location
nodes = ox.convert.graph_to_gdfs(Gp, edges=False)
mask_west = nodes.geometry.x < nodes.geometry.unary_union.centroid.x + 100
mask_east = nodes.geometry.x > nodes.geometry.unary_union.centroid.x - 100
G_west = ox.truncate.largest_component(Gp.subgraph(nodes[mask_west].index))
G_east = ox.truncate.largest_component(Gp.subgraph(nodes[mask_east].index))

# then run some function differentially
# then nx.compose() the subgraphs back together

@gboeing gboeing closed this as not planned Won't fix, can't repro, duplicate, stale Mar 30, 2024
@EwoutH
Copy link
Contributor Author

EwoutH commented Apr 17, 2024

Thanks for getting back. I recently thought about this, tried some things and had a nice discussion with @anastassiavybornova.

The basic complication is that some modifications (including consolidation) alter your network. If you network is singular that most often this isn't a problem, some locations of node might change or some attributes might be modified, but the graph will still be connected (since those are saved in the networkx graph itself).

If you have multiple networks you modify differently (like consolidating with multiple radiuses), that might not be the case. Nodes get moved, osmid values get merged to lists, and sometimes other attributes change. This makes merging - both on osmid value or geospatial - difficult.

Put simply: nx.compose() often doesn't work after consolidation of subnetworks. So unfortunately, the solution proposed above doesn't resolve this issue.

If you would handle this in OSMnx itself, you can leave it a single network. If you then move nodes around and change osmids, this won't be a problem, since how they are connected is saved in the (singular) graph.

I think the tolerance_dict that maps node identifiers to tolerance values still would be a good solution to this problem. Another option would be to allow using a node attribute (or column in the node dataframe) to define the tolerance.

Furthermore, I think there's considerable demand for such a function. While in my use case it's a simple geospatial bound, you could also have researchers that want to consolidate different types of intersections differently. Since might be distributed all over the network, such a use case would be even harder to merge.

So, considering:

  1. This is a difficult issue to fix outside OSMnx
  2. There is considerable demand for such an solution

I hope you would be willing to discuss this further.

(CC @martinfleis, @jdmcbr)

@anastassiavybornova
Copy link

also cc'ing @jGaboardi here

@gboeing
Copy link
Owner

gboeing commented Apr 17, 2024

Yes I'm open to considering this further if it's useful to the community. If a dict of per-node tolerance values would be helpful for users, would you perhaps put together some example code showing how to implement this in the codebase in an efficient way?

@EwoutH
Copy link
Contributor Author

EwoutH commented Apr 17, 2024

Thanks, and great to hear! Could you reopen this issue?

@anastassiavybornova would you or someone on your team want to give this a go?

@gboeing gboeing reopened this Apr 17, 2024
@gboeing
Copy link
Owner

gboeing commented Apr 17, 2024

FWIW, just to reiterate the challenge: I had tried to implement an "adaptive" or "per-node" tolerance for intersection consolidation a couple of years ago but was consistently stymied by my inability to make it broadly generalizable, computationally efficient, and with a streamlined API. Like I said earlier in this thread:

There is no straightforward way to do this. Anything truly useful with regards to this use case would require a custom coded solution because of the numerous ad hoc decisions and local graph characteristics to meet a specific user's specific analytical needs.

Perhaps if several people are looking at this now, there'll be some wisdom of the crowd in a clever way to push it forward. A few people have inquired about this over the years, but it's consistently been harder to implement than it appears to be on the surface.

@EwoutH
Copy link
Contributor Author

EwoutH commented Apr 17, 2024

Good to know, and thanks for the heads up.

Currently I understand the problem and implementation as follows:

  • The only thing tuned is the tolerance parameter for each nodes, so that the distance that each node is buffered can be varied.

    tolerance : float
    nodes are buffered to this distance (in graph's geometry's units) and
    subsequent overlaps are dissolved into a single node

  • This basically means that instead of doing a single buffer operation in _merge_nodes_geometric, it needs to do multiple buffers on part of the geodataframe with nodes.

    # buffer nodes GeoSeries then get unary union to merge overlaps
    merged = convert.graph_to_gdfs(G, edges=False)["geometry"].buffer(tolerance).unary_union

  • Then, it should do the .unary_union as specified, and other operations can continue as is.

Am I missing something so far?

@EwoutH
Copy link
Contributor Author

EwoutH commented Apr 17, 2024

As for the API, we could go two ways:

  1. Define a tolerance_dict argument which has a dict with tolerance levels as keys and list of node identifiers as values (as proposed currently)
  2. Define a tolerance_column argument, which contains the name of a column which contains the tolerance values in the nodes geodataframe.

I'm now nudging towards the second one, because less complex data needs to be passed.

@martinfleis
Copy link
Contributor

This basically means that instead of doing a single buffer operation in _merge_nodes_geometric, it needs to do multiple buffers on part of the geodataframe with nodes.

This can still be a single buffer call as buffer accepts array as a parameter. If you get a dictionary of tolerances, you just need to sort it in the same way, get values as numpy arrays and replace the constant tolerance with that. The rest might stay the same? I haven't seen the code in detail but I it should work like that imho.

@EwoutH
Copy link
Contributor Author

EwoutH commented Apr 17, 2024

This can still be a single buffer call as buffer accepts array as a parameter.

Didn't know that (and hadn't read the docs yet). Already really useful to have a GeoPandas maintainer, thanks for joining the conversation :).


In the case 1 API (dict), it could look like this:

    gdf_nodes = graph_to_gdfs(G, edges=False)

    # If tolerance_dict is provided, create a pandas Series from it with the
    # node index as the Series index. This Series will have the same length as
    # the gdf_nodes and provide tolerance values for each node. Nodes not in
    # the dict will get the default tolerance.
    if tolerance_dict:
        tolerances = pd.Series(tolerance_dict).reindex(gdf_nodes.index, fill_value=tolerance)
    else:
        # If no tolerance_dict is provided, use the default tolerance for all nodes
        tolerances = pd.Series(tolerance, index=gdf_nodes.index)

    # Buffer using the tolerances Series for variable distances
    merged = gdf_nodes['geometry'].buffer(distance=tolerances).unary_union

Case 2 (column) like this :

    gdf_nodes = graph_to_gdfs(G, edges=False)

    if tolerance_column and tolerance_column in gdf_nodes.columns:
        # Use the values from the tolerance_column as an array for buffering
        buffer_distances = gdf_nodes[tolerance_column].values
    else:
        # Use the default tolerance for all nodes
        buffer_distances = np.full(len(gdf_nodes), fill_value=tolerance)

    # Buffer all nodes in a single operation
    merged = gdf_nodes['geometry'].buffer(distance=buffer_distances).unary_union

I think I would find the case 2 API more reliable, since you can't mess up the node indexing by accident.

@EwoutH
Copy link
Contributor Author

EwoutH commented Apr 17, 2024

Usage example for case two:

# Calculate the street count for each node and assign the corresponding tolerance directly in the node's attributes
for node, count in ox.stats.streets_per_node(G).items():
    # Nodes with street count of 4 or higher get a tolerance of 5, others get 10
    G.nodes[node]['tolerance'] = 5 if count >= 4 else 10

# When calling consolidate_intersections, the modified version of the function will look for the 'tolerance' column in the nodes GeoDataFrame.
G_consolidated = ox.consolidate_intersections(G, tolerance=10, tolerance_column='tolerance')

And of course there are a lot of other things possible here.

I can open a draft PR tomorrow. Curious what everybody thinks!

@gboeing
Copy link
Owner

gboeing commented Apr 18, 2024

I'm at the AAG conference so I won't be super responsive this week but I'll take a look when I can.

@EwoutH
Copy link
Contributor Author

EwoutH commented Apr 18, 2024

Thanks, and no worries!

I opened a PR in #1160 so others can already start to test and review.

@gboeing
Copy link
Owner

gboeing commented Apr 25, 2024

Closed by #1160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants