-
Notifications
You must be signed in to change notification settings - Fork 88
Unary Logical Graph Operators
This section provides an overview of unary graph to graph operators, which consume one graph as input.
Unary Logical Graph Operators |
---|
Aggregation |
Pattern Matching |
Transformation |
Property transformation |
Grouping |
KeyedGrouping |
Subgraph |
Sampling |
Group by RollUp |
Split |
In general, aggregation operators are used to combine N values into a single one.
In the case of Gradoop, they are used to reduce some kind of information that is contained in a graph down to a single value.
The aggregate operator is implemented as aggregate()
function for the LogicalGraph
class.
It takes any amount of aggregate functions as an input and outputs the original graph with the result of each aggregate function as a new property. Most aggregate functions require a string denoting the name of the property or label they are being applied to. In the following, they shall be referred to as [propertyName]
or [labelName]
.
Gradoop provides the following aggregate functions:
Function | Description | Returns |
---|---|---|
MinProperty |
Minimum of a specified property over all edges and vertices |
LogicalGraph with the aggregated property min_[propertyName]: [value]
|
MinEdgeProperty |
Minimum of a specified property over all edges |
LogicalGraph with the aggregated property min_[propertyName]: [value]
|
MinVertexProperty |
Minimum of a specified property over all vertices |
LogicalGraph with the aggregated property min_[propertyName]: [value]
|
MaxProperty |
Maximum of a specified property over all edges and vertices |
LogicalGraph with the aggregated property max_[propertyName]: [value]
|
MaxEdgeProperty |
Maximum of a specified property over all edges |
LogicalGraph with the aggregated property max_[propertyName]: [value]
|
MaxVertexProperty |
Maximum of a specified property over all vertices |
LogicalGraph with the aggregated property max_[propertyName]: [value]
|
SumProperty |
Sum of a specified property over all edges and vertices |
LogicalGraph with the aggregated property sum_[propertyName]: [value]
|
SumEdgeProperty |
Sum of a specified property over all edges |
LogicalGraph with the aggregated property sum_[propertyName]: [value]
|
SumVertexProperty |
Sum of a specified property over all vertices |
LogicalGraph with the aggregated property sum_[propertyName]: [value]
|
AverageProperty |
Arithmetic mean of a specified property over all edges and vertices |
LogicalGraph with the aggregated property avg_[propertyName]: [double]
|
AverageEdgeProperty |
Arithmetic mean of a specified property over all edges |
LogicalGraph with the aggregated property avg_[propertyName]: [double]
|
AverageVertexProperty |
Arithmetic mean of a specified property over all vertices |
LogicalGraph with the aggregated property avg_[propertyName]: [double]
|
Count |
Element count of a graph |
LogicalGraph with the aggregated property count: [long]
|
EdgeCount |
Edge count of a graph |
LogicalGraph with the aggregated property edgeCount: [long]
|
VertexCount |
Vertex count of a graph |
LogicalGraph with the aggregated property vertexCount: [long]
|
HasLabel |
Checks presence of an edge or vertex label in a graph (also as filter usable) |
LogicalGraph with the aggregated property hasLabel_[labelName]: [bool]
|
HasEdgeLabel |
Checks presence of an edge label in a graph (also as filter usable) |
LogicalGraph with the aggregated property hasEdgeLabel_[labelName]: [bool]
|
HasVertexLabel |
Checks presence of an vertex label in a graph (also as filter usable) |
LogicalGraph with the aggregated property hasVertexLabel_[labelName]: [bool]
|
Consider the social graph from our quickstart section. Now we could use the aggregate function to get some more insights into the information that is contained in that graph. For example, we could be interested in the mean age of all persons. To achieve that we would
- extract a sub-graph containing only the vertices labeled "Person"
- aggregate the sum of all the persons ages
- perform a graph head transformation to add the quotient of the summed up ages and the total amount of persons to the graph head
// Given the graph n1 from quickstart, we can aggregate the age values and output the result
SumVertexProperty sumPropertyAge = new SumVertexProperty("age");
LogicalGraph meanAgeAggregation =
n1
.subgraph(v -> v.getLabel().equals("Person"), e -> true)
.aggregate(sumPropertyAge, new VertexCount())
.transformGraphHead((currentGraphHead, transformedGraphHead) -> {
int sumAge = currentGraphHead.getPropertyValue("sum_age").getInt();
long numPerson = currentGraphHead.getPropertyValue("vertexCount").getLong();
currentGraphHead.setProperty("meanAge", (double) sumAge / numPerson);
return currentGraphHead;
});
The resulting graph has the following structure:
n1:graph { meanAge: 30.8, vertexCount: 5, sum_age: 154, } [
(p1:Person {name: "Bob", age: 24})-[:friendsWith]->(p2:Person{name: "Alice", age: 30})-[:friendsWith]->(p1)
(p2)-[:friendsWith]->(p3:Person {name: "Jacob", age: 27})-[:friendsWith]->(p2)
(p3)-[:friendsWith]->(p4:Person{name: "Marc", age: 40})-[:friendsWith]->(p3)
(p4)-[:friendsWith]->(p5:Person{name: "Sara", age: 33})-[:friendsWith]->(p4)
]
In graph analytics, pattern matching is one of the most important and challenging tasks.
In Gradoop, pattern matching can be achieved either by defining a query graph using the GDL, or by executing a cypher query. While the former method is straight forward and simple to use, the cypher query language is more powerful. In the following we are going to showcase the usage and capabilities of both approaches.
The LogicalGraph
class implements the method query
to apply GDL queries on the data graph.
The method returns a GraphCollection
containing the matching subgraphs.
The method applies an engine that uses vertex homomorphism and edge isomorphism as default morphism strategies.
As pattern matching is the purpose of this method, only the MATCH and WHERE keywords of the Cypher query language are supported.
The vertex and edge data of the data graph elements is attached to the resulting vertices. In its more simple forms, the method uses no statistics about the data graph which may result in bad runtime performance.
Use GraphStatistics to provide statistics for the query planner.
The query
method is overloaded.
The following table describes each form of the method in detail:
Signature | Description |
---|---|
query(String query) |
Simplest form. Works as described above |
query(String query, String constructionPattern) |
Allows for the creation of new graph elements based on variable bindings of the match pattern |
query(String query, GraphStatistics graphStatistics) |
Allows handling of existing graph statistics |
query(String query, String constructionPattern, GraphStatistics graphStatistics) |
Allows for both the creation of new graph elements as well as the application of existing graph statistics |
query(String query, boolean attachData, MatchStrategy vertexStrategy, MatchStrategy edgeStrategy, GraphStatistics graphStatistics) |
If attachData is set to true , the original vertex and edge data is attached to the result. Possible match strategies are ISOMORPHISM and HOMOMORPHISM
|
query(String query, String constructionPattern, boolean attachData, MatchStrategy vertexStrategy, MatchStrategy edgeStrategy, GraphStatistics graphStatistics) |
Same as above with the additional possibility to define a construction pattern |
Consider the social graph form the quickstart.
Say, we are interested in analyzing Marc's circle of friends. Using the query
method, we could define a pattern which describes the relation "friend of Marc"
String gdlQuery = "MATCH (a:Person)-[e0:friendsWith]->(b:Person)-[e1:friendsWith]->(c:Person)," +
" (c)-[e2:friendsWith]->(b)-[e3:friendsWith]->(a)" +
" WHERE a <> c AND b.name = \"Marc\"";
GraphCollection friendsOfMarc = n1.query(gdlQuery);
The query above results in the following graph collection:
gc[
g0:graph[
(a:Person {name: Sara, age: 33})-[:friendsWith]->(b:Person {name: Marc, age:40})-[:friendsWith]->(c:Person {name: Jacob, age:27}),
(c)-[:friendsWith]->(b)-[:friendsWith]->(a)
],
g1:graph[
(a:Person {name: Jacob, age: 33})-[:friendsWith]->(b:Person {name: Marc, age:40})-[:friendsWith]->(c:Person {name: Sara, age:27}),
(c)-[:friendsWith]->(b)-[:friendsWith]->(a)
]
]
A useful feature of the query
method is the ability to define new graph elements based on variable bindings of the match pattern.
Let's define a new relation between Marc's friends called 'acquaintance':
Note: It is necessary to define the new element with an assigned variable. This could change in the future.
String gdlQuery = "MATCH (a:Person)-[e0:friendsWith]->(b:Person)-[e1:friendsWith]->(c:Person)," +
" (c)-[e2:friendsWith]->(b)-[e3:friendsWith]->(a)" +
" WHERE a <> c AND b.name = \"Marc\"";
String constructionPattern = "(b)<-[e0]-(a)-[e_new0:acquaintance]->(c)<-[e1]-(b)," +
" (c)-[e_new1:acquaintance]->(a)," +
" (c)-[e2]->(b)-[e3]->(a)";
GraphCollection acquaintances = n1.query(gdlQuery, constructionPattern);
Resulting graph collection:
gc[
g0:graph[
(a:Person {name: Sara, age: 33})-[:friendsWith]->(b:Person {name: Marc, age:40})-[:friendsWith]->(c:Person {name: Jacob, age:27}),
(c)-[:friendsWith]->(b)-[:friendsWith]->(a),
(a)-[:acquaintance]->(c)-[:acquaintance]->(a)
],
g1:graph[
(a:Person {name: Jacob, age: 33})-[:friendsWith]->(b:Person {name: Marc, age:40})-[:friendsWith]->(c:Person {name: Sara, age:27}),
(c)-[:friendsWith]->(b)-[:friendsWith]->(a),
(c)-[:acquaintance]->(a)-[:acquaintance]->(c)
]
]
CAPF (Cypher for Apache Flink) aims to bring the openCypher grammar to Gradoop. The operator is currently work in progress, with the repository being here. CAPF is based on Neo4j's Morpheus project.
This class enables the execution of a cypher query on a LogicalGraph via the CAPF API. When instantiating a CAPFQuery
, it is always necessary to provide a cypher query string as well as the current ExecutionEnvironment
. Optionally, one
might also provide an MetaData
object in order to speed up the execution time.
For a detailed discussion of the cypher syntax please look here.
Currently supported clauses:
Clause |
---|
MATCH |
RETURN |
UNION |
Also, all aggregating function are supported. A detailed discussion of them can be found here.
This class is a wrapper containing the results of a CAPF query. Since the executed cypher query can either result in a GraphCollection
or in Flink Table, this class provides methods to access both representations of the data.
Handling a Flink Table is a bit messy at the moment, but there will be support for it in Gradoop in the near future.
The following examples operate on the combination of the LogicalGraph's n1
and n2
from our quickstart example.
The variable env
refers the the current ExecutionEnvironment
.
Simple matching example
In order to answer find out who the second grade friends of Marc are, one could write a cypher query as follows:
String cypherQuery = "MATCH (a:Person {name: \"Marc\"})-[:friendsWith]->(b:Person)-[:friendsWith]->(c:Person)" +
"WHERE a <> c" +
"RETURN c";
// `env` refers to the execution environment
CAPFQuery query = new CAPFQuery(cypherQuery, env);
// `nc` refers to the combination of the LogicalGraph's n1 and n2 from our quickstart
CAPFQueryResult result = query.execute(nc);
GraphCollection resultCollection = result.getGraphs();
The resulting GraphCollection
looks as follows:
g0:graph[
(v0:Person {name: "Mike"})
],
g1:graph[
(v1:Person {name: "Alice"})
]
UNION example
String cypherQuery = "MATCH (v:Person) RETURN v.name AS name UNION MATCH (v:Company) RETURN v.name AS name"
CAPFQueryResult result = new CAPFQuery(cypherQuery, env).execute(nc);
// this query does not result in a graph so we need to access the data via Flink's Table API
BatchTableEnvironment tableEnvironment = (BatchTableEnvironment) result.getTable().tableEnv();
DataSet<Row> resultDataSet = tableEnvironment.toDataSet(result.getTable(), TypeInformation.of(Row.class)).javaSet();
List<Row> resultList = resultDataSet.collect();
System.out.println(resultList);
The code results in the following output: [Jil, Mike, Acme Corp, Globex Inc., Marc, Bob, Paul, Sara, Alice, Jacob]
Aggregating example
Compute the average age of all persons in our graph:
String cypherQuery = "MATCH (v:Person) RETURN avg(v.age)"
CAPFQueryResult result = new CAPFQuery(cypherQuery, env).execute(nc);
// this query does not result in a graph so we need to access the data via Flink's Table API
BatchTableEnvironment tableEnvironment = (BatchTableEnvironment) result.getTable().tableEnv();
DataSet<Row> resultDataSet = tableEnvironment.toDataSet(result.getTable(), TypeInformation.of(Row.class)).javaSet();
List<Row> resultList = resultDataSet.collect();
System.out.println(resultList);
The code results in the following output: [30]
By using the transformation operation on a LogicalGraph
, one can alter elements of a given graph.
This alteration can either affect the graph-head, edges or vertices.
One could also think of a Gradoop transformation as a modification to a LogicalGraph
that changes an EPGM elements data, but not its identity or the topological structure of the graph.
Every transformation method of the LogicalGraph
expects a TransformationFunction
as input. This interface defines a method called apply
.
The method takes the current version of the element and a copy of that element as input. The copy is initialized with the current structural information (i.e. identifiers, graph membership, source / target identifiers). The implementation is able to transform the element by either updating the current version and returning it or by adding necessary information to the new entity and returning it.
The following transformation functions are defined as methods for the LogicalGraph
class:
Function | Description | Expects |
---|---|---|
transform |
Transforms the elements of the logical graph using the given transformation functions. |
TransformationFunction<GraphHead> , TransformationFunction<Vertex> , TransformationFunction<Edge>
|
transformGraphHead |
Transforms the graph head of the logical graph using the given transformation function. | TransformationFunction<GraphHead> |
transformVertices |
Transforms the vertices of the logical graph using the given transformation function. | TransformationFunction<Vertex> |
transformEdges |
Transforms the edges of the logical graph using the given transformation function. | TransformationFunction<Edge> |
Consider the social graph from the quickstart example. Suppose every person from the graph g1 attended the same university. We could add this new information to the graph like this:
LogicalGraph n1WithUni = n1
.subgraph(v -> v.getLabel().equals("Person"), e -> true)
.transformVertices((currentVertex, transformedVertex) -> {
currentVertex.setProperty("university", "Leipzig University");
return currentVertex;
});
The resulting graph has the following structure:
g1:graph[
(p1:Person {name: "Bob", age: 24, university: "University Leipzig"})-[:friendsWith]->(p2:Person{name: "Alice", age: 30, university: "University Leipzig"})-[:friendsWith]->(p1)
(p2)-[:friendsWith]->(p3:Person {name: "Jacob", age: 27, university: "University Leipzig"})-[:friendsWith]->(p2)
(p3)-[:friendsWith]->(p4:Person{name: "Marc", age: 40, university: "University Leipzig"})-[:friendsWith]->(p3)
(p4)-[:friendsWith]->(p5:Person{name: "Sara", age: 33, university: "University Leipzig"})-[:friendsWith]->(p4)
]
The property transformation operation is a convenience function for a Transformation
on properties.
This alteration can either affect the properties of graph-head, edges or vertices.
Every property transformation method of the LogicalGraph
expects a PropertyTransformationFunction
as input. This interface defines a method called execute
.
The method takes the current version of the PropertyValue
as input. The implementation is able to transform the PropertyValue
by updating the current version and returning it.
The following property transformation functions are defined as methods for the LogicalGraph
class:
Function | Description | Expects |
---|---|---|
transformGraphHeadProperties |
Transforms the value of a graphHead property matching the specified property key using the given transformation functions. |
String propertyKey, PropertyTransformationFunction
|
transformVertexProperties |
Transforms the value of a vertex property matching the specified property key using the given transformation functions. |
String propertyKey, PropertyTransformationFunction
|
transformEdgeProperties |
Transforms the value of an edge property matching the specified property key using the given transformation functions. |
String propertyKey, PropertyTransformationFunction
|
Consider the social graph from the quickstart example. Suppose that we want to round the age of every person from the graph g1. We could update the age property like this:
LogicalGraph n1WithRoundedAge = n1
.subgraph(v -> v.getLabel().equals("Person"), e -> true)
.callForGraph(new PropertyTransformation<>("age",
null, property -> {
property.setLong(((long)(Math.round(property.getLong()/10.0) * 10)));
return property;
}, null));
The resulting graph has the following structure:
g1:graph[
(p1:Person {name: "Bob", age: 20})-[:friendsWith]->(p2:Person{name: "Alice", age: 30})-[:friendsWith]->(p1)
(p2)-[:friendsWith]->(p3:Person {name: "Jacob", age: 30})-[:friendsWith]->(p2)
(p3)-[:friendsWith]->(p4:Person{name: "Marc", age: 40})-[:friendsWith]->(p3)
(p4)-[:friendsWith]->(p5:Person{name: "Sara", age: 30})-[:friendsWith]->(p4)
]
By using the grouping operator you can achieve a structural grouping of vertices and edges to condense a graph and thus help to uncover insights about patterns and statistics hidden in the graph.
Gradoop defines the following two strategies applicable for grouping: GROUP_REDUCE
and GROUP_COMBINE
. By default GROUP_REDUCE
is used.
In some cases with bad distribution of vertex grouping keys it may be desirable to combine vertex tuples using a CombineGroup
transformation to potentially reduce network traffic. This behaviour is represented by the GROUP_COMBINE
strategy. For more information see this.
For grouping any of the following functions can be applied to a LogicalGraph
object.
Function | Description | Expects |
---|---|---|
groupBy |
Groups vertices based on the given vertex grouping keys which can be property keys or labels. |
List<String> vertexGroupingKeys |
groupBy |
Groups vertices and edges based on the given grouping keys which can be property keys or labels. |
List<String> vertexGroupingKeys,List<String> edgeGroupingKeys |
groupBy |
Groups vertices and edges based on the given grouping keys which can be property keys or labels. Additionally aggregates are calculated using the given aggregate functions (MaxProperty, MinProperty, SumProperty, Count) and a strategy has to be chosen. |
List<String> vertexGroupingKeys,List<AggregateFunction> vertexAggregateFunctions,List<String> edgeGroupingKeys,List<AggregateFunction> edgeAggregateFunctions, GroupingStrategy groupingStrategy |
Consider the social graph from the quickstart example. For our grouping example we will first extract all nodes of type Person
and then calculate a rounded age which we are going to use for grouping:
LogicalGraph preprocessedGraph = n1
.vertexInducedSubgraph(new ByLabel<>("Person"))
.transformVertices((currentVertex, transformedVertex) -> {
int age = currentVertex.getPropertyValue("age").getInt();
currentVertex.setProperty("age_rounded", (int) MathUtils.round(age, -1));
return currentVertex;
});
LogicalGraph summarizedGraph = preprocessedGraph
.groupBy(Arrays.asList(Grouping.LABEL_SYMBOL, "age_rounded"),
singletonList(new Count("count")),
singletonList(Grouping.LABEL_SYMBOL),
singletonList(new Count("count")),
GroupingStrategy.GROUP_REDUCE);
The resulting graph is shown below the next example.
Alternatively you can use the GroupingBuilder which offers more options. The following example leads to the same result as the traditional grouping operation shown above.
// Perform grouping on preprocessedGraph
LogicalGraph summarizedGraph = new Grouping.GroupingBuilder()
.setStrategy(GroupingStrategy.GROUP_REDUCE)
.addVertexGroupingKey("age_rounded")
.useVertexLabel(true)
.useEdgeLabel(true)
.addVertexAggregateFunction(new Count())
.addEdgeAggregateFunction(new Count())
.<...>build()
.execute(preprocessedGraph);
The resulting graph has the following structure:
Imagine we would not have used a transformation function to remove nodes which are not of type Person
, but still want to group over our age_rounded
property for those nodes. For this job we could define a VertexLabelGroup
as the following example shows:
// Perform grouping on preprocessedGraph
LogicalGraph summarizedGraph = new Grouping.GroupingBuilder()
.setStrategy(GroupingStrategy.GROUP_REDUCE)
.useVertexLabel(true)
.useEdgeLabel(true)
.addVertexLabelGroup("Person", "PersonAge", Arrays.asList("age_rounded"), Arrays.asList(new Count()))
.<...>build()
.execute(preprocessedGraph);
This would lead to the following resulting graph:
For adding a VertexLabelGroup
you can use one of the following four functions:
Function | Description | Expects |
---|---|---|
addVertexLabelGroup |
Groups vertices with the specified label based on the given grouping keys which can be property keys or labels. |
String label,List<String> groupingKeys |
addVertexLabelGroup |
Groups vertices with the specified label based on the given grouping keys which can be property keys or labels. Super vertices are created using the supplied super vertex label. |
String label,String superVertexLabel,List<String> groupingKeys |
addVertexLabelGroup |
Groups vertices with the specified label based on the given grouping keys which can be property keys or labels and calculate aggregates using the supplied aggregate functions. |
String label,List<String> groupingKeys, List<AggregateFunction> aggregateFunctions |
addVertexLabelGroup |
Groups vertices with the specified label based on the given grouping keys which can be property keys or labels and calculate aggregates using the supplied aggregate functions. Super vertices are created using the supplied super vertex label. |
String label,String superVertexLabel,List<String> groupingKeys, List<AggregateFunction> aggregateFunctions |
This can be applied analogously for edges using an EdgeLabelGroup
.
The keyed grouping operator provides an alternative implementation of graph grouping. This operator can be used to gain insight into the structor of a graph.
While the Grouping operator only supports grouping on labels, properties and label groups,
any attribute (or derived attribute) of an element may be used for grouping. This is achieved through
the use of user-definable key-functions.
Key-functions are used to extract a key from an element. This key may have any type supported by Flink.
Elements with the same key according to all key-functions set on the operator are reduced to a single
super-element. The user may define their own key-functions by implementing the KeyFunction
interface. Some functions are also provided by Gradoop, they may be instantiated from the GroupingKeys
factory class.
Currently the following keys are provided:
Function | Description | Parameters | Key type |
---|---|---|---|
label ¹ |
Extract the label of elements. | None. | String |
labelSpecific |
Extract different keys for elements, based on the label of the element. | A Map mapping labels to List s of key functions and (optionally) a Map from old label to new label, which may be used to change the labels of elements. |
A Tuple containing an internal reference to the label of the element and the values of key-functions defined for that label. |
nothing ¹ |
Extract a constant value for each element. This will result in all elements having the same key, which may be used to reduce all elements to a single element. | None. | Boolean |
property ¹ |
Extract a property from each element. | The property key. |
byte[] (the internal representation of a PropertyValue
|
Note: labelSpecific
only supports key-functions that provide a default value. Provided functions annotated with ¹
are examples of those functions.
The KeyedGrouping
operator has to called using the callForGraph
method. It may be instantiated using the constructor of the KeyedGrouping
class or from the createInstance
methods of KeyedGroupingUtils
. The latter provides compatibility with the
older Grouping
operator.
The constructor takes the following parameters:
- A
List
of vertex-grouping keys. (KeyFunction
s) - A
List
of vertex-aggregate functions. (AggregateFunction
s) - A
List
of edge-grouping keys. (KeyFunction
s) - A
List
of edge-aggregate functions. (AggregateFunction
s)
All of there parameters are optional and may be null
or empty lists.
// Assuming the same input as the Grouping example.
List<KeyFunction<EPGMVertex, ?>> vertexKeys = Arrays.asList(
GroupingKeys.label(), GroupingKeys.property("age_rounded"));
LogicalGraph summarizedGraph = preprocessedGraph.callForGraph(new KeyedGrouping<>(
vertexKeys,
singletonList(new Count("count")),
singletonList(GroupingKeys.label()),
singletonList(new Count("count"))));
Taking the same input graph as with the Grouping example and running the same preprocessing steps, the result should be identical to that example.
When the user wants to use a custom key, they have to implement the KeyFunction
interface.
This interface contains multiple functions:
Name | Description | Parameters | Return value |
---|---|---|---|
getKey |
Extracts the key object from an element. | The element from which the key is extracted. | The key object. (may not be null ) |
getType |
Gets type information about the keys extracted by the getKey method. |
None. | A TypeInformation object used by Flink to determine the type of extracted keys. |
(addKeyToElement ) |
Add extracted keys back to elements. This method is called on super-elements to store the key of the group on the super-element. | An element and a key. | Nothing. (void ) |
Note: Implementing the addKeyToElement
method is optional. A default implementation, which simply discards the key and leaves the element unchanged, is provided by the interface. It is, however, recommended to overwrite this method
to identify element groups in the resulting summary graph of the KeyedGrouping operator.
To support label-specific grouping with the labelSpecific
key-function, KeyFunctionWithDefaultValue
has to be implemented instead. This interface only extends the KeyFunction
interface by one method, namely getDefaultKey
. This method has no parameters and should return the default key used
for all elements. This key has the be of the same type as returned by getKey
and may also not be null
.
The grouping example is using a preprocessing step before executing the grouping step. The same result may be achieved by defining a custom key-function:
// Define the key-function.
KeyFunction<EPGMVertex, Integer> customKey = new KeyFunction<EPGMVertex, Integer>() {
@Override public Integer getKey(EPGMVertex v) {
return (int) MathUtils.round(v.getPropertyValue("age").getInt(), -1);
}
@Override public TypeInformation<Integer> getType() {
return BasicTypeInfo.INT_TYPE_INFO;
}
@Override public void addKeyToElement(T element, Object key) {
element.setProperty("age_rounded", PropertyValue.create(key));
}
}
// This key function extracts the property "age" and rounds it.
// The rounded property is stored as "age_rounded" on the super-vertex.
List<KeyFunction<EPGMVertex, ?>> vertexKeys = Arrays.asList(
GroupingKeys.label(), customKey);
LogicalGraph summarizedGraph = n1
.vertexInducedSubgraph(new ByLabel<>("Person"))
.callForGraph(new KeyedGrouping<>(
vertexKeys,
singletonList(new Count("count")),
singletonList(GroupingKeys.label()),
singletonList(new Count("count"))));
A subgraph is a logical graph which contains only vertices and edges that fulfill a given vertex and/or edge predicate. The associated operators are implemented as functions .subgraph()
, .vertexInducedSubgraph()
and .edgeInducedSubgraph()
that can be called on a LogicalGraph
instance. The former allows one of five available execution strategies to be defined, which are listed in the table below.
Strategy | Description |
---|---|
Subgraph.Strategy.BOTH |
Applies both filter functions on the input vertex and edge data set. This is the standard strategy of the .subgraph() function. Note, that this operator does not verify the consistency of the resulting graph. |
Subgraph.Strategy.BOTH_VERIFIED |
Applies both filter functions on the input vertex and edge data set. In addition, this strategy verifies the consistency of the output graph. |
Subgraph.Strategy.VERTEX_INDUCED |
Only applies the vertex filter function and adds the incident edges connecting those vertices via a join. This strategy is applied by the .vertexInducedSubgraph() function. |
Subgraph.Strategy.EDGE_INDUCED |
Only applies the edge filter function and computes the connecting vertices. This strategy is applied by the .edgeInducedSubgraph() function. |
Subgraph.Strategy.EDGE_INDUCED_PROJECT_FIRST |
Only applies the edge filter function and computes the resulting vertices via: DISTINCT((π_source(E) U π_target(E))) |><| V
|
Frequently used filter expressions are predefined in Gradoop. Make sure to check, if the filter is applicable for vertices (V) or edges (E) or both.
The following code shows examples of the different subgraph functions and the use of combined filters:
// Apply vertex and edge filter to the given graph n1 from quickstart
LogicalGraph subgraph = n1
.subgraph(
new LabelIsIn<>("Person", "Company"),
new ByLabel<>("worksAt")
);
// Apply vertex and edge filter with verifying the consistency (no "Company" vertices included)
LogicalGraph verifiedSubgraph = n1
.subgraph(
new LabelIsIn<>("Person", "Company"),
new ByLabel<>("friendsWith")
Subgraph.Strategy.BOTH
).verify();
// Apply only a vertex filter function
LogicalGraph vertexInduced = n1
.vertexInducedSubgraph(new ByProperty<>("age"));
// Apply only a edge filter function
LogicalGraph edgeInduced = n1
.edgeInducedSubgraph(new BySourceId<>(myGradoopId));
// Apply a combined vertex filter function:
LogicalGraph fromCombinedFilters = n1
.vertexInducedSubgraph(
new ByLabel<EPGMVertex>("Person").and(new ByProperty<>("Name", PropertyValue.create("Marc")).negate())
.or(new ByLabel<>("Company")));
Graph sampling is a technique to pick a subset of vertices and/or edges from the original graph. This can be used to obtain a smaller graph if the whole graph is known or, if the graph is unknown, to explore the graph. The provided techniques are either vertex based, edge based, traversal based or a combination of these. They are implemented as operators for Gradoops LogicalGraph
and return the sampled graph. The following sampling methods are provided:
Method | Description |
---|---|
RandomVertexSampling |
Computes a vertex sampling of the graph. Retains randomly chosen vertices of a given relative amount. Retains all edges which source- and target-vertices were chosen. There may retain some unconnected vertices in the sampled graph. Parameters: sampleSize : relative amount of vertices in the sampled graph randomSeed : seed-value for the random number generator |
RandomNonUniformVertexSampling |
Computes a vertex sampling of the graph. Retains randomly chosen vertices of a given relative amount and all edges which source- and target-vertices were chosen. A degree-dependent value is taken into account to have a bias towards high-degree vertices. There may retain some unconnected vertices in the sampled graph. Parameters: sampleSize : relative amount of vertices in the sampled graph randomSeed : seed-value for the random number generator |
RandomLimitedDegreeVertexSampling |
Computes a vertex sampling of the graph. Retains all vertices with a degree greater than a given degree threshold and degree type. Also retains randomly chosen vertices with a degree smaller or equal this threshold. Retains all edges which source- and target-vertices were chosen. Parameters: sampleSize : relative amount of vertices with degree smaller or equal the given degreeThreshold randomSeed : seed-value for the random number generator degreeThreshold : threshold for degrees of sampled vertices degreeType : type of degree considered for sampling, see VertexDegree
|
RandomVertexNeighborhoodSampling |
Computes a vertex sampling of the graph. Retains randomly chosen vertices of a given relative amount and includes all neighbors of those vertices in the sampling. Retains all edges which source- and target-vertices were chosen. Parameters: sampleSize : relative amount of vertices in the sampled graph randomSeed : seed-value for the random number generator neighborType : type of neighborhood considered for sampling, see Neighborhood
|
RandomEdgeSampling |
Computes an edge sampling of the graph. Retains randomly chosen edges of a given relative amount and their associated source- and target-vertices. No unconnected vertices will retain in the sampled graph. Parameters: sampleSize : relative amount of vertices in the sampled graph randomSeed : seed-value for the random number generator |
RandomVertexEdgeSampling |
Computes an edge sampling of the graph. First selects randomly chosen vertices of a given relative amount and all edges which source- and target-vertices were chosen. Then randomly chooses edges of a given relative amount from this set of edges and their associated source- and target-vertices. No unconnected vertices will retain in the sampled graph. Parameters: vertexSampleSize : relative amount of vertices in the sampled graph edgeSampleSize : relative amount of edges in the sampled graph randomSeed : seed-value for the random number generator vertexEdgeSamplingType : can be simple, nonUniform or nonUniformHybrid
|
PageRankSampling |
Computes a PageRank-Sampling of the graph. Uses the Gradoop-Wrapper of Flinks PageRank-algorithm. Retains all vertices with a PageRank-score greater or equal/smaller than a given sampling threshold - depending on the Parameters set. If all vertices got the same PageRank-score, it can be decided whether to sample all vertices or none of them. Retains all edges which source- and target-vertices were chosen. There may retain some unconnected vertices in the sampled graph. Parameters: dampeningFactor : probability of following an out-link, otherwise jump to a random vertex maxIteration : maximum number of algorithm iterations threshold : threshold for the PageRank-score, determining if a vertex is sampled, e.g. 0.5 sampleGreaterThanThreshold : sample vertices with a PageRank-score greater (true ) or equal/smaller (false ) the threshold keepVerticesIfSameScore : sample all vertices (true ) or none of them (false ), if all got the same PageRank-score |
// read the original graph
LogicalGraph graph = readLogicalGraph("path/to/graph", "json");
// define sampling parameters
float sampleSize = 0.3f; // retain vertex with degree smaller or equal degreeThreshold with a probability of 30%
long randomSeed = 0L; // random seed can be 0
long degreeThreshold = 3L; // sample all vertices with degree of degreeType greater than 3
VertexDegree degreeType = VertexDegree.IN; // consider in-degree for sampling
// call sampling method
LogicalGraph sample = new RandomLimitedDegreeVertexSampling(
sampleSize, randomSeed, degreeThreshold, degreeType).sample(graph);
The rollUp operation allows aggregations at multiple levels using a single query.
The operator can be used by instantiating the class directly with the desired options or by calling one of the shown convenience methods of LogicalGraph
:
Function | Description | Returns |
---|---|---|
groupVerticesByRollUp |
Runs rollUp operation with changing aggregation level via varying combinations of vertex grouping keys. |
GraphCollection with the differently grouped graphs |
groupEdgesByRollUp |
Runs rollUp operation with changing aggregation level via varying combinations of edge grouping keys. |
GraphCollection with the differently grouped graphs |
Both methods require the same parameters:
- A list containing the vertex property keys used for grouping.
- A list of property value aggregators used for grouping on vertices.
- A list containing the edge property keys used for grouping.
- A list of property value aggregators used for grouping on edges.
For the following examples, consider a graph containing persons as vertices and phone calls between those persons as edges. Vertices do contain the country, state and city as separate properties and edges, on the other hand, the time of calling with year, month, day, hour and minute as separate properties.
The following code would lead to a graph collection containing graphs grouped by {(:label, country, state, city), (:label, country, state), (:label, country), (:label)}. You would be able to see how many persons initiated a phone call living in the exact same city, state, country and for all persons using a single query.
// Apply vertex based rollUp on previously described graph
List<String> vertexGroupingKeys =
Arrays.asList(Grouping.LABEL_SYMBOL, "country", "state", "city");
List<AggregateFunction> vertexAggregators =
Collections.singletonList(new Count("count"));
List<String> edgeGroupingKeys = Collections.emptyList();
List<AggregateFunction> edgeAggregators = Collections.emptyList();
GraphCollection output =
n1.groupVerticesByRollUp(
vertexGroupingKeys,
vertexAggregators,
edgeGroupingKeys,
edgeAggregators);
Analogously the following code would lead to an edge based rollUp and therefore to a graph collection containing graphs grouped by {(year, month, day, hour, minute), (year, month, day, hour), (year, month, day), (year, month), (year)}. You would be able to see how many persons initiated a phone call within the exact same minute, hour, on the same day, month and year.
// Apply edge based rollUp on previously described graph
List<String> vertexGroupingKeys = Collections.emptyList();
List<AggregateFunction> vertexAggregators = Collections.emptyList();
List<String> edgeGroupingKeys =
Arrays.asList("year", "month", "day", "hour", "minute");
List<AggregateFunction> edgeAggregators =
Collections.singletonList(new Count("count"));
GraphCollection output =
n1.groupVerticesByRollUp(
vertexGroupingKeys,
vertexAggregators,
edgeGroupingKeys,
edgeAggregators);
The split operator splits a given LogicalGraph
into a GraphCollection
based on the values of a pre-defined Property
that is present for a set of vertices in the graph. The operator supports overlapping logical graphs, where a vertex can be in more than one logical graph. Edges, where source and target vertex have no graphs in common, are removed from the resulting collection.
The following code showcases the simplest application of the split operator, which is LogicalGraph
's splitBy
method:
Given is an undirected graph which models the relationships of employees. The graph consists of five vertices labeled employee
which are connected by edges labeled worksWith
.
String coworkingGraph = "coworking[" +
"(v0:employee {age: 30, employedSince: 2010, role: \"developer\"})" +
"(v1:employee {age: 25, employedSince: 2010, role: \"developer\"})" +
"(v2:employee {age: 36, employedSince: 2013, role: \"manager\"})" +
"(v3:employee {age: 29, employedSince: 2013, role: \"developer\"})" +
"(v4:employee {age: 35, employedSince: 2013, role: \"manager\"})" +
"(v0)-[:worksWith]->(v1)-[:worksWith]->(v0)" +
"(v0)-[:worksWith]->(v3)-[:worksWith]->(v0)" +
"(v1)-[:worksWith]->(v2)-[:worksWith]->(v1)" +
"(v1)-[:worksWith]->(v4)-[:worksWith]->(v1)" +
"(v2)-[:worksWith]->(v4)-[:worksWith]->(v2)" +
"(v2)-[:worksWith]->(v3)-[:worksWith]->(v2)" +
"(v3)-[:worksWith]->(v4)-[:worksWith]->(v1)" +
"]";
After we initialized the database and retrieved a LogicalGraph
, we can use splitBy
as follows:
GraphCollection result = coworking.splitBy("role");
This yields the following GraphCollection
:
gc [
g0: [
(v2)-[:worksWith]->(v4)
(v4)-[:worksWith]->(v2)
]
g1: [
(v0)-[:worksWith]->(v1)
(v1)-[:worksWith]->(v0)
(v3)-[:worksWith]->(v0)
(v0)-[:worksWith]->(v3)
]
]