Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL / RDF Machine Learning guides #252

Merged
merged 50 commits into from
Feb 22, 2022
Merged
Show file tree
Hide file tree
Changes from 49 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
e99dac7
added RDF ML notebooks
charlesivie Jan 11, 2022
92437ec
cleaning up the code a bit
charlesivie Jan 11, 2022
c864266
completed regression and classification for SPARQL ML
charlesivie Jan 27, 2022
0441b79
added RDF ML notebooks
charlesivie Jan 11, 2022
3d77b3b
cleaning up the code a bit
charlesivie Jan 11, 2022
6180887
completed regression and classification for SPARQL ML
charlesivie Jan 27, 2022
eace0a6
Complete notebooks for SPARQL Classification - Regression and Link Pr…
charlesivie Jan 27, 2022
2af95af
Complete notebooks for SPARQL Classification - Regression and Link Pr…
charlesivie Jan 27, 2022
a17c5f8
Merge branch 'RDF-ML' of github.com:aws/graph-notebook into RDF-ML
charlesivie Jan 27, 2022
0d4eb53
Complete notebooks for SPARQL Classification - Regression and Link Pr…
charlesivie Jan 27, 2022
1a46a82
Complete notebooks for SPARQL Classification - Regression and Link Pr…
charlesivie Jan 27, 2022
2202df7
Rename gremlin profile args for clarity (#249)
michaelnchin Feb 2, 2022
1dc71f8
Add --results-per-page query option (#242)
michaelnchin Feb 2, 2022
b5b48d0
Pin ipython<7.17.0 to patch vulnerability (#250)
michaelnchin Feb 2, 2022
12a5408
Disable root logger output (#248)
michaelnchin Feb 2, 2022
bf8f49d
Fix OC Bolt metadata (#255)
michaelnchin Feb 3, 2022
76c3c04
Add groupby raw node result option (#253)
michaelnchin Feb 3, 2022
659b590
Add Gremlin group-by-depth (#251)
michaelnchin Feb 4, 2022
328df92
addressing comment left in PR https://github.com/aws/graph-notebook/p…
charlesivie Feb 7, 2022
2f07062
Update ChangeLog.md
michaelnchin Feb 7, 2022
486c441
Suffix all doubles with d. Node batch reduced to 40 (#257)
krlawrence Feb 8, 2022
b5d1885
Convert Decimal type results to float for Gremlin (#256)
michaelnchin Feb 8, 2022
8b611ea
removed hard coded genre deletes, and moved them into the notebook.
charlesivie Feb 8, 2022
96cf4b3
removed hard coded genre deletes, and moved them into the notebook.
charlesivie Feb 8, 2022
f23ffcc
removed hard coded ciritcScore deletes, and moved them into the noteb…
charlesivie Feb 8, 2022
deb3475
removed hard coded link prediction retractions, and moved them into t…
charlesivie Feb 9, 2022
f7c1387
completed the Getting Started notebook with the pretrained models.
charlesivie Feb 10, 2022
6ec410a
completed the Getting Started notebook with the pretrained models.
charlesivie Feb 10, 2022
e0bcb5c
added RDF ML notebooks
charlesivie Jan 11, 2022
fc044c9
cleaning up the code a bit
charlesivie Jan 11, 2022
525517f
completed regression and classification for SPARQL ML
charlesivie Jan 27, 2022
3aa1c71
Complete notebooks for SPARQL Classification - Regression and Link Pr…
charlesivie Jan 27, 2022
2b8a6c5
Complete notebooks for SPARQL Classification - Regression and Link Pr…
charlesivie Jan 27, 2022
5db47ba
added RDF ML notebooks
charlesivie Jan 11, 2022
95f1ab8
cleaning up the code a bit
charlesivie Jan 11, 2022
2c15347
completed regression and classification for SPARQL ML
charlesivie Jan 27, 2022
6b1c763
Complete notebooks for SPARQL Classification - Regression and Link Pr…
charlesivie Jan 27, 2022
9368594
Complete notebooks for SPARQL Classification - Regression and Link Pr…
charlesivie Jan 27, 2022
1a3efdc
addressing comment left in PR https://github.com/aws/graph-notebook/p…
charlesivie Feb 7, 2022
b285ab0
removed hard coded genre deletes, and moved them into the notebook.
charlesivie Feb 8, 2022
899d0f8
removed hard coded genre deletes, and moved them into the notebook.
charlesivie Feb 8, 2022
bed78fa
removed hard coded ciritcScore deletes, and moved them into the noteb…
charlesivie Feb 8, 2022
c98d96f
removed hard coded link prediction retractions, and moved them into t…
charlesivie Feb 9, 2022
a6f6330
completed the Getting Started notebook with the pretrained models.
charlesivie Feb 10, 2022
3e1ee57
completed the Getting Started notebook with the pretrained models.
charlesivie Feb 10, 2022
e4bd2fb
Merge branch 'RDF-ML' of github.com:aws/graph-notebook into RDF-ML
charlesivie Feb 10, 2022
41c88c1
tests passing for new SPARQL ML notebooks
charlesivie Feb 11, 2022
e4aa9e6
Added detail fo new SPARQL/RDF ML notebooks to the change log
charlesivie Feb 11, 2022
c1ae8ab
fixed bugs in link prediction, amde small improvements after review
charlesivie Feb 15, 2022
56061bd
updated location of pre-trained models
charlesivie Feb 17, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions ChangeLog.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,20 @@

Starting with v1.31.6, this file will contain a record of major features and updates made in each release of graph-notebook.

## Upcoming
- Updated the airports property graph seed files to the latest level and suffixed all doubles with 'd'. ([Link to PR](https://github.com/aws/graph-notebook/pull/257))
- Added grouping by depth for Gremlin and openCypher queries ([PR #1](https://github.com/aws/graph-notebook/pull/241))([PR #2](https://github.com/aws/graph-notebook/pull/251))
- Added grouping by raw node results ([Link to PR](https://github.com/aws/graph-notebook/pull/253))
- Added `--no-scroll` option for disabling truncation of query result pages ([Link to PR](https://github.com/aws/graph-notebook/pull/243))
- Added `--results-per-page` option ([Link to PR](https://github.com/aws/graph-notebook/pull/242))
- Added relaxed seed command error handling ([Link to PR](https://github.com/aws/graph-notebook/pull/246))
- Renamed Gremlin profile query options for clarity ([Link to PR](https://github.com/aws/graph-notebook/pull/249))
- Suppressed default root logger error output ([Link to PR](https://github.com/aws/graph-notebook/pull/248))
- Fixed Gremlin visualizer bug with handling non-string node IDs ([Link to PR](https://github.com/aws/graph-notebook/pull/245))
- Fixed error in openCypher Bolt query metadata output ([Link to PR](https://github.com/aws/graph-notebook/pull/255))
- Fixed handling of Decimal type properties when rendering Gremlin query results ([Link to PR](https://github.com/aws/graph-notebook/pull/256))
- Added new notebooks: guides for using SPARQL and RDF with Neptune ML ([Link to PR](https://github.com/aws/graph-notebook/pull/252))

## Release 3.1.1 (December 21, 2021)
- Added new dataset for DiningByFriends, and associated notebook ([Link to PR](https://github.com/aws/graph-notebook/pull/235))
- Added new Neptune ML Sample Application for People Analytics ([Link to PR](https://github.com/aws/graph-notebook/pull/235))
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ jupyter-contrib-nbextensions
widgetsnbextension
gremlinpython>=3.5.1
requests==2.24.0
ipython>=7.16.1
ipython>=7.16.1,<7.17.0
ipykernel==5.3.4
neo4j==4.2.1
rdflib~=5.0.0
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ def get_version():
'requests-aws4auth==1.0.1',
'botocore>=1.19.37',
'boto3>=1.17.58',
'ipython>=7.16.1,<=7.19.0',
'ipython>=7.16.1,<7.17.0',
'neo4j==4.3.2',
'rdflib==5.0.0',
'ipykernel==5.3.4',
Expand Down
83 changes: 64 additions & 19 deletions src/graph_notebook/magics/graph_magic.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@
UNRESTRICTED_LAYOUT = widgets.Layout()

logging.basicConfig()
root_logger = logging.getLogger()
logger = logging.getLogger("graph_magic")

DEFAULT_MAX_RESULTS = 1000
Expand Down Expand Up @@ -140,6 +141,15 @@ def query_type_to_action(query_type):
return 'sparqlupdate'


def results_per_page_check(results_per_page):
if results_per_page < 1:
return 1
elif results_per_page > 1000:
return 1000
else:
return int(results_per_page)


# TODO: refactor large magic commands into their own modules like what we do with %neptune_ml
# noinspection PyTypeChecker
@magics_class
Expand All @@ -160,6 +170,7 @@ def __init__(self, shell):
self.max_results = DEFAULT_MAX_RESULTS
self.graph_notebook_vis_options = OPTIONS_DEFAULT_DIRECTED
self._generate_client_from_config(self.graph_notebook_config)
root_logger.setLevel(logging.CRITICAL)
logger.setLevel(logging.ERROR)

def _generate_client_from_config(self, config: Configuration):
Expand Down Expand Up @@ -260,6 +271,8 @@ def sparql(self, line='', cell='', local_ns: dict = None):
choices=['text/csv', 'text/html'])
parser.add_argument('-g', '--group-by', type=str, default='',
help='Property used to group nodes.')
parser.add_argument('-gr', '--group-by-raw', action='store_true', default=False,
help="Group nodes by the raw binding")
parser.add_argument('-d', '--display-property', type=str, default='',
help='Property to display the value of on each node.')
parser.add_argument('-de', '--edge-display-property', type=str, default='',
Expand All @@ -279,6 +292,8 @@ def sparql(self, line='', cell='', local_ns: dict = None):
parser.add_argument('-sd', '--simulation-duration', type=int, default=1500,
help='Specifies maximum duration of visualization physics simulation. Default is 1500ms')
parser.add_argument('--silent', action='store_true', default=False, help="Display no query output.")
parser.add_argument('-r', '--results-per-page', type=int, default=10,
help='Specifies how many query results to display per page in the output. Default is 10')
parser.add_argument('--no-scroll', action='store_true', default=False,
help="Display the entire output without a scroll bar.")
args = parser.parse_args(line.split())
Expand Down Expand Up @@ -344,7 +359,8 @@ def sparql(self, line='', cell='', local_ns: dict = None):
label_max_length=args.label_max_length,
edge_label_max_length=args.edge_label_max_length,
ignore_groups=args.ignore_groups,
expand_all=args.expand_all)
expand_all=args.expand_all,
group_by_raw=args.group_by_raw)

sn.extract_prefix_declarations_from_query(cell)
try:
Expand All @@ -365,8 +381,10 @@ def sparql(self, line='', cell='', local_ns: dict = None):
rows_and_columns = sparql_get_rows_and_columns(results)
if rows_and_columns is not None:
table_id = f"table-{str(uuid.uuid4())[:8]}"
visible_results = results_per_page_check(args.results_per_page)
first_tab_html = sparql_table_template.render(columns=rows_and_columns['columns'],
rows=rows_and_columns['rows'], guid=table_id)
rows=rows_and_columns['rows'], guid=table_id,
amount=visible_results)

# Handling CONSTRUCT and DESCRIBE on their own because we want to maintain the previous result
# pattern of showing a tsv with each line being a result binding in addition to new ones.
Expand Down Expand Up @@ -455,6 +473,10 @@ def gremlin(self, line, cell, local_ns: dict = None):
parser.add_argument('-p', '--path-pattern', default='', help='path pattern')
parser.add_argument('-g', '--group-by', type=str, default='T.label',
help='Property used to group nodes (e.g. code, T.region) default is T.label')
parser.add_argument('-gd', '--group-by-depth', action='store_true', default=False,
help="Group nodes based on path hierarchy")
parser.add_argument('-gr', '--group-by-raw', action='store_true', default=False,
help="Group nodes by the raw result")
parser.add_argument('-d', '--display-property', type=str, default='T.label',
help='Property to display the value of on each node, default is T.label')
parser.add_argument('-de', '--edge-display-property', type=str, default='T.label',
Expand All @@ -471,21 +493,23 @@ def gremlin(self, line, cell, local_ns: dict = None):
help='Specifies max length of edge labels, in characters. Default is 10')
parser.add_argument('--store-to', type=str, default='', help='store query result to this variable')
parser.add_argument('--ignore-groups', action='store_true', default=False, help="Ignore all grouping options")
parser.add_argument('--no-results', action='store_false', default=True,
parser.add_argument('--profile-no-results', action='store_false', default=True,
help='Display only the result count. If not used, all query results will be displayed in '
'the profile report by default.')
parser.add_argument('--chop', type=int, default=250,
parser.add_argument('--profile-chop', type=int, default=250,
help='Property to specify max length of profile results string. Default is 250')
parser.add_argument('--serializer', type=str, default='application/json',
parser.add_argument('--profile-serializer', type=str, default='application/json',
help='Specify how to serialize results. Allowed values are any of the valid MIME type or '
'TinkerPop driver "Serializers" enum values. Default is application/json')
parser.add_argument('--indexOps', action='store_true', default=False,
parser.add_argument('--profile-indexOps', action='store_true', default=False,
help='Show a detailed report of all index operations.')
parser.add_argument('-sp', '--stop-physics', action='store_true', default=False,
help="Disable visualization physics after the initial simulation stabilizes.")
parser.add_argument('-sd', '--simulation-duration', type=int, default=1500,
help='Specifies maximum duration of visualization physics simulation. Default is 1500ms')
parser.add_argument('--silent', action='store_true', default=False, help="Display no query output.")
parser.add_argument('-r', '--results-per-page', type=int, default=10,
help='Specifies how many query results to display per page in the output. Default is 10')
parser.add_argument('--no-scroll', action='store_true', default=False,
help="Display the entire output without a scroll bar.")

Expand Down Expand Up @@ -518,18 +542,18 @@ def gremlin(self, line, cell, local_ns: dict = None):
else:
first_tab_html = pre_container_template.render(content='No explain found')
elif mode == QueryMode.PROFILE:
logger.debug(f'results: {args.no_results}')
logger.debug(f'chop: {args.chop}')
logger.debug(f'serializer: {args.serializer}')
logger.debug(f'indexOps: {args.indexOps}')
if args.serializer in serializers_map:
serializer = serializers_map[args.serializer]
logger.debug(f'results: {args.profile_no_results}')
logger.debug(f'chop: {args.profile_chop}')
logger.debug(f'serializer: {args.profile_serializer}')
logger.debug(f'indexOps: {args.profile_indexOps}')
if args.profile_serializer in serializers_map:
serializer = serializers_map[args.profile_serializer]
else:
serializer = args.serializer
profile_args = {"profile.results": args.no_results,
"profile.chop": args.chop,
serializer = args.profile_serializer
profile_args = {"profile.results": args.profile_no_results,
"profile.chop": args.profile_chop,
"profile.serializer": serializer,
"profile.indexOps": args.indexOps}
"profile.indexOps": args.profile_indexOps}
res = self.client.gremlin_profile(query=cell, args=profile_args)
res.raise_for_status()
query_res = res.content.decode('utf-8')
Expand All @@ -555,6 +579,8 @@ def gremlin(self, line, cell, local_ns: dict = None):
logger.debug(f'label_max_length: {args.label_max_length}')
logger.debug(f'ignore_groups: {args.ignore_groups}')
gn = GremlinNetwork(group_by_property=args.group_by, display_property=args.display_property,
group_by_raw=args.group_by_raw,
group_by_depth=args.group_by_depth,
edge_display_property=args.edge_display_property,
tooltip_property=args.tooltip_property,
edge_tooltip_property=args.edge_tooltip_property,
Expand All @@ -581,7 +607,9 @@ def gremlin(self, line, cell, local_ns: dict = None):
f'unable to create gremlin network from result. Skipping from result set: {value_error}')

table_id = f"table-{str(uuid.uuid4()).replace('-', '')[:8]}"
first_tab_html = gremlin_table_template.render(guid=table_id, results=query_res)
visible_results = results_per_page_check(args.results_per_page)
first_tab_html = gremlin_table_template.render(guid=table_id, results=query_res,
amount=visible_results)

if not args.silent:
metadata_output = widgets.Output(layout=gremlin_layout)
Expand Down Expand Up @@ -1581,10 +1609,12 @@ def on_button_clicked(b=None):
@line_magic
def enable_debug(self, line):
logger.setLevel(logging.DEBUG)
root_logger.setLevel(logging.ERROR)

@line_magic
def disable_debug(self, line):
logger.setLevel(logging.ERROR)
root_logger.setLevel(logging.CRITICAL)

@line_magic
def graph_notebook_version(self, line):
Expand Down Expand Up @@ -1638,6 +1668,10 @@ def handle_opencypher_query(self, line, cell, local_ns):
parser = argparse.ArgumentParser()
parser.add_argument('-g', '--group-by', type=str, default='~labels',
help='Property used to group nodes (e.g. code, ~id) default is ~labels')
parser.add_argument('-gd', '--group-by-depth', action='store_true', default=False,
help="Group nodes based on path hierarchy")
parser.add_argument('-gr', '--group-by-raw', action='store_true', default=False,
help="Group nodes by the raw result")
parser.add_argument('mode', nargs='?', default='query', help='query mode [query|bolt]',
choices=['query', 'bolt'])
parser.add_argument('-d', '--display-property', type=str, default='~labels',
Expand All @@ -1661,6 +1695,8 @@ def handle_opencypher_query(self, line, cell, local_ns):
parser.add_argument('-sd', '--simulation-duration', type=int, default=1500,
help='Specifies maximum duration of visualization physics simulation. Default is 1500ms')
parser.add_argument('--silent', action='store_true', default=False, help="Display no query output.")
parser.add_argument('-r', '--results-per-page', type=int, default=10,
help='Specifies how many query results to display per page in the output. Default is 10')
parser.add_argument('--no-scroll', action='store_true', default=False,
help="Display the entire output without a scroll bar.")
args = parser.parse_args(line.split())
Expand Down Expand Up @@ -1689,6 +1725,8 @@ def handle_opencypher_query(self, line, cell, local_ns):
query_time=query_time)
try:
gn = OCNetwork(group_by_property=args.group_by, display_property=args.display_property,
group_by_raw=args.group_by_raw,
group_by_depth=args.group_by_depth,
edge_display_property=args.edge_display_property,
tooltip_property=args.tooltip_property,
edge_tooltip_property=args.edge_tooltip_property,
Expand All @@ -1706,7 +1744,12 @@ def handle_opencypher_query(self, line, cell, local_ns):
logger.debug(f'Unable to create network from result. Skipping from result set: {res}')
logger.debug(f'Error: {network_creation_error}')
elif args.mode == 'bolt':
res = self.client.opencyper_bolt(cell)
query_start = time.time() * 1000
res = self.client.opencyper_bolt(cell)
query_time = time.time() * 1000 - query_start
if not args.silent:
oc_metadata = build_opencypher_metadata_from_query(query_type='bolt', results=res,
query_time=query_time)
# Need to eventually add code to parse and display a network for the bolt format here

if not args.silent:
Expand All @@ -1723,8 +1766,10 @@ def handle_opencypher_query(self, line, cell, local_ns):
titles.append('Console')
if rows_and_columns is not None:
table_id = f"table-{str(uuid.uuid4())[:8]}"
visible_results = results_per_page_check(args.results_per_page)
table_html = opencypher_table_template.render(columns=rows_and_columns['columns'],
rows=rows_and_columns['rows'], guid=table_id)
rows=rows_and_columns['rows'], guid=table_id,
amount=visible_results)

# Display Graph Tab (if exists)
if force_graph_output:
Expand Down
16 changes: 12 additions & 4 deletions src/graph_notebook/magics/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,15 +212,23 @@ def build_gremlin_metadata_from_query(query_type: str, results: any, res: Respon
gremlin_metadata = set_gremlin_profile_metrics(gremlin_metadata=gremlin_metadata, profile_str=results)
return gremlin_metadata
else: # default Gremlin query
return build_propertygraph_metadata_from_default_query(results=results, query_time=query_time)
return build_propertygraph_metadata_from_default_query(results=results,
query_type=query_type,
query_time=query_time)


def build_opencypher_metadata_from_query(query_type: str, results: any, res: Response = None, query_time: float = None) -> Metadata:
return build_propertygraph_metadata_from_default_query(results=results['results'], query_time=query_time)
if query_type == 'bolt':
res_final = results
else:
res_final = results['results']
return build_propertygraph_metadata_from_default_query(results=res_final,
query_type=query_type,
query_time=query_time)


def build_propertygraph_metadata_from_default_query(results: any, query_time: float = None) -> Metadata:
propertygraph_metadata = create_propertygraph_metadata_obj('query')
def build_propertygraph_metadata_from_default_query(results: any, query_type: str = 'query', query_time: float = None) -> Metadata:
propertygraph_metadata = create_propertygraph_metadata_obj(query_type)
propertygraph_metadata.set_metric_value('request_time', query_time)
propertygraph_metadata.set_metric_value('resp_size', sys.getsizeof(results))
propertygraph_metadata.set_metric_value('results', len(results))
Expand Down
Loading