WIP: Add serve_web function #11

dali99 · 2024-10-25T06:09:02Z

This isn't complete yet, but:

using this (thrown together from other tests):

#First part:
import duckdb
import polars as pl
import time

class MyDuckDB():
    def __init__(self):
        con = duckdb.connect()
        con.execute("SET TIME ZONE 'UTC';")
        con.execute("""CREATE TABLE ts1 ("timestamp" TIMESTAMPTZ, "value" INTEGER)""")
        ts_1 = pl.read_csv("ts1.csv", try_parse_dates=True).with_columns(pl.col("timestamp").dt.replace_time_zone("UTC"))
        con.append("ts1", df=ts_1.to_pandas())
        con.execute("""CREATE TABLE ts2 ("timestamp" TIMESTAMPTZ, "value" INTEGER)""")
        ts_2 = pl.read_csv("ts2.csv", try_parse_dates=True).with_columns(pl.col("timestamp").dt.replace_time_zone("UTC"))
        con.append("ts2", df=ts_2.to_pandas())
        self.con = con

    def query(self, sql:str) -> pl.DataFrame:
        # We execute the query and return it as a Polars DataFrame.
        # Chrontext expects this method to exist in the provided class.
        df = self.con.execute(sql).pl()
        return df

my_db = MyDuckDB()

#Second part:
from sqlalchemy import MetaData, Table, Column, bindparam
metadata = MetaData()
ts1_table = Table(
    "ts1",
    metadata,
    Column("timestamp"),
    Column("value")
)
ts2_table = Table(
    "ts2",
    metadata,
    Column("timestamp"),
    Column("value")
)
ts1 = ts1_table.select().add_columns(
    bindparam("id1", "ts1").label("id"),
)
ts2 = ts2_table.select().add_columns(
    bindparam("id2", "ts2").label("id"),
)
sql = ts1.union(ts2)

#Third part
from chrontext import VirtualizedPythonDatabase

vdb = VirtualizedPythonDatabase(
    database=my_db,
    resource_sql_map={"my_resource": sql},
    sql_dialect="postgres"
)

#Fourth part
from chrontext import Prefix, Variable, Template, Parameter, RDFType, Triple, XSD
ct = Prefix("ct", "https://github.com/DataTreehouse/chrontext#")
xsd = XSD()
id = Variable("id")
timestamp = Variable("timestamp")
value = Variable("value")
dp = Variable("dp")
resources = {
    "my_resource": Template(
        iri=ct.suf("my_resource"),
        parameters=[
            Parameter(id, rdf_type=RDFType.Literal(xsd.string)),
            Parameter(timestamp, rdf_type=RDFType.Literal(xsd.dateTime)),
            Parameter(value, rdf_type=RDFType.Literal(xsd.double)),
        ],
        instances=[
            Triple(id, ct.suf("hasDataPoint"), dp),
            Triple(dp, ct.suf("hasValue"), value),
            Triple(dp, ct.suf("hasTimestamp"), timestamp)
        ]
    )}

#Fifth part
from chrontext import Engine, SparqlEmbeddedOxigraph
oxigraph_store = SparqlEmbeddedOxigraph(rdf_file="my_graph.ttl", path="oxigraph_db_tutorial")
engine = Engine(
    resources,
    virtualized_python_database=vdb,
    sparql_embedded_oxigraph=oxigraph_store)
engine.init()

#Sixth part
q = """
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
PREFIX chrontext:<https://github.com/DataTreehouse/chrontext#>
PREFIX types:<http://example.org/types#>
SELECT ?w (SUM(?v) as ?sum_v) WHERE {
    ?w types:hasSensor ?s .
    ?s a types:ThingCounter .
    ?s chrontext:hasTimeseries ?ts .
    ?ts chrontext:hasDataPoint ?dp .
    ?dp chrontext:hasTimestamp ?t .
    ?dp chrontext:hasValue ?v .
    FILTER(?t > "2022-06-01T08:46:53Z"^^xsd:dateTime) .
} GROUP BY ?w
"""
df = engine.query(q)
assert df.shape == (2,2)
print(df)

engine.serve_web("0.0.0.0:3000")
input("Press to exit")

TODO:

Move into separate crate?
Gate behind feature?
Vendor or build yasgui from source

dali99

Some specific questions. Other feedback also requested though

lib/chrontext/src/web.rs

dali99 · 2024-10-25T06:27:46Z

lib/chrontext/src/web.rs

+
+#[derive(Clone)]
+struct AppState {
+    sparql_engine: Arc<(dyn SparqlQueryable)>,


Is there any reason I might be missing as to maybe implementing this over engine instead of SparqlQueryables?

I think this is only letting you run queries on some underlying sparql database, and I'm worried about how this might interact with the virtualization stuff.

the Engine::query return type (DataFrame, HashMap<String, RDFNodeType>, Vec<Context>) is pretty complicated, so I need to take a deeper dive into how this works

Should be implemented over engine, yes.

The DataFrame, HashMap<String, RDFNodeType> representation is a column-based encoding of a result.
For each variable, there is a column. The map holds the RDF type of the column. In case the variable has multiple types, there is a Struct-column with multiple columns for that type.

There is https://github.com/DataTreehouse/maplib/blob/main/lib/representation/src/polars_to_rdf.rs which maps the df and types to a row based result of the kind we need here. Might need a bit of cleaning up though, but should be fairly well tested.

Allows querying the database via http

run rustfmt

60cc657

dali99 commented Oct 25, 2024

View reviewed changes

WIP: Implement serve_web

609faad

Allows querying the database via http

dali99 force-pushed the sparql-web branch from 20174eb to 609faad Compare October 25, 2024 10:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add serve_web function #11

WIP: Add serve_web function #11

dali99 commented Oct 25, 2024 •

edited

Loading

dali99 left a comment

dali99 Oct 25, 2024

magbak Oct 26, 2024

WIP: Add serve_web function #11

Are you sure you want to change the base?

WIP: Add serve_web function #11

Conversation

dali99 commented Oct 25, 2024 • edited Loading

TODO:

dali99 left a comment

Choose a reason for hiding this comment

dali99 Oct 25, 2024

Choose a reason for hiding this comment

magbak Oct 26, 2024

Choose a reason for hiding this comment

dali99 commented Oct 25, 2024 •

edited

Loading