-
Notifications
You must be signed in to change notification settings - Fork 19
Data Pipelines with Markdown Templates? #593
Comments
Thinking about this a bit more over the weekend, we might want to go with an even simpler solution: # A list of cars
---
# this can be just the URL of a known data source, the pipeline will know how to fetch the JSON representation
hlx_data: https://docs.google.com/spreadsheets/d/1IX0g5P74QnHPR3GW1AMCdTk_-m954A-FKZRT2uOZY7k/edit#gid=0
# a JSON pointer expression that points to the root node to start iterating
hlx_root: /sheets[1]
# a sift https://www.npmjs.com/package/sift expression to filter from the root
hlx_filter:
maker:
$eq: Tesla
---
## {{maker}} {{model}}
{{model}} was first released in {{year}}
In this example, the iteration would be implicit (no |
I would stick to a proper, well documented template engine, like handlebars and not come up with our own simple solution. the the example above, how would you mix repetition with fixed content? |
In the example above, # A list of cars would be fixed content (because it's outside of the section) and ## {{maker}} {{model}}
{{model}} was first released in {{year}} would be repeated for each matching element in |
don't you think, this is too much magic?
I can just imagine, that creating a very simple solution soon is not enough, and developers will ask for a more comprehensive template language. |
I'm not sure about this yet. My proposal:
You don't. If you need to combine multiple sources, write a
You loop only through the current section. You may argue with me that there can be multiple sections, each with a data source of its own to be looped.
I don't think I'm violating the letter of the law (no mockups here) or the spirit. I'd even argue that my proposal does not support the creation of looped Markdown tables, so I'm feeling safe.
The YAML front matter definitely has potential for weirdness, and so do the At the moment I see the most viable choices to be Handlebars and MDX (Markdown + JSX). Handlebars looks like the original example: ---
# this can be just the URL of a known data source, the pipeline will know how to fetch the JSON representation
hlx_data: https://docs.google.com/spreadsheets/d/1IX0g5P74QnHPR3GW1AMCdTk_-m954A-FKZRT2uOZY7k/edit#gid=0
# TDB, we might want to use a different templating language
hlx_template: handlebars
# a JSON pointer expression that points to the root node to start iterating
hlx_root: /sheets[1]
# a sift https://www.npmjs.com/package/sift expression to filter from the root
hlx_filter:
maker:
$eq: Tesla
---
# A list of cars
{{#each row}}
## {{this.maker}} {{this.model}}
{{this.model}} was first released in {{this.year}}
{{/each}} MDX would look like this: import { Each, Text } from "helix-mdx";
# A list of cars
<Each data=" https://docs.google.com/spreadsheets/d/1IX0g5P74QnHPR3GW1AMCdTk_-m954A-FKZRT2uOZY7k/edit#gid=0" root="/sheets[1]">
<Text src="{{model}}"/> was first released in <Text src="{{year}}">
</Each> Using MDX would technically not introduce another templating language (as Helix already supports JSX), but I just can't see this being a good idea. Handlebars may be a viable choice, but I think we might also add this functionality later when we need it. |
Even simpler (suggested by @davidnuescheler) # A list of cars
https://docs.google.com/spreadsheets/d/1IX0g5P74QnHPR3GW1AMCdTk_-m954A-FKZRT2uOZY7k/edit?root=sheets[1]&filter=maker:Tesla
## {{maker}} {{model}}
{{model}} was first released in {{year}}
---
That was a cool list, wasn't it? we would detect the |
Another question from @davidnuescheler:
This would have the advantage of having markdown that is decidedly non-weird (no templates at all), but poses the question of how to specify the template (we cannot assume to guessing the correct formatting correctly) and how to run it, especially in the light of Helix Pages' no server side imperative code paradigm. |
injects values from data embeds, but does not support sections or deep object access yet fix #593
@davidnuescheler I've implemented your penultimate idea (couldn't come up with a way to implement the last suggestion) in #611 – please take a look at the documentation here: https://github.com/adobe/helix-pipeline/blob/data-embeds/docs/markdown.md We could (quite easily) also implement the frontmatter syntax that I suggested, but I'd leave that for a separate PR. |
# [6.8.0](v6.7.5...v6.8.0) (2020-03-18) ### Bug Fixes * **data-sections:** ensure that unist map can handle async callbacks ([4f80666](4f80666)) * **data-sections:** ensure that unist map can handle async callbacks ([13dca79](13dca79)) * **embeds:** add proper error handling (logging) for failed data embed downloads ([a221714](a221714)), closes [#593](#593) * **embeds:** add proper error handling (logging) for failed data embed downloads ([79d194e](79d194e)), closes [#593](#593) * **embeds:** data embeds update the surrogate key based on the source URL ([638415b](638415b)) * **embeds:** remove dataEmbed nodes from mdast after detection ([ae6445c](ae6445c)) * **embeds:** remove dataEmbed nodes from mdast after detection ([0ab4dba](0ab4dba)) * **embeds:** use a proper logger when fetching data embeds ([aca4798](aca4798)) * **embeds:** use a proper logger when fetching data embeds ([2f959a0](2f959a0)) ### Features * **embeds:** add data section extraction step ([e7bc55a](e7bc55a)) * **embeds:** add data section extraction step ([bd721a5](bd721a5)) * **embeds:** detect data embeds ([155df67](155df67)), closes [/github.com//issues/593#issuecomment-590956631](https://github.com//github.com/adobe/helix-pipeline/issues/593/issues/issuecomment-590956631) * **embeds:** detect data embeds ([fec11d4](fec11d4)), closes [/github.com//issues/593#issuecomment-590956631](https://github.com//github.com/adobe/helix-pipeline/issues/593/issues/issuecomment-590956631) * **embeds:** implement data embeds for sections ([de54ccb](de54ccb)), closes [#593](#593) * **embeds:** implement data embeds for sections ([366a7e9](366a7e9)), closes [#593](#593) * **embeds:** provide basic data injection ([e733e24](e733e24)), closes [#593](#593) * **embeds:** provide basic data injection ([355b2ba](355b2ba)), closes [#593](#593) * **embeds:** support dot notation `{{foo.bar}}` in data embed templates ([283972a](283972a)), closes [#593](#593) * **embeds:** support dot notation `{{foo.bar}}` in data embed templates ([ace965e](ace965e)), closes [#593](#593) * **utils:** add cache-utils for merging cache-control headers ([3568d39](3568d39))
🎉 This issue has been resolved in version 6.8.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
I'd like to discuss two aspects:
|
since we are injecting markdown, the document gets sanitized the same way as it is normal markdown content. |
alternative to handlebars, would be using the HTL expressions, eg: But I think, handlebars is more versatile, since it would allow for more control flow in the future. |
That's my assumption, too. |
We should add a test to confirm it :) |
spoiler alert: yes, they are #593 (comment)
Since #356 we don't have the XSS sanitizer enabled. |
That's fine for markup from a markdown file, but bad for markup from a large spreadsheet. |
I'd think so, too. How would we fix this? with |
fb116cd adds another test, this time trying to brute-force inject |
I'd go with filtering everything regardless, and wait for customers to complain. |
As a continuation of the discussion we had around query-backed pipelines (adobe/helix-home#90) I'm trying to write up a mini-spec around an offhand remark @davidnuescheler made along the lines of "I think query-backed pipelines should be data pipelines – and they should use Markdown templates". This is an obvious riff on @dylandepass's original headless CMS POC, so I'd love to hear his feedback.
Use case
Some content is just not hypertext-shaped, but effectively spreadsheet-shaped, for instance:
Current Solution
In the current solution (see
theblog
for an example), we would fetch the JSON representation of the data on the client-side, then navigate to "the big array", iterate over it and apply HTML templates (or some other templating language) to update the DOM.This has the drawback of creating additional engineering complexity and runtime complexity, because you now have to deal with two templating systems (HTL on the server-side and whatever you have on the client side)
Proposed solution
Instead of using a separate pipeline which would come with the problem of selecting the correct extension or selector, we supercharge the normal Markdown to HTML pipeline, so that it can detect some special directives in the Markdown front matter:
Upon detecting a
hlx_data
tag in the front matter, the pipeline will:hlx_root
to navigate to an array nodesift
andhlx_filter
to remove elements from the array that don't matchhlx_template
(I'm picking handlebars here, @dylandepass suggested Squirrely) to generate a markdown documentIntended Limitations
In the spirit of "how simple can we make things and still get away with it", there are some limitations:
bindings
like in @dylandepass's original design. If you want to combine multiple sources, use ESI or acgi-bin
scriptThe text was updated successfully, but these errors were encountered: