Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HTML schema for separator block #914

Closed
wants to merge 5 commits into from
Closed

Add HTML schema for separator block #914

wants to merge 5 commits into from

Conversation

nylen
Copy link
Member

@nylen nylen commented May 26, 2017

This PR is an initial step towards handling editor parsing as described in #391. It adds the concept of a "block schema" with which a block can define the markup shape(s) it understands how to handle.

This functionality is eventually intended to replace most of the attributes and save callbacks for static blocks (blocks that do not need to fetch data from external sources other than post_content; as described in #391, these blocks should store their data inside semantically meaningful markup that does not change from render to render).

Goals of this PR

Here is what I hope to achieve in this initial PR:

  • The core/separator block defines a very simple schema (a single <hr /> element). This schema resembles the HTML markup that it matches.
  • If an HTML tag in between block delimiters matches a schema defined by a block, then this tag will be "upgraded" to the corresponding block type.
  • If a block defines a schema, then it will be used to validate the HTML content that appears inside block delimiters for that block type. If the schema does not validate against the HTML content, then the block will be "downgraded" to the core/freeform block.
  • Improve the build process of the phs library - this should contain a minimum of webpack-generated code.
  • Improve the way JSX element creation is handled for Schema tags - we shouldn't have to import the createSchemaElement variable and then tell eslint to ignore it.

How it works

This PR introduces two new libraries. First, babel-plugin-transform-jsx-flexible, inspired by this article about alternative uses of JSX syntax:

This is a drop-in replacement for babel-plugin-transform-react-jsx, with the additional feature that multiple JSX handler functions can be used in the same file.

Second, phs (Parameterized HTML Schemas):

This is a library that can be used to describe and validate chunks of HTML. It is inspired by RELAX NG, but with a few very different goals [accept placeholders for parameters; be more concise than RELAX NG; and resemble the HTML it describes].

Combining these two libraries, this PR uses a new JSX handler function (separate from React) to create a very simple schema that validates a single <hr /> tag. The schema is comprised of a single <hr /> element. As described in the phs readme, this is a shorthand syntax for <Element name="hr" /> with the benefit of resembling the HTML it describes.

Finally, this PR uses the newly added schema functionality to perform the tasks described above in "Goals of this PR".

I expect to add a lot more features to these schemas in following PRs, but as noted in the phs readme, the schema library currently contains the bare minimum that is at all useful.

(JSX is not strictly necessary for the task of creating schemas; however, I have not yet created an alternative, and I think this is a very logical and intuitive way to express criteria for validating trees of HTML content, much like RELAX NG but simpler to use for common cases.)

Goals for future PRs

A declarative, schema-based approach for parsing and serialization where the schemas also contain placeholders for serialization parameters has the potential to solve a lot of problems at once. In particular:

  • Define where block parameters are expected to appear in the markup, including their types and allowed values.
  • Store information about which branch of the schema was matched, and replace the save callback for blocks that are simple enough.
  • Using this information about how the schema was matched, preserve existing markup attributes like user-added CSS classes across deserialization and serialization.
  • Drop many of the block delimiters entirely. (Theoretically, we could now remove the block delimiters for the core/separator blocks, but this requires further discussion and is out of scope for this PR).

Caveats / Questions

This PR uses the TinyMCE-based parser because it returns a full tree of HTML tags along with the block delimiters. This is important for the functionality in this PR and for other reasons, such as detecting and recovering from invalid markup, and planning for future features like shortcode support. See also discussion at #844 (comment).

I would like to remove the TinyMCE parser, but first we need to enhance our grammar-based parser to return a full tree of HTML content like wp-post-grammar. For now, using the existing parser seems the simplest way forward.

As noted in several TODO comments in the new code introduced here, there will be some algorithmic complexity related to matching schemas against parsed HTML due to the large number of combinations. The phs schema library will also likely become pretty complicated and I will need some help developing it. For now, though, it is fairly simple and has excellent test coverage.

I am not sure the best way to accomplish the last two tasks in this PR (better package phs for consumption by another webpack build; improve the way JSX element creation is handled for Schema tags). Any tips there?

nylen added 5 commits May 25, 2017 23:21
@nylen nylen added the [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f label May 26, 2017
@nylen nylen requested review from youknowriad, mtias and aduth May 26, 2017 16:58
@nylen
Copy link
Member Author

nylen commented May 26, 2017

Commit b732903 (add failing test fixtures with <hr> tags in between blocks) happened before commit c1455a2 (upgrade content in between blocks using schemas). I am not sure why GitHub is switching them around.

@youknowriad
Copy link
Contributor

I've been thinking about this too. And I think we can't decide whether this is a good approach (simple enough to understand for block authors and addresses all our needs) unless we implement the complex use-cases. Do you have an idea, what the schema would look like for the image for example?

I'm personally afraid the JSX here will add more confusion than it solves, it's not easy to figure out that these are not React Elements when in the same file we use JSX for React Elements. What if someone, somewhere else in the code creates a Schema component?


I've been thinking we could write something like for the image schema (example).

const schema = s( 'figure', {}, [
	s( 'img', { placeholders: { src: attr( 'src' ), alt: attr( 'alt' ) } } ),
	s( 'figcaption', { placeholders: { content: children() } } ),
] );

This uses hyperscript notation to avoid the confusion with JSX, it's still concise enough IMO and I think it kind of use simpler hpq handlers (dropping the query handler).

@nylen
Copy link
Member Author

nylen commented May 27, 2017

Do you have an idea, what the schema would look like for the image for example?

I think it would look a lot like #391 (comment). Everything Andrew described there is possible, and I think it will work, but the implementation will get pretty complicated. So, before we spend a lot of time doing that, how can we decide whether this is something we want to do?

Worth noting that I looked for an existing solution for this, but didn't find anything that could handle what we need, particularly the parameter placeholders part.

I'm personally afraid the JSX here will add more confusion than it solves, it's not easy to figure out that these are not React Elements when in the same file we use JSX for React Elements.

I don't think this matters as long as we document what this is and how it works, provide good error handling, etc. I still think JSX is the way to go here, because we're going to end up with a lot of nesting, and for this task, tags are much clearer than lots of } ).

What if someone, somewhere else in the code creates a Schema component?

We can use a more specific name, for example <ParameterizedHTMLSchema>, or add a namespace, like <phs.Schema>. The latter is probably a better solution, though it will require some changes to the Babel plugin (nylen/babel-plugin-transform-jsx-flexible#2).

Though we can certainly try a non-JSX syntax without too much trouble. It needs to be a bit more sophisticated, because there will be a lot more structures than simply s( 'elementname' ).

@aduth
Copy link
Member

aduth commented Jun 1, 2017

I want to revisit this after I've had some time to digest and weigh the direction. In the meantime I've left some commentary on my internal state of conflict at #886 (comment) where similar suggestions for server-side schemas have surfaced.

@aduth
Copy link
Member

aduth commented Jun 1, 2017

The schema is comprised of a single <hr /> element. As described in the phs readme, this is a shorthand syntax for <Element name="hr" /> with the benefit of resembling the HTML it describes.

How do you envision the shorthand looking when matching attributes of a given element? Like this?

<Schema>
	<img>
		<Attribute name="src" as="url" />
	</img>
</Schema>

@nylen
Copy link
Member Author

nylen commented Jun 1, 2017

How do you envision the shorthand looking when matching attributes of a given element? Like this?

Like that, or possibly:

<Schema>
	<img src={ <Param name="url" /> } />
</Schema>

@nylen
Copy link
Member Author

nylen commented Jun 1, 2017

Though maybe what this says (and you touched on this point in #391 (comment)) is that the shorthand isn't such a good idea after all.

@nylen
Copy link
Member Author

nylen commented Aug 4, 2017

This is out of date, and subsequent PRs like #1929 and #2210 have achieved most of what I wanted to do here in a different way.

@nylen nylen closed this Aug 4, 2017
@nylen nylen deleted the add/html-schemas branch August 4, 2017 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants