Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design a file format to store Playground site metadata across all runtimes #1659

Open
10 of 13 tasks
bgrgicak opened this issue Jul 31, 2024 · 21 comments
Open
10 of 13 tasks
Assignees

Comments

@bgrgicak
Copy link
Collaborator

bgrgicak commented Jul 31, 2024

This task is a part of the Web app redesign project.

We need to create a way to store metadata of sites available to a user.

  • Site list each site in OPFS has a metadata.json file
    • Site title
    • Site slug
    • Favicon (ideally a data url)
    • Blueprint
    • Date created
    • Date last active (used to determine the current site)
    • Logs
    • Storage (browser | none | device)
  • Add site with default settings
  • Update site
  • Delete site
  • Get site
@adamziel
Copy link
Collaborator

adamziel commented Jul 31, 2024

Let's take this opportunity to design a solid metadata format. Some questions to ponder:

  • How many JSON schemas will we need? There's site metadata, runtime setup options, Blueprints. Is it fine to have three files? Would it be useful to fold it into one?
  • Should the metadata file be exported with the site and easily reusable across different Playground apps?
  • Is there something we can reuse from the JSON format used by Studio?
  • Should it contain the PHP and WordPress version? That would duplicate what's inside the Blueprint. Or should it not contain it? But then, we'd have to update the stored copy of the Blueprint used to create a site whenever the user changes the PHP or WordPress version.

cc @brandonpayton

@brandonpayton
Copy link
Member

Let's take this opportunity to design a solid metadata format. Some questions to ponder:

This sounds like a great idea. I haven't had time to look at this today but plan to begin tomorrow the morning.

@brandonpayton
Copy link
Member

First task:

@brandonpayton
Copy link
Member

brandonpayton commented Aug 7, 2024

I considered making a simple PR starting with @bgrgicak's OPFS-reading function from the site manager view PR. But we will outgrow that interface almost immediately, and it wouldn't be better than the local function that is there now.

I started thinking about the interface we'd actually want for loading sites from a variety of sources. Here are my thoughts.

Today, we save and load sites from:

  • OPFS
  • Local FileSystem

But we would also like to be able to load sites from other sources like:

  • Git repo containing multiple sites on a single branch
  • Git repo with a site version per branch
  • List of External ZIP file URLs
  • Third-party HTTP API

Each source would define its own Load/Save operations, and since both Load and Save operations can be slow, the interface would need to support progress updates that can be reflected to users.

For Load, each source query would yield an event stream. We could read each source's stream individually or optionally compose them into a single loading event stream. (Perhaps we could use the Streams API. I don't know enough yet to know whether it can be used to yield streams of objects, but it looks like it might.)

If using the Streams API, each load event might look something like:

interface SiteListingEvent {
    value: {
        source: sourceSlug,
        // Site schema TBD
        sites: Site[];
        // This is the projected total site count
        // and could be used as the basis for progress UI.
        // This number could change as a source is being read
        // and more sites are being discovered.
        expectedTotal: number;
    }
    done: boolean;
}

Ideally, the loading streams would be cancelable.

Before loading, Playground sources would be configured via APIs like query string, Blueprints, etc.

Some possible lenses for the site sourcing and listing:

  • Combine Site Sources into one group
  • Read multiple site sources and group by source
  • Single source, a source equivalent to the "seamless" UI option

These are my thoughts, and I hope to start prototyping tomorrow.

@bgrgicak
Copy link
Collaborator Author

bgrgicak commented Aug 7, 2024

@brandonpayton This sounds like a good solution to the load/save problem we discussed a few times and I like that it would enable Playground to use more sources.

I'm unsure if this project is the right time to do it as it seems unrelated and we could add it later, but prototyping it now sounds like a good next step.

How do temporary sites fit into this interface?
Today we need to show the current temporary site in the sidebar. For example, if you open playground.wordpress.net it should show up as a temporary site.
Longterm, we would like to allow people to save temporary sites as templates. For example, I test a plugin frequently and then have a template that spins up the setup in a temporary site.
Would templates even be a good fit for the site storage?

@brandonpayton
Copy link
Member

Let's take this opportunity to design a solid metadata format. Some questions to ponder:

  • How many JSON schemas will we need? There's site metadata, runtime setup options, Blueprints. Is it fine to have three files? Would it be useful to fold it into one?
  • Should it contain the PHP and WordPress version? That would duplicate what's inside the Blueprint. Or should it not contain it? But then, we'd have to update the stored copy of the Blueprint used to create a site whenever the user changes the PHP or WordPress version.

My initial thought is that site state is site state and is separate from the idea of Blueprints. Current Playground site state can be initialized with a Blueprint and subsequently changed through manual interactions with the user. At that point, we can have a site state that has diverged far from its initial state.

It might be useful metadata to remember the Blueprint and other ingredients that went into creating a site, but unless we can represent every subsequent state change in terms of a Blueprint, expressing and sharing overall site state seems more like export and site-transfer-protocol territory.

It would be great if we can explore some of that space as part of this effort, but for the first iteration of the site metadata format, let's start with something flat and simple. Site metadata could include the initial Blueprint and WP settings but do so as historical information rather than a representation of current site state and config.

  • Should the metadata file be exported with the site and easily reusable across different Playground apps?

This seems like a good idea. Providing Playground site information with an export opens them to unforeseen possibilities. Is there a downside?

  • Is there something we can reuse from the JSON format used by Studio?

For reference, here is a sample of the per-site format used by Studio app:
(the adminPassword is fine to share as this is a throwaway local site)

    {
      "id": "89d73cb6-154f-41bb-9590-9a270edceedd",
      "name": "My Noble Website 2",
      "path": "/Users/brandon/Studio/my-noble-website-2",
      "adminPassword": "STclSmd5bm5zc0lIZHRqUyFuZGslRUM4",
      "port": 8882,
      "phpVersion": "8.1",
      "themeDetails": {
        "name": "Twenty Twenty-Four",
        "path": "/var/www/html/wp-content/themes/twentytwentyfour",
        "slug": "twentytwentyfour",
        "isBlockTheme": true,
        "supportsWidgets": false,
        "supportsMenus": false
      }
    }

So far, I don't see much here that seems good or necessary to include in our site metadata. Maybe:

    {
      "id": "89d73cb6-154f-41bb-9590-9a270edceedd",
      "name": "My Noble Website 2",
      "phpVersion": "8.1",
    }

For now, let's sculpt site metadata that makes sense for Playground and then consider how it might be useful outside of Playground.

@brandonpayton
Copy link
Member

@brandonpayton This sounds like a good solution to the load/save problem we discussed a few times and I like that it would enable Playground to use more sources.

Glad to hear it. ☺️

I'm unsure if this project is the right time to do it as it seems unrelated and we could add it later, but prototyping it now sounds like a good next step.

You're right. I started considering an interface for retrieving a list of sites along with some ideas @adamziel had mentioned about importing sets of sites from sources like Git repos, and I lost a bit of focus on the purpose of this issue.

How do temporary sites fit into this interface? Today we need to show the current temporary site in the sidebar.

An in-memory site wouldn't be retrieved from anywhere but would be treated as part of the current site list.

Probably the UI should work with sites through the redux store. There will likely be source-specific interfaces under the covers, but we can abstract those operations with redux actions and state updates.

For example, if you open playground.wordpress.net it should show up as a temporary site. Longterm, we would like to allow people to save temporary sites as templates. For example, I test a plugin frequently and then have a template that spins up the setup in a temporary site. Would templates even be a good fit for the site storage?

A template just seems like a different category of persisted site. Is there something that makes them fundamentally different?

Perhaps templates could be edited directly or perhaps not. But at least a template could be used to create a new temporary site, and that temporary site could be modified and used to create yet another template or saved as a regular, persisted site.

@brandonpayton
Copy link
Member

I'm unsure if this project is the right time to do it as it seems unrelated and we could add it later, but prototyping it now sounds like a good next step.

You're right. I started considering an interface for retrieving a list of sites along with some ideas @adamziel had mentioned about importing sets of sites from sources like Git repos, and I lost a bit of focus on the purpose of this issue.

I hope to play with the streaming idea as we go, but next, I plan to focus on providing a redux-based interaction with the site list, sculpting the site metadata format, and considering a simple interface for I/O operations that can support different site sources.

Aiming for a Draft PR for this work tomorrow.

@bgrgicak
Copy link
Collaborator Author

bgrgicak commented Aug 8, 2024

It would be great if we can explore some of that space as part of this effort, but for the first iteration of the site metadata format, let's start with something flat and simple. Site metadata could include the initial Blueprint and WP settings but do so as historical information rather than a representation of current site state and config.

Starting with a simple flat format sounds good to me.

My initial idea behind including blueprints was that they already could include all settings data so we wouldn't duplicate it.
Browser and device storage don't need the blueprint after the first run, but temporary storage will need it every time it loads.

@bgrgicak
Copy link
Collaborator Author

bgrgicak commented Aug 8, 2024

Random thought:
For browser and device storage, it would be nice to access the URL from the blueprint so that it goes to that page when the site opens.
But we couldn't use it today because the new UI doesn't allow users to edit blueprints, so it's better if we open / then open a blueprint URL every time.
Editing blueprints and figuring out how they are applied to browser and device storage could be a good thing to work on in the future.

@bgrgicak
Copy link
Collaborator Author

bgrgicak commented Aug 8, 2024

What if we reused the query API format for this? All settings are in that format, it supports blueprints and we need all that data to reconstruct a site.

bgrgicak added a commit that referenced this issue Aug 8, 2024
**Warning** This PR contains a lot of TODOs because I didn't want it to
get too large. We can decide if it's worth shipping as is or if we need
to implement some of these missing features first.

## Motivation for the change, related issues

Implements #1656

Browser storage in Playground supports having multiple sites by
switching by adding a `site-slug` query string.
This is a powerful feature that's hard to discover.

As a first step in the [Web app redesign
project,](#1655)
this PR implements switching sites in browser storage.

Other site management features like adding and deleting sites will be
added in future PRs.

![Screenshot 2024-08-01 at 12 20
55](https://github.com/user-attachments/assets/b820a040-d9a8-4eb9-aaa0-7d480385e979)
![Screenshot 2024-08-01 at 12 21
08](https://github.com/user-attachments/assets/3ace3f63-7edf-4ab5-b332-78be3ef692a9)

## Implementation details

The goal of this PR is to set the groundwork for the [Web app
redesign](#1655)
project by allowing users to switch between views.

The feature is only available while using browser storage.

The current view is now called `site-view` and the new management view
is called `site-manager`.

In this iteration, switching is done by redirecting to a URL with a
`site-manager=true` query string.
A future iteration will remove the need for reloads.

When in the site manager a list of sites is loaded from OPFS, clicking
on a site (or preview) will redirect to that site.
This is a temporary implementation that will be removed[ once we add
site
storage.](#1659)

## Testing Instructions (or ideally a Blueprint)

- Checkout this branch
- [Open Playground with browser
storage](http://127.0.0.1:5400/website-server/?storage=browser)
- Confirm that a new site manager icon is available in the upper left
corner
- Click it and confirm that the site manager loads
- [Open Playground with browser storage and a custom
slug](http://127.0.0.1:5400/website-server/?storage=browser&site-slug=test)
- Click on the site manager icon and confirm that the site manager loads
- Confirm that both sites are visible

---------

Co-authored-by: Brandon Payton <[email protected]>
@brandonpayton
Copy link
Member

@bgrgicak thanks for all your thoughts!

My initial idea behind including blueprints was that they already could include all settings data so we wouldn't duplicate it.
Browser and device storage don't need the blueprint after the first run, but temporary storage will need it every time it loads.

By "temporary storage will need it every time it loads", do you mean that we naturally have to start with a Blueprint every time we create a temporary site?

Random thought:
For browser and device storage, it would be nice to access the URL from the blueprint so that it goes to that page when the site opens.
But we couldn't use it today because the new UI doesn't allow users to edit blueprints, so it's better if we open / then open a blueprint URL every time.
Editing blueprints and figuring out how they are applied to browser and device storage could be a good thing to work on in the future.

I wonder if this is naturally heading in the direction where some things belong to site metadata and other things belong to Blueprints. There can be some overlap. Blueprints currently describe initial platform decisions and configuration and setup preferences, but once a persistent site is initialized, some of those things belong to site metadata which may be changed.

So we render an initial site with a Blueprint and then maintain separate site metadata after that.

I guess an alternative might be to just use a Blueprint as site metadata and always update it as changes are made to the site configuration. Intuitively, I'm uncomfortable with that because I think it might be conflating things that should be separate concepts, but it's something to sleep on.

What if we reused the query API format for this? All settings are in that format, it supports blueprints and we need all that data to reconstruct a site.

This is an interesting thought! My first reaction is that the query API is a kind of user interface and offers different, mutually exclusive options that may not translate well to making a clearly defined data format. But I could be mistaken. This seems like a good one to sleep on as well.

@brandonpayton
Copy link
Member

Today, I started a rough draft of site storage APIs and redux plumbing for working with stored sites, #1679.

It's basic and incomplete. It only considers OPFS, and we'll eventually need to support Local FS sites and remote site sources. But it's something to work with and sculpt into something better.

@adamziel
Copy link
Collaborator

adamziel commented Aug 9, 2024

Each source would define its own Load/Save operations, and since both Load and Save operations can be slow, the interface would need to support progress updates that can be reflected to users.

@brandonpayton good thinking. Also, some sources might only save a diff, a zip, or require streaming data in a re-entrant way for a few days. Also, we'll want to eventually support the same data sources for downloading plugins, themes etc. in the PHP Blueprints library, likely using the WIP StreamChain API. We might eventually have a PHP<->JS stream interop layer. Let's not actually implement any of that today, but let's keep it in mind for the interface design.

For Load, each source query would yield an event stream. We could read each source's stream individually or optionally compose them into a single loading event stream.

@dmsnell Thinking about WordPress core, that's a nice use-case the for the StreamChain API.

If using the Streams API, each load event might look something like:

Does done: boolean stand for the last SiteListingEvent in the stream? If so, it looks a lot like generator.next() data structure that's described by the TypeScript's IteratorYieldResult type:

interface IteratorYieldResult<TYield> {
  done?: false;
  value: TYield;
}

We could lean on that and immediately make it interoperable with generators and iterators:

type SiteListingSource = Iterator<{
    source: sourceSlug,
    // Site schema TBD
    sites: Site[];
    // This is the projected total site count
    // and could be used as the basis for progress UI.
    // This number could change as a source is being read
    // and more sites are being discovered.
    expectedTotal: number;
}>;
const listingSource = /**/ as SiteListingSource;
iterator.next(); // done, value
for(const listing of listingSource) {
    console.log( listing.sites );
}

@adamziel
Copy link
Collaborator

adamziel commented Aug 9, 2024

My initial thought is that site state is site state and is separate from the idea of Blueprints.

+1, Blueprints are just the initial site recipes. An incomplete site-to-Blueprint export is possible, but Blueprints are still a fundamentally separate concept. Like Dockerfile and a docker image.

It would be great if we can explore some of that space as part of this effort, but for the first iteration of the site metadata format, let's start with something flat and simple.

+1

This seems like a good idea. Providing Playground site information with an export opens them to unforeseen possibilities. Is there a downside?

As long as that format is designed with interop in mind and is not super specific to in-browser Playground, I don't see any downsides today.

For now, let's sculpt site metadata that makes sense for Playground and then consider how it might be useful outside of Playground.

Good call and much agreed. Let me also CC @wojtekn and @sejas.

An in-memory site wouldn't be retrieved from anywhere but would be treated as part of the current site list.

There could be a TemporaryListingSource to keep it all within a single interface and reduce special casing.

A template just seems like a different category of persisted site. Is there something that makes them fundamentally different?

A template could be either a ZIP snapshot or a Blueprint.

Starting a new site from the template would mean cloning the template and creating another site from that initial state. Working with that site would not alter the site template. However, you could explicitly choose to save the current site state as a ZIP snapshot template.

For browser and device storage, it would be nice to access the URL from the blueprint so that it goes to that page when the site opens.

Ideally I'd like to use the slug instead of the scope so that you could have stable Playground URLs, e.g. https://playground.wordpress.net/my-site-23/wp-admin/

@bgrgicak
Copy link
Collaborator Author

bgrgicak commented Aug 9, 2024

By "temporary storage will need it every time it loads", do you mean that we naturally have to start with a Blueprint every time we create a temporary site?

Yes, this is the same as today, except that the blueprint is stored in the URL and not in site storage.

So we render an initial site with a Blueprint and then maintain separate site metadata after that.

I like this. We would still need to keep the blueprint to support the reset site feature and temporary sites, but the blueprint could be immutable.

My first reaction is that the query API is a kind of user interface and offers different, mutually exclusive options that may not translate well to making a clearly defined data format. But I could be mistaken.

If you look at the features of the query API and the new UI they are mostly the same.

brandonpayton added a commit that referenced this issue Aug 22, 2024
## Motivation for the change, related issues

This is a PR to start exploring site storage. In order to start from a
concrete place, this PR starts by setting up the site manager sidebar to
interact with the sites list via redux.

Related to #1659

## Implementation details

This PR adds a `site-storage` module that provides functions for adding,
removing, and listing sites. It currently only supports writing to sites
stored in OPFS.

Today, Safari only appears to support writing to OPFS files from worker
threads, so this PR adds a `site-storage-metadata-worker.ts` module that
the UI thread can spawn to write site metadata to OPFS.

This PR adds a `siteListing` property to our web app's redux state. It
looks like:
```ts
{
    status: SiteListingStatus;
    sites: SiteInfo[];
}
```

`SiteListingStatus` can reflect whether the listing is loading, loaded,
or in an error state.

The site-manager-sidebar has been updated to select sites state from
redux and interact with the sites list via redux actions. Currently, the
loading status is ignored, but we should show it to the user in a
follow-up PR.

## Testing Instructions (or ideally a Blueprint)

Run `npm run dev` and interact with the sites list in Chrome and Safari.
(Unfortunately, Firefox does not yet support loading Service Worker
modules, and that feature is required for our current dev setup)
@brandonpayton
Copy link
Member

Ideally I'd like to use the slug instead of the scope so that you could have stable Playground URLs, e.g. https://playground.wordpress.net/my-site-23/wp-admin/

I really like this idea.

@brandonpayton
Copy link
Member

There are a few items left here:

  • Favicon (ideally a data url)
  • Date last active (used to determine the current site)
  • Logs

For the logs, I wonder if we could make it so entries are appended to the log file(s) rather than representing the whole file as JSON and having to re-serialize and write the whole thing each time. Maybe each log entry could be a line of JSON, so that adding a new entry just means writing an additional line to the log file.

@bgrgicak
Copy link
Collaborator Author

bgrgicak commented Oct 1, 2024

For the logs, I wonder if we could make it so entries are appended to the log file(s) rather than representing the whole file as JSON and having to re-serialize and write the whole thing each time. Maybe each log entry could be a line of JSON, so that adding a new entry just means writing an additional line to the log file.

Yes! My initial idea was to store logs in JSON and use a subset of the OpenTelemetry logging standard for the format.
It introduced some extra work in the first phase of the project, so we ended up just using raw logs.

When we collect Playground logs (JS, WASM) it should be easy to format them as JSON. Until recently we missed support for parsing debug.log but we have it working reliably in the Playground tester, so it's just a matter of adding the parser.

I feel like this is a good time to work on it. We currently have a few tasks related to compatibility and it would be great if we would crate a long term format for logs before working on it.

@adamziel
Copy link
Collaborator

adamziel commented Oct 7, 2024

if we could make it so entries are appended to the log file(s)

Makes sense, with the caveat that we'd need something to avoid losing logs if Playground crashes. Maybe write to a log file AND an in-memory buffer?

it would be great if we would crate a long term format for logs before working on it.

Ideally we could use the same format as WordPress. That way all the tools would be immediately compatible with a regular WordPress site that already collects logs.

@adamziel adamziel changed the title Site storage File format to store Playground site metadata across all runtimes Oct 7, 2024
@adamziel adamziel changed the title File format to store Playground site metadata across all runtimes Design a file format to store Playground site metadata across all runtimes Oct 7, 2024
@bgrgicak
Copy link
Collaborator Author

bgrgicak commented Oct 8, 2024

Makes sense, with the caveat that we'd need something to avoid losing logs if Playground crashes. Maybe write to a log file AND an in-memory buffer?

We already do this today. All logs are stored in-memory. The PHP error log is just internal storage for PHP logs and the logger pulls PHP logs from there.

Ideally we could use the same format as WordPress. That way all the tools would be immediately compatible with a regular WordPress site that already collects logs.

We already do that. PHP logs are formated like that out of the box and we format JS logs to match PHP logs.

This approach allowed us to parse Playground logs in the Tester without the need to separately parse JS and PHP logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

3 participants