Skip to content

Apache Daffodil™ Extension for Visual Studio Code: Roadmap

Davin Shearer edited this page Oct 3, 2023 · 28 revisions

Apache Daffodil™ Extension for Visual Studio Code: Roadmap


The Future of the Apache Daffodil™ Extension for Visual Studio Code

While the most recent release of the Apache Daffodil™ Extension for Visual Studio Code focused on the schema and the infoset, the theme of the next version will place additional emphasis on the input data. The input data could be any kind of file, with different byte sizes, byte ordering, and alignments, so having robust hex editing capabilities is important.

It is also important to have the ability to set breakpoints not only in the schema, but also in the data, and allow for manipulating the data and watch it affect the parse outcome. In other words, what happens to the parse when the data changes in some way. While stepping through the debugger, the schema, the infoset, and the data views need to be kept in sync.


Desired Features of the Input Data Editor

For organizational purposes, the desired features for the Apache Daffodil™ Extension for Visual Studio Code are broken down into eight functional areas.

1. File Type Support (FTS)

1.1 The data editor needs to support any fixed length (non-streaming) file Daffodil is capable of opening. Generally, any file type can be opened and displayed by a hex editor. The file type and extension do not influence the rendering of the file in hex or binary formats.

2. User Interface (UI)

2.1 The data editor needs to be responsive and provide a good VS Code User Experience. Existing third-party VS Code hex editors will decrease in responsiveness while rendering medium to large size files. The editor will handle file sizes common to Daffodil without impacting overall usability.

2.2 The data editor needs to be designed as a composition of display panels that allow for multiple data representations to be rendered on the same screen. A data file may be segmented into multiple representations of data, from differing on byte boundaries to endianness. The editor will render differing representations within the same user interface.

2.3 The data editor needs to allow individual display panels to maintain their own position in the data to allow viewing different segments of data in different display panels. The editor will manage each composable view as a separate Viewport capable of displaying a view into the data at a specified offset and capacity.

2.4 The data editor viewports need to be interactive to allow mouse and keyboard interactions such as scrolling and context menus. User interaction will drive the function of the editor as such the ability to interpret keyboard and mouse actions on individual and block data selections are critical.

2.5 The data editor needs to include a Properties View component. The property view will provide a static region on the display to place file and selection metadata. The property view is not associated to a specific region in the file, so it is not a viewport component. It is tied to events such as selection events and is updated based on notification of events occurring.

2.6 The data editor needs to include a property display mode for a single unit selection. The Properties View will allow multiple representations for a single unit, eg byte, to be displayed simultaneously.

2.7 The data editor needs to include a property display mode for multiple unit selection. Selecting up to some limit of bytes, for example four, could still be rendered in the Properties View. For example, selecting four bytes could render a 32-bit integer value.

3. Persisting Edits (PER)

3.1 The data editor needs to allow edits to be saved as a new file. The editor will not attempt to write the file that is held open by Daffodil. Instead, a copy of the file will be written to disk.

3.2 The data editor needs to provide an auto-incremented file revision number to save without prompting the user. When saving edits to a file it may be preferrable for the save-as-new-file to be transparent to the user. In this case the user will not be prompted for a file name but instead use an autogenerated name.

3.3 The data editor needs to provide a save-as option to name a new file. When saving edits to a file the user may want to specify where the edited file will be saved. In this case a file picker dialog or something similar can be used to allow the user to specify the location for the save file.

3.4 The data editor will provide a convenient way of restarting the Daffodil debugger with the specified edits. After saving the edits to a file the debugger can be restarted and automatically set to use the new files path as the input. This convenience allows the user to avoid editing their launch profile to point to the new file.

4. Data Representations (DATAREP)

Hex and binary representations for both viewing and editing.

4.1 The data editor needs to implement support for multiple data representations. The editor will use the viewport component design to deliver a composable multiple representation rendering capability.

4.2 The data editor needs to provide a viewport for viewing byte delimited data. The viewport will display hex bytes similar to the common hex editor displayed.

4.3 The data editor needs to provide a viewport for viewing data as individual bits. The viewport will render binary 1-0 display. The details of the rendering such as unit length can be modified using properties associated with the viewport.

4.4 The data editor needs to provide configurable rendering properties for any given representation. The UI will allow the user to view and edit viewport properties

4.5 The data editor needs to provide configurable endianness properties for viewport rendering. Configuring big or little endian for a viewport.

4.6 Ability to represent data where MSB or LSB bit can be the first bit displayed. Ability to view and edit bytes represented in binary where the most significant bit can be the first bit of the byte, or the last bit of the byte.

5. Editing (EDT)

5.1 The data editor needs to implement inline editing within a viewport. The viewport will support mouse and keyboard interaction to initiate editing a value.

5.2 The data editor needs to default to editing in the same representation as the view. The editor will allow editing using the same viewport rendering as the representation, e.g., hex from hex, binary from binary can be represented using the native rendering logic of the viewport.

5.3 The data editor needs to provide undo / redo capability related to edits. A common expectation of editors such as this would be to provide commands to undo and redo edits that have been made.

5.4 The data editor needs to provide editing in differing representations as the view. The editor could provide something similar to a pop-out component that allows editing a value in a format that differs from the viewport representation, e.g., editing binary from the hex view.

6. Debugger integration (DBG)

6.1 The debugger needs to provide extension points which allow executing debug commands from the editor. There are certain non-standard operations such as setting breakpoints on data locations that are to be supported. This will require the debugger to provide extension points that allow the editor to pass instructions that augment the debugger flow.

6.2 The debugger will support breakpoints to be set at data positions in the input file. Setting breakpoints on data locations indicates to the debugger that when the input stream reaches a specified point in the file it will break execution as if it hit a code breakpoint.

6.3 The data editor will allow breakpoints to be set at data positions in the input file. The data editor will allow creation of and then render data breakpoints in a similar way to how code breakpoints are set and rendered.

6.4 The data editor will support starting debug from a specified position. The editor provides a function via a context menu that indicates a starting point in the file for the input stream. This will drop all bytes prior to this location when starting the debug.

6.5 The data editor will support stopping debug at a specified position. The editor provides a function via a context menu that indicates the stopping point in the input stream. All data after this point will be ignored by the input stream, ending the debug at the specified point.

6.6 The debugger will support the latest version of Apache Daffodil™ released. The extension will be kept up to date with the latest version of Apache Daffodil™.

7. Editing Commands (CMD)

In this section a “block” is defined as a range that has been selected by the user.

7.1 The data editor needs to support adding individual bytes. The editor will provide a function to insert a single byte at a position in the file.

7.2 The data editor needs to support adding blocks of bytes. The editor will provide a function to insert multiple bytes starting at a position in the file.

7.3 The data editor needs to support deleting individual bytes. The editor will provide a function to delete a single byte from the file.

7.4 The data editor needs to support deleting blocks of bytes. The editor will provide a function to delete blocks of bytes from the file.

7.5 The data editor needs to support modifying the value of an individual byte. The editor will provide a function to overwrite the value of a byte in the file.

7.6 The data editor needs to support modifying the value of a block of bytes. The editor will provide a function to overwrite the value of a block of bytes in the file.

7.7 The data editor needs to support copying byte(s). The editor will provide the ability to select and copy a range of bytes to the clipboard for convenience and interoperability. The size of bytes that can be copied will need an upper limit depending on the file size and system memory availability.

7.8 The data editor needs to support pasting byte(s). The editor will provide the ability to past bytes from the system clipboard into the file at a specified position for convenience and interoperability.

7.9 The data editor needs to support searching for patterns. The editor will provide a search function similar to a text editor find text using literal text. This pattern would literally be searched for in each given representation.

7.10 The data editor needs to support replacing search results with new patterns. The editor will provide a search function similar to a text editor find text using literal text and replace the found text with alternate text. This pattern would literally be searched for in each given representation and replaced using text that is valid within said representation.

7.11 The data editor needs to use the native clipboard provided by the operating system for interoperability with other applications. The editor will use the operating system clipboard for copy and paste operations to improve interoperability with other applications.

7.12 The data editor needs to support applying a bit mask to an individual byte. The editor will provide function to apply a mask to a byte at a position in the file.

7.13 The data editor needs to support applying a bit mask to a block of bytes. The editor will provide a function to apply a mask to a selection of bytes in the file.

8. Test Data Markup Language integration (TDML)

8.1 All external files needed by the TDML file will be incorporated as relative paths into the TDML file.

8.2 TDML features need to be as modular as possible. Modularization allows for the future removal of TDML from the repository of the DFDL extension and addition to a library that can be shared by the DFDL repository.

8.3 TDML features need to be written in Scala and will read/write XML by using XML bindings (e.g., Jaxb/scalaxb).

8.4 The extension needs to provide an item in the command palette (ctrl + shift + p) for ‘Generate TDML File’.

Selecting this command will display menus allowing the user to select the following:

  • TDML File Name
  • Name for the test case
  • Description for the test case
  • DFDL Schema
  • Data Document

This selection will work in the same way as the DFDL debugger. If the user selects the command from a DFDL Schema, it will automatically use that in place of a selection.

  • The TDML File will be created in the workspace directory.
  • The DFDL Schema and Document files will be file names only.
  • These file names will be relative to the workspace directory. It will be the responsibility of the user to organize everything when creating a TDML file and to package the files up for distribution.
  • The name of the TDML file will be the name of the DFDL schema used with ‘.tdml’ appended to the end.

8.5 The extension needs to provide an item in the command palette (ctrl + shift + p) for ‘Add Test Case to TDML File’.

Selecting this command will display menus allowing the user to select the following:

  • TDML File Name
  • Name for the test case
  • Description for the test case
  • DFDL Schema
  • Data Document

This selection will work in the same way as the DFDL debugger. If the user selects the command from a DFDL Schema, it will automatically use that in place of a selection.

8.6 The extension needs to provide an item in the command palette (ctrl + shift + p) for ‘Run Test Case in TDML File’.

Selecting this command will display menus allowing the user to select the following:

  • TDML File Name
  • Test Case to run (this list will be populated with data in the selected TDML File)

This command will start the Daffodil process in run mode. This command will provide an option to start the Daffodil process in debug mode. The location of the DFDL Schema is expected to be relative to the location of the TDML File. It will be the responsibility of the user who created the TDML file to ensure that packaging of their TDML file is correct.

IntelliSense Auto Completion (INT)

9.1 The extension needs to provide context sensitive auto completion suggestion (IntelliSense) based on the DFDL language.

9.2 The IntelliSense suggestions for attributes needs to supply an appropriate list of choices where applicable.

9.3 The IntelliSense for element tags needs to supply attribute appropriate for that specific tag.

9.4 The IntelliSense for element tags needs to supply attribute suggestions for newly insert tags as well as editing existing tags.

9.5 The IntelliSense needs to supply suggestions based on the contextual cursor position.

9.6 The IntelliSense suggestions need to work when multiple tags are on a single line as well as when each tag is on a single line.

9.7 IntelliSense needs to supply a closing tag when a closing tag is missing.

9.8 IntelliSense suggestions need to work when attributes are split on multiple lines.

DFDL Schema Syntax Colorization (SYN)

10.1 Provide DFDL syntax colorization.

10.2 Matching tags within the dfdl schema need to be highlighted.

10.3 XPath expressions embedded within dfdl schema should be highlighted.


Release Plan (Proposed)

The goal is to have these Apache Daffodil VS Code Extension capabilities incrementally released, and published to the Marketplace every few months.

The following table will be updated as new releases are published, or the themes/emphasis of a release change.

However, this is all highly subject to change based on the needs of the user community, and on what community developers choose to work.

The semantic versioning release identifications are also subject to change.

Release Published to Marketplace? Description Issues
1.1.0
Target: July, 2022
✅ Yes UI wireframes showing a vision of the data editor has been posted for discussion and feedback. The main editing viewport now has support for the delete and insert editing primitives in addition to overwrite. Support for multiple viewports, being able to undo and redo changes, cut and paste, and file saving are implemented. Issues
1.2.0
Target: December, 2022
✅ Yes Search and replace is implemented. Full-stack testing is in place. Issues
1.3.0
Target: July, 2023
✅ Yes Improvements to DFDL auto-completion (aka, "Intellisense"). Basic support for TDML. Editing is permitted in any of several viewports. Each viewport can display data in different formats (e.g, binary, hex, ascii, big and little endian integers). Issues
1.3.1
Target: August, 2023
✅ Yes Refinement of DFDL auto-completion (aka, "Intellisense"), Data editor large file support, mode simplification, incremental search and replace, updates to views and selections, multitasking support, data profiler, content discovery and editing additions Issues
1.4.0
Target: November, 2023
❌ Not yet Unicode detection and profiling, language guessing, adjustable viewports, additional data display features, segment saving to file, streaming transforms MVP, breakpoints can be set at data offsets and debugging can start and stop at specified offsets. Issues

Beyond 1.4.0:

Support for:

  • A Properties View component.
  • Automated checkpoints.
  • Transformations of a byte range (with checkpoints allowing undo/redo).
  • Additional encodings in the data editor.

More to come...