-
Notifications
You must be signed in to change notification settings - Fork 21
Apache Daffodil™ Extension for Visual Studio Code: v1.3.1
The Apache Daffodil™ Extension for Visual Studio Code is an extension to the Microsoft® Visual Studio Code (VS Code) editor which enables Data Format Description Language (DFDL) syntax highlighting, code completion, and the interactive debugging of DFDL Schema parsing operations using Apache Daffodil™.
DFDL is a data modeling language used to describe file formats. The DFDL language is a subset of eXtensible Markup Language (XML) Schema Definition (XSD). Just as file formats are rich and complex, so is the modeling language to describe them. Developing DFDL Schemas can be challenging, requiring a lot of iterative development, and testing.
The purpose of Apache Daffodil™ Extension for Visual Studio Code is to ease the burden on DFDL Schema developers, enabling them to develop high quality, DFDL Schemas, in less time. VS Code is free, open source, cross-platform, well-maintained, extensible, and ubiquitous in the developer community. These attributes align well with the Apache Daffodil™ project and the Apache Daffodil™ Extension for Visual Studio Code.
DFDL is rich and complex. Developers using modern code editors expect some degree of built-in language support for the language in which they are developing, and DFDL should be no different. The Apache Daffodil™ Extension for Visual Studio Code provides syntax highlighting to improve the readability and context of the text. In addition, the syntax highlighting provides feedback to the developer indicating the structure and code appear syntactically correct.
The Apache Daffodil™ Extension for Visual Studio Code provides code completion, also known as “Intellisense”, offering context-aware code segment predictions that can dramatically speed up DFDL Schema development by reducing keyboard input, memorization by the developer, and typos.
The Apache Daffodil™ Extension for Visual Studio Code provides a Daffodil Data Parse Debugger which enables the developer to carefully control the execution of Apache Daffodil™ parse operations. Given a DFDL Schema and a target data file, the developer can step through the execution of a parse line by line, or until the parse reaches some developer-defined location, known as a break point, in the DFDL Schema. What is particularly helpful is that the developer can watch the parsed output, known as the "infoset", as it’s being created by the parser, and see where the parser is parsing in the data file. This enables the developer to quickly discover and correct issues, improving DFDL Schema development and testing cycles.
The Apache Daffodil™ Extension for Visual Studio Code provides an integrated data editor. It is akin to a hex editor, but tuned specifically for challenging Daffodil use cases. As an editor designed for Daffodil developers by Daffodil developers, features of the tool will evolve quickly to address the specific needs of the Daffodil community.
This guide assumes VS Code and a Java Runtime Environment (Java 8 or greater) are installed.
- Install VS Code
- Install Java Runtime 8 or greater
- On Linux, glibc 2.31 or greater is required
The Apache Daffodil™ Extension for Visual Studio Code can be installed using one of two methods.
Option 1: Install the Apache Daffodil™ Extension for Visual Studio Code From the Visual Studio Code Extension Marketplace
The Apache Daffodil™ Extension for Visual Studio Code is available in the Visual Studio Code Extension Marketplace. This option is recommended for most users.
Option 2: Install the Latest .Vsix File From the Apache Daffodil™ Extension for Visual Studio Code Release Page
The latest .vsix
(the file extension used for VS Code extensions) file can also be downloaded from the Apache Daffodil™ Extension for Visual Studio Code releases page and installed by either:
- Using the command-line via
code --install-extension <path-to-downloaded-vsix-file>
; or - Using the "Extensions: Install from VSIX" command from within VS Code by opening the Command Palette (Mac = Command+Shift+P, Windows/Linux = Ctrl+Shift+P), and typing
vsix
to bring up the command and pointing it at the downloaded.vsix
file.
Since DFDL Schema files end with .xsd
(XML Schema Definition or XSD), the editor needs to be informed specifically that DFDL mode is desired over the more general XML mode. The mode is selected in the status bar at the bottom of the editor window.
Auto suggest is triggered using control space
or typing the beginning characters of an item. Typing one or more unique characters will further limit the results.
📝 NOTE: Intellisense is context aware, so there is no need to begin a block with <
, just start typing the tag name and code completion will automatically handle it as appropriate.
Code completion can be used to add a schema block, with just a couple of keystrokes. Code completion can make short work out of completing a DFDL Format Block, offering context-sensitive suggestions attribute values.
The >
or /
characters are used to close XML tags. Use tab
to select an item from the drop down and to exit double quotes.
Code completion supports creating self-defined dfdl:complextypes
and dfdl:simpleTypes
.
The tab
key can be used to complete an auto-complete item within an XML tag. After auto-complete is triggered, typing the initial character or characters will limit the suggestion results. Inside an XML tag a space
or carriage return
will trigger a list of context sensitive attribute suggestions.
XPath expressions can be code completed.
- The Apache Daffodil™ Extension for Visual Studio Code uses a clunky method to auto complete curly braces within quotes. It is anticipated that this will be better addressed in the future. The auto complete method blocks suggestions while typing between the beginning quote, opening curly brace and the closing curly brace, ending quote.
Debugging a DFDL Schema Using the Apache Daffodil™ Extension for Visual Studio Code’s Bundled Daffodil Data Parse Debugger
Debugging a DFDL Schema needs both the DFDL Schema to use and a data file to parse. Instead of having to select the DFDL Schema and the data file each time from a file picker, a "launch configuration" can be created, which is a JSON description of the debugging session.
To create the launch profile:
-
Select
Run -> Open Configurations
from the VS Code menubar. This will load alaunch.json
file into the editor. There may be existingconfigurations
, or it may be empty. -
Press
Add Configuration...
and select theDaffodil Debug - Launch
option.
Once the launch.json
file has been created it will look something like this
{
"type": "dfdl",
"request": "launch",
"name": "Ask for file name",
"program": "${command:AskForProgramName}",
"stopOnEntry": true,
"data": "${command:AskForDataName}",
"infosetOutput": {
"type": "file",
"path": "${workspaceFolder}/infoset.xml"
},
"debugServer": 4711
}
This default configuration will prompt the user to select the DFDL Schema and data files. If desired, the "program" and "data" elements can be mapped specifically to the user's files to avoid being prompted each time.
📝 Note: Use ${workspaceFolder}
for files in the VS Code workspace, and use absolute paths for files outside of the workspace.
{
"type": "dfdl",
"request": "launch",
"name": "DFDL parse: My Data",
"program": "${workspaceFolder}/schema.dfdl.xsd",
"stopOnEntry": true,
"data": "/path/to/my/data",
"infosetOutput": {
"type": "file",
"path": "${workspaceFolder}/infoset.xml"
},
"debugServer": 4711
}
Using the launch profile above a DFDL parse: My Data
menu item at the top of the Run and Debug
pane (Command-Shift-D) will display. Then press the play
button to start the debugging session.
In the Terminal, log output from the DFDL debugger backend service will display. If something is not working as expected, check the output in this Terminal window for hints.
The DFDL Schema file will also be loaded in VS Code and there should be a visible marking at the beginning where the debugger has paused upon entry to the debugging session. Control the debugger using the available VS Code debugger controls such as setting breakpoints
, removing breakpoints
, continue
, step over
, step into
, and step out
.
-
Option 1:
- Open the DFDL Schema file to debug
- From inside the file open the Command Palette (Mac = Command+Shift+P, Windows/Linux = Ctrl+Shift+P)
- Once the command Palette is opened start typing
Daffodil Debug:
- Option 1 =
Daffodil Debug: Debug File
- This will allow for the user to fully step through the DFDL Schema. Once fully completed, it will produce an infoset to a file namedSCHEMA-infoset.xml
which it then opens as well. - Option 2 =
Daffodil Debug: Run File
- This will run the DFDL Schema, producing the infoset to a file namedSCHEMA-infoset.xml
.
- Option 1 =
-
Option 2:
- Open the schema file to debug
- Click the play button in the top right, two options will be provided:
- Option 1 =
Debug File
- This will allow for the user to fully step through the schema (WIP). Once fully completed, it will produce a infoset to a file namedSCHEMA-infoset.xml
which it then opens as well. - Option 2 =
Run File
- This will run the DFDL Schema, producing the infoset to a file namedSCHEMA-infoset.xml
which it then opens as well.
- Option 1 =
Find the infoset tools from the command menu (Mac = Command+Shift+P, Windows/Linux = Ctrl+Shift+P)
Find the hex view from the command menu (Mac = Command+Shift+P, Windows/Linux = Ctrl+Shift+P)
When uploading files to the mailing list, it may be easier to upload a zip file containing a TDML file, the DFDL Schema file, the input data file, and, optionally, the infoset file. Sending this file to the mailing list will allow other users to unpack your zip file and run your test case. It becomes even easier if you have multiple test cases.
To Generate a TDML file, use similar steps for Launching a DFDL Parse Debugging Session:
- Open the DFDL Schema file
- From inside the file, open the Command Palette (Mac = Command+Shift+P, Windows/Linux = Ctrl+Shift+P)
- Once the Command Palette is opened, select the
Daffodil Debug: Generate TDML
command - From there, you will be asked to provide the input data file, the TDML test case name, the TDML test case description, and the location/name for the TDML file.
Once the Daffodil Parse has finished, an infoset and a TDML file will be created. The TDML file contains relative paths to the DFDL Schema file, input data file, and infoset file. When creating an archive for these files, preserve the directory structure in the archive.
To Append a new test case to an existing TDML file, use similar steps for Generating a TDML file:
- Open the DFDL Schema file
- From inside the file, open the Command Palette (Mac = Command+Shift+P, Windows/Linux = Ctrl+Shift+P)
- Once the Command Palette is opened, select the
Daffodil Debug: Append TDML
command - From there, you will be asked to provide the input data file, the TDML test case name, the TDML test case description, and the TDML file
Once the Daffodil Parse has finished, an infoset will be created, and a test case will be added to the existing TDML file. The TDML test case name OR description can be shared between test cases, but no two test cases should share TDML test case names and descriptions. To create an archive for a TDML file with multiple test cases, the same guidelines for creating an archive from a TDML file created from a 'Generate TDML' operation should be followed. All DFDL schema files, input data files, the TDML file, and, optionally, the infosets should be added to the archive. Additionally, any directory structure should be preserved in the archive to allow for the relative paths in the TDML file to be resolved.
When running a zip archive created from another user, extract the archive into your workspace folder. If there is an infoset in the zip archive that you wish to compare with your infoset, make sure that the infoset from the zip archive is not located at the same place as the default infoset for the Daffodil Parse that will be run when executing a test case from the TDML file. This is because the Daffodil Parse run by executing the TDML test case uses the default location for its infoset and will overwrite anything that already exists there.
To Execute a test case from a TDML file, use the following steps:
- Open a DFDL Schema file
- From inside the file, open the Command Palette (Mac = Command+Shift+P, Windows/Linux = Ctrl+Shift+P)
- Once the Command Palette is opened, select the
Daffodil Debug: Execute TDML
command - From there, you will be asked to provide the TDML file, TDML test case name, and TDML test case description
A Daffodil Parse will then be launched. The DFDL Schema file and input data file to be used is determined by the selected test case in the TDML file. The infoset that is generated from this parse can optionally be compared to an infoset included in the zip archive the TDML file was extracted from.
This version of the Apache Daffodil™ Extension for Visual Studio Code includes a new Data Editor. To use the Data Editor, open the VS Code command palette and select Daffodil Debug: Data Editor
.
A notification message will appear that informs where the Data Editor will write its logs to. If problems happen, check this log file for clues.
Once the extension is connected to the server, the bottom left corner of the Data Editor shows the version of the Ωedit™ server powering the editor, and the port its connected to. Hovering over the filled circle shows the CPU load average, the memory usage of the server in bytes, the server session count, the server uptime measured in seconds, and the round trip latency measured in milli-seconds.
After selecting a file to edit, there will be a table with controls at the top of the Data Editor.
The first section of the table is called File Metrics
and it contains the path of the file being edited, its initial size in bytes, the size as the file is being edited, and the detected Content Type. When changes are committed, the Save
button will become enabled, allowing the changes to be saved to file. The Redo
and Undo
buttons will redo and undo edit change transactions that have been applied. The Revert All
button will revert all edit changes that have been applied since the file was opened. The Profile
button will open the Data Profiler
and allow profiling of all or a portion of the edited file.
The Data Profiler
allows for byte frequency profiling of all or a section of the file starting at an editable start offset and ending at an editable end offset, or an editable length of bytes. The offsets and lengths will use the chosen Address Radix
. The frequency scale can be either Linear
or Logrithmic
. The graph can have either an ASCII
overlay that appears behind the graph, or None
for no overlay behind the graph. Hover over the bars to see the byte frequency and value. The frequency data can be downloaded as a Comma Separated Value (CSV) file using the Profile as CSV
button. Click anywhere outside the Data Profiler
to close it.
📝 Note: The maximum length of bytes that can be profiled in this version is capped at 10,000,000 (10M).
The second section of the table is called Search
, and it allows for seeking to a desired offset and searching of byte sequences in the given Edit Encoding
in the edited file. The Seek
input box uses the selected Address Radix
as the seek radix. If the Edit Encoding
can be case-insensitive, a Case Insensitive
toggle (located inside the Search
input box) will be displayed allowing for that option to be enabled. The found sequences can be examined using the First
, Prev
, Next
, and Last
buttons found in this section. The search can be canceled using the Cancel
button.
Found sequences can also be replaced in the given Edit Encoding
by filling in a replacement sequence and clicking the Replace...
button.
The third section of the table is called Settings
, and it allows for setting the Display Radix
, Edit Encoding
, and Editing
mode.
The Display Radix
can be one of Hexadecimal, Decimal, Octal, or Binary, and will affect the bytes displayed in the Physical
viewport.
The Edit Encoding
can be one of Hexadecimal, Binary, ASCII (7-Bit), Latin-1 (8-bit), UTF-8, or UTF-16LE and will affect the selected bytes being edited in the Edit
viewport.
In Single Byte Edit Mode
, individual bytes may be deleted, inserted (to the left or to the right of the selected byte), and overwritten in the Single Byte Edit Window
that appears when a byte in the Physical
or Logical
viewports is clicked.
Mouseover the buttons of the Ephemeral Edit Window
to determine what each button does. Mouseover the Input Box
and it will show the byte offset position in the Address Radix
selected radix. Buttons will become enabled or disabled depending on whether there is valid input in the Input Box
or not. Values entered in the Input Box
must match the format set by the byte Display Radix
when editing bytes in the Physical
viewport or be in Latin-1 (8-bit ASCII) format when editing bytes in the Logical
viewport.
When clicking on a single byte in either the Physical
or Logical
viewports, the Data Inspector
will populate giving the value of the byte in latin-1, and various integer formats with respect to the selected endianess. The Data Inspector
will also show the byte offset position in the Address Radix
selected radix. All of the values in the Data Inspector
are editable by clicking on the value and entering a new value.
In Multiple Byte Edit Mode
, a segment of bytes is selected from either the Physical
or Logical
viewports, then the selected segment of bytes is edited in the Edit
viewport using the selected Edit Encoding
.
Now changes are made in the selected Edit Encoding
.
When valid changes have been made to the segment of bytes in the Edit
viewport, the Apply
button will become enabled.
Once editing of the selected segment is completed and is valid, the Apply
button is pressed, and the edited segment replaces the selected segment. As with changes made in Single Byte Mode
, changes in Multiple Byte Edit Mode
are also applied as edit transactions that can be undone and redone.
Byte addresses can be expressed in hexadecimal, decimal, or octal. The selected Address Radix
is also what is used entering an offset into the Offset
input and for offsets and length in the Data Profiler
. If an offset was entered in the Offset
input and the Address Radix
is changed, the offset will automatically be converted into the selected radix.
The Data Editor supports light and dark modes. The mode is determined by the VSCode theme. If the VSCode theme is set to a light theme, the Data Editor will be in light mode. If the VSCode theme is set to a dark theme, the Data Editor will be in dark mode.
The Data Editor can be navigated using the mouse or keyboard.
Clicking on the File Progress Indicator Bar
will navigate to the position in the file that corresponds to the position clicked.
Below the File Progress Indicator Bar
are a series of buttons that allow for navigating the file. The Home
button will take you to the beginning of the file, the Page Up
button will take you to the previous page of the file, the Page Down
button will take you to the next page of the file, and the End
button will take you to the end of the file. The Line Up
button will take you to the previous line of the file, and the Line Down
button will take you to the next line of the file.
The following keyboard shortcuts are available in the Data Editor:
For any input box, including the input box for Single Byte Editing Mode
, ENTER
will submit the input, and ESC
will cancel the input.
When using Single Byte Editing Mode
, CTRL-ENTER
will insert a byte to the left of the selected byte, SHIFT-ENTER
will insert a byte to the right of the selected byte ,and DELETE
will delete the selected byte.
When browsing the data in the Physical
or Logical
viewports, Home
will take you to the top of the edited file, End
will take you to the end of the edited file, Page-Up
will give you the previous page of the edited file, Page-Down
will give you the next page of the edited file, Arrow-Up
will give you the previous line of the edited file, and Arrow-Down
will give you the next line of the edited file.
- The current profiling length limit is 10,000,000 (10M) bytes.
-
In
Single Byte Editing Mode
, there is noInsert Left
button when the cursor is at the beginning of the file, and there is noInsert Right
button when the cursor is at the end of the file. There are three workarounds for this limitation:-
Instead of using the
Single Byte Editing Mode
buttons, use the keybindings (CTRL-ENTER
forInsert Left
andSHIFT-ENTER
forInsert Right
). -
Use
Insert-Right
to insert a byte next to the start of the file, then mode the cursor back to the start of the file and edit the byte. UseInsert-Left
to insert a byte next to the end of the file, then move the cursor to the end of the file and edit the byte. -
Use
Multiple Byte Editing Mode
to insert bytes at the beginning or end of the file.
-
-
In Windows, both Windows 10 & 11, if a file of size
<=
1 is selected to be loaded into the data editor it will cause a backend server failure. This server failure will not properly present the file's data and the server will not properly terminate when closing the data editor instance associated with this file.See Issue #824 for failure resolutions and more information
If problems are encountered or new features are desired, create tickets here.
If additional help or guidance on using Daffodil and its tooling is needed, please engage with the community on mailing lists and/or review the archives.
Copyright © 2023 The Apache Software Foundation. Licensed under the Apache License, Version 2.0.
Apache, Apache Daffodil, Daffodil, and the Apache Daffodil logo are trademarks of The Apache Software Foundation.