ted now has a live demo! Try it out here.
Once, I was presented with an the following file (abridged)
INFO:2024-12-07 13:01:40:Trace:198d079c-af9a-45b2-8236-7fbb2a012f69:Starting Procedure foo
ERROR:2024-12-07 13:01:41:Trace:198d079c-af9a-45b2-8236-7fbb2a012f69:Error 1
INFO:2024-12-07 13:01:41:Trace:198d079c-af9a-45b2-8236-7fbb2a012f69:Ending Procedure foo
INFO:2024-12-07 13:01:41:Trace:198d079c-af9a-45b2-8236-7fbb2a012f69:Starting Procedure bar
INFO:2024-12-07 13:01:41:Trace:198d079c-af9a-45b2-8236-7fbb2a012f69:Error 2
INFO:2024-12-07 13:01:41:Trace:198d079c-af9a-45b2-8236-7fbb2a012f69:Success
INFO:2024-12-07 13:01:42:Trace:198d079c-af9a-45b2-8236-7fbb2a012f69:Ending Procedure bar
INFO:2024-12-07 13:01:42:Trace:30019fff-7645-4d07-9fc4-0bbb39aa09db:Starting Procedure foo
INFO:2024-12-07 13:01:42:Trace:30019fff-7645-4d07-9fc4-0bbb39aa09db:Success
INFO:2024-12-07 13:01:42:Trace:30019fff-7645-4d07-9fc4-0bbb39aa09db:Ending Procedure foo
INFO:2024-12-07 13:01:43:Trace:30019fff-7645-4d07-9fc4-0bbb39aa09db:Starting Procedure bar
ERROR:2024-12-07 13:01:43:Trace:30019fff-7645-4d07-9fc4-0bbb39aa09db:Error 3
ERROR:2024-12-07 13:01:43:Trace:30019fff-7645-4d07-9fc4-0bbb39aa09db:Error 4
INFO:2024-12-07 13:01:44:Trace:30019fff-7645-4d07-9fc4-0bbb39aa09db:Ending Procedure bar
I wanted only the errors that did not have a success in the procedure. In this case, we should only get Errors 1,3,4
ERROR:2024-12-07 13:01:41:Trace:198d079c-af9a-45b2-8236-7fbb2a012f69:Error 1
ERROR:2024-12-07 13:01:43:Trace:30019fff-7645-4d07-9fc4-0bbb39aa09db:Error 3
ERROR:2024-12-07 13:01:43:Trace:30019fff-7645-4d07-9fc4-0bbb39aa09db:Error 4
I created an awk
program to keep track of things and get the correct output. But I thought it was easier to express what I wanted as a state machine. Thus was born ted
, a language for specifying state machines and using them to process files.
An equivalent ted
program:
startstate: /Starting.Procedure/ -> capture_begin
capture_begin: {
start capture
-> lookforsuccessorending
/Success/ -> startstate
}
lookforsuccessorending: /Success/ -> startstate
lookforsuccessorending: /Ending.Procedure/ {
stop capture
print
-> startstate
}
Requires go 1.22
git clone [email protected]:ahalbert/ted.git
cd ted
go install
You can build the code using
make build
make test
Given the input:
baz
foo
baz
bar
baz
And you only want to edit the final baz
into bang
, use this command:
$ echo "baz\nfoo\nbaz\nbar\nbaz" | ted '/foo/ -> /bar/ -> do s/baz/bang/'
Results in:
baz
foo
baz
bar
bang
Given the input:
DO NOT PRINT THIS LINE
baz - DO NOT PRINT THIS EITHER
foo
bar
baz - DO NOT PRINT THIS EITHER
DO NOT PRINT THIS LINE
And you only want to print what's between the baz
s
$ ted -n 'stop:/baz/ -> start start:/baz/ -> 1 start: print' < file.txt
Results in:
foo
bar
Given the input:
beep
boop
buzz
cheater
beep
boop
cheater
And you want to modify cheater
to nose
only if you see a beep and buzz, but if there's a buzz
, start looking for /beep/
again
$ ted '/beep/ -> {/boop/ -> /buzz/ -> 1} {do s/cheater/nose/ /buzz/ -> 1}' < file.txt
Results In:
beep
boop
buzz
cheater
beep
boop
nose
Capturing enables you to read input into a variable rather than printing it on the screen.
Given the input:
beep
boop
foo
bar
baz
buzz
You can capture one line as so:
1: /beep/ ->
2: {capture mycapture -> }
3: do s/THIS.IS.CAPTURED/CAPTURED/ mycapture
3: /buzz/ ->
4: print mycapture
This program removes the boop
, captured into the variable $_
:
beep
boop
foo
buzz
CAPTURED
bar
CAPTURED
baz
Given the input:
beep
boop - CAPTURED
foo - CAPTURED
bar - CAPTURED
baz
buzz
And running this ted
program with --no-print
option:
/beep/ ->
/boop/ {start capture ->}
/baz/ {stop capture print -> 1}
Yields:
boop - CAPTURED
foo - CAPTURED
bar - CAPTURED
You can store capture groups in a variable and refer to them later.
Given the input:
beep
boop
i want these and those
foo
bar
baz
buzz
And this program with the --no-print
option:
/i.*want.(these).and.(those)/ ->
{println $1 println $2 ->}
Yields:
these
those
You can rewind or fast-forward the input file to any point matching /regex/
Given the file:
beep
boop
buzz
foo
bar
baz
{ capture fastforward /buzz/ -> }
/baz/ { rewind /beep/ -> }
Usage: ted [--fsa-file FSAFILE] [--no-print] [--debug] [--var key=value] [PROGRAM [INPUTFILE [INPUTFILE ...]]]
Positional arguments:
PROGRAM Program to run.
INPUTFILE File to use as input.
Options:
--fsa-file FSAFILE, -f FSAFILE
Finite State Autonoma file to run.
--no-print, -n Do not print lines by default.
--debug Provides Lexer and Parser information.
--var key=value Variable in the format name=value.
--help, -h display this help and exit
ted consists of states, which contain actions. During each execution, ted
will:
- Read a line from the input.
- Execute each action for that state in the order parsed
- If an action requires it to move state, stops executing actions and moves to the next line.
- Prints a line unless
--no-print
or capturing is on.
[<statename>:] Action
Binds the Action to the state statename
. If a state is not specified, it is an Anonymous State, and assigned a name from 1..N, incrementing each time a new state is created. Multiple actions in a statement can be combined using { }
. If you want to specify multiple different rules for the same state, use ,
Various actions can be specified in a state:
let variable = expression
Assigns variable
to expression
Supports addition, subtraction, multiplication and division. Attempts to coerce strings to integers when doing math.
/<regex>/ Action
Perform Action
if the current line matches regex. If capture groups are used, you may assign them to variables using $0, $1, $2...
do s/sed/command/g [variable]
Execute sed
command on variable
. If no variable
is specified, assumes the current line or capture.
dountil s/sed/command/g [variable] Action
Perform sed command, and run Action on successful substitution
-> [statename]
Change current state to statename
. If a state is not specified, assumes the next state listed in the program. If this is the last state, goes to state "0".
-->
Transitions state to start state.
{ Action... }
Runs all the actions between the {
and }
.
print [variable]
Prints variable
. If a variable is not specified, uses $_
which can be the current line or capture.
println [variable]
Prints variable
with a newline. If a variable is not specified, uses $_
which can be the current line or capture.
[start|stop] capture [variable]
Starts/Stops capturing to variable
. When capturing is started, input lines are redirected to . If variable is not specified, defaults to $_
. If start|stop
is not given, only captures the current line.
rewind|fastforward /regex/
Moves the head backwards/forward to the first line matching regex
. Stops if it hits the beginning, or halts if it hits the end of file.
if BoolExpr Action [else Action]
Executes action
only if the condtion in BoolExpr is true. An optional else clause is also possible.
Special pre-defined states exist as well.
BEGIN
: Actions in this state are run once before consuming any input. Transitioning will stop executing the action.
END
: Actions in this state are run after all input is consumed. Cannot transition in this state.
ALL
: Actions that are run after every state, even if state transitioned during that cycle. Does not apply to BEGIN
and END
$_
The default variable used by arguments. At the beginning of an iteration, stores the current line in$_
unless it is being used to capture.$@
Contains the original line read in during the iteration.$0
Contains the matched text of the last regex compared.$1..$N
Contains the first to N capture groups in the last regex compared
Feedback is always appreciated, you can contact me at armand (dot) halbert (at) gmail.com