Skip to content

Buffered rune reader. Rehosted from jimsmart/bufrr

License

Notifications You must be signed in to change notification settings

SteelSeries/bufrr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bufrr - a buffered rune reader

Language: Go

Synopsis

Package bufrr provides a buffered rune reader, with both PeekRune and UnreadRune. It takes an io.Reader providing the source, buffers it by wrapping with a bufio.Reader, and creates a new Reader implementing the bufrr.RunePeeker interface (an io.RuneScanner interface plus an additional PeekRune method).

Additionally, bufrr.Reader also translates io.EOF error into the invalid rune value of -1 (defined as bufrr.EOF)

Internally, bufrr.Reader is a bufio.Reader plus a single-rune peek buffer and a single-rune unread buffer.

Code Example

import (
	"github.com/SteelSeries/bufrr"
	"strings"
)

func ExampleBufrr() {

	// example input
	in := strings.NewReader("abc")

	// construct buffered rune reader
	buf := bufrr.NewReader(in)

	var err error
	var r, p rune

	// common sequence of operations when lexing an awkawrd grammar
	r, _, err = buf.ReadRune()
	// [...]
	p, _, err = buf.PeekRune()
	// [...]
	err = buf.UnreadRune()
	// [...]
}

Motivation

When writing Unicode/UTF-8 parsers/lexers/tokenizers in Go, it is preferential to work with the higher-level native rune type instead of []byte.

A common sequence of operations that a tokenizer performs on its input stream are:

  1. next (read)
  2. peek (look-ahead)
  3. backup (unread)

Requirement: a simple API providing ReadRune(), PeekRune() and UnreadRune().

  • bufio.Reader has ReadRune and UnreadRune -- but no PeekRune (has PeekBytes though). Furthermore, under certain conditions, bufio.Reader seems to have some unexpected behaviour when combining peeks with unreads.
  • scanner.Scanner is rune-based, with Read and Peek -- but no Unread.

I considered adding PeekRune() to bufio.Reader, as the easiest option. But once I got halfway through the implementation I realised there were some edge cases where things became trickier than I'd expected (due to bufio.Reader's current implementation).

I considered adding Unread() to scanner.Scanner, but decided this would introduce unnecessary complexity - plus scanner.Scanner is higher-level than needed, having additional unrequired functionality; to implement a tokenizer over the top of it would really be duplicating too much functionality.

After all this, I finally decided the easiest option was to implement a simple wrapper for bufio.Reader with the functionality I needed - it was the least amount of work I could do: my API requirement is only 3 methods.

As two of my methods are already covered by the io.RuneScanner interface, the bufrr.RunePeeker interface simply extends this with the addition of a PeekRune() method.

Why bufio.Reader? Tokenizers arguably/usually work over a buffered input stream (supporting both peek and unread implies at least a minimal amount of buffering, i.e. two runes - plus buffered I/O is generally a good thing).

An eventual end-of-file is an expected condition when parsing, lexing or tokenizing. Therefore, representing EOF as a token/marker in the rune stream, distinct from any error conditions encountered while reading the stream, is preferable, and leads to cleaner client code.

To this end, when bufrr.Reader reaches EOF, both ReadRune() and PeekRune() will return an invalid rune value of -1 (defined as bufrr.EOF), and will never return an io.EOF error.

Installation

Fetch the code:

go get github.com/SteelSeries/bufrr

Import the package into your code:

import (
	...
	"github.com/SteelSeries/bufrr"
	...
)

API Reference

See autogenerated documentation at: http://godoc.org/github.com/SteelSeries/bufrr

API Overview

Constructors

func NewReader(rd io.Reader) *bufrr.Reader
func NewReaderSize(rd io.Reader, size int) *bufrr.Reader

bufrr.Reader methods

bufrr.Reader implements all the methods of interface bufrr.RunePeeker, namely:

ReadRune() (r rune, w int, err error)
PeekRune() (r rune, w int, err error)
UnreadRune() error

bufrr.RunePeeker interface

type RunePeeker interface {
	io.RuneScanner
	PeekRune() (r rune, w int, err error)
}

Tests

To run the tests:

cd $GOPATH/src/github.com/SteelSeries/bufrr
go test

The tests could do with improvement. They only test the basic API functionality and do not test all of the edge cases. But this is not to say that the code is not fully tested, per se; it is in fact well exercised by several file parsers I have written.

Contributors

Bug reports and pull requests are most welcome!

License

This work is distributed under an MIT License (Wikipedia: MIT License) - see LICENSE file for details.

About

Buffered rune reader. Rehosted from jimsmart/bufrr

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages