Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a bug that I discovered with @colin-alexa during work on datasets & tables. Our current implementation of
extractSlices
does not properly handle overlapping slices, and will duplicate text since it does not handle cases where slices are nested or overlap.To illustrate this, we may have a document that has slices inside of slices, which are
something very possible inside of tables:
Listen on Bandcamp
Debut is the debut studio album by Icelandic recording artist Björk as an international singer, released in July 1993 by One Little Independent Records and Elektra Entertainment. It was produced by Björk and Nellee Hooper.
It was Björk's first recording following the dissolution of her previous band, the Sugarcubes. The album departed from the rock style of her previous work and drew from an eclectic variety of styles, including electronic pop, house music, jazz, and trip-hop.
Listen on Bandcamp
Post is the second studio album by Icelandic recording artist Björk, released in 1995 in the United Kingdom by One Little Independent Records and in the United States by Elektra Entertainment.
Whereas Björk's previous album Debut (1993) was produced almost entirely by Nellee Hooper, Björk produced Post herself with co-producers including Hooper, 808 State's Graham Massey, and former Massive Attack member Tricky.
Continuing the style developed on Debut, Post is considered an important exponent of art-pop. It features an eclectic mixture of electronic and dance styles such as techno, trip-hop, IDM, and house, but also ambient, jazz, industrial, and experimental music.
Björk wrote most of the songs after moving to London and intended Post to convey the city's pace, urban culture, and underground club culture.
Here we have some captions on images, inside of tables.
When we represent this content for rendering, we will want the records in the dataset to have well structured documents, so that the first cell for album art looks like:
And the caption slice would be:
Currently, the data cell for this would have the caption slice included, which will result in duplicate text in rendering: