Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop xmlparsedata and replace it with getParseData #659

Merged
merged 35 commits into from
Feb 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
d98819c
wip: retrieve output as csv
Ellpeck Feb 15, 2024
f4ff857
refactor: use a separate function for retrieving code as csv
Ellpeck Feb 15, 2024
f32a7e8
wip: csv parsing
Ellpeck Feb 16, 2024
be89ea5
feat-fix: fixed csv parsing
Ellpeck Feb 16, 2024
12c22b7
refactor: cleaned up object parse test
Ellpeck Feb 16, 2024
8107ff9
refactor: moved csv parsing to csv folder
Ellpeck Feb 16, 2024
4276e06
refactor: cleaned up small xml inconsistencies
Ellpeck Feb 16, 2024
30d2c71
wip: retrieve output as csv
Ellpeck Feb 15, 2024
c1d4c5c
refactor: use a separate function for retrieving code as csv
Ellpeck Feb 15, 2024
148e6ad
wip: csv parsing
Ellpeck Feb 16, 2024
0986afd
feat-fix: fixed csv parsing
Ellpeck Feb 16, 2024
992b5d9
refactor: cleaned up object parse test
Ellpeck Feb 16, 2024
9d7b7fd
refactor: moved csv parsing to csv folder
Ellpeck Feb 16, 2024
34558cb
refactor: cleaned up small xml inconsistencies
Ellpeck Feb 16, 2024
0926a9f
Merge remote-tracking branch 'origin/653-drop-xmlparsedata' into 653-…
Ellpeck Feb 22, 2024
59048d8
feat: basic csv -> xml based json conversion
Ellpeck Feb 22, 2024
a447d49
refactor: murder the token map
Ellpeck Feb 22, 2024
8a1cae6
refactor: only removeTokenMapQuotationMarks from tokens
Ellpeck Feb 22, 2024
11163a0
feat-fix: relax quotes when parsing csv
Ellpeck Feb 22, 2024
5fb123e
feat-fix: there can be multiple root expressions
Ellpeck Feb 22, 2024
7dfe392
feat-fix: fix quote hell
Ellpeck Feb 22, 2024
9a886f8
refactor: only strip quotes in our code
Ellpeck Feb 22, 2024
edd4e2a
refactor: escape quotes better
Ellpeck Feb 22, 2024
90f0a69
refactor: sort children the same way as xmlparsedata did
Ellpeck Feb 22, 2024
6285305
feat-fix: only store text in xml if there is text
Ellpeck Feb 22, 2024
e24c78b
feat-fix: fixed text not being included in parsed data
Ellpeck Feb 22, 2024
0f444bc
test-fix: fixed csv parsing tests
Ellpeck Feb 22, 2024
c7b8f7a
refactor: remove xmlparsedata & package installation stuff
Ellpeck Feb 22, 2024
4eb2f41
refactor: remove debug prints
Ellpeck Feb 22, 2024
1868842
refactor: delete xml options
Ellpeck Feb 23, 2024
a7ac591
refactor: rename keys
Ellpeck Feb 23, 2024
92f9690
dep: removed unnecessary xml2js dependency
Ellpeck Feb 23, 2024
25d0c57
refactor: use legacy xmlparsedata code for parsing the statistics xml
Ellpeck Feb 23, 2024
fdf03cf
Merge branch 'main' into 653-drop-xmlparsedata
EagleoutIce Feb 23, 2024
b950109
lint-fix: handle linter braces
EagleoutIce Feb 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 0 additions & 5 deletions .github/actions/setup/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,3 @@ runs:
uses: r-lib/actions/setup-r@v2
with:
r-version: ${{ inputs.r-version }}

- name: Install R packages
if: ${{ inputs.r-version != '' }}
shell: Rscript {0}
run: install.packages("xmlparsedata", repos="https://cloud.r-project.org/")
36 changes: 0 additions & 36 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 0 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -352,7 +352,6 @@
"@types/n3": "^1.16.4",
"@types/object-hash": "^3.0.6",
"@types/tmp": "^0.2.6",
"@types/xml2js": "^0.4.14",
"@typescript-eslint/eslint-plugin": "^6.13.2",
"chai": "^4.3.10",
"chai-as-promised": "^7.1.1",
Expand Down Expand Up @@ -394,7 +393,6 @@
"ts-essentials": "^9.4.1",
"tslog": "^4.9.2",
"ws": "^8.16.0",
"xml2js": "^0.6.2",
"xpath-ts2": "^1.4.2"
}
}
2 changes: 0 additions & 2 deletions scripts/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,6 @@ LABEL author="Florian Sihler" git="https://github.com/Code-Inspect/flowr"
WORKDIR /app

RUN apk add --no-cache R
# install the xmlparsedata package because we need it
RUN Rscript -e "install.packages('xmlparsedata',repos='https://cloud.r-project.org/')"

# we keep the package.json for module resolution
COPY ./package.json ./scripts/demo.R LICENSE /app/
Expand Down
34 changes: 5 additions & 29 deletions src/benchmark/slicer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

import type {
NormalizedAst,
RParseRequestFromFile, RParseRequestFromText, TokenMap
RParseRequestFromFile, RParseRequestFromText
} from '../r-bridge'
import {
collectAllIds,
Expand Down Expand Up @@ -47,8 +47,6 @@ export const benchmarkLogger = log.getSubLogger({ name: 'benchmark' })
export interface BenchmarkSlicerStats extends MergeableRecord {
/** the measurements obtained during the benchmark */
stats: SlicerStats
/** the used token map when translating what was parsed from R */
tokenMap: Record<string, string>
/** the initial and unmodified AST produced by the R side/the 'parse' step */
parse: string
/** the normalized AST produced by the 'normalization' step, including its parent decoration */
Expand Down Expand Up @@ -86,7 +84,6 @@ export class BenchmarkSlicer {
private readonly shell: RShell
private stats: SlicerStats | undefined
private loadedXml: string | undefined
private tokenMap: Record<string, string> | undefined
private dataflow: DataflowInformation | undefined
private ai: DataflowInformation | undefined
private normalizedAst: NormalizedAst | undefined
Expand All @@ -101,10 +98,6 @@ export class BenchmarkSlicer {
'initialize R session',
() => new RShell()
)
this.commonMeasurements.measure(
'inject home path',
() => this.shell.tryToInjectHomeLibPath()
)
}

/**
Expand All @@ -114,27 +107,11 @@ export class BenchmarkSlicer {
public async init(request: RParseRequestFromFile | RParseRequestFromText) {
guard(this.stats === undefined, 'cannot initialize the slicer twice')


await this.commonMeasurements.measureAsync(
'ensure installation of xmlparsedata',
() => this.shell.ensurePackageInstalled('xmlparsedata', true),
)

this.tokenMap = await this.commonMeasurements.measureAsync(
'retrieve token map',
// with this being the first time, there is no preexisting caching!
() => this.shell.tokenMap()
)

this.stepper = new SteppingSlicer({
shell: this.shell,
request: {
...request,
ensurePackageInstalled: true
},
shell: this.shell,
request: {...request},
stepOfInterest: LAST_STEP,
criterion: [],
tokenMap: this.tokenMap
criterion: []
})

this.loadedXml = await this.measureCommonStep('parse', 'retrieve AST from R code')
Expand Down Expand Up @@ -316,8 +293,7 @@ export class BenchmarkSlicer {
stats: this.stats,
parse: this.loadedXml as string,
dataflow: this.dataflow as DataflowInformation,
normalize: this.normalizedAst as NormalizedAst,
tokenMap: this.tokenMap as TokenMap,
normalize: this.normalizedAst as NormalizedAst
}
}

Expand Down
2 changes: 0 additions & 2 deletions src/benchmark/stats/print.ts
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,6 @@ export function stats2string(stats: SummarizedSlicerStats): string {
let result = `
Request: ${JSON.stringify(stats.request)}
Shell init time: ${print(stats.commonMeasurements,'initialize R session')}
Retrieval of token map: ${print(stats.commonMeasurements,'retrieve token map')}
AST retrieval: ${print(stats.commonMeasurements,'retrieve AST from R code')}
AST normalization: ${print(stats.commonMeasurements,'normalize R AST')}
Dataflow creation: ${print(stats.commonMeasurements,'produce dataflow information')}
Expand Down Expand Up @@ -137,7 +136,6 @@ export function ultimateStats2String(stats: UltimateSlicerStats): string {
return `
Summarized: ${stats.totalRequests} requests and ${stats.totalSlices} slices
Shell init time: ${formatSummarizedTimeMeasure(stats.commonMeasurements.get('initialize R session'))}
Retrieval of token map: ${formatSummarizedTimeMeasure(stats.commonMeasurements.get('retrieve token map'))}
AST retrieval: ${formatSummarizedTimeMeasure(stats.commonMeasurements.get('retrieve AST from R code'))}
AST normalization: ${formatSummarizedTimeMeasure(stats.commonMeasurements.get('normalize R AST'))}
Dataflow creation: ${formatSummarizedTimeMeasure(stats.commonMeasurements.get('produce dataflow information'))}
Expand Down
2 changes: 1 addition & 1 deletion src/benchmark/stats/stats.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ import type { SingleSlicingCriterion, SlicingCriteria } from '../../slicing'
import type { NodeId, RParseRequestFromFile, RParseRequestFromText } from '../../r-bridge'
import type { ReconstructionResult } from '../../slicing'

export const CommonSlicerMeasurements = ['initialize R session', 'inject home path', 'ensure installation of xmlparsedata', 'retrieve token map', 'retrieve AST from R code', 'normalize R AST', 'produce dataflow information', 'run abstract interpretation', 'close R session', 'total'] as const
export const CommonSlicerMeasurements = ['initialize R session', 'retrieve AST from R code', 'normalize R AST', 'produce dataflow information', 'run abstract interpretation', 'close R session', 'total'] as const
export type CommonSlicerMeasurements = typeof CommonSlicerMeasurements[number]

export const PerSliceMeasurements = ['static slicing', 'reconstruct code', 'total'] as const
Expand Down
7 changes: 1 addition & 6 deletions src/cli/export-quads-app.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,9 @@ const options = processCommandLineArgs<QuadsCliOptions>('export-quads', [],{
})

const shell = new RShell()
shell.tryToInjectHomeLibPath()

async function writeQuadForSingleFile(request: RParseRequestFromFile, output: string) {
const normalized = await retrieveNormalizedAstFromRCode({
...request,
ensurePackageInstalled: true
}, shell)
const normalized = await retrieveNormalizedAstFromRCode(request, shell)
const serialized = serialize2quads(normalized.ast, { context: request.content })
log.info(`Appending quads to ${output}`)
fs.appendFileSync(output, serialized)
Expand All @@ -51,4 +47,3 @@ async function getQuads() {
}

void getQuads()

33 changes: 15 additions & 18 deletions src/cli/repl/commands/parse.ts
Original file line number Diff line number Diff line change
@@ -1,27 +1,25 @@
import type {
XmlBasedJson,
XmlParserConfig
} from '../../../r-bridge'
import type { XmlBasedJson} from '../../../r-bridge'
import {childrenKey} from '../../../r-bridge'
import {attributesKey, contentKey} from '../../../r-bridge'
import {
DEFAULT_XML_PARSER_CONFIG,
getKeysGuarded, RawRType,
requestFromInput
parseCSV
} from '../../../r-bridge'
import {getKeysGuarded, RawRType, requestFromInput} from '../../../r-bridge'
import {
extractLocation,
getTokenType,
objectWithArrUnwrap,
xlm2jsonObject
} from '../../../r-bridge/lang-4.x/ast/parser/xml/internal'
import type { OutputFormatter } from '../../../statistics'
import { FontStyles } from '../../../statistics'
import type { ReplCommand } from './main'
import { SteppingSlicer } from '../../../core'
import { deepMergeObject } from '../../../util/objects'
import {csvToRecord} from '../../../r-bridge/lang-4.x/ast/parser/csv/format'
import {convertToXmlBasedJson} from '../../../r-bridge/lang-4.x/ast/parser/csv/parser'

type DepthList = { depth: number, node: XmlBasedJson, leaf: boolean }[]

function toDepthMap(xml: XmlBasedJson, config: XmlParserConfig): DepthList {
function toDepthMap(xml: XmlBasedJson): DepthList {

Check warning on line 22 in src/cli/repl/commands/parse.ts

View check run for this annotation

Codecov / codecov/patch

src/cli/repl/commands/parse.ts#L22

Added line #L22 was not covered by tests
const root = getKeysGuarded<XmlBasedJson>(xml, RawRType.ExpressionList)
const visit = [ { depth: 0, node: root } ]
const result: DepthList = []
Expand All @@ -32,7 +30,7 @@
continue
}

const children = current.node[config.childrenName] as XmlBasedJson[] | undefined ?? []
const children = current.node[childrenKey] as XmlBasedJson[] | undefined ?? []
result.push({ ...current, leaf: children.length === 0 })
children.reverse()

Expand Down Expand Up @@ -86,7 +84,7 @@
}
}

function depthListToTextTree(list: Readonly<DepthList>, config: XmlParserConfig, f: OutputFormatter): string {
function depthListToTextTree(list: Readonly<DepthList>, f: OutputFormatter): string {

Check warning on line 87 in src/cli/repl/commands/parse.ts

View check run for this annotation

Codecov / codecov/patch

src/cli/repl/commands/parse.ts#L87

Added line #L87 was not covered by tests
let result = ''

const deadDepths = new Set<number>()
Expand All @@ -100,14 +98,14 @@
result += f.reset()

const raw = objectWithArrUnwrap(node)
const content = raw[config.contentName] as string | undefined
const locationRaw = raw[config.attributeName] as XmlBasedJson | undefined
const content = raw[contentKey] as string | undefined
const locationRaw = raw[attributesKey] as XmlBasedJson | undefined

Check warning on line 102 in src/cli/repl/commands/parse.ts

View check run for this annotation

Codecov / codecov/patch

src/cli/repl/commands/parse.ts#L101-L102

Added lines #L101 - L102 were not covered by tests
let location = ''
if(locationRaw !== undefined) {
location = retrieveLocationString(locationRaw)
}

const type = getTokenType(config.tokenMap, node)
const type = getTokenType(node)

Check warning on line 108 in src/cli/repl/commands/parse.ts

View check run for this annotation

Codecov / codecov/patch

src/cli/repl/commands/parse.ts#L108

Added line #L108 was not covered by tests

if(leaf) {
const suffix = `${f.format(content ? JSON.stringify(content) : '', { style: FontStyles.Bold })}${f.format(location, { style: FontStyles.Italic })}`
Expand All @@ -134,9 +132,8 @@
request: requestFromInput(remainingLine.trim())
}).allRemainingSteps()

const config = deepMergeObject<XmlParserConfig>(DEFAULT_XML_PARSER_CONFIG, { tokenMap: await shell.tokenMap() })
const object = xlm2jsonObject(config, result.parse)
const object = convertToXmlBasedJson(csvToRecord(parseCSV(result.parse)))

Check warning on line 135 in src/cli/repl/commands/parse.ts

View check run for this annotation

Codecov / codecov/patch

src/cli/repl/commands/parse.ts#L135

Added line #L135 was not covered by tests

output.stdout(depthListToTextTree(toDepthMap(object, config), config, output.formatter))
output.stdout(depthListToTextTree(toDepthMap(object), output.formatter))

Check warning on line 137 in src/cli/repl/commands/parse.ts

View check run for this annotation

Codecov / codecov/patch

src/cli/repl/commands/parse.ts#L137

Added line #L137 was not covered by tests
}
}
12 changes: 4 additions & 8 deletions src/cli/repl/server/connection.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import type { StepResults} from '../../../core'
import { LAST_STEP, printStepResult, SteppingSlicer, STEPS_PER_SLICE } from '../../../core'
import type { NormalizedAst, RShell, XmlParserConfig } from '../../../r-bridge'
import { DEFAULT_XML_PARSER_CONFIG } from '../../../r-bridge'
import type { NormalizedAst, RShell } from '../../../r-bridge'
import { sendMessage } from './send'
import { answerForValidationError, validateBaseMessageFormat, validateMessage } from './validate'
import type {
Expand All @@ -25,7 +24,6 @@ import {
} from './messages/repl'
import { replProcessAnswer } from '../core'
import { ansiFormatter, voidFormatter } from '../../../statistics'
import { deepMergeObject } from '../../../util/objects'
import { LogLevel } from '../../../util/log'
import type { ControlFlowInformation} from '../../../util/cfg/cfg'
import { cfg2quads, extractCFG } from '../../../util/cfg/cfg'
Expand Down Expand Up @@ -132,7 +130,6 @@ export class FlowRServerConnection {
}

const config = (): QuadSerializationConfiguration => ({ context: message.filename ?? 'unknown', getId: defaultQuadIdGenerator() })
const parseConfig = deepMergeObject<XmlParserConfig>(DEFAULT_XML_PARSER_CONFIG, { tokenMap: await this.shell.tokenMap() })

if(message.format === 'n-quads') {
sendMessage<FileAnalysisResponseMessageNQuads>(this.socket, {
Expand All @@ -141,7 +138,7 @@ export class FlowRServerConnection {
id: message.id,
cfg: cfg ? cfg2quads(cfg, config()) : undefined,
results: {
parse: await printStepResult('parse', results.parse as string, StepOutputFormat.RdfQuads, config(), parseConfig),
parse: await printStepResult('parse', results.parse as string, StepOutputFormat.RdfQuads, config()),
normalize: await printStepResult('normalize', results.normalize as NormalizedAst, StepOutputFormat.RdfQuads, config()),
dataflow: await printStepResult('dataflow', results.dataflow as DataflowInformation, StepOutputFormat.RdfQuads, config()),
ai: ''
Expand Down Expand Up @@ -170,9 +167,8 @@ export class FlowRServerConnection {
shell: this.shell,
// we have to make sure, that the content is not interpreted as a file path if it starts with 'file://' therefore, we do it manually
request: {
request: message.content === undefined ? 'file' : 'text',
content: message.content ?? message.filepath as string,
ensurePackageInstalled: false
request: message.content === undefined ? 'file' : 'text',
content: message.content ?? message.filepath as string
},
criterion: [] // currently unknown
})
Expand Down
2 changes: 0 additions & 2 deletions src/cli/statistics-helper-app.ts
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@ if(options.compress) {
const processedFeatures = new Set<FeatureKey>(options.features as FeatureKey[])

const shell = new RShell()
shell.tryToInjectHomeLibPath()

initFileProvider(options['output-dir'])

Expand Down Expand Up @@ -111,4 +110,3 @@ async function getStatsForSingleFile() {
}

void getStatsForSingleFile()

13 changes: 8 additions & 5 deletions src/core/print/parse-printer.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
import type { QuadSerializationConfiguration} from '../../util/quads'
import { serialize2quads } from '../../util/quads'
import { xlm2jsonObject } from '../../r-bridge/lang-4.x/ast/parser/xml/internal'
import type { XmlBasedJson, XmlParserConfig } from '../../r-bridge'
import type { XmlBasedJson} from '../../r-bridge'
import {attributesKey, childrenKey, contentKey} from '../../r-bridge'
import {parseCSV} from '../../r-bridge'
import {csvToRecord} from '../../r-bridge/lang-4.x/ast/parser/csv/format'
import {convertToXmlBasedJson} from '../../r-bridge/lang-4.x/ast/parser/csv/parser'

function filterObject(obj: XmlBasedJson, keys: Set<string>): XmlBasedJson[] | XmlBasedJson {
if(typeof obj !== 'object') {
Expand All @@ -24,11 +27,11 @@

}

export function parseToQuads(code: string, config: QuadSerializationConfiguration, parseConfig: XmlParserConfig): string{
const obj = xlm2jsonObject(parseConfig, code)
export function parseToQuads(code: string, config: QuadSerializationConfiguration): string{
const obj = convertToXmlBasedJson(csvToRecord(parseCSV(code)))

Check warning on line 31 in src/core/print/parse-printer.ts

View check run for this annotation

Codecov / codecov/patch

src/core/print/parse-printer.ts#L31

Added line #L31 was not covered by tests
// recursively filter so that if the object contains one of the keys 'a', 'b' or 'c', all other keys are ignored
return serialize2quads(
filterObject(obj, new Set([parseConfig.attributeName, parseConfig.childrenName, parseConfig.contentName])) as XmlBasedJson,
filterObject(obj, new Set([attributesKey, childrenKey, contentKey])) as XmlBasedJson,
config
)
}
Loading
Loading