Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate processors from common assets #931

Merged
merged 29 commits into from
Nov 13, 2024
Merged
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
c56faa7
migrate copy_metadata_field
jsnoble Oct 24, 2024
0e5a689
migrate count_unique
jsnoble Oct 24, 2024
1242635
fix spacing
jsnoble Oct 24, 2024
e5966ba
add drop_docs
jsnoble Oct 28, 2024
659f96c
add remove_empty_fields
jsnoble Oct 28, 2024
81d433c
add set_field_conditional
jsnoble Oct 28, 2024
6263e89
migrate filter_by_required_fields
jsnoble Oct 28, 2024
9900dac
migrate filter_by_unknown_fields
jsnoble Oct 28, 2024
649e673
migreate json_parser
jsnoble Oct 30, 2024
a3dd149
migreate date_guard
jsnoble Oct 30, 2024
85bdd92
migrate filter function
jsnoble Oct 31, 2024
c65022b
move drop docs to sample
jsnoble Oct 31, 2024
6dc94c0
update bunlded asset index to include new processors
jsnoble Oct 31, 2024
5955e5d
fix tests
jsnoble Nov 4, 2024
b19bea8
fix lint issues
jsnoble Nov 4, 2024
8fb10d0
bump asset version
jsnoble Nov 4, 2024
42baf8e
update from master
jsnoble Nov 4, 2024
ce99604
finish merge from master
jsnoble Nov 4, 2024
b4c321e
split sample to sample_exact and sample_random
jsnoble Nov 5, 2024
115fe89
fix lint issues
jsnoble Nov 5, 2024
f8bcda2
adjust range of results for the probability tests
jsnoble Nov 5, 2024
eef12f8
update package versions to use ~, update hash library
jsnoble Nov 5, 2024
dde7608
update test reanges
jsnoble Nov 5, 2024
90b4b62
initial docs for many of the processors
jsnoble Nov 7, 2024
edac052
general fixes
jsnoble Nov 12, 2024
3d44a75
fix filter_by_date tests and logic, use ms library instead of custum …
jsnoble Nov 12, 2024
eed3560
fix tests
jsnoble Nov 13, 2024
cf25dfc
update packages
jsnoble Nov 13, 2024
6e4c95d
update docs
jsnoble Nov 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
migreate json_parser
jsnoble committed Oct 30, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit 649e673d9a34e2d1ae18ef46e3069f9a88d7cd04
30 changes: 30 additions & 0 deletions asset/src/json_parser/processor.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import { BatchProcessor, DataEntity } from '@terascope/job-components';

/**
Expects an array of data entities and attempts to transform the buffer data to json
ciorg marked this conversation as resolved.
Show resolved Hide resolved
Uses the _dead_letter_queue options to handle parsing errors which are none (ignore), log, throw
or sends bad docs to a kafka topic specified in the api property of the job.
see https://terascope.github.io/teraslice/docs/jobs/dead-letter-queue#docsNav
and https://github.com/terascope/kafka-assets/blob/master/docs/apis/kafka_dead_letter.md for dead letter queue details
*/

export default class JSONParser extends BatchProcessor {
// @ts-expect-error TODO: fix this type issue
onBatch(docArray: DataEntity[]) {
return docArray.reduce<DataEntity[]>((parsedDocs, doc) => {
try {
const dataString = Buffer.from(doc.getRawData()).toString('utf8').trim();

const toJson = JSON.parse(dataString);

parsedDocs.push(DataEntity.make(toJson, doc.getMetadata()));
// TODO: fix this type issue
} catch (err: any) {
this.rejectRecord(doc.getRawData(), err.message);
}

return parsedDocs;
}, []);
}
}
14 changes: 14 additions & 0 deletions asset/src/json_parser/schema.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import { ConvictSchema, OpConfig } from '@terascope/job-components';

export default class Schema extends ConvictSchema<OpConfig> {
build() {
return {
// maybe document its an inbuilt setting?
_dead_letter_action: {
doc: 'action to take if a doc can not be transformed to JSON; accepts none, throw, log, or an api name',
ciorg marked this conversation as resolved.
Show resolved Hide resolved
default: 'log',
value: 'required_String'
}
};
}
}
85 changes: 85 additions & 0 deletions test/json_parser/processor-spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@

import { cloneDeep, DataEntity, isString } from '@terascope/utils';
import { WorkerTestHarness } from 'teraslice-test-harness';
import { OpConfig } from '@terascope/job-components';


describe('json_parser', () => {
let harness: WorkerTestHarness;

async function makeTest(config: Partial<OpConfig> = {}) {
const baseConfig = {
_op: 'json_parser',
};

const opConfig = Object.assign({}, baseConfig, config);
harness = WorkerTestHarness.testProcessor(opConfig);

await harness.initialize();

return harness;
}

afterEach(async () => {
if (harness) await harness.shutdown();
});


it('should return empty array if input is an empty array', async () => {
const harness = await makeTest();
const results = await harness.runSlice([]);

expect(results.length).toBe(0);
});

it('should parse valid json', async () => {
const data = [
{
_key: 1,
name: 'bob'
},
{
_key: 2,
name: 'joe'
},
];

const rawData = makeRawDataEntities(cloneDeep(data));

const harness = await makeTest();
const results = await harness.runSlice(rawData);

expect(results).toEqual(data);
});

it('should only return the good json', async () => {
const data = [
'somebadjson',
{
_key: 2,
name: 'joe'
},
];

const rawData = makeRawDataEntities(cloneDeep(data));

const harness = await makeTest();
const results = await harness.runSlice(rawData);

expect(results).toEqual([{ _key: 2, name: 'joe' }]);
});
});

function makeRawDataEntities(dataArray: any[]) {
return dataArray.map((doc) => {
let d = doc;
if (isString(doc)) d = {};
const entity = DataEntity.make(d, { _key: doc._key });

const buf = Buffer.from(JSON.stringify(doc), 'utf8');

entity.setRawData(buf);

return entity;
});
}
Empty file added test/json_parser/schema-spec.ts
Empty file.