Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support legacy seed in perf #1256

Merged
merged 12 commits into from
Nov 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/test-src.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4
with:
fetch-depth: '0' # Used to test `seed_tools create --revision <sha1>`

- name: npm install
run: npm install
Expand Down
50 changes: 0 additions & 50 deletions .github/workflows/test-testing-fieldtrials-gen.yml

This file was deleted.

31 changes: 24 additions & 7 deletions src/seed_tools/commands/create.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,20 @@ const compareProtobuf = (actual: Uint8Array, expectedFilename: string) => {
if (fs_sync.existsSync(expectedFilename)) {
expectedJson = fs_sync.readFileSync(expectedFilename, 'utf-8');
}
const expected = VariationsSeed.fromJsonString(expectedJson);
const expected = VariationsSeed.fromJsonString(expectedJson, {
ignoreUnknownFields: false,
});
const actualObj = VariationsSeed.fromBinary(actual);
try {
expect(actualObj).toEqual(expected);
} catch (error) {
const failedFilename = expectedFilename + '.failed';
fs_sync.writeFileSync(
failedFilename,
VariationsSeed.toJsonString(actualObj, { prettySpaces: 2 }) + '\n',
VariationsSeed.toJsonString(actualObj, {
useProtoFieldName: true,
prettySpaces: 2,
}) + '\n',
);
throw error;
}
Expand Down Expand Up @@ -111,12 +116,14 @@ describe('create command', () => {
expect(VariationsSeed.fromBinary(output).version).toEqual('1');
};

fs_sync.readdirSync(validSeedsDir).forEach((testCase) => {
it(testCase, () => runTest(testCase, undefined));
it('test1', () => runTest('test1', undefined));

// Check creating seed using git history.
it(`${testCase}_old_revision`, () => runTest(testCase, 'HEAD'));
});
// Check creating seed using git history.
it('test1_git_revision', () => runTest('test1', 'HEAD'));

// Check creating seed using git history for legacy seed.json.
it('legacy_seed', () =>
runTest('legacy_seed', '3f3eb03e12eb7f37a315f66f735d3decb483a90d'));
});

test('set seed version', async () => {
Expand Down Expand Up @@ -203,6 +210,16 @@ describe('create command', () => {
serialNumberPath,
]),
).rejects.toThrowError('process.exit(1)');

const expectedError = (
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That part was missed and the tests ignored expected_errors.txt

await fs.readFile(
path.join(testCaseDir, 'expected_errors.txt'),
'utf-8',
)
).trim();
expect(errorMock).toHaveBeenCalledWith(
expect.stringContaining(expectedError),
);
},
);
});
Expand Down
64 changes: 4 additions & 60 deletions src/seed_tools/commands/split_seed_json.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,7 @@

import { Command } from '@commander-js/extra-typings';
import { promises as fs } from 'fs';
import DefaultMap from '../../base/containers/default_map';
import { type Study } from '../../proto/generated/study';
import { VariationsSeed } from '../../proto/generated/variations_seed';
import { parseLegacySeedJson } from '../utils/legacy_json_to_seed';
import * as study_json_utils from '../utils/study_json_utils';

export default function createCommand() {
Expand All @@ -19,21 +17,8 @@ export default function createCommand() {
}

async function main(seedPath: string, outputDir: string) {
const seedJson = preprocessSeedJson(
JSON.parse(await fs.readFile(seedPath, 'utf8')),
);

// Parse the seed as protobuf json representation. The parse will fail if any
// unknown fields or values are present in the json.
const parsedSeed = VariationsSeed.fromJson(seedJson, {
ignoreUnknownFields: false,
});

const studies = new DefaultMap<string, Study[]>(() => []);
for (const study of parsedSeed.study) {
studies.get(study.name).push(study);
}

const seedContent = await fs.readFile(seedPath, 'utf8');
const { studiesMap } = parseLegacySeedJson(seedContent);
await fs.mkdir(outputDir, { recursive: true });

// Remove all files in the output directory.
Expand All @@ -45,51 +30,10 @@ async function main(seedPath: string, outputDir: string) {
}

// Write study files.
for (const study of studies) {
for (const study of studiesMap) {
const studyName = study[0];
const studyArray = study[1];
const studyFile = `${outputDir}/${studyName}.json5`;
await study_json_utils.writeStudyFile(studyArray, studyFile);
}
}

function preprocessSeedJson(json: any): any {
json.study = json.studies;
delete json.studies;

for (const study of json.study) {
if (study.experiments !== undefined) {
study.experiment = study.experiments;
delete study.experiments;
}
for (const experiment of study.experiment) {
if (experiment.parameters !== undefined) {
experiment.param = experiment.parameters;
delete experiment.parameters;
}
}
if (study.filter !== undefined) {
if (study.filter.channel !== undefined) {
study.filter.channel = study.filter.channel.map((channel: string) => {
switch (channel) {
case 'NIGHTLY':
return 'CANARY';
case 'RELEASE':
return 'STABLE';
default:
return channel;
}
});
}
if (study.filter.platform !== undefined) {
study.filter.platform = study.filter.platform.map(
(platform: string) => {
return 'PLATFORM_' + platform;
},
);
}
}
}

return json;
}
69 changes: 69 additions & 0 deletions src/seed_tools/utils/legacy_json_to_seed.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
// Copyright (c) 2024 The Brave Authors. All rights reserved.
// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this file,
// You can obtain one at https://mozilla.org/MPL/2.0/.

import DefaultMap from '../../base/containers/default_map';
import { type Study } from '../../proto/generated/study';
import { VariationsSeed } from '../../proto/generated/variations_seed';

export function parseLegacySeedJson(seedContent: string): {
parsedSeed: VariationsSeed;
studiesMap: DefaultMap<string, Study[]>;
} {
const seedJson = preprocessSeedJson(JSON.parse(seedContent));

// Parse the seed as protobuf json representation. The parse will fail if any
// unknown fields or values are present in the json.
const parsedSeed = VariationsSeed.fromJson(seedJson, {
ignoreUnknownFields: false,
});

const studiesMap = new DefaultMap<string, Study[]>(() => []);
for (const study of parsedSeed.study) {
studiesMap.get(study.name).push(study);
}

return { parsedSeed, studiesMap };
}

function preprocessSeedJson(json: any): any {
json.study = json.studies;
delete json.studies;

for (const study of json.study) {
if (study.experiments !== undefined) {
study.experiment = study.experiments;
delete study.experiments;
}
for (const experiment of study.experiment) {
if (experiment.parameters !== undefined) {
experiment.param = experiment.parameters;
delete experiment.parameters;
}
}
if (study.filter !== undefined) {
if (study.filter.channel !== undefined) {
study.filter.channel = study.filter.channel.map((channel: string) => {
switch (channel) {
case 'NIGHTLY':
return 'CANARY';
case 'RELEASE':
return 'STABLE';
default:
return channel;
}
});
}
if (study.filter.platform !== undefined) {
study.filter.platform = study.filter.platform.map(
(platform: string) => {
return 'PLATFORM_' + platform;
},
);
}
}
}

return json;
}
46 changes: 36 additions & 10 deletions src/seed_tools/utils/studies_to_seed.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ import diffStrings from '../utils/diff_strings';
import * as file_utils from '../utils/file_utils';
import * as seed_validation from '../utils/seed_validation';
import * as study_json_utils from '../utils/study_json_utils';
import { parseLegacySeedJson } from './legacy_json_to_seed';
import { validateName } from './study_validation';

export async function readStudiesToSeed(
studiesDir: string,
Expand Down Expand Up @@ -71,20 +73,44 @@ async function readStudiesAtRevision(
}> {
const basePath = wsPath('//');
studiesDir = path.relative(basePath, studiesDir);
const files = execSync(`git show "${revision}":"${studiesDir}"`, {
encoding: 'utf8',
}).split('\n');

const filesWithContent = [];
for (const file of files) {
if (!file.endsWith('.json5')) continue;
const content = execSync(`git show ${revision}:"${studiesDir}/${file}"`, {
// Validate revision format.
if (!/^[a-z0-9]+$/.test(revision) && revision !== 'HEAD') {
return {
studies: [],
studyFileBaseNameMap: new Map(),
errors: [`Invalid revision: ${revision}`],
};
}

try {
const files = execSync(`git show "${revision}":"${studiesDir}"`, {
encoding: 'utf8',
}).split('\n');

const filesWithContent = [];
for (const file of files) {
if (!file.endsWith('.json5')) continue;
if (!validateName(file, 'filename', [])) continue;

const content = execSync(`git show ${revision}:"${studiesDir}/${file}"`, {
encoding: 'utf8',
});
filesWithContent.push({ path: file, content });
}
return await readStudies(filesWithContent, false);
} catch {
console.log(`Failed to read studies ${revision}, use seed.json fallback:`);
const seedContent = execSync(`git show "${revision}":seed/seed.json`, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected calls to child_process from a function argument revision. This could lead to a command injection if the input is user controllable. Try to avoid calls to child_process, and if it is needed ensure user input is correctly sanitized or sandboxed.

Source: https://semgrep.dev/r/javascript.lang.security.detect-child-process.detect-child-process


Cc @thypon @kdenhartog

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execSync should be called with an array, instead of a string. This would avoid command injections

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execSync() doesn't support string[]. I've made a wrapper over spawnSync() to address this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, git show <rev>:<filename> has to be called in shell=True anyway. spawnSync() actually doesn't provide more security.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add a check to ensure the parameter is a git hash [a-z0-9]+ so it's safe to pass it as is with shell=True.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected calls to child_process from a function argument revision. This could lead to a command injection if the input is user controllable. Try to avoid calls to child_process, and if it is needed ensure user input is correctly sanitized or sandboxed.

Source: https://semgrep.dev/r/javascript.lang.security.detect-child-process.detect-child-process


Cc @thypon @kdenhartog

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation code has been added above.
Also we have a follow up issue for sanitizing all the args: #1250

encoding: 'utf8',
});
filesWithContent.push({ path: file, content });
const { parsedSeed } = parseLegacySeedJson(seedContent);
return {
studies: parsedSeed.study,
studyFileBaseNameMap: new Map(),
errors: [],
};
}

return await readStudies(filesWithContent, false);
}

async function readStudies(
Expand Down
24 changes: 2 additions & 22 deletions src/seed_tools/utils/study_validation.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ describe('getStudyErrors', () => {
);
});

test.each(['exp😀', 'exp<', 'exp*'])(
test.each(['exp😀', 'exp<', 'exp*', 'exp,'])(
'should error if experiment name has invalid char %s',
(experimentName) => {
const study = Study.fromJson({
Expand All @@ -131,26 +131,6 @@ describe('getStudyErrors', () => {
},
);

test('should not error if experiment name has comma', () => {
const study = Study.fromJson({
name: 'study',
experiment: [
{
name: 'experiment1,',
probability_weight: 100,
},
],
filter: {
platform: ['PLATFORM_LINUX'],
channel: ['BETA'],
},
});

expect(study_validation.getStudyErrors(study, studyFileBaseName)).toEqual(
[],
);
});

test('should error if duplicate experiment names are found', () => {
const study = Study.fromJson({
name: 'study',
Expand Down Expand Up @@ -225,7 +205,7 @@ describe('getStudyErrors', () => {
study_validation.getStudyErrors(study, studyFileBaseName),
).toContainEqual(
expect.stringContaining(
`Invalid feature name for experiment experiment: ${featureName}`,
`Invalid feature name: ${featureName} (use only 0-9,a-z,A-Z,_,-)`,
),
);
}
Expand Down
Loading