Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support legacy seed in perf #1256

Merged
merged 12 commits into from
Nov 19, 2024
11 changes: 6 additions & 5 deletions src/seed_tools/commands/create.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -111,12 +111,13 @@ describe('create command', () => {
expect(VariationsSeed.fromBinary(output).version).toEqual('1');
};

fs_sync.readdirSync(validSeedsDir).forEach((testCase) => {
it(testCase, () => runTest(testCase, undefined));
it('test1', () => runTest('test1', undefined));

// Check creating seed using git history.
it(`${testCase}_old_revision`, () => runTest(testCase, 'HEAD'));
});
// Check creating seed using git history.
it('test1_git_revision', () => runTest('test1', 'HEAD'));

// Check creating seed using git history for legacy seed.json.
it('legacy_seed', () => runTest('legacy_seed', '3f3eb03e'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls use full revision.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

});

test('set seed version', async () => {
Expand Down
65 changes: 4 additions & 61 deletions src/seed_tools/commands/split_seed_json.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,7 @@

import { Command } from '@commander-js/extra-typings';
import { promises as fs } from 'fs';
import DefaultMap from '../../base/containers/default_map';
import { type Study } from '../../proto/generated/study';
import { VariationsSeed } from '../../proto/generated/variations_seed';
import * as study_json_utils from '../utils/study_json_utils';
import { parseLegacySeedJson } from '../utils/legacy_json_to_seed';

export default function createCommand() {
return new Command('split_seed_json')
Expand All @@ -19,21 +16,8 @@ export default function createCommand() {
}

async function main(seedPath: string, outputDir: string) {
const seedJson = preprocessSeedJson(
JSON.parse(await fs.readFile(seedPath, 'utf8')),
);

// Parse the seed as protobuf json representation. The parse will fail if any
// unknown fields or values are present in the json.
const parsedSeed = VariationsSeed.fromJson(seedJson, {
ignoreUnknownFields: false,
});

const studies = new DefaultMap<string, Study[]>(() => []);
for (const study of parsedSeed.study) {
studies.get(study.name).push(study);
}

const seedContent = await fs.readFile(seedPath, 'utf8');
const { studiesMap } = parseLegacySeedJson(seedContent);
await fs.mkdir(outputDir, { recursive: true });

// Remove all files in the output directory.
Expand All @@ -45,51 +29,10 @@ async function main(seedPath: string, outputDir: string) {
}

// Write study files.
for (const study of studies) {
for (const study of studiesMap) {
const studyName = study[0];
const studyArray = study[1];
const studyFile = `${outputDir}/${studyName}.json5`;
await study_json_utils.writeStudyFile(studyArray, studyFile);
}
}

function preprocessSeedJson(json: any): any {
json.study = json.studies;
delete json.studies;

for (const study of json.study) {
if (study.experiments !== undefined) {
study.experiment = study.experiments;
delete study.experiments;
}
for (const experiment of study.experiment) {
if (experiment.parameters !== undefined) {
experiment.param = experiment.parameters;
delete experiment.parameters;
}
}
if (study.filter !== undefined) {
if (study.filter.channel !== undefined) {
study.filter.channel = study.filter.channel.map((channel: string) => {
switch (channel) {
case 'NIGHTLY':
return 'CANARY';
case 'RELEASE':
return 'STABLE';
default:
return channel;
}
});
}
if (study.filter.platform !== undefined) {
study.filter.platform = study.filter.platform.map(
(platform: string) => {
return 'PLATFORM_' + platform;
},
);
}
}
}

return json;
}
69 changes: 69 additions & 0 deletions src/seed_tools/utils/legacy_json_to_seed.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
// Copyright (c) 2024 The Brave Authors. All rights reserved.
// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this file,
// You can obtain one at https://mozilla.org/MPL/2.0/.

import DefaultMap from '../../base/containers/default_map';
import { type Study } from '../../proto/generated/study';
import { VariationsSeed } from '../../proto/generated/variations_seed';

export function parseLegacySeedJson(seedContent: string): {
parsedSeed: VariationsSeed;
studiesMap: DefaultMap<string, Study[]>;
} {
const seedJson = preprocessSeedJson(JSON.parse(seedContent));

// Parse the seed as protobuf json representation. The parse will fail if any
// unknown fields or values are present in the json.
const parsedSeed = VariationsSeed.fromJson(seedJson, {
ignoreUnknownFields: false,
});

const studiesMap = new DefaultMap<string, Study[]>(() => []);
for (const study of parsedSeed.study) {
studiesMap.get(study.name).push(study);
}

return { parsedSeed, studiesMap };
}

function preprocessSeedJson(json: any): any {
json.study = json.studies;
delete json.studies;

for (const study of json.study) {
if (study.experiments !== undefined) {
study.experiment = study.experiments;
delete study.experiments;
}
for (const experiment of study.experiment) {
if (experiment.parameters !== undefined) {
experiment.param = experiment.parameters;
delete experiment.parameters;
}
}
if (study.filter !== undefined) {
if (study.filter.channel !== undefined) {
study.filter.channel = study.filter.channel.map((channel: string) => {
switch (channel) {
case 'NIGHTLY':
return 'CANARY';
case 'RELEASE':
return 'STABLE';
default:
return channel;
}
});
}
if (study.filter.platform !== undefined) {
study.filter.platform = study.filter.platform.map(
(platform: string) => {
return 'PLATFORM_' + platform;
},
);
}
}
}

return json;
}
35 changes: 24 additions & 11 deletions src/seed_tools/utils/studies_to_seed.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ import diffStrings from '../utils/diff_strings';
import * as file_utils from '../utils/file_utils';
import * as seed_validation from '../utils/seed_validation';
import * as study_json_utils from '../utils/study_json_utils';
import { parseLegacySeedJson } from './legacy_json_to_seed';

export async function readStudiesToSeed(
studiesDir: string,
Expand Down Expand Up @@ -71,20 +72,32 @@ async function readStudiesAtRevision(
}> {
const basePath = wsPath('//');
studiesDir = path.relative(basePath, studiesDir);
const files = execSync(`git show "${revision}":"${studiesDir}"`, {
encoding: 'utf8',
}).split('\n');

const filesWithContent = [];
for (const file of files) {
if (!file.endsWith('.json5')) continue;
const content = execSync(`git show ${revision}:"${studiesDir}/${file}"`, {
try {
const files = execSync(`git show "${revision}":"${studiesDir}"`, {
encoding: 'utf8',
}).split('\n');

const filesWithContent = [];
for (const file of files) {
if (!file.endsWith('.json5')) continue;
const content = execSync(`git show ${revision}:"${studiesDir}/${file}"`, {
encoding: 'utf8',
});
filesWithContent.push({ path: file, content });
}
return await readStudies(filesWithContent, false);
} catch {
console.log(`Failed to read studies ${revision}, use seed.json fallback:`);
const seedContent = execSync(`git show "${revision}":seed/seed.json`, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected calls to child_process from a function argument revision. This could lead to a command injection if the input is user controllable. Try to avoid calls to child_process, and if it is needed ensure user input is correctly sanitized or sandboxed.

Source: https://semgrep.dev/r/javascript.lang.security.detect-child-process.detect-child-process


Cc @thypon @kdenhartog

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execSync should be called with an array, instead of a string. This would avoid command injections

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execSync() doesn't support string[]. I've made a wrapper over spawnSync() to address this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, git show <rev>:<filename> has to be called in shell=True anyway. spawnSync() actually doesn't provide more security.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add a check to ensure the parameter is a git hash [a-z0-9]+ so it's safe to pass it as is with shell=True.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected calls to child_process from a function argument revision. This could lead to a command injection if the input is user controllable. Try to avoid calls to child_process, and if it is needed ensure user input is correctly sanitized or sandboxed.

Source: https://semgrep.dev/r/javascript.lang.security.detect-child-process.detect-child-process


Cc @thypon @kdenhartog

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation code has been added above.
Also we have a follow up issue for sanitizing all the args: #1250

encoding: 'utf8',
});
filesWithContent.push({ path: file, content });
const { parsedSeed } = parseLegacySeedJson(seedContent);
return {
studies: parsedSeed.study,
studyFileBaseNameMap: new Map(),
errors: [],
};
}

return await readStudies(filesWithContent, false);
}

async function readStudies(
Expand Down
Loading
Loading