Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Call noise correction python binary from kotlin. #1803

Merged
merged 30 commits into from
Oct 11, 2024

Conversation

ple13
Copy link
Contributor

@ple13 ple13 commented Sep 12, 2024

No description provided.

@wfa-reviewable
Copy link

This change is Reviewable

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 9 files at r1, 3 of 4 files at r2, all commit messages.
Reviewable status: 7 of 9 files reviewed, 4 unresolved discussions (waiting on @kungfucraig and @ple13)


experimental/dp_consistency/src/main/kotlin/tools/BUILD.bazel line 7 at r2 (raw file):

    srcs = ["CorrectOriginReport.kt"],
    data = [
        "//experimental/dp_consistency/src/main/python/tools:correct_origin_report",

This needs to be in resources rather than data so it gets embedded in the final JAR. This means you need to use ClassLoader.getJarResourcePath from common-jvm and copy the file out to a temp location on the local file system before running.


experimental/dp_consistency/src/main/kotlin/tools/CorrectOriginReport.kt line 19 at r2 (raw file):

import java.util.logging.Logger

class CorrectOriginReport {

Add KDoc to all public symbols


experimental/dp_consistency/src/main/kotlin/tools/CorrectOriginReport.kt line 20 at r2 (raw file):

class CorrectOriginReport {
  fun correctReport(inputPath: String, unnoisedEdp: String, outputPath: String) {

Use specific types, e.g. Path or File.

Code quote:

String

experimental/dp_consistency/src/test/kotlin/tools/CorrectOriginReportTest.kt line 30 at r2 (raw file):

    val reportCorrector: CorrectOriginReport = CorrectOriginReport()
    val outputPath =
      "experimental/dp_consistency/src/test/kotlin/tools/example_origin_corrected_report.xlsx"

I thought we were operating on the JSON serialization of a Report message?

Copy link
Contributor Author

@ple13 ple13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 9 files reviewed, 4 unresolved discussions (waiting on @kungfucraig and @SanjayVas)


experimental/dp_consistency/src/main/kotlin/tools/BUILD.bazel line 7 at r2 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

This needs to be in resources rather than data so it gets embedded in the final JAR. This means you need to use ClassLoader.getJarResourcePath from common-jvm and copy the file out to a temp location on the local file system before running.

I've replaced data with resources. The code still works without ClassLoader.getJarResourcePath. I tried to load the binary with ClassLoader.getJarResourcePath, but keep seeing the error "invalid ELF headler". I had a look at this PR and notice that there is the java_native_libraries rule to handle native library, but it didn't work for the python binary. I wonder if we need to have new rules for python to make it work with ClassLoader.getJarResourcePath?


experimental/dp_consistency/src/main/kotlin/tools/CorrectOriginReport.kt line 19 at r2 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Add KDoc to all public symbols

Done.


experimental/dp_consistency/src/main/kotlin/tools/CorrectOriginReport.kt line 20 at r2 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Use specific types, e.g. Path or File.

Done.


experimental/dp_consistency/src/test/kotlin/tools/CorrectOriginReportTest.kt line 30 at r2 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

I thought we were operating on the JSON serialization of a Report message?

This will be in the next PR. This PR is to load a python binary and runs it in kotlin.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 9 files at r1, 3 of 3 files at r3.
Reviewable status: 7 of 9 files reviewed, 4 unresolved discussions (waiting on @kungfucraig and @ple13)


experimental/dp_consistency/src/main/kotlin/tools/BUILD.bazel line 7 at r2 (raw file):

I've replaced data with resources. The code still works without ClassLoader.getJarResourcePath

I'm willing to bet this would fail after a bazel clean, as removing it from data means that it is no longer copied into the runfiles tree.

I had a look at this PR and notice that there is the java_native_libraries rule to handle native library, but it didn't work for the python binary. I wonder if we need to have new rules for python to make it work with ClassLoader.getJarResourcePath?

No, because we're not embedding a C++ native shared library. Instead, we're embedding a self-executing Python archive. The issue here is probably that you're not using the "python ZIP" as the resource and instead using the default output of py_binary. I believe this will just embed the top-level class, where you actually need to embed all py deps (and even a Python interpreter).

The way to get the python zip is to use the python_zip_file output group. e.g.

filegroup(
    name = "correct_origin_report_pyzip",
    srcs = ["//experimental/dp_consistency/src/main/python/tools:correct_origin_report"],
    output_group = "python_zip_file",
)

experimental/dp_consistency/src/main/kotlin/tools/CorrectOriginReport.kt line 31 at r3 (raw file):

   *
   * @param inputPath The path to the input report.
   * @param unnoisedEdp The list of EDPs that do not add noise to their measurements.
  1. This says a list but the type is String
  2. How are the EDPs referenced? Meaning is this a list of DataProvider resource names?

Code quote:

The list of EDPs

experimental/dp_consistency/src/test/kotlin/tools/CorrectOriginReportTest.kt line 30 at r2 (raw file):

Previously, ple13 (Phi) wrote…

This will be in the next PR. This PR is to load a python binary and runs it in kotlin.

The binary it should load is the one that operates on the JSON file. Meaning we shouldn't be checking in a test Excel spreadsheet.


experimental/dp_consistency/src/test/kotlin/tools/example_origin_report.xlsx line 0 at r3 (raw file):
This must be in JSON format.

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 9 files reviewed, 5 unresolved discussions (waiting on @ple13 and @SanjayVas)


experimental/dp_consistency/src/main/kotlin/tools/CorrectOriginReport.kt line 25 at r4 (raw file):

 * This depends on the python library correct_origin_report for noise correction.
 */
class CorrectOriginReport {

Is this the inteface we expect Origin to interact with?

Copy link
Contributor Author

@ple13 ple13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 20 files reviewed, 5 unresolved discussions (waiting on @kungfucraig and @SanjayVas)


experimental/dp_consistency/src/main/kotlin/tools/BUILD.bazel line 7 at r2 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

I've replaced data with resources. The code still works without ClassLoader.getJarResourcePath

I'm willing to bet this would fail after a bazel clean, as removing it from data means that it is no longer copied into the runfiles tree.

I had a look at this PR and notice that there is the java_native_libraries rule to handle native library, but it didn't work for the python binary. I wonder if we need to have new rules for python to make it work with ClassLoader.getJarResourcePath?

No, because we're not embedding a C++ native shared library. Instead, we're embedding a self-executing Python archive. The issue here is probably that you're not using the "python ZIP" as the resource and instead using the default output of py_binary. I believe this will just embed the top-level class, where you actually need to embed all py deps (and even a Python interpreter).

The way to get the python zip is to use the python_zip_file output group. e.g.

filegroup(
    name = "correct_origin_report_pyzip",
    srcs = ["//experimental/dp_consistency/src/main/python/tools:correct_origin_report"],
    output_group = "python_zip_file",
)

When I tried to load the binary or zip file with Runtime.getRuntime().load(/path/to/binary), it complains that the ELF header is not valid. It looks like the issue is due to incorrect dependencies used. I'm fixing this.
Detailed error message:
"java.lang.UnsatisfiedLinkError: /some/path/native8861144412253610922correct_origin_report: invalid ELF header
at java.base/jdk.internal.loader.NativeLibraries.load(Native Method)
at java.base/jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open(NativeLibraries.java:388)
at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:232)
at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:174)
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2394)
at java.base/java.lang.Runtime.load0(Runtime.java:755)
at java.base/java.lang.System.load(System.java:1957)
at org.wfanet.measurement.common.ClassLoaderResourcesKt.load(ClassLoaderResources.kt:59)
at experimental.dp_consistency.src.main.kotlin.tools.CorrectOriginReport.(CorrectOriginReport.kt:159)
at experimental.dp_consistency.src.test.kotlin.tools.CorrectOriginReportTest.run correct report successfully(CorrectOriginReportTest.kt:161)"


experimental/dp_consistency/src/main/kotlin/tools/CorrectOriginReport.kt line 31 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…
  1. This says a list but the type is String
  2. How are the EDPs referenced? Meaning is this a list of DataProvider resource names?

This has been changed. When reading the report directly, we will get the standard deviation for each of the measurement. So we do not need a list of unnoised edps any more.


experimental/dp_consistency/src/main/kotlin/tools/CorrectOriginReport.kt line 25 at r4 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

Is this the inteface we expect Origin to interact with?

They requested that the input is a json representation of the report and the corrected report also in json format.


experimental/dp_consistency/src/test/kotlin/tools/CorrectOriginReportTest.kt line 30 at r2 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

The binary it should load is the one that operates on the JSON file. Meaning we shouldn't be checking in a test Excel spreadsheet.

Done. Now the binary read a report in JSON format.


experimental/dp_consistency/src/test/kotlin/tools/example_origin_report.xlsx line at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

This must be in JSON format.

Done.

@SanjayVas SanjayVas requested a review from kungfucraig October 2, 2024 22:47
Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 18 files at r5, all commit messages.
Reviewable status: 4 of 20 files reviewed, 5 unresolved discussions (waiting on @kungfucraig and @ple13)


experimental/dp_consistency/src/main/kotlin/tools/BUILD.bazel line 7 at r2 (raw file):

Previously, ple13 (Phi) wrote…

When I tried to load the binary or zip file with Runtime.getRuntime().load(/path/to/binary), it complains that the ELF header is not valid. It looks like the issue is due to incorrect dependencies used. I'm fixing this.
Detailed error message:
"java.lang.UnsatisfiedLinkError: /some/path/native8861144412253610922correct_origin_report: invalid ELF header
at java.base/jdk.internal.loader.NativeLibraries.load(Native Method)
at java.base/jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open(NativeLibraries.java:388)
at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:232)
at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:174)
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2394)
at java.base/java.lang.Runtime.load0(Runtime.java:755)
at java.base/java.lang.System.load(System.java:1957)
at org.wfanet.measurement.common.ClassLoaderResourcesKt.load(ClassLoaderResources.kt:59)
at experimental.dp_consistency.src.main.kotlin.tools.CorrectOriginReport.(CorrectOriginReport.kt:159)
at experimental.dp_consistency.src.test.kotlin.tools.CorrectOriginReportTest.run correct report successfully(CorrectOriginReportTest.kt:161)"

You're calling the wrong function. Call ClassLoader.getJarResourcePath and copy out the file the same way that Runtime.load does. You don't actually want to call Runtime.load as that is for loading a native library, where all you're trying to do is extract a file from the JAR.


experimental/dp_consistency/src/test/kotlin/tools/CorrectOriginReportTest.kt line 172 at r5 (raw file):

    val measurementFrequencyInSeconds = 604800L // 7 days has 604800 seconds.
    val REPORT_TEMPLATE =
      """

Don't have giant textproto or JSON strings in the test. Put these in files referenced in the data attribute and load them from Runfiles.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 20 files reviewed, 9 unresolved discussions (waiting on @kungfucraig and @ple13)


experimental/dp_consistency/src/main/kotlin/tools/CorrectOriginReport.kt line 15 at r3 (raw file):

// limitations under the License.

package experimental.dp_consistency.src.main.kotlin.tools

nit: We use the tools package name to refer to CLI tools by convention.

Code quote:

tools

experimental/dp_consistency/src/main/kotlin/tools/CorrectOriginReport.kt line 27 at r5 (raw file):

import org.wfanet.measurement.reporting.v2alpha.report

/** Corrects noisy measurements in a report using the Python library `correct_origin_report`. */

nit: implementation detail that doesn't belong in public class documentation

Code quote:

using the Python library `correct_origin_report`

experimental/dp_consistency/src/main/kotlin/tools/CorrectOriginReport.kt line 28 at r5 (raw file):

/** Corrects noisy measurements in a report using the Python library `correct_origin_report`. */
class CorrectOriginReport {

nit: I mentioned this on the chat thread: I thought we agreed to stop using "correct" or "correction" as that implies that our Report is incorrect when it's not. IIUC this is just doing consistency adjustments. I'm not happy that we've continued to use the phrase "noise correction" in meetings.

@kungfucraig


experimental/dp_consistency/src/main/kotlin/tools/ReportConverter.kt line 31 at r5 (raw file):

data class SetOperationSummary(val isCumulative: Boolean, val setOperation: String)

fun getReportFromJsonString(reportAsJsonString: String): Report {

Group these into a ReportConversion object rather than polluting the namespace with symbols. See https://github.com/world-federation-of-advertisers/cross-media-measurement/blob/main/docs/code-style.md#kotlin (the advice for constants and static properties also applies to non-extension functions)

Copy link
Contributor Author

@ple13 ple13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 22 files reviewed, 9 unresolved discussions (waiting on @kungfucraig and @SanjayVas)


experimental/dp_consistency/src/test/kotlin/tools/CorrectOriginReportTest.kt line 172 at r5 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Don't have giant textproto or JSON strings in the test. Put these in files referenced in the data attribute and load them from Runfiles.

Done.


experimental/dp_consistency/src/main/kotlin/tools/BUILD.bazel line 7 at r2 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

You're calling the wrong function. Call ClassLoader.getJarResourcePath and copy out the file the same way that Runtime.load does. You don't actually want to call Runtime.load as that is for loading a native library, where all you're trying to do is extract a file from the JAR.

Thanks a lot. I copied the package to the local directory and it works.


experimental/dp_consistency/src/main/kotlin/tools/CorrectOriginReport.kt line 15 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

nit: We use the tools package name to refer to CLI tools by convention.

I've updated the package name to reporting.


experimental/dp_consistency/src/main/kotlin/tools/CorrectOriginReport.kt line 27 at r5 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

nit: implementation detail that doesn't belong in public class documentation

I've removed the implement details.


experimental/dp_consistency/src/main/kotlin/tools/CorrectOriginReport.kt line 28 at r5 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

nit: I mentioned this on the chat thread: I thought we agreed to stop using "correct" or "correction" as that implies that our Report is incorrect when it's not. IIUC this is just doing consistency adjustments. I'm not happy that we've continued to use the phrase "noise correction" in meetings.

@kungfucraig

Name is updated to ReportPostProcessing.


experimental/dp_consistency/src/main/kotlin/tools/ReportConverter.kt line 31 at r5 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Group these into a ReportConversion object rather than polluting the namespace with symbols. See https://github.com/world-federation-of-advertisers/cross-media-measurement/blob/main/docs/code-style.md#kotlin (the advice for constants and static properties also applies to non-extension functions)

Done.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 13 files at r6, all commit messages.
Reviewable status: 5 of 22 files reviewed, 13 unresolved discussions (waiting on @kungfucraig and @ple13)


experimental/dp_consistency/src/main/kotlin/reporting/BUILD.bazel line 5 at r7 (raw file):

package(default_visibility = ["//visibility:public"])

kt_jvm_binary(

We're providing a library, not a binary. This means there should also not be a main function.


experimental/dp_consistency/src/main/kotlin/reporting/ReportConverter.kt line 1 at r7 (raw file):

// Copyright 2024 The Cross-Media Measurement Authors

Rename this file ReportConversion to match the top level object.


experimental/dp_consistency/src/main/kotlin/reporting/ReportConverter.kt line 58 at r7 (raw file):

}

fun getMeasurementPolicy(tag: String): String {

All of the non-extension functions need to be moved inside an object. For this file, that means the ReportConversion object above.


experimental/dp_consistency/src/main/kotlin/reporting/ReportPostProcessing.kt line 15 at r7 (raw file):

// limitations under the License.

package experimental.dp_consistency.src.main.kotlin.reporting
  1. Kotlin packages mirror the directory structure under src/main/kotlin or src/test/kotlin
  2. The package path should mirror wherever this will end up when graduated out of experimental. This would be some package under org.wfanet.measurement.reporting

Suggestion:

package org.wfanet.measurement.reporting.postprocessing

experimental/dp_consistency/src/main/kotlin/reporting/ReportPostProcessing.kt line 35 at r7 (raw file):

/** Corrects noisy measurements in a report. */
class ReportPostProcessing {

There doesn't appear to be any instance state, so this could be an object.

Suggestion:

object

experimental/dp_consistency/src/main/kotlin/reporting/ReportPostProcessing.kt line 37 at r7 (raw file):

class ReportPostProcessing {
  /**
   * Corrects the inconsistent measurements in the [reportAsJsonString] and returns a corrected

nit: use KDoc tags (e.g. @param, @return) as appropriate.

I'd expect the param to be described as "standard JSON serialization of a Report message".


experimental/dp_consistency/src/main/kotlin/reporting/ReportPostProcessing.kt line 40 at r7 (raw file):

   * report in JSON format.
   */
  fun processReport(reportAsJsonString: String): String {

nit: Since the parameter type is not specific (i.e. String), indicate the JSON part in the function name rather than the parameter name.

If we had a specific type here, then we could just make all functions overloads of processReport. Instead, just the protobuf one would be called processReport and anything that deals with strings would need a different name. e.g. if we added something that used an unparsed XML string, it would be named processReportXml.

Suggestion:

  fun processReportJson(report: String): String {

experimental/dp_consistency/src/main/kotlin/reporting/ReportPostProcessing.kt line 46 at r7 (raw file):

    for (reportSummary in reportSummaries) {
      correctedMeasurementsMap.putAll(
        processReportSummary(JsonFormat.printer().print(reportSummary))

JSON should only be used a the edges. We should immediately convert to protobuf.

I'd expect this function to call a processReport function that takes and returns a Report message.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 22 files reviewed, 14 unresolved discussions (waiting on @kungfucraig and @ple13)


experimental/dp_consistency/src/main/kotlin/reporting/ReportPostProcessing.kt line 164 at r7 (raw file):

    init {
      // Copies python zip package from JAR to local directory.
      Files.copy(

Don't assume that any directory other than tmp is writable. This should be copied to a temporary file and deleted on exit.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 18 files at r5, 5 of 9 files at r8, all commit messages.
Reviewable status: 17 of 22 files reviewed, 11 unresolved discussions (waiting on @kungfucraig and @ple13)


experimental/dp_consistency/src/main/python/tools/BUILD.bazel line 1 at r8 (raw file):

load("@pip//:requirements.bzl", "requirement")

This should mirror the Kotlin package path. e.g. be under src/main/python/wfa/measurement/reporting/processing. (The top-level package would always be wfa, except for JVM languages where the convention is to reverse a domain that you own. Hence Kotlin uses org.wfanet.)


experimental/dp_consistency/src/main/kotlin/reporting/ReportPostProcessing.kt line 46 at r7 (raw file):

Previously, ple13 (Phi) wrote…

I've added a function processReport that takes input as a report proto, and output a report proto.

This comment is not resolved as this still converted to JSON again. This means that processReportSummary and the Python executable must operate on binary protobuf.


experimental/dp_consistency/src/main/kotlin/reporting/ReportPostProcessing.kt line 62 at r7 (raw file):

Previously, ple13 (Phi) wrote…

It looks like this depends on the OS. For linux, we can check the max length using the cli xargs --show-limits. It looks like the limit on the argument length is a bit less than 2MB.
xargs --show-limits Your environment variables take up 4425 bytes POSIX upper limit on argument length (this system): 2090679 POSIX smallest allowable upper limit on argument length (all systems): 4096 Maximum length of command we could actually use: 2086254 Size of command buffer we are actually using: 131072 Maximum parallelism (--max-procs must be no greater): 2147483647

I believe what @kungfucraig was implying was that the data should be read from stdin to avoid any issues with length limits.


experimental/dp_consistency/src/main/kotlin/reporting/ReportPostProcessing.kt line 62 at r7 (raw file):

Previously, ple13 (Phi) wrote…

It didn't work without python3 even after explicitly setting the zip to be executable.

Looks like this is a bug. See bazelbuild/bazel#17629

Add a todo referencing that bug. e.g.

// TODO(bazelbuild/bazel#17629): Execute the Python zip directly once this is fixed.

experimental/dp_consistency/src/main/kotlin/reporting/ReportPostProcessing.kt line 35 at r8 (raw file):

/** Corrects noisy measurements in a report. */
object ReportPostProcessing {
  val logger: Logger = Logger.getLogger(this::class.java.name)

It looks like these properties can be private


experimental/dp_consistency/src/test/kotlin/reporting/BUILD.bazel line 1 at r8 (raw file):

load("@wfa_rules_kotlin_jvm//kotlin:defs.bzl", "kt_jvm_test")

The package path under src/test must mirror the package path under src/main

Copy link
Contributor Author

@ple13 ple13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 34 files reviewed, 11 unresolved discussions (waiting on @kungfucraig and @SanjayVas)


experimental/dp_consistency/src/main/kotlin/reporting/ReportPostProcessing.kt line 46 at r7 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

This comment is not resolved as this still converted to JSON again. This means that processReportSummary and the Python executable must operate on binary protobuf.

ah I see. I've updated the code.


experimental/dp_consistency/src/main/kotlin/reporting/ReportPostProcessing.kt line 62 at r7 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

I believe what @kungfucraig was implying was that the data should be read from stdin to avoid any issues with length limits.

Thanks for the clarification. I've updated the python code to read the input from stdin other than taking an argument.


experimental/dp_consistency/src/main/kotlin/reporting/ReportPostProcessing.kt line 62 at r7 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Looks like this is a bug. See bazelbuild/bazel#17629

Add a todo referencing that bug. e.g.

// TODO(bazelbuild/bazel#17629): Execute the Python zip directly once this is fixed.

Done.


experimental/dp_consistency/src/main/kotlin/reporting/ReportPostProcessing.kt line 35 at r8 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

It looks like these properties can be private

Done.


experimental/dp_consistency/src/main/python/tools/BUILD.bazel line 1 at r8 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

This should mirror the Kotlin package path. e.g. be under src/main/python/wfa/measurement/reporting/processing. (The top-level package would always be wfa, except for JVM languages where the convention is to reverse a domain that you own. Hence Kotlin uses org.wfanet.)

Done.


experimental/dp_consistency/src/test/kotlin/reporting/BUILD.bazel line 1 at r8 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

The package path under src/test must mirror the package path under src/main

Done.

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 9 files at r8, 30 of 30 files at r9, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @SanjayVas)


experimental/dp_consistency/src/main/kotlin/reporting/ReportConverter.kt line 27 at r6 (raw file):

Previously, ple13 (Phi) wrote…

Yes, the measurement policy is one of these.

A bit of documentation for the data classes would be helpful.


experimental/dp_consistency/src/main/kotlin/reporting/ReportPostProcessing.kt line 46 at r7 (raw file):

Previously, ple13 (Phi) wrote…

ah I see. I've updated the code.

I'm happy either way, but the text proto has a major benefit of being readable and it's not big, so...

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 15 of 30 files at r9, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @ple13)


experimental/dp_consistency/src/main/kotlin/org/wfanet/measurement/reporting/postprocess/ReportPostProcessing.kt line 96 at r9 (raw file):

    writer.write(reportSummary)
    writer.flush()
    writer.close()

Use a try-with-resources construct to guarantee close gets called. In Kotlin, this is the use function.

Suggestion:

    OutputStreamWriter(process.outputStream).use { writer ->
      writer.write(reportSummary)
      writer.flush()
    }

experimental/dp_consistency/src/main/kotlin/reporting/ReportPostProcessing.kt line 46 at r7 (raw file):
This is still not correct. processReportSummary must take in a param of type ReportSummary, not String.

I'm happy either way, but the text proto has a major benefit of being readable and it's not big, so...

There's no opportunity for a human to read it.


experimental/dp_consistency/src/main/proto/wfa/measurement/internal/reporting/report_summary.proto line 1 at r9 (raw file):

// Copyright 2024 The Cross-Media Measurement Authors

internal/reporting is for the internal Reporting API. You probably want this to be under wfa/measurement/reporting/postprocess.

Copy link
Contributor Author

@ple13 ple13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 22 of 34 files reviewed, 4 unresolved discussions (waiting on @kungfucraig and @SanjayVas)


experimental/dp_consistency/src/main/kotlin/org/wfanet/measurement/reporting/postprocess/ReportPostProcessing.kt line 96 at r9 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Use a try-with-resources construct to guarantee close gets called. In Kotlin, this is the use function.

Done. Thanks.


experimental/dp_consistency/src/main/kotlin/reporting/ReportConverter.kt line 27 at r6 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

A bit of documentation for the data classes would be helpful.

Done.


experimental/dp_consistency/src/main/kotlin/reporting/ReportPostProcessing.kt line 46 at r7 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

This is still not correct. processReportSummary must take in a param of type ReportSummary, not String.

I'm happy either way, but the text proto has a major benefit of being readable and it's not big, so...

There's no opportunity for a human to read it.

Done.


experimental/dp_consistency/src/main/proto/wfa/measurement/internal/reporting/report_summary.proto line 1 at r9 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

internal/reporting is for the internal Reporting API. You probably want this to be under wfa/measurement/reporting/postprocess.

Done.

@SanjayVas SanjayVas requested a review from kungfucraig October 9, 2024 20:28
Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 13 files at r6, 3 of 30 files at r9, 7 of 12 files at r10, all commit messages.
Reviewable status: 29 of 34 files reviewed, 3 unresolved discussions (waiting on @kungfucraig and @ple13)


experimental/dp_consistency/src/main/kotlin/org/wfanet/measurement/reporting/postprocess/ReportPostProcessing.kt line 95 at r10 (raw file):

      writer.write(Base64.getEncoder().encodeToString(reportSummary.toByteArray()))
      writer.flush()
    }

Don't base64 encode. Just write the bytes.

Suggestion:

    reportSummary.writeTo(process.outputStream)

experimental/dp_consistency/src/test/kotlin/org/wfanet/measurement/reporting/postprocess/ReportPostProcessingTest.kt line 32 at r10 (raw file):

    val reportAsJson =
      Files.readString(
        File(

This file is not guaranteed to be in this location. You need to use Runfiles to read it. Ignore the deprecation warning, as it's blocked on a bug outside of our control in rules_kotlin.


experimental/dp_consistency/src/test/python/wfa/measurement/reporting/postprocess/tools/example_origin_report.xlsx line 0 at r10 (raw file):
Delete this file

Copy link
Contributor Author

@ple13 ple13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 26 of 34 files reviewed, 3 unresolved discussions (waiting on @kungfucraig and @SanjayVas)


experimental/dp_consistency/src/main/kotlin/org/wfanet/measurement/reporting/postprocess/ReportPostProcessing.kt line 95 at r10 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Don't base64 encode. Just write the bytes.

Done.


experimental/dp_consistency/src/test/kotlin/org/wfanet/measurement/reporting/postprocess/ReportPostProcessingTest.kt line 32 at r10 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

This file is not guaranteed to be in this location. You need to use Runfiles to read it. Ignore the deprecation warning, as it's blocked on a bug outside of our control in rules_kotlin.

Done.


experimental/dp_consistency/src/test/python/wfa/measurement/reporting/postprocess/tools/example_origin_report.xlsx line at r10 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Delete this file

Done.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 12 files at r10, 4 of 6 files at r11, all commit messages.
Reviewable status: 32 of 34 files reviewed, 5 unresolved discussions (waiting on @kungfucraig and @ple13)


experimental/dp_consistency/src/main/kotlin/org/wfanet/measurement/reporting/postprocess/ReportConversion.kt line 45 at r11 (raw file):

      try {
        val protoBuilder = Report.newBuilder()
        JsonFormat.parser().ignoringUnknownFields().merge(reportAsJsonString, protoBuilder)

Why?

Code quote:

ignoringUnknownFields()

experimental/dp_consistency/src/main/kotlin/org/wfanet/measurement/reporting/postprocess/ReportConversion.kt line 47 at r11 (raw file):

        JsonFormat.parser().ignoringUnknownFields().merge(reportAsJsonString, protoBuilder)
        protoBuilder.build()
      } catch (e: InvalidProtocolBufferException) {

Bubble up the exception rather than hiding it. Passing an invalid message should indeed be an error.


experimental/dp_consistency/src/main/kotlin/org/wfanet/measurement/reporting/postprocess/ReportConversion.kt line 95 at r11 (raw file):

fun Report.toReportSummaries(): List<ReportSummary> {
  require(this.state == Report.State.SUCCEEDED) { "Unsucceeded report is not supported." }

nit: redundant qualifier, here and elsewhere.

We only use the qualifier when necessary

Code quote:

this.

experimental/dp_consistency/src/test/kotlin/org/wfanet/measurement/reporting/postprocess/ReportConversionTest.kt line 91 at r11 (raw file):

  companion object {
    private val REPORT_WITH_UNSPECIFIED_STATE =

Extract these to separate data files.


experimental/dp_consistency/src/test/kotlin/org/wfanet/measurement/reporting/postprocess/ReportPostProcessingTest.kt line 32 at r10 (raw file):

Previously, ple13 (Phi) wrote…

Done.

This is still not correct. By "use Runfiles" I meant the functions defined in Runfiles.kt in common-jvm. Specifically, org.wfanet.measurement.common.getRuntimePath. The function is marked as deprecated, but we can't actually fix it until another issue is resolved (see #1530) which is why I said to ignore the deprecation warning.

Copy link
Contributor Author

@ple13 ple13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 27 of 37 files reviewed, 4 unresolved discussions (waiting on @kungfucraig and @SanjayVas)


experimental/dp_consistency/src/main/kotlin/org/wfanet/measurement/reporting/postprocess/ReportConversion.kt line 45 at r11 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Why?

Removed.


experimental/dp_consistency/src/main/kotlin/org/wfanet/measurement/reporting/postprocess/ReportConversion.kt line 47 at r11 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Bubble up the exception rather than hiding it. Passing an invalid message should indeed be an error.

Done.


experimental/dp_consistency/src/main/kotlin/org/wfanet/measurement/reporting/postprocess/ReportConversion.kt line 95 at r11 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

nit: redundant qualifier, here and elsewhere.

We only use the qualifier when necessary

Done.


experimental/dp_consistency/src/test/kotlin/org/wfanet/measurement/reporting/postprocess/ReportConversionTest.kt line 91 at r11 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Extract these to separate data files.

Done.


experimental/dp_consistency/src/test/kotlin/org/wfanet/measurement/reporting/postprocess/ReportPostProcessingTest.kt line 32 at r10 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

This is still not correct. By "use Runfiles" I meant the functions defined in Runfiles.kt in common-jvm. Specifically, org.wfanet.measurement.common.getRuntimePath. The function is marked as deprecated, but we can't actually fix it until another issue is resolved (see #1530) which is why I said to ignore the deprecation warning.

Done. Thanks.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 30 files at r9, 8 of 8 files at r12, all commit messages.
Reviewable status: 35 of 37 files reviewed, 3 unresolved discussions (waiting on @kungfucraig and @ple13)


experimental/dp_consistency/src/test/kotlin/org/wfanet/measurement/reporting/postprocess/BUILD.bazel line 29 at r12 (raw file):

    srcs = ["ReportConversionTest.kt"],
    data = [
        "//experimental/dp_consistency/src/test/kotlin/org/wfanet/measurement/reporting/postprocess:report_with_failed_measurement.json",

Isn't this file in the same package? You can just reference the file names directly, or even create a filegroup for convenience.

filegroup(
    name = "sample_reports",
    srcs = glob(["*.json"]),
)

then you can just reference :sample_reports.

Code quote:

//experimental/dp_consistency/src/test/kotlin/org/wfanet/measurement/reporting/postprocess:report_with_failed_measurement.json

experimental/dp_consistency/src/test/kotlin/org/wfanet/measurement/reporting/postprocess/ReportConversionTest.kt line 36 at r12 (raw file):

  fun `report as json string is successfully converted to report summary proto`() {
    val reportFile = TEST_DATA_RUNTIME_DIR.resolve("sample_report_small.json").toFile()
    val reportAsJson = Files.readString(reportFile.toPath())

Use the File.readText extension.

Suggestion:

    val reportFile = TEST_DATA_RUNTIME_DIR.resolve("sample_report_small.json").toFile()
    val reportAsJson = reportFile.readText()

Copy link
Contributor Author

@ple13 ple13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 32 of 37 files reviewed, 3 unresolved discussions (waiting on @kungfucraig and @SanjayVas)


experimental/dp_consistency/src/test/kotlin/org/wfanet/measurement/reporting/postprocess/BUILD.bazel line 29 at r12 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Isn't this file in the same package? You can just reference the file names directly, or even create a filegroup for convenience.

filegroup(
    name = "sample_reports",
    srcs = glob(["*.json"]),
)

then you can just reference :sample_reports.

Done.


experimental/dp_consistency/src/test/kotlin/org/wfanet/measurement/reporting/postprocess/ReportConversionTest.kt line 36 at r12 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Use the File.readText extension.

Done.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved for BUILD and Kotlin code. Assuming someone else is looking at the Python.

Reviewed 2 of 13 files at r6, 3 of 3 files at r13, all commit messages.
Reviewable status: 35 of 37 files reviewed, all discussions resolved (waiting on @kungfucraig)

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the Python, and the the touch points between Kotlin and it.

Reviewable status: 35 of 37 files reviewed, all discussions resolved (waiting on @ple13)

Copy link
Contributor Author

@ple13 ple13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Reviewable status: 35 of 37 files reviewed, all discussions resolved (waiting on @kungfucraig)

Copy link
Contributor Author

@ple13 ple13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 18 files at r5, 2 of 13 files at r6, 1 of 9 files at r8, 16 of 30 files at r9, 4 of 12 files at r10, 5 of 6 files at r11, 5 of 8 files at r12, 3 of 3 files at r13, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ple13)

@ple13 ple13 merged commit 33df695 into main Oct 11, 2024
4 checks passed
@ple13 ple13 deleted the lephi-noise-correction-works-with-report-proto branch October 11, 2024 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants