Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(minipipeline): DNSLookupFailures must be a slice #1395

Merged
merged 63 commits into from
Nov 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
766d4d0
chore: start investigating LTE vs v0.4
bassosimone Nov 22, 2023
f3701a0
document why some QA tests with redirects are broken
bassosimone Nov 23, 2023
f1cc4bb
document more doubts about emmitting events
bassosimone Nov 23, 2023
0b2203d
document more caveats
bassosimone Nov 23, 2023
8eb1ba1
[ci skip] remember to update files in sync
bassosimone Nov 23, 2023
92eb7b7
doc: document more doubts that I have
bassosimone Nov 23, 2023
ea0d3bf
[ci skip] more documentation on what to do
bassosimone Nov 23, 2023
fd06406
feat: progress towards fixing some fundamental issues
bassosimone Nov 23, 2023
ad75714
resolve one more test case
bassosimone Nov 23, 2023
d6cbfd9
more fixes
bassosimone Nov 23, 2023
41fbd3f
doc: explain issues caused by adding HTTP response
bassosimone Nov 23, 2023
e5e4c37
try to sketch out an ooni/data-inspired pipeline
bassosimone Nov 23, 2023
6aff4f0
convert more of v0.5's analysis to the ooni/data-like style
bassosimone Nov 24, 2023
94f9fd7
some more progress
bassosimone Nov 24, 2023
ff42f3c
break the code in a different way
bassosimone Nov 24, 2023
5ad88d5
feat: rewrite the pipeline to match ooni/data more closely
bassosimone Nov 24, 2023
132ba4d
also implement the analysis
bassosimone Nov 24, 2023
be18947
work
bassosimone Nov 24, 2023
18da855
we're mostly done in terms of passing the existing QA tests
bassosimone Nov 25, 2023
e7764c8
tests now green
bassosimone Nov 25, 2023
dac4170
make more test cases work with LTE
bassosimone Nov 26, 2023
258d7fb
we now pass all tests
bassosimone Nov 27, 2023
c6a49ff
[ci skip] remove TODO
bassosimone Nov 27, 2023
8d25a65
fix tricky case with order of DNS processing
bassosimone Nov 27, 2023
44541ea
adjust test case where actually dns is consistent with lte
bassosimone Nov 27, 2023
183f524
make all lte tests pass consistently
bassosimone Nov 27, 2023
a4bedcc
x
bassosimone Nov 27, 2023
3ded283
start generating test cases for the minipipeline
bassosimone Nov 27, 2023
1eaaac0
start adding tests for the minipipeline
bassosimone Nov 27, 2023
df33632
add tests for the minipipeline command
bassosimone Nov 27, 2023
5ad8387
more testing
bassosimone Nov 27, 2023
9fc77fc
more minipipeline tests
bassosimone Nov 27, 2023
c7c310a
x
bassosimone Nov 27, 2023
281e38d
x
bassosimone Nov 27, 2023
0ea4803
add more test cases
bassosimone Nov 27, 2023
9ec20fc
x
bassosimone Nov 27, 2023
66364ed
x
bassosimone Nov 27, 2023
329b2c8
x
bassosimone Nov 27, 2023
72b2be9
x
bassosimone Nov 27, 2023
7f8c143
start documenting code and existing bugs
bassosimone Nov 27, 2023
14840ac
attempt to fix the model problems
bassosimone Nov 27, 2023
55c05ac
commit the measurements
bassosimone Nov 27, 2023
c45a2f6
okay, this looks relatively good
bassosimone Nov 27, 2023
28f93aa
other changes
bassosimone Nov 27, 2023
d849894
x
bassosimone Nov 27, 2023
5091608
add measurements
bassosimone Nov 27, 2023
f1d7137
x
bassosimone Nov 27, 2023
c228c4e
add measurements
bassosimone Nov 27, 2023
8d668fe
x
bassosimone Nov 27, 2023
29bfdd4
x
bassosimone Nov 27, 2023
310ab28
x
bassosimone Nov 27, 2023
dfdf673
meas
bassosimone Nov 27, 2023
9110384
meas
bassosimone Nov 27, 2023
1a8c235
obs
bassosimone Nov 27, 2023
29fec45
x
bassosimone Nov 27, 2023
05f8838
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
b6c643f
x
bassosimone Nov 28, 2023
cc60691
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
0956494
fix potential bug with failed DNS lookups
bassosimone Nov 28, 2023
f457597
x
bassosimone Nov 28, 2023
1565ff2
x
bassosimone Nov 28, 2023
8199685
x
bassosimone Nov 28, 2023
4fe692b
x
bassosimone Nov 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions internal/cmd/minipipeline/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ var (
// destdirFlag is the -destdir flag
destdirFlag = flag.String("destdir", ".", "destination directory to use")

// helpFlag is the -help flag
helpFlag = flag.Bool("help", false, "Show the help message")

// measurementFlag is the -measurement flag
measurementFlag = flag.String("measurement", "", "measurement file to analyze")

Expand All @@ -30,15 +33,15 @@ var (

func main() {
flag.Parse()
if *measurementFlag == "" {
if *helpFlag || *measurementFlag == "" {
fmt.Fprintf(os.Stderr, "\n")
fmt.Fprintf(os.Stderr, "usage: %s -measurement <file> [-prefix <prefix>]\n", filepath.Base(os.Args[0]))
fmt.Fprintf(os.Stderr, "usage: %s -measurement <file> [-destdir <dir>] [-prefix <prefix>]\n", filepath.Base(os.Args[0]))
fmt.Fprintf(os.Stderr, "\n")
fmt.Fprintf(os.Stderr, "Mini measurement processing pipeline to reprocess recent probe measurements\n")
fmt.Fprintf(os.Stderr, "and align results calculation with ooni/data.\n")
fmt.Fprintf(os.Stderr, "\n")
fmt.Fprintf(os.Stderr, "Analyzes the <file> provided using -measurement <file> and writes the\n")
fmt.Fprintf(os.Stderr, "observations.json and analysis.json files in the -destdir <destdir> directory,\n")
fmt.Fprintf(os.Stderr, "observations.json and analysis.json files in the -destdir <dir> directory,\n")
fmt.Fprintf(os.Stderr, "which must already exist.\n")
fmt.Fprintf(os.Stderr, "\n")
fmt.Fprintf(os.Stderr, "Use -prefix <prefix> to add <prefix> in front of the generated files names.\n")
Expand Down
20 changes: 10 additions & 10 deletions internal/cmd/minipipeline/testdata/observations.json
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
{
"DNSLookupFailures": {
"2": {
"DNSTransactionID": 2,
"DNSLookupFailures": [
{
"DNSTransactionID": 3,
"DNSDomain": "nexa.polito.it",
"DNSLookupFailure": "dns_no_answer",
"DNSQueryType": "AAAA",
"DNSEngine": "doh",
"DNSEngine": "udp",
"IPAddress": null,
"IPAddressASN": null,
"IPAddressOrg": null,
Expand Down Expand Up @@ -38,12 +38,12 @@
"ControlHTTPResponseHeadersKeys": null,
"ControlHTTPResponseTitle": null
},
"3": {
"DNSTransactionID": 3,
{
"DNSTransactionID": 2,
"DNSDomain": "nexa.polito.it",
"DNSLookupFailure": "dns_no_answer",
"DNSQueryType": "AAAA",
"DNSEngine": "udp",
"DNSEngine": "doh",
"IPAddress": null,
"IPAddressASN": null,
"IPAddressOrg": null,
Expand Down Expand Up @@ -76,7 +76,7 @@
"ControlHTTPResponseHeadersKeys": null,
"ControlHTTPResponseTitle": null
}
},
],
"DNSLookupSuccesses": [
{
"DNSTransactionID": 3,
Expand Down Expand Up @@ -233,7 +233,7 @@
"X-Generator": true
},
"HTTPResponseLocation": null,
"HTTPResponseTitle": "Nexa Center for Internet & Society | Il centro Nexa è un centro di ricerca del Dipartimento di Automatica e Informatica del Politecnico di Torino",
"HTTPResponseTitle": "Nexa Center for Internet \u0026 Society | Il centro Nexa è un centro di ricerca del Dipartimento di Automatica e Informatica del Politecnico di Torino",
"HTTPResponseIsFinal": true,
"ControlDNSDomain": null,
"ControlDNSLookupFailure": null,
Expand All @@ -260,7 +260,7 @@
"X-Frame-Options": true,
"X-Generator": true
},
"ControlHTTPResponseTitle": "Nexa Center for Internet & Society | Il centro Nexa è un centro di ricerca del Dipartimento di Automatica e Informatica del Politecnico di Torino"
"ControlHTTPResponseTitle": "Nexa Center for Internet \u0026 Society | Il centro Nexa è un centro di ricerca del Dipartimento di Automatica e Informatica del Politecnico di Torino"
}
}
}
11 changes: 7 additions & 4 deletions internal/measurexlite/web.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,15 @@ package measurexlite

import "regexp"

// webTitleRegexp is the regexp to extract the title
//
// MK used {1,128} but we're making it larger here to get longer titles
// e.g. <http://www.isa.gov.il/Pages/default.aspx>'s one
var webTitleRegexp = regexp.MustCompile(`(?i)<title>([^<]{1,512})</title>`)

// WebGetTitle returns the title or an empty string.
func WebGetTitle(measurementBody string) string {
// MK used {1,128} but we're making it larger here to get longer titles
// e.g. <http://www.isa.gov.il/Pages/default.aspx>'s one
re := regexp.MustCompile(`(?i)<title>([^<]{1,512})</title>`)
v := re.FindStringSubmatch(measurementBody)
v := webTitleRegexp.FindStringSubmatch(measurementBody)
if len(v) < 2 {
return ""
}
Expand Down
12 changes: 6 additions & 6 deletions internal/minipipeline/analysis_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ import (
func TestWebAnalysisComputeDNSExperimentFailure(t *testing.T) {
t.Run("when there's no DNSDomain", func(t *testing.T) {
container := &WebObservationsContainer{
DNSLookupFailures: map[int64]*WebObservation{
1: {
DNSLookupFailures: []*WebObservation{
{
DNSTransactionID: optional.Some(int64(1)),
DNSDomain: optional.None[string](), // explicitly set
DNSLookupFailure: optional.Some("dns_no_answer"),
Expand All @@ -30,8 +30,8 @@ func TestWebAnalysisComputeDNSExperimentFailure(t *testing.T) {

t.Run("when DNSDomain does not match ControlDNSDomain", func(t *testing.T) {
container := &WebObservationsContainer{
DNSLookupFailures: map[int64]*WebObservation{
1: {
DNSLookupFailures: []*WebObservation{
{
DNSTransactionID: optional.Some(int64(1)),
DNSDomain: optional.Some("dns.google.com"),
DNSLookupFailure: optional.Some("dns_no_answer"),
Expand All @@ -52,8 +52,8 @@ func TestWebAnalysisComputeDNSExperimentFailure(t *testing.T) {

t.Run("when the failure is dns_no_answer for AAAA", func(t *testing.T) {
container := &WebObservationsContainer{
DNSLookupFailures: map[int64]*WebObservation{
1: {
DNSLookupFailures: []*WebObservation{
{
DNSTransactionID: optional.Some(int64(1)),
DNSDomain: optional.Some("dns.google.com"),
DNSLookupFailure: optional.Some("dns_no_answer"),
Expand Down
9 changes: 5 additions & 4 deletions internal/minipipeline/observation.go
Original file line number Diff line number Diff line change
Expand Up @@ -223,8 +223,9 @@ type WebObservationsContainer struct {
// ID space, i.e., you can't see the same transaction ID in both. Transaction IDs
// are strictly positive unique numbers within the same OONI measurement. Note
// that the A and AAAA events for the same DNS lookup uses the same transaction ID
// until we fix the https://github.com/ooni/probe/issues/2624 issue.
DNSLookupFailures map[int64]*WebObservation
// until we fix the https://github.com/ooni/probe/issues/2624 issue. For this
// reason DNSLookupFailure and DNSLookupSuccesses MUST be slices.
DNSLookupFailures []*WebObservation

// DNSLookupSuccesses contains all the successful transactions.
DNSLookupSuccesses []*WebObservation
Expand All @@ -240,7 +241,7 @@ type WebObservationsContainer struct {
// NewWebObservationsContainer constructs a [*WebObservationsContainer].
func NewWebObservationsContainer() *WebObservationsContainer {
return &WebObservationsContainer{
DNSLookupFailures: map[int64]*WebObservation{},
DNSLookupFailures: []*WebObservation{},
DNSLookupSuccesses: []*WebObservation{},
KnownTCPEndpoints: map[int64]*WebObservation{},
knownIPAddresses: map[string]*WebObservation{},
Expand Down Expand Up @@ -271,7 +272,7 @@ func (c *WebObservationsContainer) ingestDNSLookupFailures(evs ...*model.Archiva
}

// add record
c.DNSLookupFailures[ev.TransactionID] = obs
c.DNSLookupFailures = append(c.DNSLookupFailures, obs)
}
}

Expand Down
8 changes: 4 additions & 4 deletions internal/minipipeline/observation_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ func TestLoadWebObservations(t *testing.T) {
func TestWebObservationsContainerIngestTLSHandshakeEvents(t *testing.T) {
t.Run("when we don't have any known TCP endpoint", func(t *testing.T) {
container := &WebObservationsContainer{
DNSLookupFailures: map[int64]*WebObservation{},
DNSLookupFailures: []*WebObservation{},
KnownTCPEndpoints: map[int64]*WebObservation{}, // this map must be empty in this test
knownIPAddresses: map[string]*WebObservation{},
}
Expand Down Expand Up @@ -87,7 +87,7 @@ func TestWebObservationsContainerIngestTLSHandshakeEvents(t *testing.T) {
func TestWebObservationsContainerIngestHTTPRoundTripEvents(t *testing.T) {
t.Run("when we don't have any known TCP endpoint", func(t *testing.T) {
container := &WebObservationsContainer{
DNSLookupFailures: map[int64]*WebObservation{},
DNSLookupFailures: []*WebObservation{},
KnownTCPEndpoints: map[int64]*WebObservation{}, // this map must be empty in this test
knownIPAddresses: map[string]*WebObservation{},
}
Expand Down Expand Up @@ -117,7 +117,7 @@ func TestWebObservationsContainerIngestHTTPRoundTripEvents(t *testing.T) {
func TestWebObservationsContainerIngestControlMessages(t *testing.T) {
t.Run("we don't set MatchWithControlIPAddressASN when we don't have probe ASN info", func(t *testing.T) {
container := &WebObservationsContainer{
DNSLookupFailures: map[int64]*WebObservation{},
DNSLookupFailures: []*WebObservation{},
KnownTCPEndpoints: map[int64]*WebObservation{
1: {
DNSDomain: optional.Some("dns.google"),
Expand Down Expand Up @@ -164,7 +164,7 @@ func TestWebObservationsContainerIngestControlMessages(t *testing.T) {

t.Run("we don't save TLS handshake failures when the SNI is different", func(t *testing.T) {
container := &WebObservationsContainer{
DNSLookupFailures: map[int64]*WebObservation{},
DNSLookupFailures: []*WebObservation{},
KnownTCPEndpoints: map[int64]*WebObservation{
1: {
IPAddress: optional.Some("8.8.8.8"),
Expand Down
Loading