Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodic: Add basic metrics in the library #3237

Merged
merged 2 commits into from
Oct 17, 2019

Conversation

karampok
Copy link
Contributor

@karampok karampok commented Oct 9, 2019

Adds for every periodic task

bs_beacon_cleaner_periodic_period_duration_seconds 30
bs_beacon_cleaner_periodic_runtime_duration_seconds_total 0.028704568000000007
bs_beacon_cleaner_periodic_runtime_timestamp_seconds 1.570616739e+09

Other:

  • adds interface for metrics to allow testing.
  • modifies the periodic.StartTaskfix to accept interval, and metric
    prefixthe namespace,subsystem to be inserted in startTask.
  • refactors tests to support the above change.

Fixes #3121


This change is Reviewable

@karampok karampok force-pushed the pub-add-periodic-metrics branch from ab7f479 to 98d13e5 Compare October 9, 2019 11:23
@karampok karampok force-pushed the pub-add-periodic-metrics branch from 98d13e5 to 0d4bd04 Compare October 9, 2019 15:21
Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 20 of 20 files at r1.
Reviewable status: all files reviewed, 24 unresolved discussions (waiting on @karampok)


go/beacon_srv/main.go, line 317 at r1 (raw file):

	t.beaconCleaner = periodic.StartTask(
		beaconstorage.NewBeaconCleaner(t.store),
		30*time.Second, 30*time.Second, "bs_beacon_cleaner")

hm we should make sure that the cleaner metrics also use the same namespace. Otherwise this will be confusing I guess.
Can you please make sure that all occurrences of Cleaner also use the namespace of the periodic task they are run in?


go/beacon_srv/main.go, line 320 at r1 (raw file):

	t.revCleaner = periodic.StartTask(
		beaconstorage.NewRevocationCleaner(t.store),
		5*time.Second, 5*time.Second, "bs_revocation_cleaner")

As discussed in person we can probably use the name of the task an just pass the metric namespace.
For this we should also make sure that all task name use snake_case. Also please adapt the logging in periodic to print namespace and task name so that we can easily correlate logs and metrics


go/lib/healthpool/pool.go, line 58 at r1 (raw file):

	}
	p.expirer = periodic.StartTask((*expirer)(p), time.Second,
		time.Second, "healthpool_expirer")

This is problematic as there can be multiple healthpools I guess. So we would need some differentiation from outside I think, I'm not too familiar with this code.


go/lib/infra/modules/idiscovery/idiscovery.go, line 258 at r1 (raw file):

	}
	r.fetcher = periodic.StartTask(fetcher, cfg.Interval.Duration,
		cfg.Timeout.Duration, "idiscovery_fetcher")

Also here I think there can be multiple with same name


go/lib/infra/modules/itopo/cleaner.go, line 30 at r1 (raw file):

// StartCleaner starts a periodic task that removes expired dynamic topologies.
func StartCleaner(period, timeout time.Duration) *periodic.Runner {
	return periodic.StartTask(cleaner{}, period, timeout, "itopo_cleaner")

Ditto Multiple with same name.


go/lib/periodic/periodic.go, line 27 at r1 (raw file):

// Ticker interface to improve testability of this periodic task code.
type Ticker interface {

I don't think we need this anymore right?


go/lib/periodic/periodic.go, line 65 at r1 (raw file):

	cancelF      context.CancelFunc
	trigger      chan struct{}
	export       metrics.ExportMetric

why exportcan't we call it metrics?


go/lib/periodic/periodic.go, line 69 at r1 (raw file):

 The ticker regulates the periodicity. 

Please update the doc according to your changes


go/lib/periodic/periodic.go, line 72 at r1 (raw file):

// The timeout can be larger than the periodicity of the ticker. That means if a tasks takes a long
// time it will be immediately retriggered.
func StartTask(task Task, period, timeout time.Duration, prefix string) *Runner {

I think we could even just call this Start it looks nicer in place of usage:
periodic.Start(pathdb.NewCleaner(...)...)
periodic.Start(syncer,....)
etc.


go/lib/periodic/periodic.go, line 103 at r1 (raw file):

// Kill is like stop but it also cancels the context of the current running method.
func (r *Runner) Kill() {
	if r == nil {

uhm why do you remove this?


go/lib/periodic/periodic_test.go, line 42 at r1 (raw file):

func TestPeriodicExecution(t *testing.T) {
	met := initMetrics(t)

that doesn't seem to be required. Why do we have this?


go/lib/periodic/periodic_test.go, line 51 at r1 (raw file):

		cnt++
	})
	p := 10 * time.Millisecond

I guess 3 times would also be enough right?


go/lib/periodic/periodic_test.go, line 69 at r1 (raw file):

		errChan <- ctx.Err()
	})
	p := 50 * time.Millisecond

I think the period can be shorter in this test, 1ms should be enough


go/lib/periodic/periodic_test.go, line 74 at r1 (raw file):

	fn := taskFunc(func(ctx context.Context) {
		// Simulate long work by blocking on the done channel.
		xtest.AssertReadReturnsBefore(t, ctx.Done(), time.Second)

why did you replace this? This was already doing what is done by the code below now.


go/lib/periodic/periodic_test.go, line 90 at r1 (raw file):

	go func() {
		for {
			if cnt == 1 {

A bit simpler would be to use a channel.

	executedOnce := make(chan struct{}, 10)
	fn := taskFunc(func(ctx context.Context) {
		executedOnce <- struct{}{}
	})

	r := StartTask(fn, p, 2*p, "TestValue")
	go func() {
		<-executedOnce
		r.Kill()
		return
	}()

go/lib/periodic/periodic_test.go, line 97 at r1 (raw file):

	}()
	time.Sleep(5 * p)
	assert.Equal(t, cnt, 1, "Must run only once within 5 period time")

Using cnt without locks is racy.
I changed the test as following:

	var cntMtx sync.Mutex
	cnt := 0
	executedOnce := make(chan struct{}, 10)
	fn := taskFunc(func(ctx context.Context) {
		cntMtx.Lock()
		cnt++
		cntMtx.Unlock()
		executedOnce <- struct{}{}
	})
	p := 20 * time.Millisecond
	r := StartTask(fn, p, 2*p, "TestValue")
	go func() {
		<-executedOnce
		r.Kill()
		return
	}()
	time.Sleep(5 * p)
	cntMtx.Lock()
	assert.Equal(t, cnt, 1, "Must run only once within 5 period time")
	cntMtx.Unlock()

It uses locks, the channel, and a bit sharper timers to reduce overall time.


go/lib/periodic/periodic_test.go, line 100 at r1 (raw file):

}

func TestTriggerNow(t *testing.T) {

This test could be without sleeping:

func TestTriggerNow(t *testing.T) {
	cnt := 0
	fn := taskFunc(func(ctx context.Context) {
		cnt++
	})
	want := 10
	p := 10 * time.Hour
	r := StartTask(fn, p, 3*p, "testValue")
	triggerDone := make(chan struct{})
	go func() {
		for i := 0; i < want+1; i++ {
			r.TriggerRun()
		}
		close(triggerDone)
		return
	}()
	<-triggerDone
	assert.GreaterOrEqual(t, cnt, want, "Must run want times within 2 period time")
}

still requires locking on cnt though


go/lib/periodic/periodic_test.go, line 110 at r1 (raw file):

	go func() {
		for {
			if cnt == 1 {

instead of busy looping we could also use a channel here.


go/lib/periodic/periodic_test.go, line 119 at r1 (raw file):

	}()
	time.Sleep(2 * p)
	assert.GreaterOrEqual(t, cnt, want, "Must run want times within 2 period time")

Also race detector complains about cnt. guard it with a lock.


go/lib/periodic/metrics/metrics.go, line 27 at r1 (raw file):

const (
	//EventStop indicates a stop event took place.

space after // for all comments


go/lib/periodic/metrics/metrics.go, line 38 at r1 (raw file):

// NewMetric return a struct with the metrics counters.
var NewMetric = newMetric

Why not just make the function itself public? How is this pattern better?


go/lib/periodic/metrics/metrics.go, line 41 at r1 (raw file):

// ExportMetric is the interface to export periodic metrics.
type ExportMetric interface {

hm what's this export about? Wouldn't RunnerMetric or PeriodicMetric by a better name?


go/lib/periodic/metrics/metrics_test.go, line 31 at r1 (raw file):

		EventLabels{},
	}
	for _, test := range tests {

I personally don't see the gain of using a loop here. it's more lines, and the check call is very simple usually.


go/lib/periodic/metrics/metrics_test.go, line 38 at r1 (raw file):

func TestNewMetric(t *testing.T) {
	n, sn := "randomSnakeName", "random_snake_name"
	assert.NotContains(t, counters, sn)

Not sure I like testing of the internals. Wouldn't it be enough if we just tested that doing NewMetric with the same name twice works?


go/lib/periodic/metrics/metrics_test.go, line 65 at r1 (raw file):

		want := `
# HELP test_me_periodic_runtime_duration_seconds_total Total time spend on every periodic run.
# TYPE test_me_periodic_runtime_duration_seconds_total counter

Cool I like this. This will be amazing

@karampok karampok force-pushed the pub-add-periodic-metrics branch from 0d4bd04 to fec6cf6 Compare October 10, 2019 13:50
Copy link
Contributor Author

@karampok karampok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 24 unresolved discussions (waiting on @lukedirtwalker)


go/beacon_srv/main.go, line 317 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

hm we should make sure that the cleaner metrics also use the same namespace. Otherwise this will be confusing I guess.
Can you please make sure that all occurrences of Cleaner also use the namespace of the periodic task they are run in?

at /go/lib/infra/modules/cleaner/cleaner.go#L50 we can insert a subsystem, so in beacon code we call the new with the right value.
Done


go/beacon_srv/main.go, line 320 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

As discussed in person we can probably use the name of the task an just pass the metric namespace.
For this we should also make sure that all task name use snake_case. Also please adapt the logging in periodic to print namespace and task name so that we can easily correlate logs and metrics

done

bs_revocation_cleaner_deleted_total 0
bs_revocation_cleaner_periodic_period_duration_seconds 5
bs_revocation_cleaner_periodic_runtime_duration_seconds_total 0.011446651000000004
bs_revocation_cleaner_periodic_runtime_timestamp_seconds 1.570707374e+09
bs_revocation_cleaner_results_total{result="ok"} 44
bs_beacon_cleaner_deleted_total 0
bs_beacon_cleaner_periodic_period_duration_seconds 30
bs_beacon_cleaner_periodic_runtime_duration_seconds_total 0.005863332000000001
bs_beacon_cleaner_periodic_runtime_timestamp_seconds 1.570707374e+09
bs_beacon_cleaner_results_total{result="ok"} 7

go/lib/healthpool/pool.go, line 58 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

This is problematic as there can be multiple healthpools I guess. So we would need some differentiation from outside I think, I'm not too familiar with this code.

I am not sure we actually use that part (yet or we use in anapaya branch), so I cannot have solution. Left a todo


go/lib/infra/modules/idiscovery/idiscovery.go, line 258 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

Also here I think there can be multiple with same name

that is hard, not done yet


go/lib/infra/modules/itopo/cleaner.go, line 30 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

Ditto Multiple with same name.

this is only being used by idiscovery, probably can be solved together


go/lib/periodic/periodic.go, line 27 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

I don't think we need this anymore right?

Done.


go/lib/periodic/periodic.go, line 65 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

why exportcan't we call it metrics?

overload term, i call it metricExporter


go/lib/periodic/periodic.go, line 69 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…
 The ticker regulates the periodicity. 

Please update the doc according to your changes

Done.


go/lib/periodic/periodic.go, line 72 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

I think we could even just call this Start it looks nicer in place of usage:
periodic.Start(pathdb.NewCleaner(...)...)
periodic.Start(syncer,....)
etc.

Done.


go/lib/periodic/periodic.go, line 103 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

uhm why do you remove this?

if the r==nil, the code before that it would panic, right? you cannot call a method on nil


go/lib/periodic/periodic_test.go, line 42 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

that doesn't seem to be required. Why do we have this?

work in progress, I can now make sure that we call the metric function as many times we need.
What do you think of the whole approach?


go/lib/periodic/periodic_test.go, line 51 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

I guess 3 times would also be enough right?

sure, but I increase my confidence that it cannot be flaky.


go/lib/periodic/periodic_test.go, line 69 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

I think the period can be shorter in this test, 1ms should be enough

Done.


go/lib/periodic/periodic_test.go, line 74 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

why did you replace this? This was already doing what is done by the code below now.

At first I did that for readability, it helped me to read the code when first time.
And I kept for the same reason, plus we use it only once and since it is three lines of code for I prefer not to import a dependency.
No strong preference, what you prefer


go/lib/periodic/periodic_test.go, line 90 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

A bit simpler would be to use a channel.

	executedOnce := make(chan struct{}, 10)
	fn := taskFunc(func(ctx context.Context) {
		executedOnce <- struct{}{}
	})

	r := StartTask(fn, p, 2*p, "TestValue")
	go func() {
		<-executedOnce
		r.Kill()
		return
	}()

indeed, looks better


go/lib/periodic/periodic_test.go, line 97 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

Using cnt without locks is racy.
I changed the test as following:

	var cntMtx sync.Mutex
	cnt := 0
	executedOnce := make(chan struct{}, 10)
	fn := taskFunc(func(ctx context.Context) {
		cntMtx.Lock()
		cnt++
		cntMtx.Unlock()
		executedOnce <- struct{}{}
	})
	p := 20 * time.Millisecond
	r := StartTask(fn, p, 2*p, "TestValue")
	go func() {
		<-executedOnce
		r.Kill()
		return
	}()
	time.Sleep(5 * p)
	cntMtx.Lock()
	assert.Equal(t, cnt, 1, "Must run only once within 5 period time")
	cntMtx.Unlock()

It uses locks, the channel, and a bit sharper timers to reduce overall time.

done, refactored to use channel and not locks.
-race does not complain anymore


go/lib/periodic/periodic_test.go, line 100 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

This test could be without sleeping:

func TestTriggerNow(t *testing.T) {
	cnt := 0
	fn := taskFunc(func(ctx context.Context) {
		cnt++
	})
	want := 10
	p := 10 * time.Hour
	r := StartTask(fn, p, 3*p, "testValue")
	triggerDone := make(chan struct{})
	go func() {
		for i := 0; i < want+1; i++ {
			r.TriggerRun()
		}
		close(triggerDone)
		return
	}()
	<-triggerDone
	assert.GreaterOrEqual(t, cnt, want, "Must run want times within 2 period time")
}

still requires locking on cnt though

Using sleep is more intuitive in this context
This tests:

  1. Start periodic task every p seconds
  2. Open another window where trigger 10 times on the row, after the first run
  3. Wait 2 periods
  4. I observe that task runs at least 10 times

If we have the triggerDone channel, then we could also do not use goroutine


go/lib/periodic/periodic_test.go, line 110 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

instead of busy looping we could also use a channel here.

Done.


go/lib/periodic/periodic_test.go, line 119 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

Also race detector complains about cnt. guard it with a lock.

Done.


go/lib/periodic/metrics/metrics.go, line 27 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

space after // for all comments

Done.


go/lib/periodic/metrics/metrics.go, line 38 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

Why not just make the function itself public? How is this pattern better?

This is what allow to mock the metric interface in _test files and control that the function (e.g. StartTimestamp) is executed at least once.


go/lib/periodic/metrics/metrics.go, line 41 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

hm what's this export about? Wouldn't RunnerMetric or PeriodicMetric by a better name?

In prometheus there is usually the exporter/export in documentation (e.g. https://prometheus.io/docs/instrumenting/exporters/)
The idea is we "export the metric starttimestamp" to prometheus. (=task.exportMetric.StartTimestamp())
Not strong preference.


go/lib/periodic/metrics/metrics_test.go, line 31 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

I personally don't see the gain of using a loop here. it's more lines, and the check call is very simple usually.

Done.


go/lib/periodic/metrics/metrics_test.go, line 38 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

Not sure I like testing of the internals. Wouldn't it be enough if we just tested that doing NewMetric with the same name twice works?

I refactored a bit.
I add this line mainly because the counters is global status, and could potential another test in parallel (in the future) make the test green.


go/lib/periodic/metrics/metrics_test.go, line 65 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

Cool I like this. This will be amazing

Happy to hear that.

Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 25 of 25 files at r2.
Reviewable status: all files reviewed, 22 unresolved discussions (waiting on @karampok and @lukedirtwalker)


go/beacon_srv/internal/beaconing/originator.go, line 69 at r2 (raw file):

// Name returns the tasks name.
func (o *Originator) Name() string {
	return "beaconing_originator"

hm some metrics use bs_xxx and others don't have bs_... prefix.


go/beacon_srv/internal/beaconing/propagator.go, line 85 at r2 (raw file):

// Name returns the tasks name.
func (p *Propagator) Name() string {
	return "beaconing_propagator"

bs_


go/beacon_srv/internal/ifstate/revoker.go, line 79 at r2 (raw file):

// Name returns the tasks name.
func (r *Revoker) Name() string {
	return "ifstate_revoker"

bs_


go/beacon_srv/internal/keepalive/sender.go, line 45 at r2 (raw file):

// Name returns the tasks name.
func (s *Sender) Name() string {
	return "keepalive_sender"

bs_


go/lib/infra/modules/cleaner/cleaner.go, line 91 at r2 (raw file):

	}
	m.registered[subsystem] = &metric{
		resultsTotal: *prom.NewCounterVec(subsystem, MetricsNamespace, "results_total",

it's a bit confusing that subsystem is now used as namespace and namespace as subsystem. Please rename.


go/lib/periodic/periodic.go, line 65 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

overload term, i call it metricExporter

How is it overloaded? I mean this is just scoped to the struct so it shouldn't be a problem with the import.


go/lib/periodic/periodic.go, line 103 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

if the r==nil, the code before that it would panic, right? you cannot call a method on nil

No you can: https://play.golang.org/p/BT8lhlR39B8
Removing this will cause panics


go/lib/periodic/periodic_test.go, line 51 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

sure, but I increase my confidence that it cannot be flaky.

Sure, sorry the comment actually is on the wrong line. I meant we don't need to do Sleep(11) and cnt >= 10 we could instead do Sleep(4) and cnt >= 3


go/lib/periodic/periodic_test.go, line 100 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

Using sleep is more intuitive in this context
This tests:

  1. Start periodic task every p seconds
  2. Open another window where trigger 10 times on the row, after the first run
  3. Wait 2 periods
  4. I observe that task runs at least 10 times

If we have the triggerDone channel, then we could also do not use goroutine

  1. is already tested by another test. But anyway. We can also leave it like this, I just generally dislike sleeps in tests.

go/lib/periodic/periodic_test.go, line 91 at r2 (raw file):

		<-cnt // discard the first normal periodic run
		r.Kill()
		return

return is not needed.


go/lib/periodic/periodic_test.go, line 93 at r2 (raw file):

		return
	}()
	time.Sleep(5 * p)

Actually Kill blocks until the periodic loop is done. So we can leverage this.
Use killed := make(chan struct{}) and in the go routine above instead of return put close(killed) and here <-killed

Then we don't need sleep anymore -> faster test.


go/lib/periodic/metrics/metrics.go, line 38 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

This is what allow to mock the metric interface in _test files and control that the function (e.g. StartTimestamp) is executed at least once.

Uh okay. I guess fine then.


go/lib/periodic/metrics/metrics.go, line 81 at r2 (raw file):

func (e exporter) StartTimestamp(t time.Time) {
	e.timestamp.Set(float64(t.UnixNano() / 1e9))

use SetToCurrentTime() and don't take an arg in this method.


go/lib/periodic/metrics/metrics.go, line 85 at r2 (raw file):

func (e exporter) Period(d time.Duration) {
	e.period.Set(float64(d) / 1e9)

uhm why not d.Seconds() ?


go/lib/periodic/metrics/metrics_test.go, line 35 at r2 (raw file):

func TestNewMetric(t *testing.T) {
	t.Run("Returns valid exporter", func(t *testing.T) {
		t.Parallel()

why is this?


go/lib/periodic/metrics/metrics_test.go, line 43 at r2 (raw file):

		v, _ := counters[sn]
		assert.NotNil(t, v.period)

this still tests internal details.


go/lib/periodic/metrics/metrics_test.go, line 49 at r2 (raw file):

	})

	t.Run("Same name does not panic", func(t *testing.T) {

Is that a usual pattern? To me it feels a bit weird that this depends on the state of the previous subtest.


go/path_srv/main.go, line 230 at r2 (raw file):

		}
	}
	t.pathDBCleaner = periodic.Start(pathdb.NewCleaner(t.args.PathDB),

pathdb.NewCleaner should also take a prefix like the revcache constructor.


go/path_srv/internal/cryptosyncer/cryptosyncer.go, line 42 at r2 (raw file):

func (c *Syncer) Name() string {
	return "cryptosyncer_syncer"

ps_


go/path_srv/internal/segsyncer/segsyncer.go, line 88 at r2 (raw file):

func (s *SegSyncer) Name() string {
	return fmt.Sprintf("segsyncer_segSyncer_%s", s.dstIA)

ps_

@karampok karampok force-pushed the pub-add-periodic-metrics branch from fec6cf6 to c69f362 Compare October 11, 2019 13:21
Copy link
Contributor Author

@karampok karampok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 14 of 30 files reviewed, 22 unresolved discussions (waiting on @lukedirtwalker)


go/beacon_srv/internal/beaconing/originator.go, line 69 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

hm some metrics use bs_xxx and others don't have bs_... prefix.

Done.


go/beacon_srv/internal/beaconing/propagator.go, line 85 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

bs_

Done.


go/beacon_srv/internal/ifstate/revoker.go, line 79 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

bs_

Done.


go/beacon_srv/internal/keepalive/sender.go, line 45 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

bs_

Done.


go/lib/infra/modules/cleaner/cleaner.go, line 91 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

it's a bit confusing that subsystem is now used as namespace and namespace as subsystem. Please rename.

Done.


go/lib/periodic/periodic.go, line 65 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

How is it overloaded? I mean this is just scoped to the struct so it shouldn't be a problem with the import.

Using metrics in general, it compiles of course.
Anyway, I changed it to metric.


go/lib/periodic/periodic.go, line 103 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

No you can: https://play.golang.org/p/BT8lhlR39B8
Removing this will cause panics

reverted and interesting, I can not really see the use case.
My questions

  1. what is the critiria to have this check?
  2. Should we put it to all methods that have a pointer receiver? e.g. the Stop some lines above

go/lib/periodic/periodic_test.go, line 51 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

Sure, sorry the comment actually is on the wrong line. I meant we don't need to do Sleep(11) and cnt >= 10 we could instead do Sleep(4) and cnt >= 3

done


go/lib/periodic/periodic_test.go, line 100 at r1 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…
  1. is already tested by another test. But anyway. We can also leave it like this, I just generally dislike sleeps in tests.

Okay, even if it feels intuitive having sleeps in testing a periodic task, I can agree that having sleeps is not good idea. I refactored again and I removed all sleeps apart (except one).
I have to admit that removing the ticker might not have been the best idea.
If you think it is more complicated now, we can revert.


go/lib/periodic/periodic_test.go, line 91 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

return is not needed.

Done.


go/lib/periodic/periodic_test.go, line 93 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

Actually Kill blocks until the periodic loop is done. So we can leverage this.
Use killed := make(chan struct{}) and in the go routine above instead of return put close(killed) and here <-killed

Then we don't need sleep anymore -> faster test.

Done.


go/lib/periodic/metrics/metrics.go, line 81 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

use SetToCurrentTime() and don't take an arg in this method.

If I do not take args, then i cannot test-driven it.
Any idea if we can testdriven ?


go/lib/periodic/metrics/metrics.go, line 85 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

uhm why not d.Seconds() ?

Done.


go/lib/periodic/metrics/metrics_test.go, line 35 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

why is this?

testing , removed


go/lib/periodic/metrics/metrics_test.go, line 43 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

this still tests internal details.

Done.


go/lib/periodic/metrics/metrics_test.go, line 49 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

Is that a usual pattern? To me it feels a bit weird that this depends on the state of the previous subtest.

It does not depend on the previous test or anything. Do I miss something?


go/path_srv/main.go, line 230 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

pathdb.NewCleaner should also take a prefix like the revcache constructor.

Done.


go/path_srv/internal/cryptosyncer/cryptosyncer.go, line 42 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

ps_

Done.


go/path_srv/internal/segsyncer/segsyncer.go, line 88 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

ps_

Done.

Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 16 of 17 files at r3.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @karampok and @lukedirtwalker)


go/lib/infra/modules/cleaner/cleaner.go, line 30 at r3 (raw file):

namespace

subsystem


go/lib/periodic/periodic_test.go, line 100 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

Okay, even if it feels intuitive having sleeps in testing a periodic task, I can agree that having sleeps is not good idea. I refactored again and I removed all sleeps apart (except one).
I have to admit that removing the ticker might not have been the best idea.
If you think it is more complicated now, we can revert.

It's fine like this. The API is nicer now.


go/lib/periodic/periodic_test.go, line 86 at r3 (raw file):

 timeoute

timeout without e

Also this might be confusing. It could be read as until the first run with timeout but it is blocks until first run, or a timeout


go/lib/periodic/periodic_test.go, line 161 at r3 (raw file):

// introduce flakeness. To be removed unless we think something
//
//func TestMetrics(t *testing.T) {

Before merging we should remove this.


go/lib/periodic/metrics/metrics.go, line 81 at r2 (raw file):

Previously, karampok (Konstantinos) wrote…

If I do not take args, then i cannot test-driven it.
Any idea if we can testdriven ?

well time is always hard to test. on the other hand this is so simple code not sure if it needs a test. You can leave it as is or change whatever you prefer.


go/lib/periodic/metrics/metrics_test.go, line 49 at r2 (raw file):

Previously, karampok (Konstantinos) wrote…

It does not depend on the previous test or anything. Do I miss something?

Ah sorry nevermind me. I thought this would register the same name as above and test that it doesn't panic, I didn't read correctly.


go/sciond/main.go, line 163 at r3 (raw file):

sciond

sd

@karampok karampok force-pushed the pub-add-periodic-metrics branch from c69f362 to 8b963a7 Compare October 11, 2019 14:32
Copy link
Contributor Author

@karampok karampok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 26 of 30 files reviewed, 6 unresolved discussions (waiting on @lukedirtwalker)


go/lib/infra/modules/cleaner/cleaner.go, line 30 at r3 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…
namespace

subsystem

Done.


go/lib/periodic/periodic_test.go, line 86 at r3 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…
 timeoute

timeout without e

Also this might be confusing. It could be read as until the first run with timeout but it is blocks until first run, or a timeout

i prefer to remove it completely if slightly confusing


go/lib/periodic/periodic_test.go, line 161 at r3 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

Before merging we should remove this.

Done.


go/sciond/main.go, line 163 at r3 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…
sciond

sd

Done.

Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r4.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @lukedirtwalker)

@karampok karampok force-pushed the pub-add-periodic-metrics branch 2 times, most recently from 022842e to 31e7da8 Compare October 14, 2019 12:36
Copy link
Contributor Author

@karampok karampok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 24 of 31 files reviewed, 2 unresolved discussions (waiting on @lukedirtwalker)


go/lib/infra/modules/idiscovery/idiscovery.go, line 258 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

that is hard, not done yet

Done.


go/lib/infra/modules/itopo/cleaner.go, line 30 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

this is only being used by idiscovery, probably can be solved together

Done.

Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 7 of 7 files at r5.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@karampok karampok force-pushed the pub-add-periodic-metrics branch from 31e7da8 to e7e6911 Compare October 15, 2019 10:51
Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r6.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

Adds for every periodic task
```
bs_beacon_cleaner_periodic_period_duration_seconds 30
bs_beacon_cleaner_periodic_runtime_duration_seconds_total 0.028704568000000007
bs_beacon_cleaner_periodic_runtime_timestamp_seconds 1.570616739e+09
```

Other:
* modifies the periodic.StartTask to accept interval not ticker.
* periodic metrics get the .Name() from the task
* unit-tests for the metric file

Fixes scionproto#3121
@karampok karampok force-pushed the pub-add-periodic-metrics branch from e7e6911 to 710db9a Compare October 17, 2019 07:29
Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r7.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@karampok karampok merged commit 5ed6cba into scionproto:master Oct 17, 2019
@karampok karampok deleted the pub-add-periodic-metrics branch October 17, 2019 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lib/periodic task metrics
2 participants