Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Dimensional metric reporting #376

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions lib/new_relic.ex
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,17 @@ defmodule NewRelic do
defdelegate report_custom_event(type, attributes),
to: NewRelic.Harvest.Collector.CustomEvent.Harvester

@doc """
Report a Dimensional Metric.
Valid types: `:count`, `:gauge`, and `:summary`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should add some docs (or a link) on when to use each of the types.


```elixir
NewRelic.report_dimensional_metric(:count, "my_metric_name", 1, %{some: "attributes"})
```
"""
defdelegate report_dimensional_metric(type, name, value, attributes \\ %{}),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there should be other functions that call this, as well. Similar to how incrementing a timeslice metric works.

to: NewRelic.Harvest.TelemetrySdk.DimensionalMetrics.Harvester

@doc """
Report a Custom metric.

Expand Down
8 changes: 8 additions & 0 deletions lib/new_relic/harvest/telemetry_sdk/api.ex
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,14 @@ defmodule NewRelic.Harvest.TelemetrySdk.API do
|> maybe_retry(url, payload)
end

def dimensional_metric(metrics) do
url = url(:metric)
payload = {:metrics, metrics, generate_request_id()}

request(url, payload)
|> maybe_retry(url, payload)
end

def request(url, payload) do
post(url, payload)
end
Expand Down
10 changes: 8 additions & 2 deletions lib/new_relic/harvest/telemetry_sdk/config.ex
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ defmodule NewRelic.Harvest.TelemetrySdk.Config do

@default %{
logs_harvest_cycle: 5_000,
spans_harvest_cycle: 5_000
spans_harvest_cycle: 5_000,
dimensional_metrics_harvest_cycle: 5_000
}
def lookup(key) do
Application.get_env(:new_relic_agent, key, @default[key])
Expand All @@ -18,7 +19,8 @@ defmodule NewRelic.Harvest.TelemetrySdk.Config do

%{
log: "https://#{env}log-api.#{region}newrelic.com/log/v1",
trace: trace_domain(env, region)
trace: trace_domain(env, region),
metric: metric_domain(env, region)
}
end

Expand All @@ -34,4 +36,8 @@ defmodule NewRelic.Harvest.TelemetrySdk.Config do
defp trace_domain(_env, _region, infinite_tracing_host) do
"https://#{infinite_tracing_host}/trace/v1"
end

defp metric_domain(env, region) do
"https://#{env}metric-api.#{region}newrelic.com/metric/v1"
end
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
defmodule NewRelic.Harvest.TelemetrySdk.DimensionalMetrics.Harvester do
use GenServer

@moduledoc false

alias NewRelic.Harvest
alias NewRelic.Harvest.TelemetrySdk

@interval_ms TelemetrySdk.Config.lookup(:dimensional_metrics_harvest_cycle)

@valid_types [:count, :gauge, :summary]

def start_link(_) do
GenServer.start_link(__MODULE__, [])
end

def init(_) do
{:ok,
%{
start_time_ms: System.system_time(:millisecond),
metrics: []
}}
end

# API

@spec report_dimensional_metric(:count | :gauge | :summary, atom() | binary(), any, map()) ::
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I remove the typespec?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like a pretty detailed spec would be worth it on the function inside NewRelic module, I'd go beyond what's here to help folks understand what the value should be for the various types

:ok
def report_dimensional_metric(type, name, value, attributes) when type in @valid_types do
Copy link
Collaborator

@binaryseed binaryseed Jan 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main risk I'm noticing in this PR is that report_dimensional_metric collects all the metrics and sends every one up, while report_custom_metric will aggregate each metric during the harvest cycle and only sends each one once.

Which behavior do you want this function to have? I suspect most folks would assume the aggregated behavior. For example, if someone calls this function for every single request, we wouldn't want to copy the name & attributes potentially thousands of times a second..

If you did want that behavior, perhaps send_dimensional_metric is more expressive of what will happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I definitely agree that it's not what I would expect.

What are your thoughts on efficiently storing the map of attributes along with the metric name?

Eg I report metric_a with attributes a, b, and c. Then I report metric_a with attributes c, d, and e.

Do I merge the maps? Or does that remain as two metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use something like phash2 on the attributes map and then aggregate matching ones.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's right, the whole of the attributes makes up a unique metric. The other metric harvester has an optimized aggregation implementation you could steal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it would be keyed off of a tuple of the name and the hash of the attributes? The timeslice metric harvester uses the name and the scope, it seems.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

TelemetrySdk.DimensionalMetrics.HarvestCycle
|> Harvest.HarvestCycle.current_harvester()
|> GenServer.cast({:report, %{type: type, name: name, value: value, attributes: attributes}})
end

def gather_harvest,
do:
TelemetrySdk.DimensionalMetrics.HarvestCycle
|> Harvest.HarvestCycle.current_harvester()
|> GenServer.call(:gather_harvest)

# do not accept more report messages when harvest has already been reported
def handle_cast(_late_msg, :completed), do: {:noreply, :completed}

def handle_cast({:report, metric}, state) do
# TODO: merge metrics with the same type/name/attributes?
# Take phash2 of attributes and store them as a map with the
# key being {type, name, phash2(attributes)} => [metrics]
{:noreply, %{state | metrics: [metric | state.metrics]}}
end

# do not resend metrics when harvest has already been reported
def handle_call(_late_msg, _from, :completed), do: {:reply, :completed, :completed}

def handle_call(:send_harvest, _from, state) do
send_harvest(state)
{:reply, :ok, :completed}
end

def handle_call(:gather_harvest, _from, state) do
{:reply, build_dimensional_metric_data(state.metrics, state), state}
end

# Helpers

defp send_harvest(state) do
TelemetrySdk.API.log(build_dimensional_metric_data(state.metrics, state))
log_harvest(length(state.metrics))
end

defp log_harvest(harvest_size) do
NewRelic.log(
:debug,
"Completed TelemetrySdk.DimensionalMetrics harvest - size: #{harvest_size}"
)
end

defp build_dimensional_metric_data(metrics, state) do
[
%{
metrics: metrics,
common: common(state)
}
]
end

defp common(%{start_time_ms: start_time_ms}) do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the start time the common timestamp for any metrics in this harvest cycle. I don't think there is any accuracy below (or even near to) 5 seconds when querying.

%{"timestamp" => start_time_ms, "interval.ms" => @interval_ms}
tpitale marked this conversation as resolved.
Show resolved Hide resolved
end
end
3 changes: 2 additions & 1 deletion lib/new_relic/harvest/telemetry_sdk/supervisor.ex
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ defmodule NewRelic.Harvest.TelemetrySdk.Supervisor do
def init(_) do
children = [
data_supervisor(TelemetrySdk.Logs, :logs_harvest_cycle),
data_supervisor(TelemetrySdk.Spans, :spans_harvest_cycle)
data_supervisor(TelemetrySdk.Spans, :spans_harvest_cycle),
data_supervisor(TelemetrySdk.DimensionalMetrics, :dimensional_metrics_harvest_cycle)
]

Supervisor.init(children, strategy: :one_for_all)
Expand Down
28 changes: 28 additions & 0 deletions test/dimensional_metric_test.exs
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
defmodule DimensionalMetricTest do
use ExUnit.Case

test "reports dimensional metrics" do
TestHelper.restart_harvest_cycle(
NewRelic.Harvest.TelemetrySdk.DimensionalMetrics.HarvestCycle
)

NewRelic.report_dimensional_metric(:count, "memory.foo_baz", 100, %{cpu: 1000})
NewRelic.report_dimensional_metric(:summary, "memory.foo_bar", 50, %{cpu: 2000})

[%{common: common, metrics: metrics}] =
TestHelper.gather_harvest(NewRelic.Harvest.TelemetrySdk.DimensionalMetrics.Harvester)

assert common["interval.ms"] > 0
assert common["timestamp"] > 0

assert length(metrics) == 2
[metric1, metric2] = metrics
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This points out that the metrics are coming in reverse order because that's how they're added. Not sure if it is worth swapping the order, or reversing the metrics at gather time.

assert metric1.name == "memory.foo_bar"
assert metric1.type == :summary

assert metric2.name == "memory.foo_baz"
assert metric2.type == :count

TestHelper.pause_harvest_cycle(NewRelic.Harvest.TelemetrySdk.DimensionalMetrics.HarvestCycle)
end
end
59 changes: 59 additions & 0 deletions test/telemetry_sdk/dimensional_metrics_harvester_test.exs
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
defmodule TelemetrySdk.DimensionalMetricsHarvesterTest do
use ExUnit.Case

alias NewRelic.Harvest
alias NewRelic.Harvest.TelemetrySdk

test "Harvester collects dimensional metrics" do
{:ok, harvester} =
DynamicSupervisor.start_child(
TelemetrySdk.DimensionalMetrics.HarvesterSupervisor,
TelemetrySdk.DimensionalMetrics.Harvester
)

metric1 = %{}
GenServer.cast(harvester, {:report, metric1})

metrics = GenServer.call(harvester, :gather_harvest)
assert length(metrics) > 0
end

test "harvest cycle" do
Application.put_env(:new_relic_agent, :dimensional_metrics_harvest_cycle, 300)
TestHelper.restart_harvest_cycle(TelemetrySdk.DimensionalMetrics.HarvestCycle)

first = Harvest.HarvestCycle.current_harvester(TelemetrySdk.DimensionalMetrics.HarvestCycle)
Process.monitor(first)

# Wait until harvest swap
assert_receive {:DOWN, _ref, _, ^first, :shutdown}, 1000

second = Harvest.HarvestCycle.current_harvester(TelemetrySdk.DimensionalMetrics.HarvestCycle)
Process.monitor(second)

refute first == second
assert Process.alive?(second)

TestHelper.pause_harvest_cycle(TelemetrySdk.DimensionalMetrics.HarvestCycle)
Application.delete_env(:new_relic_agent, :dimensional_metrics_harvest_cycle)

# Ensure the last harvester has shut down
assert_receive {:DOWN, _ref, _, ^second, :shutdown}, 1000
end

test "Ignore late reports" do
TestHelper.restart_harvest_cycle(TelemetrySdk.DimensionalMetrics.HarvestCycle)

harvester =
TelemetrySdk.DimensionalMetrics.HarvestCycle
|> Harvest.HarvestCycle.current_harvester()

assert :ok == GenServer.call(harvester, :send_harvest)

GenServer.cast(harvester, {:report, :late_msg})

assert :completed == GenServer.call(harvester, :send_harvest)

TestHelper.pause_harvest_cycle(TelemetrySdk.DimensionalMetrics.HarvestCycle)
end
end