-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify tsdb track to allow ingesting into a data stream #275
Merged
martijnvg
merged 22 commits into
elastic:master
from
martijnvg:add_data_stream_tsdb_track
Jun 29, 2022
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
d05ed9d
wip
martijnvg 720e4b6
fixed mistake
martijnvg 63dc47c
added pipeline
martijnvg 139318f
default pipeline
martijnvg 56de329
fixed typo
martijnvg 0b8b6af
track now works with --test-mode
martijnvg c2b4e5a
Use ilm policy and no manual force merge
martijnvg b127de7
experiment with pipeline
martijnvg bfddf93
tweaked look ahead time
martijnvg 33ff93d
fix mistake in setting name
martijnvg 315b3bc
iter
martijnvg bb8f790
iter
martijnvg b44ab85
remove ilm/rollover aspect from benchmark track
martijnvg 88cbd35
update lookahead time
martijnvg a1fc96d
undo mistake
martijnvg ad84077
update track to use deduped data set
martijnvg 3906fb2
added dedupe script
martijnvg 9c5077e
add ingest_mode track param and inlined tsbd data stream track into e…
martijnvg c0e211e
remove dedupe function
martijnvg fc67579
added docs
martijnvg 83dc567
fix readme
martijnvg ef538a7
reuse generate_node_key(...) function
martijnvg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
#!/usr/bin/env python3 | ||
|
||
#################################################################### | ||
# | ||
# A tool that dedupes a sorted anonymized metricbeat dump. | ||
# | ||
#################################################################### | ||
# | ||
# Expects sorted anonymized metricbeat dump as input via standard | ||
# in and returns a deduped sorted anonymized metric beat output via | ||
# standard out. Also seperately generates 'dupes-' prefixed files | ||
# per metric set name containing the dupes for manual inspection. | ||
# | ||
#################################################################### | ||
|
||
import json | ||
import sys | ||
|
||
def generate_event_key(parsed_line): | ||
return parsed_line['kubernetes']['event']['involved_object']['uid'] | ||
|
||
def generate_state_container_key(parsed_line): | ||
key = parsed_line['kubernetes']['container']['name'] | ||
key += parsed_line['kubernetes']['pod']['name'] | ||
key += parsed_line['kubernetes']['node']['name'] | ||
container_id = parsed_line.get('kubernetes',{}).get('container', {}).get('id') | ||
if (container_id is not None): | ||
key += container_id | ||
return key | ||
|
||
def generate_state_pod_key(parsed_line): | ||
return parsed_line['kubernetes']['pod']['name'] + generate_node_key(parsed_line) | ||
|
||
def generate_container_key(parsed_line): | ||
return parsed_line['kubernetes']['container']['name'] + parsed_line['kubernetes']['pod']['name'] + generate_node_key(parsed_line) | ||
|
||
def generate_volume_key(parsed_line): | ||
return parsed_line['kubernetes']['volume']['name'] + parsed_line['kubernetes']['pod']['name'] + generate_node_key(parsed_line) | ||
|
||
def generate_pod_key(parsed_line): | ||
return parsed_line['kubernetes']['pod']['name'] + generate_node_key(parsed_line) | ||
|
||
def generate_node_key(parsed_line): | ||
return parsed_line['kubernetes']['node']['name'] | ||
|
||
def generate_system_key(parsed_line): | ||
return generate_node_key(parsed_line) + parsed_line['kubernetes']['system']['container'] | ||
|
||
def generate_state_node_key(parsed_line): | ||
return generate_node_key(parsed_line) | ||
|
||
generate_key_functions = { | ||
'event': generate_event_key, | ||
'state_container': generate_state_container_key, | ||
'state_pod': generate_state_pod_key, | ||
'container': generate_container_key, | ||
'volume': generate_volume_key, | ||
'pod': generate_pod_key, | ||
'node': generate_node_key, | ||
'system': generate_system_key, | ||
'state_node': generate_state_node_key | ||
} | ||
|
||
in_count = 0 | ||
error_count = 0 | ||
out_count = 0 | ||
current_timestamp = None | ||
keys = set() | ||
|
||
dupe_files = {} | ||
|
||
with open('error_lines.json', 'a') as error_file: | ||
for line in sys.stdin: | ||
in_count += 1 | ||
try: | ||
parsed = json.loads(line) | ||
line_timestamp = parsed['@timestamp'] | ||
metric_set_name = parsed['metricset']['name'] | ||
if parsed.get('error') is not None: | ||
error_count += 1 | ||
print(line, file=error_file) | ||
continue | ||
|
||
generate_key_function = generate_key_functions[metric_set_name] | ||
key = metric_set_name + generate_key_function(parsed) | ||
if (current_timestamp == line_timestamp): | ||
if key in keys: | ||
dupe_file_name = f"dupes-{metric_set_name}.json" | ||
dupe_file = dupe_files.get(dupe_file_name) | ||
if dupe_file is None: | ||
dupe_file = open(dupe_file_name, 'a') | ||
dupe_files[dupe_file_name] = dupe_file | ||
|
||
print(line, file=dupe_file) | ||
continue | ||
else: | ||
keys.add(key) | ||
else: | ||
current_timestamp = line_timestamp | ||
keys = set() | ||
keys.add(key) | ||
|
||
print(line, end='') | ||
out_count += 1 | ||
if out_count % 100000 == 0: | ||
print(f"in {in_count:012d} docs, out {out_count:012d} docs, errors {error_count:012d}", file=sys.stderr) | ||
except Exception as e: | ||
raise Exception(f"Error processing {line}") from e | ||
|
||
for dupe_file in dupe_files: | ||
dupe_files[dupe_file].close() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor thing, but this function and
generate_state_node_key
are identical. There's also several places where you could replaceparsed_line['kubernetes']['node']['name']
withgenerate_node_key(parsed_line)
just to cut down on duplication if you'd like. Not a big deal, though.