Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1187] feat(lakehouse-iceberg): the basic framework of storing Iceberg metrics #1164

Merged
merged 6 commits into from
Dec 28, 2023

Conversation

FANNG1
Copy link
Contributor

@FANNG1 FANNG1 commented Dec 14, 2023

What changes were proposed in this pull request?

  1. introduce MetricsManager to manage storages
  2. introduce IcebergMetricsFormatter to format metrics, support JSON for now.

Why are the changes needed?

Fix: #1187

Does this PR introduce any user-facing change?

no

How was this patch tested?

  1. local env

@FANNG1 FANNG1 marked this pull request as draft December 14, 2023 13:41
@jerryshao
Copy link
Contributor

What's the purpose of writing metrics to FS, not db?

@FANNG1
Copy link
Contributor Author

FANNG1 commented Dec 15, 2023

What's the purpose of writing metrics to FS, not db?

Analyzing the metrics with Spark or Python is easy if writing to FS.

@FANNG1 FANNG1 changed the title [#453] feat(lakehouse-iceberg): store Iceberg metrics in file storage [#453] feat(lakehouse-iceberg): the basic framework of storing Iceberg metrics Dec 18, 2023
@FANNG1 FANNG1 changed the title [#453] feat(lakehouse-iceberg): the basic framework of storing Iceberg metrics [#1187] feat(lakehouse-iceberg): the basic framework of storing Iceberg metrics Dec 18, 2023
@FANNG1
Copy link
Contributor Author

FANNG1 commented Dec 18, 2023

remove the specified storage implement, leaving the basic framework.

@FANNG1 FANNG1 marked this pull request as ready for review December 18, 2023 12:03
@FANNG1
Copy link
Contributor Author

FANNG1 commented Dec 19, 2023

@Clearvive @jerryshao , could you help to review this pr?


public static final ConfigEntry<Integer> ICEBERG_METRICS_STORAGE_RETAIN_DAYS =
new ConfigBuilder(IcebergMetricsManager.ICEBERG_METRICS_STORAGE_RETAIN_DAYS)
.doc("The retain days of Iceberg metrics")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it reasonable to keep the default setting based on the day dimension, and would it be more reasonable to configure it based on the hour dimension

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep metrics in days seems enough

@qqqttt123
Copy link
Contributor

Do we need a metrics storage for the Gravitino server?

@FANNG1
Copy link
Contributor Author

FANNG1 commented Dec 20, 2023

Do we need a metrics storage for the Gravitino server?

no, it's used to store Iceberg query or commit metrics not server metrics.

@FANNG1
Copy link
Contributor Author

FANNG1 commented Dec 20, 2023

@Clearvive @yuqi1129 , are comments addressed, please review again. thx.

@Clearvive
Copy link
Contributor

LGTM except the few comments

mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
mapper.setPropertyNamingStrategy(new PropertyNamingStrategies.KebabCaseStrategy());
RESTSerializers.registerAll(mapper);
return mapper;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of change here, why do you need a new ObjectMapper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previouse implement(JsonUtils.objectMapper()) get a global ObjectMapper and applies Iceberg REST properties to it, this's not a good design. it may effect other ObjectMappers.

LocalDateTime currentDateTime =
LocalDateTime.ofInstant(currentTimestamp, ZoneId.systemDefault());
LocalDateTime nextHourStartTime = currentDateTime.plusHours(1).withMinute(0).withSecond(0);
return nextHourStartTime.atZone(ZoneId.systemDefault()).toInstant();
Copy link
Contributor

@jerryshao jerryshao Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of using LocalDateTime, it is quite simple to calculate the time hour's timestamp by using mod, like (curr_time + 3600000) - (curr_time + 3600000) % 3600000.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalDateTime seems more maintainable.

this.icebergObjectMapper = IcebergObjectMapper.getInstance();
}

public String toJsonOrToString(MetricsReport metricsReport) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method name is weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about toPrintableString()?

}

private void logMetrics(String message, MetricsReport metricsReport) {
LOG.info("{} {}", message, icebergMetricsFormatter.toJsonOrToString(metricsReport));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this IcebergMetricsFormatter is only used here, Is it necessary to have a class to define this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IcebergMetricsFormatter is designed to provide variable format for other storages like KVStorage or SQLStorage

@FANNG1
Copy link
Contributor Author

FANNG1 commented Dec 21, 2023

@jerryshao , all comments are addressed, please help to review again.

@FANNG1
Copy link
Contributor Author

FANNG1 commented Dec 25, 2023

planing to do:

  1. remove async mode configuration
  2. add queue capacity configutation
  3. use service loader to load specified metrics storage.

@FANNG1
Copy link
Contributor Author

FANNG1 commented Dec 26, 2023

planing to do:

  1. remove async mode configuration
  2. add queue capacity configutation
  3. use service loader to load specified metrics storage.

supported in the lastest pr , cc @jerryshao

@jerryshao
Copy link
Contributor

I think there's no need to use service loader to make it pluggable, using reflection should be enough.

@FANNG1
Copy link
Contributor Author

FANNG1 commented Dec 26, 2023

I think there's no need to use service loader to make it pluggable, using reflection should be enough.

replace service loader with reflection.

public String toJson(MetricsReport metricsReport) throws JsonProcessingException {
return icebergObjectMapper.writeValueAsString(metricsReport);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this class is only used for print logs, I don't think it is necessary to have a class, some helper methods should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use this class to serialize metrics report to json string in KVMetricsStorage in next pr. should I keep it as a class?

}
}

public void save(MetricsReport metricsReport) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This save method is quite confused with the method above, you'd better choose a better name.

Also you should put all the public method together, then all all private method.

metricsWriterThread.join();
} catch (InterruptedException e) {
LOG.warn("Iceberg metrics manager is interrupted while join metrics writer thread.");
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not return directly here, you still need to execute the follow-up logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If IcebergMetricsManager is interrupted, I think it's better to quit as quickly as possible in case the other cleanup actions block the close.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least you should close the iceberg metrics store, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jerryshao jerryshao merged commit eff38b8 into apache:main Dec 28, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] the basic framework of storing Iceberg metrics
6 participants