Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 32: Garbage collect completed files #29

Merged

Conversation

dada-dell-emc
Copy link
Contributor

@dada-dell-emc dada-dell-emc commented Dec 11, 2023

Made changes to consider only those file which are last modified by more than minTimeInMillisToUpdateFile milliseconds, this reduce risk of reading partially updated files in ingestion.
Added garbage collection of completed files by moving it into completed_files directory, parallel to database file or failed_files. In this change completed files garbage collection is not scheduled, it will execute if DELETE_COMPLETED_FILES configuration set to true.
Scheduling of completed files deletion and unit tests will be added in next checkin.

Fixes #32

…ed_files directory parallel to database file or failed_files. In this commit completed files garbage collection is not scheduled, it will execute if DELETE_COMPLETED_FILES configuration set to true.

Made changes to consider only those file which are last modified by more than minTimeInMillisToUpdateFile, this reduce risk of reading partially updated files in ingestion.
…-emc/pravega-sensor-collector into garbage-completed-files

# Conflicts:
#	pravega-sensor-collector/src/main/java/io/pravega/sensor/collector/file/FileProcessor.java
#	pravega-sensor-collector/src/main/java/io/pravega/sensor/collector/util/FileUtils.java
#	pravega-sensor-collector/src/test/java/io/pravega/sensor/collector/file/FileProcessorTests.java
#	pravega-sensor-collector/src/test/java/io/pravega/sensor/collector/file/parquet/ParquetEventGeneratorTests.java
Copy link
Contributor

@ShwethaSNayak ShwethaSNayak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added few review comments.

Copy link
Contributor

@sachin-j-joshi sachin-j-joshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unblocking

Please ask @abhinb to merge once done

PRAVEGA_SENSOR_COLLECTOR_RAW1_DELETE_COMPLETED_FILES=false
PRAVEGA_SENSOR_COLLECTOR_RAW1_TRANSACTION_TIMEOUT_MINUTES=2.0
PRAVEGA_SENSOR_COLLECTOR_RAW1_CREATE_SCOPE=false
PRAVEGA_SENSOR_COLLECTOR_RAW1_MIN_TIME_IN_MILLIS_TO_UPDATE_FILE=5000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this param to other file properties

Files.move(Paths.get(fileEntry.fileName), failedFile, StandardCopyOption.REPLACE_EXISTING);
log.info("moveFailedFiles: Moved file to {}", failedFile);
Files.move(sourcePath, targetPath, StandardCopyOption.REPLACE_EXISTING);
log.info("movedFile: Moved file from {} to {}", sourcePath, targetPath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change log level to debug

import java.nio.file.*;
import java.sql.Connection;
import java.sql.SQLException;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove .* and import only the required classes?

} else {
try {
FileUtils.moveCompletedFile(dirFile, movedFilesDirectory);
log.warn("File: {} already marked as completed, moving now", dirFile.fileName);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need warn here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, This file movement is taken care as it was not succeeded at first time.

Copy link
Contributor

@ShwethaSNayak ShwethaSNayak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Minor comments which will be taken care as part of Unit test PR

@shshashwat
Copy link

LGTM, Minor comments which will be taken care as part of Unit test PR

based on this, I am merging this

@shshashwat shshashwat merged commit d1e25e4 into pravega:stability-improvements Dec 12, 2023
1 check passed
@shshashwat shshashwat changed the title Garbage collect completed files Issue 32: Garbage collect completed files Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants