-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for .tgz files in GeoIpDownloader #70725
Conversation
Pinging @elastic/es-core-features (Team:Core/Features) |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Left a few questions.
|
||
import java.util.Objects; | ||
|
||
public class TarEntry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should use a library that provides tar ball support? Like plexus-archiver or Apache commons-compress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it but both libraries adds quite big footprint (common-cmpress over 600kB, plexus over 200kB) just for this one use case.
I think tar is simple enough (at least for basic cases but I don't expect anything crazy from infra) to parse it by ourselves. That said I can reconsider if you are really against creating our own solution here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that plexus-archiver
actually depends on common-compress
. I agree it is a lot of additional bytes for just for extracting a tarball and the code itself isn't complex. On the hand, about half of this PR is about adding support for extracting a tarball and comparing the size of all our other dependencies (all jdk libraries and the jdk) I think about 600kb for common-compress is just a small part our total distribution.
I did find a lightweight library called jtar
that adds support for just tar files, but that project doesn't seem very active and maintained by basically a single developer.
I think I'm ok with maintaining this tar code ourselves, but it should be moved to ingest-geoipv2 module and explained that it is not a general tar library and meant to be used for extracting the tar files provided by the infra service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it to geoip module and made it simpler by skipping everything that is not needed for our use case.
modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpTaskState.java
Outdated
Show resolved
Hide resolved
...est-geoip/src/internalClusterTest/java/org/elasticsearch/ingest/geoip/GeoIpDownloaderIT.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* {@link InputStream} with very basic support for tar format, just enough to parse archives provided by GeoIP database service from Infra. | ||
* This class is not suitable for general purpose tar processing! | ||
*/ | ||
class TarInputStream extends FilterInputStream { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/DatabaseRegistry.java
Show resolved
Hide resolved
if (name.startsWith(databaseName)) { | ||
Files.copy(is, databaseTmpFile, StandardCopyOption.REPLACE_EXISTING); | ||
} else { | ||
Files.copy(is, geoipTmpDirectory.resolve(databaseName + "_" + name), StandardCopyOption.REPLACE_EXISTING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removeStaleEntries()
method doesn't remove these files, but I think that is ok.
These are small files and each update overwrites these files, during node startup the
geoip tmp dir is purged and the removeStaleEntries()
is only invoked if at some
point in time we stop distributing databases by default or custom dbs are shipped
by a third party database webserver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was exactly my thinking, I think we can leave them behind as they are small and doesn't interfere with geoip processor itself.
@elasticmachine run elasticsearch-ci/1 |
We have to ship COPYRIGHT.txt and LICENSE.txt files alongside .mmdb files for legal compliance. Infra will pack these in single .tgz (gzipped tar) archive provided by GeoIP databases service. This change adds support for that format to GeoIpDownloader and DatabaseRegistry # Conflicts: # modules/ingest-geoip/src/internalClusterTest/java/org/elasticsearch/ingest/geoip/GeoIpDownloaderIT.java # modules/ingest-geoip/src/test/java/org/elasticsearch/ingest/geoip/GeoIpDownloaderTests.java
* Add support for .tgz files in GeoIpDownloader (#70725) We have to ship COPYRIGHT.txt and LICENSE.txt files alongside .mmdb files for legal compliance. Infra will pack these in single .tgz (gzipped tar) archive provided by GeoIP databases service. This change adds support for that format to GeoIpDownloader and DatabaseRegistry
We have to ship
COPYRIGHT.txt
andLICENSE.txt
files alongside.mmdb
files for legal compliance. Infra will pack these in single.tgz
(gzipped tar) archive provided by GeoIP databases service.This change adds support for that format to
GeoIpDownloader
andDatabaseRegistry