We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We want to be able to send commands to FSCrawler which could fetch a file from any provider like the local FS where FSCrawler is running or S3...
FSCrawler supports the following services:
local
http
s3
To upload a binary from a 3rd party service, you can call POST /_document endpoint and pass a JSON document which describes the service settings:
POST /_document
curl -XPOST http://127.0.0.1:8080/fscrawler/_document -H 'Content-Type: application/json' -d '{ "type": "<TYPE>", "<TYPE>": { // Settings for the <TYPE> } }'
The local plugin reads a file from the server where FSCrawler is running (a local file). It needs the following parameter:
url
For example, we can read the file bar.txt from the /path/to/foo directory with:
bar.txt
/path/to/foo
curl -XPOST http://127.0.0.1:8080/fscrawler/_document -H 'Content-Type: application/json' -d '{ "type": "local", "local": { "url": "/path/to/foo/bar.txt" } }'
The http plugin reads a file from a given URL. It needs the following parameter:
For example, we can read the file robots.txt from the https://www.elastic.co/ website with:
robots.txt
https://www.elastic.co/
curl -XPOST http://127.0.0.1:8080/fscrawler/_document -H 'Content-Type: application/json' -d '{ "type": "http", "http": { "url": "https://www.elastic.co/robots.txt" } }'
The s3 plugin reads a file from an S3 compatible service. It needs the following parameters:
bucket
object
access_key
secret_key
For example, we can read the file foo.txt from the bucket foo running on https://s3.amazonaws.com:
foo.txt
foo
https://s3.amazonaws.com
curl -XPOST http://127.0.0.1:8080/fscrawler/_document -H 'Content-Type: application/json' -d '{ "type": "s3", "s3": { "url": "https://s3.amazonaws.com", "bucket": "foo", "object": "foo.txt", "access_key": "ACCESS", "secret_key": "SECRET" } }'
If you are using Minio, you can use:
curl -XPOST http://127.0.0.1:8080/fscrawler/_document -H 'Content-Type: application/json' -d '{ "type": "s3", "s3": { "url": "http://localhost:9000", "bucket": "foo", "object": "foo.txt", "access_key": "minioadmin", "secret_key": "minioadmin" } }'
The text was updated successfully, but these errors were encountered:
This could also solve may be #805
Sorry, something went wrong.
#1897 is proposing one implementation for http url.
dadoonet
Successfully merging a pull request may close this issue.
We want to be able to send commands to FSCrawler which could fetch a file from any provider like the local FS where FSCrawler is running or S3...
FSCrawler supports the following services:
local
: reads a file from the server where FSCrawler is running (a local file)http
: reads a file from a URLs3
: reads a file from an S3 compatible serviceTo upload a binary from a 3rd party service, you can call
POST /_document
endpoint and passa JSON document which describes the service settings:
Local plugin
The
local
plugin reads a file from the server where FSCrawler is running (a local file).It needs the following parameter:
url
: link to the local fileFor example, we can read the file
bar.txt
from the/path/to/foo
directory with:HTTP plugin
The
http
plugin reads a file from a given URL.It needs the following parameter:
url
: link to the fileFor example, we can read the file
robots.txt
from thehttps://www.elastic.co/
website with:S3 plugin
The
s3
plugin reads a file from an S3 compatible service.It needs the following parameters:
url
: url for the S3 Servicebucket
: bucket nameobject
: object to read from the bucketaccess_key
: access key (or login)secret_key
: secret key (or password)For example, we can read the file
foo.txt
from the bucketfoo
running onhttps://s3.amazonaws.com
:If you are using Minio, you can use:
The text was updated successfully, but these errors were encountered: