Read from any FS Provider using the REST Service #1247

dadoonet · 2021-09-06T14:15:54Z

We want to be able to send commands to FSCrawler which could fetch a file from any provider like the local FS where FSCrawler is running or S3...

FSCrawler supports the following services:

local: reads a file from the server where FSCrawler is running (a local file)
http: reads a file from a URL
s3: reads a file from an S3 compatible service

To upload a binary from a 3rd party service, you can call POST /_document endpoint and pass
a JSON document which describes the service settings:

curl -XPOST http://127.0.0.1:8080/fscrawler/_document -H 'Content-Type: application/json' -d '{
  "type": "<TYPE>",
  "<TYPE>": {
    // Settings for the <TYPE>
  }
}'

Local plugin

The local plugin reads a file from the server where FSCrawler is running (a local file).
It needs the following parameter:

url: link to the local file

For example, we can read the file bar.txt from the /path/to/foo directory with:

curl -XPOST http://127.0.0.1:8080/fscrawler/_document -H 'Content-Type: application/json' -d '{
  "type": "local",
  "local": {
    "url": "/path/to/foo/bar.txt"
  }
}'

HTTP plugin

The http plugin reads a file from a given URL.
It needs the following parameter:

url: link to the file

For example, we can read the file robots.txt from the https://www.elastic.co/ website with:

curl -XPOST http://127.0.0.1:8080/fscrawler/_document -H 'Content-Type: application/json' -d '{
  "type": "http",
  "http": {
    "url": "https://www.elastic.co/robots.txt"
  }
}'

S3 plugin

The s3 plugin reads a file from an S3 compatible service.
It needs the following parameters:

url: url for the S3 Service
bucket: bucket name
object: object to read from the bucket
access_key: access key (or login)
secret_key: secret key (or password)

For example, we can read the file foo.txt from the bucket foo running on https://s3.amazonaws.com:

curl -XPOST http://127.0.0.1:8080/fscrawler/_document -H 'Content-Type: application/json' -d '{
  "type": "s3",
  "s3": {
    "url": "https://s3.amazonaws.com",
    "bucket": "foo",
    "object": "foo.txt",
    "access_key": "ACCESS",
    "secret_key": "SECRET"
  }
}'

If you are using Minio, you can use:

curl -XPOST http://127.0.0.1:8080/fscrawler/_document -H 'Content-Type: application/json' -d '{
  "type": "s3",
  "s3": {
    "url": "http://localhost:9000",
    "bucket": "foo",
    "object": "foo.txt",
    "access_key": "minioadmin",
    "secret_key": "minioadmin"
  }
}'

The text was updated successfully, but these errors were encountered:

dadoonet · 2021-09-06T14:20:36Z

This could also solve may be #805

dadoonet · 2024-09-12T08:19:21Z

#1897 is proposing one implementation for http url.

dadoonet added the feature_request for feature request label Sep 6, 2021

dadoonet added this to the 2.8 milestone Sep 6, 2021

dadoonet mentioned this issue Nov 25, 2021

The es7 fscrawler link is broken #990

Closed

dadoonet modified the milestones: 2.8, 2.9 Dec 14, 2021

dadoonet modified the milestones: 2.9, 2.10 Jan 10, 2022

dadoonet added the component:rest label Nov 23, 2022

dadoonet self-assigned this Sep 20, 2024

dadoonet mentioned this issue Sep 20, 2024

Add support for REST plugins #1937

Merged

3 tasks

dadoonet closed this as completed in #1937 Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read from any FS Provider using the REST Service #1247

Read from any FS Provider using the REST Service #1247

dadoonet commented Sep 6, 2021 •

edited

Loading

dadoonet commented Sep 6, 2021

dadoonet commented Sep 12, 2024

Read from any FS Provider using the REST Service #1247

Read from any FS Provider using the REST Service #1247

Comments

dadoonet commented Sep 6, 2021 • edited Loading

Local plugin

HTTP plugin

S3 plugin

dadoonet commented Sep 6, 2021

dadoonet commented Sep 12, 2024

dadoonet commented Sep 6, 2021 •

edited

Loading