Skip to content

Commit

Permalink
feat: Make SSAPI receiver storage optional (#2099)
Browse files Browse the repository at this point in the history
* make storage extn optional

* update storage test

* fix: update readme
  • Loading branch information
Caleb-Hurshman authored Jan 9, 2025
1 parent f9b4576 commit d24c17b
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 13 deletions.
12 changes: 6 additions & 6 deletions receiver/splunksearchapireceiver/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ This receiver collects Splunk events using the [Splunk Search API](https://docs.
- Configured storage extension

## Use Case
Unlike other receivers, the SSAPI receiver is not built to collect live data. Instead, it collects a finite set of historical data and transfers it to a destination, preserving the timestamp from the source. For this reason, the SSAPI recevier only needs to be left running until all Splunk events have been migrated, which is denoted by the log message: "all search results exported". Until this log message or some other error is printed, avoid cancelling the collector for any reason, as it will unnecessarily interfere with the receiver's ability to protect against writing duplicate events.
Unlike other receivers, the SSAPI receiver is not built to collect live data. Instead, it collects a finite set of historical data and transfers it to a destination, preserving the timestamp from the source. For this reason, the SSAPI receiver only needs to be left running until all Splunk events have been migrated, which is denoted by the log message: "all search results exported". Until this log message or some other error is printed, avoid cancelling the collector for any reason, as it will unnecessarily interfere with the receiver's ability to protect against writing duplicate events.

## Configuration
| Field | Type | Default | Description |
Expand All @@ -24,7 +24,7 @@ Unlike other receivers, the SSAPI receiver is not built to collect live data. In
| searches.earliest_time | string | `required (no default)` | The earliest timestamp to collect logs. Only logs that occurred at or after this timestamp will be collected. Must be in 'yyyy-MM-ddTHH:mm' format (UTC). |
| searches.latest_time | string | `required (no default)` | The latest timestamp to collect logs. Only logs that occurred at or before this timestamp will be collected. Must be in 'yyyy-MM-ddTHH:mm' format (UTC). |
| searches.event_batch_size | int | `100` | The amount of events to query from Splunk for a single request. |
| storage | component | `required (no default)` | The component ID of a storage extension which can be used when polling for `logs`. The storage extension prevents duplication of data after an exporter error by remembering which events were previously exported. |
| storage | component | `(no default)` | The component ID of a storage extension which can be used when polling for `logs`. The storage extension prevents duplication of data after an exporter error by remembering which events were previously exported. This should be configured in all production environments. |

### Example Configuration
```yaml
Expand Down Expand Up @@ -60,10 +60,10 @@ extensions:
- `latest_time: "2024-12-31T23:59:59.999-05:00"`
- Note: By default, GCL will not accept logs with a timestamp older than 30 days. Contact Google to modify this rule.
3. Repeat steps 1 & 2 for each index you wish to collect from
3. Configure a storage extension to store checkpointing data for the receiver.
4. Configure the rest of the receiver fields according to your Splunk environment.
5. Add a `googlecloud` exporter to your config. Configure the exporter to send to a GCP project where your service account has Logging Admin role. To check the permissions of service accounts in your project, go to the [IAM page](https://console.cloud.google.com/iam-admin/iam).
6. Disable the `sending_queue` field on the GCP exporter. The sending queue introduces an asynchronous step to the pipeline, which will jeopardize the receiver's ability to checkpoint correctly and recover from errors. For this same reason, avoid using any asynchronous processors (e.g., batch processor).
4. Configure a storage extension to store checkpointing data for the receiver.
5. Configure the rest of the receiver fields according to your Splunk environment.
6. Add a `googlecloud` exporter to your config. Configure the exporter to send to a GCP project where your service account has Logging Admin role. To check the permissions of service accounts in your project, go to the [IAM page](https://console.cloud.google.com/iam-admin/iam).
7. Disable the `sending_queue` field on the GCP exporter. The sending queue introduces an asynchronous step to the pipeline, which will jeopardize the receiver's ability to checkpoint correctly and recover from errors. For this same reason, avoid using any asynchronous processors (e.g., batch processor).

After following these steps, your configuration should look something like this:
```yaml
Expand Down
4 changes: 0 additions & 4 deletions receiver/splunksearchapireceiver/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,6 @@ func (cfg *Config) Validate() error {
return errors.New("at least one search must be provided")
}

if cfg.StorageID == nil {
return errors.New("storage configuration is required for this receiver")
}

for _, search := range cfg.Searches {
if search.Query == "" {
return errors.New("missing query in search")
Expand Down
3 changes: 1 addition & 2 deletions receiver/splunksearchapireceiver/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -140,8 +140,7 @@ func TestValidate(t *testing.T) {
LatestTime: "2024-10-30T14:00",
},
},
errExpected: true,
errText: "storage configuration is required for this receiver",
errExpected: false,
},
{
desc: "Missing searches",
Expand Down
2 changes: 1 addition & 1 deletion receiver/splunksearchapireceiver/receiver.go
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ func (ssapir *splunksearchapireceiver) pollSearchCompletion(ctx context.Context,
}

func (ssapir *splunksearchapireceiver) createSplunkSearch(search Search) (string, error) {
timeFormat := "%Y-%m-%dT%H:%M:%S"
timeFormat := "%Y-%m-%dT%H:%M"
searchQuery := fmt.Sprintf("%s starttime=\"%s\" endtime=\"%s\" timeformat=\"%s\"", search.Query, search.EarliestTime, search.LatestTime, timeFormat)
ssapir.logger.Info("creating search", zap.String("query", searchQuery))
resp, err := ssapir.client.CreateSearchJob(searchQuery)
Expand Down

0 comments on commit d24c17b

Please sign in to comment.