Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat][Salesforce] Add Salesforce filebeat module #2

Merged

Conversation

yug-crest
Copy link

@yug-crest yug-crest commented Dec 8, 2021

  • Enhancement

What does this PR do?

This PR contains event parsing, ECS Field Mappings and ingest pipelines for the filesets Login and Logout as a part of Salesforce Filebeat module.
The fields from Salesforce REST API and the Salesforce Streaming API have been taken into consideration.

Why is it important?

  • It is a prerequisite for the collection and parsing of the Login and Logout event data for the Salesforce Observability Connector.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Copy link

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one, @yug-crest!

It looks correct to me, the only missing part is the ingest pipelines tests. It would be beneficial to add them to check if ingest pipelines work correctly. I understand that you can't run the full salesforce module without the Kush's PR, right?

I also acknowledge that there isn't any CHANGELOG.next entry as this PR will be merged into a feature branch. Please remember to add an appropriate entry while merging into the main branch.

- module: salesforce
# All logs
login:
enabled: false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that any of these filesets should be enabled by default? I mean, whenever user enables the filebeat module.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, I thought of the same and tried that out earlier. But I faced the following, and hence decided to go with false.
salesforce module dataset login must be explicitly disabled (needs enabled: false)

# Set custom paths for the log files. If left empty,
# Filebeat will choose the paths depending on your OS.
# Oauth Client ID
# var.clientid: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: var.client_id

# var.clientid: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Oauth Client Secret
# var.clientsecret: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: var.client_secret

- name: access_mode
type: keyword
description: >
The mode of collecting logs from salesforce - "rest" or "stream".
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Salesforce (capital)

- name: salesforce.login
type: group
# TODO: What should be the appropriate release value?
release: ga
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not a GA, let's start with "experimental" or "beta"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks!

Comment on lines 104 to 110
field: json.LOGIN_KEY
target_field: salesforce.login.login_key
ignore_missing: true
- rename:
field: json.LoginKey
target_field: salesforce.login.login_key
ignore_missing: true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question: LOGIN_KEY, LoginKey

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOGIN_KEY: comes from httpjson (REST API)
LoginKey: comes from cometd (Streaming API)
This was the case. But the filesets would be updated to login-rest and login-stream to separate them and it should not be an issue anymore.

Comment on lines 219 to 225
field: json.USER_NAME
target_field: user.email
ignore_missing: true
- rename:
field: json.Username
target_field: user.email
ignore_missing: true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just double-checking: is username ALWAYS the email?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per this article, the username must be in the format of an email address, for example, [email protected]. The email used in your username need not function or match the email address used for the account. In addition to this, in all the live responses that we have seen so far, the username was an email address.

I hope this clarifies, please let me know if you think we should still change the mapping to user.name. Thanks!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking this! I recommend leaving a comment in this file to preserve any confusion in the future.

- name: salesforce.logout
type: group
# TODO: What should be the appropriate release value?
release: ga
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: let's go with experimental or beta

enabled: true
client.id: {{ .clientid }}
client.secret: {{ .clientsecret}}
token_url: https://login.salesforce.com/services/oauth2/token
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My knowledge might be outdated, but Salesforce still hasn't provided any self-hosted options, correct? Is it always https://login.salesforce.com/?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. And yes, it's always https://login.salesforce.com/ (reference). Anyways, let's add token_url as a user-configurable variable with the default value as https://login.salesforce.com/services/oauth2/token. Does that make sense?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it makes sense. Thanks for clarifying it!

If you want to mock API in integration tests, it would be a good option to use.

Comment on lines 2 to 3
- id: Filebeat-salesforce-login-Dashboard
file: Filebeat-salesforce-login.json
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is missing. I suggest to remove this entry until you create a dashboard presenting collected Salesforce data.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, will do

@kush-elastic kush-elastic changed the title Add Salesforce filebeat module [Filebeat][Salesforce] Add Salesforce filebeat module Dec 9, 2021
@yug-crest yug-crest self-assigned this Dec 9, 2021
@yug-crest yug-crest marked this pull request as ready for review December 15, 2021 14:32
# var.client_id: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Oauth Client Secret
# var.client_secret: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marc-gr Do you know if there is an option to set secrets in one place instead of per-config?

copy_from: salesforce.setup_audit_trail.created_by_id
ignore_failure: true
- set:
field: temp
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe let's name it more accurately in case we need another temp :) temp_setup_audit_trail_display?

# URL, should include the instance_url
# var.url: "https://instance_id.my.salesforce.com"

setupaudittrail-rest:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should name it: setup-audit-trail. @yug-crest Did you face any problems with dashes there? I'm just considering which form fits better: setupaudittrail vs setup-audit-trail.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point, I thought of using setupaudittrail as we are using the convention of -rest and -stream suffix in all the filesets so as to avoid hyphen inconsistencies. To answer your question, I did not face any issues with the dashes, we can definitely go with setup-audit-trail-rest. Another option would be setup_audit_trail-rest.
Please let me know if you think I should make any changes. Thanks!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could please ping @akshay-saraswat about the preference?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a preference. setup-audit-trail-rest is certainly more readable. My suggestion would be to align ourselves with how Salesforce handles it (i.e. setupaudittrail) because customers would be familiar with that. Please let me know your thoughts.

https://developer.salesforce.com/docs/atlas.en-us.object_reference.meta/object_reference/sforce_api_objects_setupaudittrail.htm

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, setupaudittrail might be easier to grep for :) Let's stick to that.

@@ -0,0 +1,58 @@
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you copy manually or generate -expected.json files?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generated the -expected.json file by running the pipeline after importing the processors and adding the sample document in the Stack Management > Ingest Pipelines > Create Pipeline section in Kibana. We have added the generated files out here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but it isn't the right way to generate them. You should use Python integration tests to generate them. See this comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried running the following command and faced an issue:

Command: PYTEST_ADDOPTS="-k test_xpack_modules" GENERATE=1 TESTING_FILEBEAT_MODULES=salesforce mage pythonIntegTest

Issue:

<?xml version="1.0" encoding="utf-8"?>
<testsuites>
    <testsuite name="pytest" errors="0" failures="11" skipped="0" tests="11" time="116.958" timestamp="2021-12-17T09:55:04.390402" hostname="ec0cb65ff549">
        <testcase classname="x-pack.filebeat.tests.system.test_xpack_modules.XPackTest" name="test_fileset_file_00_salesforce" file="filebeat/tests/system/test_modules.py" line="77" time="10.512">
            <failure message="beat.beat.TimeoutError: Timeout waiting for 'cond' to be true. Waited 10 seconds.">

@kush-elastic was also facing this error before as well and we were trying to resolve it.
Are you aware of any specific reason why this could be happening?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check for any errors in the Docker logs?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me drop a comment from @ChrsMark:

Hey! This
Timeout waiting for 'cond' to be true. Waited 10 seconds.
message looks like the beats does not actually generate events that the test expects to find, no?
In such cases Filebeat’s logs would help, and iirc they can find them under _build directory where the beats started by the tests actually log.

Copy link
Author

@yug-crest yug-crest Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found this issue in the log. We analysed that it's the first parameter of the first fileset inside the module. Is this error because it's running in our local machines because of some reason?
We were thinking of letting it run in the CI once and verifying if face the same issue there.

{
    "log.level": "error",
    "@timestamp": "2021-12-17T10:41:48.922-0200",
    "log.origin": {
        "file.name": "instance/beat.go",
        "file.line": 994
    },
    "message": "Exiting: error getting config for fileset salesforce/apex-rest: Error interpreting the template of the input: template: text:7:16: executing \"text\" at <.client_id>: map has no entry for key \"client_id\"",
    "service.name": "filebeat",
    "ecs.version": "1.6.0"
}

Copy link

@sayden sayden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all log files must end in newline (even if it's empty)

- `logout-stream` fileset: supports Salesforce Logout logs received from the Streaming API.
- `apex-rest` fileset: supports Salesforce Apex logs received from the REST API.
- `setupaudittrail-rest` fileset: supports logs generated when admins make in your org’s Setup area.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful to explain somewhere that for the rest of the events, users could use inputs provided above?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, let me update that. Thanks!

@mtojek
Copy link

mtojek commented Dec 23, 2021

I can't approve this PR as I don't have rights, but let's assume it's "approved" and merge this.

What hasn't been tested is integration tests and we'll focus on them, while contributing to elastic/beats repository.

@kush-elastic kush-elastic merged this pull request into salesforce_observability_connector Dec 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants