Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add oplog metricset to mongodb module #7604

Merged
merged 83 commits into from Aug 20, 2018
Merged

Add oplog metricset to mongodb module #7604

merged 83 commits into from Aug 20, 2018

Conversation

a3dho3yn
Copy link
Contributor

@a3dho3yn a3dho3yn commented Jul 15, 2018

Oplog size and window are two important metrics which show replication health.
With this metric set, we can have this information about replication:

{"mongodb": {
  "oplog": {
    "size": {
      "allocated": 2605587456,
      "used": 2616684138
    },
    "first": {
       "ts": 6515806468564845000
    },
    "last": {
       "ts": 6578335797915681000
    },
    "window": 62529329350836220
  }
}}

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

used := event["size"].(common.MapStr)["used"].(int64)
assert.True(t, used > 0)

first_ts := event["first"].(common.MapStr)["ts"].(int64)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use underscores in Go names; var first_ts should be firstTs


// get first and last items in the oplog
oplog_iter := collection.Find(nil).Sort("$natural").Iter()
oplog_reverse_iter := collection.Find(nil).Sort("-$natural").Iter()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use underscores in Go names; var oplog_reverse_iter should be oplogReverseIter

used := int64(oplogStatus["size"].(float64))

// get first and last items in the oplog
oplog_iter := collection.Find(nil).Sort("$natural").Iter()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use underscores in Go names; var oplog_iter should be oplogIter

return false
}

func New(base mb.BaseMetricSet) (mb.MetricSet, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported function New should have comment or be unexported

mb.DefaultMetricSet())
}

type MetricSet struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type MetricSet should have comment or be unexported

"gopkg.in/mgo.v2/bson"
)

const oplog_col = "oplog.rs"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use underscores in Go names; const oplog_col should be oplogCol

@kvch
Copy link
Contributor

kvch commented Jul 16, 2018

jenkins test this

@ruflin
Copy link
Contributor

ruflin commented Jul 16, 2018

Could you add a changelog entry?

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! :) It looks quite good, I have added some comments, the only serious thing is the failing test.

var debugf = logp.MakeDebug("mongodb.oplog")

func init() {
logp.Info("initializing oplog")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use to log initializations.


--

*`mongodb.oplog.last.ts`*::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't abbreviate timestamps field names, what about first.time or first.timestamp?

}

firstTs := int64(first.(bson.M)["ts"].(bson.MongoTimestamp))
lastTs := int64(last.(bson.M)["ts"].(bson.MongoTimestamp))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add checks for type conversions here

// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this selector so this is executed only when running integration tests:

// +build integration

I think this is the reason why CI builds are failing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes! My bad :(

// New creates a new instance of the MetricSet
// Part of new is also setting up the configuration by processing additional
// configuration entries if needed.
func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, add also here the experimental warning, something like:

cfgwarn.Experimental("The mongodb oplog metricset is experimental.")

@a3dho3yn
Copy link
Contributor Author

@jsoriano Thanks for your comments. I've fixed these issues and pushed them to my branch.

@jsoriano
Copy link
Member

jsoriano commented Jul 17, 2018

@a3dho3yn unfortunately I cannot see the fixes, could you check that you pushed to the branch used for this PR?

@a3dho3yn
Copy link
Contributor Author

@jsoriano We just finished our work but I have a problem with the integration tests.
When I run make test-module in my machine, everything goes well:

=== RUN   TestFetch
time="2018-08-13T17:31:23+04:30" level=info msg="[0/35] [mongodb]: Starting "
time="2018-08-13T17:31:23+04:30" level=warning msg="Error while reading .dockerignore (/home/ho3yn/go/src/github.com/elastic/beats/metricbeat/module/mongodb/_meta/.dockerignore) : open /home/ho3yn/go/src/github.com/elastic/beats/metricbeat/module/mongodb/_meta/.dockerignore: no such file or directory"
time="2018-08-13T17:31:23+04:30" level=info msg="Building metricbeat_mongodb..."
time="2018-08-13T17:31:23+04:30" level=info msg="Recreating mongodb"
time="2018-08-13T17:31:23+04:30" level=info msg="[1/35] [mongodb]: Started "
--- PASS: TestFetch (3.76s)
        replstatus_integration_test.go:49: mongodb/replstatus event: {"headroom":{"max":null,"min":null},"lag":{"max":null,"min":null},"members":{"arbiter":{"count":0,"hosts":null},"down":{"count":0,"hosts":null},"primary":{"host":"9c951bf3e380:27017","optime":1534165281},"recovering":{"count":0,"hosts":null},"rollback":{"count":0,"hosts":null},"secondary":{"count":0,"hosts":null,"optimes":null},"startup2":{"count":0,"hosts":null},"unhealthy":{"count":0,"hosts":null},"unknown":{"count":0,"hosts":null}},"oplog":{"first":{"timestamp":1534161289},"last":{"timestamp":1534165281},"size":{"allocated":1038090240,"used":38484},"window":3992},"optimes":{"applied":1534165281,"durable":1534165281,"last_committed":1534165281},"server_date":"2018-08-13T17:31:27.055+04:30","set_name":"beats"}
=== RUN   TestData
--- SKIP: TestData (1.13s)
        data_generator.go:44: skip data generation tests
PASS
ok      github.com/elastic/beats/metricbeat/module/mongodb/replstatus   (cached)

But tests are failing in the CI due to no reachable servers :(

@jsoriano
Copy link
Member

jsoriano commented Aug 14, 2018

I have been trying and the tests fail if they are run just after starting the container and it passes if the container was already started beforehand (or in a previous execution), so this is something that can be probably solved by improving the healthcheck, that currently only checks if the port is open.

On the other hand, I have also seen that tests only fail in the replstatus metricset, the only one setting the session mode to strong with mongoSession.SetMode(mgo.Strong, true). I wonder if it this is really needed. If this line is removed, tests pass too.

@a3dho3yn
Copy link
Contributor Author

a3dho3yn commented Aug 14, 2018 via email

@jsoriano
Copy link
Member

jenkins, test this please

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is looking good, I see it being merged soon 🙂
Only some small comments left.

@@ -0,0 +1,30 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this file be updated?

func init() {
mb.Registry.MustAddMetricSet("mongodb", "replstatus", New,
mb.WithHostParser(mongodb.ParseURL),
mb.DefaultMetricSet())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this requires replicaset to work maybe it'd be better to make it non-default.

myState, ok := status["myState"].(int)
t.Logf("Mongodb state is %d", myState)
if ok && myState == 1 {
time.Sleep(5 * time.Second) // hack, wait more for replica set to become stable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to detect this stability? 🙂

We can leave it by now with the sleep in any case and revisit it later.

if ok && myState == 1 {
time.Sleep(5 * time.Second) // hack, wait more for replica set to become stable
break
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a sleep after every retry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can, but I see no reason to do this.
As you mentioned before, we should wait for some condition instead of sleeping. First I thought we should wait for a primary node, and I expected this condition to be sufficient for running the test. But then -in action- I figured out it needs something more than a node in the primary state. As I didn't find any condition to wait for, I used this sleep expresion.

If you're worried about blowing up CPU with this loop, I should note that state changes very fast (like 5 3 2 2 1) and it doesn't seem to be an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's leave it like this by now.

@jsoriano
Copy link
Member

jenkins, test this

@jsoriano jsoriano merged commit ca8f56b into elastic:master Aug 20, 2018
jsoriano pushed a commit to jsoriano/beats that referenced this pull request Aug 23, 2018
Add metrics about replication health.

Co-authored-by: Hossein Taleghani <[email protected]>
Co-authored-by: Bahar Taghavi <[email protected]>
(cherry picked from commit ca8f56b)
ruflin pushed a commit that referenced this pull request Aug 24, 2018
Add metrics about replication health.

Co-authored-by: Hossein Taleghani <[email protected]>
Co-authored-by: Bahar Taghavi <[email protected]>
(cherry picked from commit ca8f56b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants