-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add oplog metricset to mongodb module #7604
Conversation
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
used := event["size"].(common.MapStr)["used"].(int64) | ||
assert.True(t, used > 0) | ||
|
||
first_ts := event["first"].(common.MapStr)["ts"].(int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use underscores in Go names; var first_ts should be firstTs
|
||
// get first and last items in the oplog | ||
oplog_iter := collection.Find(nil).Sort("$natural").Iter() | ||
oplog_reverse_iter := collection.Find(nil).Sort("-$natural").Iter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use underscores in Go names; var oplog_reverse_iter should be oplogReverseIter
used := int64(oplogStatus["size"].(float64)) | ||
|
||
// get first and last items in the oplog | ||
oplog_iter := collection.Find(nil).Sort("$natural").Iter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use underscores in Go names; var oplog_iter should be oplogIter
return false | ||
} | ||
|
||
func New(base mb.BaseMetricSet) (mb.MetricSet, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported function New should have comment or be unexported
mb.DefaultMetricSet()) | ||
} | ||
|
||
type MetricSet struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported type MetricSet should have comment or be unexported
"gopkg.in/mgo.v2/bson" | ||
) | ||
|
||
const oplog_col = "oplog.rs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use underscores in Go names; const oplog_col should be oplogCol
jenkins test this |
Could you add a changelog entry? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! :) It looks quite good, I have added some comments, the only serious thing is the failing test.
var debugf = logp.MakeDebug("mongodb.oplog") | ||
|
||
func init() { | ||
logp.Info("initializing oplog") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use to log initializations.
metricbeat/docs/fields.asciidoc
Outdated
|
||
-- | ||
|
||
*`mongodb.oplog.last.ts`*:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't abbreviate timestamps field names, what about first.time
or first.timestamp
?
} | ||
|
||
firstTs := int64(first.(bson.M)["ts"].(bson.MongoTimestamp)) | ||
lastTs := int64(last.(bson.M)["ts"].(bson.MongoTimestamp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add checks for type conversions here
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this selector so this is executed only when running integration tests:
// +build integration
I think this is the reason why CI builds are failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes! My bad :(
// New creates a new instance of the MetricSet | ||
// Part of new is also setting up the configuration by processing additional | ||
// configuration entries if needed. | ||
func New(base mb.BaseMetricSet) (mb.MetricSet, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, add also here the experimental warning, something like:
cfgwarn.Experimental("The mongodb oplog metricset is experimental.")
@jsoriano Thanks for your comments. I've fixed these issues and pushed them to my branch. |
@a3dho3yn unfortunately I cannot see the fixes, could you check that you pushed to the branch used for this PR? |
@jsoriano We just finished our work but I have a problem with the integration tests.
But tests are failing in the CI due to |
I have been trying and the tests fail if they are run just after starting the container and it passes if the container was already started beforehand (or in a previous execution), so this is something that can be probably solved by improving the healthcheck, that currently only checks if the port is open. On the other hand, I have also seen that tests only fail in the |
Optimes should be calculated from primray's view. That's why we used the
strong mode.
When there is one node and we initiate the replica set, this member should
become primary and there should be no problem with forcing commands to run
against primary.
May be initiating takes time and we should wait for it. I will check this.
Thanks for your insights:)
…On Tue, Aug 14, 2018 at 7:53 PM Jaime Soriano Pastor < ***@***.***> wrote:
I have been trying and the tests fails if the test is run just after start
the container and it passes if the container was already started
beforehand, so this is something that can be probably solved by improving
the healthcheck, that currently only checks if the port is open.
On the other hand, I have also seen that tests only fail in the replstatus
metricset, the only one setting the session mode to strong with mongoSession.SetMode(mgo.Strong,
true). I wonder if it this is really needed. If this line is removed,
tests pass too.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7604 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFk_xLFKaNQkjdHhLmwSLdOXr9RHBhakks5uQutagaJpZM4VQKYb>
.
|
jenkins, test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is looking good, I see it being merged soon 🙂
Only some small comments left.
@@ -0,0 +1,30 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this file be updated?
func init() { | ||
mb.Registry.MustAddMetricSet("mongodb", "replstatus", New, | ||
mb.WithHostParser(mongodb.ParseURL), | ||
mb.DefaultMetricSet()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this requires replicaset to work maybe it'd be better to make it non-default.
myState, ok := status["myState"].(int) | ||
t.Logf("Mongodb state is %d", myState) | ||
if ok && myState == 1 { | ||
time.Sleep(5 * time.Second) // hack, wait more for replica set to become stable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to detect this stability? 🙂
We can leave it by now with the sleep in any case and revisit it later.
if ok && myState == 1 { | ||
time.Sleep(5 * time.Second) // hack, wait more for replica set to become stable | ||
break | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a sleep after every retry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can, but I see no reason to do this.
As you mentioned before, we should wait for some condition instead of sleeping. First I thought we should wait for a primary node, and I expected this condition to be sufficient for running the test. But then -in action- I figured out it needs something more than a node in the primary state. As I didn't find any condition to wait for, I used this sleep expresion.
If you're worried about blowing up CPU with this loop, I should note that state changes very fast (like 5 3 2 2 1) and it doesn't seem to be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's leave it like this by now.
jenkins, test this |
Add metrics about replication health. Co-authored-by: Hossein Taleghani <[email protected]> Co-authored-by: Bahar Taghavi <[email protected]> (cherry picked from commit ca8f56b)
Add metrics about replication health. Co-authored-by: Hossein Taleghani <[email protected]> Co-authored-by: Bahar Taghavi <[email protected]> (cherry picked from commit ca8f56b)
Oplog size and window are two important metrics which show replication health.
With this metric set, we can have this information about replication: