all-in-one with non-memory storage (Kubernetes) #740

emailtovamos · 2019-10-30T11:11:20Z

I could find an example of all-in-one Jaeger instance with in-memory storage. But there is no such example for doing it with elasticsearch.
Where can I find it?
I understand one has to have elasticsearch running already and THEN one can incorporate corresponding changes in the above yaml file. But is there a simple way/file to have both the Jaeger instance AND elasticsearch from same file?

For someone like me who is mainly worried about having a persistent storage(with default options) and not having to figure out/manage details of elasticsearch, this would really help.

The text was updated successfully, but these errors were encountered:

objectiser · 2019-10-30T11:16:14Z

There is another example that uses Badger local storage - this should give you what you are looking for. We need to update the documentation to clearly outline this option.

emailtovamos · 2019-10-30T13:00:34Z

Thanks, I will try it out. But is there any other setting up needed for Badger? Like creating volume separately? Or your example takes care of it already?

objectiser · 2019-10-30T13:54:22Z

@emailtovamos No additional setup - the example yaml sets up the volume for local storage.

emailtovamos · 2019-10-31T13:17:01Z

Thanks @objectiser - I tried the Badger storage as per the file you mentioned. It worked fine as I could see my services in the UI. But just to test the persistence nature, I deleted the jaeger instance pod. Another pod sprung back on as expected. But this time I could no longer see my services in the UI. Is this expected behaviour? I was expecting it to still show stuff.

jpkrohling · 2019-10-31T13:19:50Z

For production purposes, you probably would want to provision the storage yourself and specify the volume mount/volume in the Jaeger CR. ~~The Jaeger Operator will only create emptyDir volumes~~, which effectively makes it just a bit better than ephemeral.

edit: I meant to say that our examples are using emptyDir, not that the operator will create emptyDir volumes (which doesn't make any sense...)

emailtovamos · 2019-10-31T13:28:01Z

You mean like the options shown here: https://www.jaegertracing.io/docs/1.14/operator/#storage-options ?

jpkrohling · 2019-10-31T13:41:15Z

The example that @objectiser mentioned and that you are probably using is the right way, just replace emptyDir in the volume definition with a production-quality concrete storage: https://kubernetes.io/docs/concepts/storage/volumes/#types-of-volumes

emailtovamos · 2019-11-01T12:57:31Z

Ok So I created a pvc:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: jaegerpvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: "1Gi"
  storageClassName: "my-storageclass"

Then gave it in the jaeger instance with badger:

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: simplest
spec: 
  storage:
    type: badger
    options:
      badger:
        ephemeral: false
        directory-key: "/badger/key"
        directory-value: "/badger/data"
    volumeMounts:
    - name: data
      mountPath: /badger
    volumes:
    - name: data
      persistentVolumeClaim:
            claimName: jaegerpvc

I can see the pvc and pv running fine when I do e.g. kubectl get pv --all-namespaces
I was expecting the pod to restart and/or have a mention of the pvc. But nothing changed. How can I make sure it is working the way expected? Or am I missing any other step?

jpkrohling · 2019-11-01T13:14:48Z

The indentation looks odd. Do you get any error messages when you try to apply this resource? Could you please start the operator with --log-level=debug and share the logs?

emailtovamos · 2019-11-01T16:23:31Z

{"level":"debug","ts":1572625156.8930852,"caller":"app/span_processor.go:124","msg":"Span written to the storage by the collector","trace-id":"65c73a5837feb416","span-id":"65c73a5837feb416"}
{"level":"debug","ts":1572625157.892143,"caller":"processors/thrift_processor.go:116","msg":"Span(s) received by the agent","bytes-received":331}
{"level":"debug","ts":1572625157.8931003,"caller":"app/span_processor.go:124","msg":"Span written to the storage by the collector","trace-id":"5e915efb5e9f8483","span-id":"5e915efb5e9f8483"}
{"level":"debug","ts":1572625158.8920205,"caller":"processors/thrift_processor.go:116","msg":"Span(s) received by the agent","bytes-received":329}
{"level":"debug","ts":1572625158.892912,"caller":"app/span_processor.go:124","msg":"Span written to the storage by the collector","trace-id":"23c6edea6faf6a10","span-id":"23c6edea6faf6a10"}

emailtovamos · 2019-11-01T16:24:15Z

The above logs look as expected right? I just deleted the pod and it restarted. But in the UI I could no longer see the older traces.

objectiser · 2019-11-04T09:37:57Z

@emailtovamos Can you try with the modified indentation as below:

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: simplest
spec: 
  storage:
    type: badger
    options:
      badger:
        ephemeral: false
        directory-key: "/badger/key"
        directory-value: "/badger/data"
  volumeMounts:
  - name: data
    mountPath: /badger
  volumes:
  - name: data
    persistentVolumeClaim:
       claimName: jaegerpvc

The volumes and volumeMounts should be at the same level as storage.

emailtovamos · 2019-11-04T11:32:31Z

Thanks @objectiser ! It works now! When I delete the Jaeger pod and the new pod gets created, I can now see the traces corresponding to the old pod in the Jaeger UI.

One last question:
Is there any way I can give option such that it only saves the latest 1GB of data or latest 7 days of data or something similar? Because no matter what storage I give in my PVC, it will eventually get filled up. What's the usual way to deal with this? I couldn't find any such option in the documentation: https://www.jaegertracing.io/docs/1.13/deployment/#badger-local-storage

objectiser · 2019-11-04T11:39:26Z

There is a badger.span-store-ttl option, defaults to 72 hours, can be found here: https://www.jaegertracing.io/docs/1.14/cli/#jaeger-all-in-one-badger

emailtovamos · 2019-11-04T11:42:21Z

Thanks!
span-store-ttl: "72h0m0s"
Is the above formatting ok? I mean with the inverted commas for the time.

objectiser · 2019-11-04T11:51:16Z

Yes I believe so - let us know if you get problems with it.

emailtovamos · 2019-11-04T11:58:35Z

Thanks.
Since badger wasn't mentioned in the Storage Options section in this page (https://www.jaegertracing.io/docs/1.14/operator/#storage-options), can I add information about badger setting up as discussed above here(https://github.com/jaegertracing/documentation/blob/master/content/docs/1.14/operator.md#storage-options) and make a pull request ?

objectiser · 2019-11-04T12:09:20Z

Yes please!

emailtovamos · 2019-11-04T12:35:46Z

Sure, will do.

BTW I tried to rerun the yaml using span-store-ttl value and now it is no longer running and giving the following badger-related error: failed to init storage factory.... I am not sure if this can be resolved without deleting the data. And if it has to be deleted then how.

{ 
   "level":"fatal",
   "ts":1572870354.2647974,
   "caller":"all-in-one/main.go:105",
   "msg":"Failed to init storage factory",
   "error":"Unable to replay value log: \"/badger/data/000006.vlog\": Value log truncate required to run DB. This might result in data loss.",
   "errorVerbose":"Value log truncate required to run DB. This might result in data loss.\ngithub.com/jaegertracing/jaeger/vendor/github.com/dgraph-io/badger.init\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/dgraph-io/badger/errors.go:98\nruntime.doInit\n\t/home/travis/.gimme/versions/go1.13.4.linux.amd64/src/runtime/proc.go:5222\nruntime.doInit\n\t/home/travis/.gimme/versions/go1.13.4.linux.amd64/src/runtime/proc.go:5217\nruntime.doInit\n\t/home/travis/.gimme/versions/go1.13.4.linux.amd64/src/runtime/proc.go:5217\nruntime.doInit\n\t/home/travis/.gimme/versions/go1.13.4.linux.amd64/src/runtime/proc.go:5217\nruntime.doInit\n\t/home/travis/.gimme/versions/go1.13.4.linux.amd64/src/runtime/proc.go:5217\nruntime.main\n\t/home/travis/.gimme/versions/go1.13.4.linux.amd64/src/runtime/proc.go:190\nruntime.goexit\n\t/home/travis/.gimme/versions/go1.13.4.linux.amd64/src/runtime/asm_amd64.s:1357\nUnable to replay value log: \"/badger/data/000006.vlog\"\ngithub.com/jaegertracing/jaeger/vendor/github.com/dgraph-io/badger.(*valueLog).Replay\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/dgraph-io/badger/value.go:772\ngithub.com/jaegertracing/jaeger/vendor/github.com/dgraph-io/badger.Open\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/dgraph-io/badger/db.go:306\ngithub.com/jaegertracing/jaeger/plugin/storage/badger.(*Factory).Initialize\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/badger/factory.go:119\ngithub.com/jaegertracing/jaeger/plugin/storage.(*Factory).Initialize\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/factory.go:108\nmain.main.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/all-in-one/main.go:104\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:826\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:914\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:864\nmain.main\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/all-in-one/main.go:171\nruntime.main\n\t/home/travis/.gimme/versions/go1.13.4.linux.amd64/src/runtime/proc.go:203\nruntime.goexit\n\t/home/travis/.gimme/versions/go1.13.4.linux.amd64/src/runtime/asm_amd64.s:1357",
   "stacktrace":"main.main.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/all-in-one/main.go:105\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:826\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:914\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:864\nmain.main\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/all-in-one/main.go:171\nruntime.main\n\t/home/travis/.gimme/versions/go1.13.4.linux.amd64/src/runtime/proc.go:203"
}

objectiser · 2019-11-04T15:20:37Z

@emailtovamos An option to truncate was added recently. You could try this out using the jaegertracing/all-in-one:latest image in the CR (under allInOne node).

objectiser · 2019-11-04T15:23:07Z

If this is reproducible, could you provide the log for the pod that fails - before you try restarting and getting this "Failed to init storage factory" error - it might help to detect and avoid this failure.

emailtovamos · 2019-11-06T11:35:14Z

Thanks @objectiser . I added the truncate option although yet to check if now I can avoid that problem.
BTW since I am setting a pvc for badger with some storage (e.g. 50GB) what happens after this storage limit gets reached? I mean is there any automated way to resolve this or one just has to manually create another storage?

objectiser · 2019-11-06T12:20:37Z

@burmanm would you be able to answer?

burmanm · 2019-11-06T13:22:02Z

I'm not sure what the question is really. If database runs out of diskspace, there's nothing it can do. It can't free space since it can't write the deletes and it can't rearrange the data either.

emailtovamos · 2019-11-07T12:10:47Z

Thanks @burmanm . A practical scenario which happened to me today:
The badger database got filled up and there was no longer any new traces that was being saved. But then if I have set my span-store-ttl option to say 48h and 48hours have already passed since a high frequency update, I should expect the Pod to start writing traces again right? Since there would be some data which would get deleted which were "old" enough.

burmanm · 2019-11-07T13:02:00Z

No, it would not continue. The writing is never happening inplace, instead those SST files are always immutable. Thus, when the TTL is expired, the next compaction process will remove that old data (and write the new SST files without that expired data). But since there's no disk space, the compaction process cannot continue.

Also, I would assume at that point the WAL log has some operations which are also in the memtable and it can't be flushed for proper compaction thus the WAL log can't be cleaned. To provide consistency, the compaction process can't really continue since it can't write all the data to the disk and make correctly sorted SST files.

Thus, you should always have enough empty diskspace to ensure that the compactions can take place. Same applies to Cassandra backend also as these both are based on the LSM trees.

emailtovamos · 2019-11-10T20:14:29Z

Thanks @burmanm for the detailed explanation.
How can I check if the deletion is really happening? Now I have given enough storage and gave 24 hours are the span-store-ttl. But when I checked the Disk space used by the Pod in the Google cloud console, I did not see any drop in space after 24 hours of starting.

emailtovamos · 2019-11-10T20:33:21Z

Actually when I search for the old traces, I can't find them which is the expected behaviour.

But the only thing I am worried about is the constant increase of the disk space. I was expecting it to stay around the level which it reached at the end of the first 24 hours but it has almost always been increasing except a few drops. So no matter how much space I assign, there is always a chance of hitting the limit!

pavolloffay · 2019-11-11T12:56:31Z

@emailtovamos could you please open this issue in the main repository? - regarding badger not cleaning the data properly

emailtovamos · 2019-11-11T13:58:53Z

@pavolloffay done : https://github.com/jaegertracing/jaeger-operator/issues/755

objectiser closed this as completed Nov 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

all-in-one with non-memory storage (Kubernetes) #740

all-in-one with non-memory storage (Kubernetes) #740

emailtovamos commented Oct 30, 2019

objectiser commented Oct 30, 2019

emailtovamos commented Oct 30, 2019

objectiser commented Oct 30, 2019

emailtovamos commented Oct 31, 2019

jpkrohling commented Oct 31, 2019 •

edited

Loading

emailtovamos commented Oct 31, 2019

jpkrohling commented Oct 31, 2019

emailtovamos commented Nov 1, 2019

jpkrohling commented Nov 1, 2019

emailtovamos commented Nov 1, 2019

emailtovamos commented Nov 1, 2019

objectiser commented Nov 4, 2019 •

edited

Loading

emailtovamos commented Nov 4, 2019 •

edited

Loading

objectiser commented Nov 4, 2019

emailtovamos commented Nov 4, 2019

objectiser commented Nov 4, 2019

emailtovamos commented Nov 4, 2019

objectiser commented Nov 4, 2019

emailtovamos commented Nov 4, 2019

objectiser commented Nov 4, 2019

objectiser commented Nov 4, 2019

emailtovamos commented Nov 6, 2019

objectiser commented Nov 6, 2019

burmanm commented Nov 6, 2019

emailtovamos commented Nov 7, 2019

burmanm commented Nov 7, 2019

emailtovamos commented Nov 10, 2019

emailtovamos commented Nov 10, 2019

pavolloffay commented Nov 11, 2019 •

edited

Loading

emailtovamos commented Nov 11, 2019

all-in-one with non-memory storage (Kubernetes) #740

all-in-one with non-memory storage (Kubernetes) #740

Comments

emailtovamos commented Oct 30, 2019

objectiser commented Oct 30, 2019

emailtovamos commented Oct 30, 2019

objectiser commented Oct 30, 2019

emailtovamos commented Oct 31, 2019

jpkrohling commented Oct 31, 2019 • edited Loading

emailtovamos commented Oct 31, 2019

jpkrohling commented Oct 31, 2019

emailtovamos commented Nov 1, 2019

jpkrohling commented Nov 1, 2019

emailtovamos commented Nov 1, 2019

emailtovamos commented Nov 1, 2019

objectiser commented Nov 4, 2019 • edited Loading

emailtovamos commented Nov 4, 2019 • edited Loading

objectiser commented Nov 4, 2019

emailtovamos commented Nov 4, 2019

objectiser commented Nov 4, 2019

emailtovamos commented Nov 4, 2019

objectiser commented Nov 4, 2019

emailtovamos commented Nov 4, 2019

objectiser commented Nov 4, 2019

objectiser commented Nov 4, 2019

emailtovamos commented Nov 6, 2019

objectiser commented Nov 6, 2019

burmanm commented Nov 6, 2019

emailtovamos commented Nov 7, 2019

burmanm commented Nov 7, 2019

emailtovamos commented Nov 10, 2019

emailtovamos commented Nov 10, 2019

pavolloffay commented Nov 11, 2019 • edited Loading

emailtovamos commented Nov 11, 2019

jpkrohling commented Oct 31, 2019 •

edited

Loading

objectiser commented Nov 4, 2019 •

edited

Loading

emailtovamos commented Nov 4, 2019 •

edited

Loading

pavolloffay commented Nov 11, 2019 •

edited

Loading