jaeger-spark-dependencies failing to access AWS Elasticsearch #668

mehstg · 2019-09-20T08:44:19Z

Currently using the Jaeger Operator version 1.13.1 with an Amazon Elasticsearch v6.8 backend.

The 'jaeger-spark-dependencies' pods are currently erroring out with the following message:
19/09/17 23:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 19/09/17 23:56:27 INFO ElasticsearchDependenciesJob: Running Dependencies job for 2019-09-17T00:00Z, reading from jaeger-span-2019-09-17 index, result storing to jaeger-dependencies-2019-09-17 19/09/17 23:57:28 ERROR NetworkClient: Node [https://10.3.146.146:9200] failed (java.net.SocketTimeoutException: connect timed out); no other nodes left - aborting... Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

All other pods can communicate with the ES backend successfully. Any ideas?

The text was updated successfully, but these errors were encountered:

pavolloffay · 2019-09-20T09:50:40Z

Did you try to configure

ES_NODES_WAN_ONLY/elasticsearchNodesWanOnly in the dependencies spec?

mehstg · 2019-09-20T10:55:16Z

Hi there

I am unsure of where that would go in the spec. The documentation does not seem clear on this. What do you mean by 'dependencies spec'

pavolloffay · 2019-09-20T11:53:45Z

It goes to jaeger CR inside dependencies node https://godoc.org/github.com/jaegertracing/jaeger-operator/pkg/apis/jaegertracing/v1#JaegerDependenciesSpec

mehstg · 2019-09-23T10:42:23Z

Like this?

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: jaeger
  namespace: tracing
spec:
  strategy: production
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: <ES URL>
  dependencies:
    ElasticsearchNodesWanOnly: true

pavolloffay · 2019-09-23T12:54:48Z

It looks correct. You can verify the config by inspecting the cron job spec created by operator. There should be defined environmental variable for this option.

mehstg · 2019-09-24T08:09:39Z

Confirmed that this still shows false in the cron job. It doesn't seem to be anywhere in the documentation where this should go or how it should be formatted in the YAML.

pavolloffay · 2019-09-24T08:19:12Z

We haven't documented every possible option in the CR. I often point folks to godoc to see all the possible options, however it might be harder to read for no golang developers.

Unfortunatelly you make a mistake and the property has to start with small letter. The name of the property is in annotation json: in the godoc.

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: jaeger
  namespace: tracing
spec:
  strategy: production
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: <ES URL>
  dependencies:
    elasticsearchNodesWanOnly: true

mehstg · 2019-09-24T08:20:19Z

Perfect! - Yes I am no go developer, hence struggling to understand it. Thanks for your help.

…

On Tue, 24 Sep 2019 at 09:19, Pavol Loffay ***@***.***> wrote: We haven't documented every possible option in the CR. I often point folks to godoc to see all the possible options, however it might be harder to read for no golang developers. Unfortunatelly you make a mistake and the property has to start with small letter. The name of the property is in annotation json: in the godoc. apiVersion: jaegertracing.io/v1kind: Jaegermetadata: name: jaeger namespace: tracingspec: strategy: production storage: type: elasticsearch options: es: server-urls: <ES URL> dependencies: elasticsearchNodesWanOnly: true — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#668?email_source=notifications&email_token=ABDWBU2PDQSSGSPR2ZVV5PDQLHEQDA5CNFSM4IYUMX52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7NQRGA#issuecomment-534448280>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABDWBU7UVZJD4ZM436YU5GTQLHEQDANCNFSM4IYUMX5Q> .

pavolloffay · 2019-09-24T12:02:38Z

Please let us know if it worked

mehstg · 2019-09-25T16:36:21Z

Hi Pavol

Unfortunately not, I can still see :

      ES_CLIENT_NODE_ONLY:  false
      ES_NODES_WAN_ONLY:    false

when I describe the job.

pavolloffay · 2019-09-26T07:49:22Z

Alright, we are doing one more mistake - the dependencies node should be nested under storage

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: jaeger
  namespace: tracing
spec:
  strategy: production
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: <ES URL>
    dependencies:
      elasticsearchNodesWanOnly: true

mehstg · 2019-09-30T09:34:22Z

Thanks for that. I can now see:
ES_NODES_WAN_ONLY: true

Unfortunately my job is still failing with the same error
kubectl logs jaeger-spark-dependencies-1569801300-57frm -n tracing 19/09/30 00:03:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 19/09/30 00:03:27 INFO ElasticsearchDependenciesJob: Running Dependencies job for 2019-09-30T00:00Z, reading from jaeger-span-2019-09-30 index, result storing to jaeger-dependencies-2019-09-30 19/09/30 00:04:28 ERROR NetworkClient: Node [https://vpc-prod-eu-west-1-prod01-jaeger-pzzdivavqjkmvtkxccfwua6ekm.eu-west-1.es.amazonaws.com:9200] failed (java.net.SocketTimeoutException: connect timed out); no other nodes left - aborting... Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

pavolloffay · 2019-09-30T09:36:23Z

@mehstg could you please paste full jaeger CR? I would like to see the full configuration and especially if you are using TLS.

mehstg · 2019-09-30T10:06:37Z

Of course.

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: jaeger
  namespace: tracing
spec:
  strategy: production
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: https://vpc-prod-eu-west-1-prod01-jaeger-pzzdivavqjkmvtkxccfwua6ekm.eu-west-1.es.amazonaws.com
    dependencies:
      elasticsearchNodesWanOnly: true
  ingress:
    enabled: false
  agent:
    strategy: DaemonSet
  collector:
    image: jaegertracing/jaeger-collector:1.14.0
  query:
    image: jaegertracing/jaeger-query:1.14.0
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""
    iam.amazonaws.com/role: prod-eu-west-1-prod01-pod-tracing-jaeger

I have just noticed we are pinning particular versions of the collector/query. Not sure if this could cause an issue.

pavolloffay · 2019-09-30T11:43:42Z

The version should not matter that much in this case. But I would not recommend pinning the version of the images. Jaeger operator knows the best which version of the components should be used.

I cannot debug your use case, but you can try the following configuration

Use http://vpc-prod-eu-west-1-prod01-jaeger-pzzdivavqjkmvtkxccfwua6ekm.eu-west-1.es.amazonaws.com (mind http instead of https) as URL for spark-dependencies
Set ES_CLIENT_NODE_ONLY to true

You can either edit the job spec manually, but it requires to undeploy the operator or deploy spark dependencies manually without the operator.

https://github.com/jaegertracing/spark-dependencies#elasticsearch

mehstg · 2019-10-02T07:54:38Z

I have no issue undeploying/redeploying. I cannot connect to ES via http though. It is blocked on the security group and we will not be able to modify that in production due to security reasons.
I did see on another thread someone seemed to fix this by using the flag es.net.ssl = true however I haven't managed to find that in the operator yet.

mehstg · 2019-10-04T13:03:08Z

@pavolloffay Is there any way I can just disable the jaeger-spark-dependencies? I am not sure it is even functionality I am using.

pavolloffay · 2019-10-04T16:03:48Z

Yes, enabled: false in the dependencies node.

Crevil · 2019-10-13T12:44:29Z

I also have problems getting the dependencies job to run. I've tried combinations of elasticsearchNodesWanOnly and elasticsearchClientNodeOnly with no luck. I'm unsure if this s an issue with the operator setting up the jobs or the jobs them self.
I'm thinking it's the latter as the configuration changes are reflected in the job environment variables just fine.

Any leads in to what I can do to debug further?

Crevil · 2019-10-13T14:06:26Z

I think I might be on to something. If server-urls contains a port number these are not propageted to ES_NODES with that port. It then defaults to 9200.
For AWS hosts the port sjould be 443 when accessing it with HTTPS

I’ll see if I can replicate this in a simple setup.

pavolloffay · 2019-10-14T09:09:03Z

ES_NODES should contain exactly the value of server-urls

jaeger-operator/pkg/cronjob/spark_dependencies.go

Line 137 in a059701

{Name: "ES_NODES", Value: sFlagsMap["es.server-urls"]},

Could you please paste here Jaeger CR and job spec which does not contain the same values for ES urls?

Crevil · 2019-10-14T09:56:38Z

I didn’t get any longer with it yesterday. Just wanted to add a bit more context. I’m back at my computer tomorrow and will make sure to post the info you ask for.

When giving this more thought: could it be that the two components have different defaults? We specified the hodt name in server-urls without a port number and it worked for everything except the dependencies job.

Crevil · 2019-10-15T06:27:00Z

Ok. Specifying the port number did in fact work as you expected. I had a bad configuration of the jaeger instance making it non-reflected in the jobs.

In other words. Setting elasticsearchNodesWanOnly: true and ensuring the specify the port of the AWS Elasticsearch host makes the jobs work as expected.

spec:
  strategy: production
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: https://vpc-jaeger-tracing-unique.region.es.amazonaws.com:443
    dependencies:
      enabled: true
      elasticsearchNodesWanOnly: true

This leaves me with the assumption from above regarding different default value handlings.
The collector worked fine without specifying the 443 port in the host but the jobs did not. Much like @mehstg specified the host in this issue.

Crevil · 2019-10-15T07:14:08Z

It does indeed looks to be the case.

In the jaeger collector there is no addition of a port-number if non is specified: elastic.SetURL(c.Servers...) but the spark dependencies job will add a default port number 9200:

* `ES_NODES`: A comma separated list of elasticsearch hosts advertising http. Defaults to
              localhost. Add port section if not listening on port 9200. (...)

So this confirms the odd behaviour of the collector working but the jobs do not.

I guess it is guarded by the documentation of jaeger-collector that states a full URL must be specified:

--es.server-urls string    The comma-separated list of Elasticsearch servers, must be full url i.e. http://localhost:9200 (default "http://127.0.0.1:9200")

It might be nice if the operator guarded against this as well or maybe the collector would fail to start if the port is missing. What do you think?

pavolloffay · 2019-10-15T08:41:02Z

@Crevil thanks for digging into this!

I do not understand how the collector can work when the port number is missing.

pavolloffay · 2019-10-15T09:50:15Z

note that port 9200 in spark-dependencies is added automatically by spark ES connector.

pavolloffay · 2019-10-18T09:19:45Z

I think the job should automatically set es.nodes.wan.only if the ES_NODES are specified. Most people are struggling with this.

Then we can check the hosts string and generate a warning if the port is missing.

Crevil · 2019-10-18T10:02:44Z

Setting es.nodes.wan.only would indeed have made my issues easier. This would be a breaking change, right?

olivere/elastic does no modifications on the URL before connecting (it just uses url.Parse() on provided values) and it uses an http.Client be default with a TLS configured http.Transport underneath.

Go's http.Transport uses connectMethodForRequest to connect. This in turn uses canonicalAddr to get the outbound address. Notice how a default port is added based on the scheme here. So HTTPS URLs will use port 443.

As AWS Elasticservice exposes the servers over HTTPS, this default port value is set and it works.

pavolloffay · 2019-10-18T11:54:43Z

Setting es.nodes.wan.only would indeed have made my issues easier. This would be a breaking change, right?

It should not be breaking change. The clients will be able to connect, the ES client will just switch off auto-discovery.

To make that work I will have to submit a PR to the operator to not set es.nodes.wan.only if the value was not specified in the CR.

pavolloffay · 2019-10-18T11:56:08Z

I have submitted #708 for the operator and jaegertracing/spark-dependencies#79 in spark-dependencies.

Crevil · 2019-10-18T12:08:36Z

Great. Thanks for taking time to add these changes. It should make it a lot easier for future users of this operator. 🙏

pavolloffay · 2019-10-18T12:47:02Z

np thanks for digging into this you helped the most!

pavolloffay · 2019-10-18T15:48:22Z

done in #708

pavolloffay mentioned this issue Sep 24, 2019

Document CR with all the options #671

Closed

pavolloffay mentioned this issue Oct 18, 2019

Pass only specified options to spark dependencies #708

Merged

pavolloffay closed this as completed Oct 18, 2019

tyree731 mentioned this issue Mar 31, 2023

[Bug]: jaeger-spark-dependency unable to connect to AWS Opensearch 2.5 #2206

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jaeger-spark-dependencies failing to access AWS Elasticsearch #668

jaeger-spark-dependencies failing to access AWS Elasticsearch #668

mehstg commented Sep 20, 2019

pavolloffay commented Sep 20, 2019

mehstg commented Sep 20, 2019

pavolloffay commented Sep 20, 2019

mehstg commented Sep 23, 2019 •

edited

Loading

pavolloffay commented Sep 23, 2019

mehstg commented Sep 24, 2019

pavolloffay commented Sep 24, 2019

mehstg commented Sep 24, 2019 via email

pavolloffay commented Sep 24, 2019

mehstg commented Sep 25, 2019

pavolloffay commented Sep 26, 2019

mehstg commented Sep 30, 2019

pavolloffay commented Sep 30, 2019

mehstg commented Sep 30, 2019 •

edited

Loading

pavolloffay commented Sep 30, 2019

mehstg commented Oct 2, 2019

mehstg commented Oct 4, 2019

pavolloffay commented Oct 4, 2019

Crevil commented Oct 13, 2019

Crevil commented Oct 13, 2019

pavolloffay commented Oct 14, 2019

Crevil commented Oct 14, 2019

Crevil commented Oct 15, 2019

Crevil commented Oct 15, 2019 •

edited

Loading

pavolloffay commented Oct 15, 2019

pavolloffay commented Oct 15, 2019

pavolloffay commented Oct 18, 2019

Crevil commented Oct 18, 2019

pavolloffay commented Oct 18, 2019

pavolloffay commented Oct 18, 2019

Crevil commented Oct 18, 2019

pavolloffay commented Oct 18, 2019

pavolloffay commented Oct 18, 2019

jaeger-spark-dependencies failing to access AWS Elasticsearch #668

jaeger-spark-dependencies failing to access AWS Elasticsearch #668

Comments

mehstg commented Sep 20, 2019

pavolloffay commented Sep 20, 2019

mehstg commented Sep 20, 2019

pavolloffay commented Sep 20, 2019

mehstg commented Sep 23, 2019 • edited Loading

pavolloffay commented Sep 23, 2019

mehstg commented Sep 24, 2019

pavolloffay commented Sep 24, 2019

mehstg commented Sep 24, 2019 via email

pavolloffay commented Sep 24, 2019

mehstg commented Sep 25, 2019

pavolloffay commented Sep 26, 2019

mehstg commented Sep 30, 2019

pavolloffay commented Sep 30, 2019

mehstg commented Sep 30, 2019 • edited Loading

pavolloffay commented Sep 30, 2019

mehstg commented Oct 2, 2019

mehstg commented Oct 4, 2019

pavolloffay commented Oct 4, 2019

Crevil commented Oct 13, 2019

Crevil commented Oct 13, 2019

pavolloffay commented Oct 14, 2019

Crevil commented Oct 14, 2019

Crevil commented Oct 15, 2019

Crevil commented Oct 15, 2019 • edited Loading

pavolloffay commented Oct 15, 2019

pavolloffay commented Oct 15, 2019

pavolloffay commented Oct 18, 2019

Crevil commented Oct 18, 2019

pavolloffay commented Oct 18, 2019

pavolloffay commented Oct 18, 2019

Crevil commented Oct 18, 2019

pavolloffay commented Oct 18, 2019

pavolloffay commented Oct 18, 2019

mehstg commented Sep 23, 2019 •

edited

Loading

mehstg commented Sep 30, 2019 •

edited

Loading

Crevil commented Oct 15, 2019 •

edited

Loading