Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to configure read autoscaling for dynamoDB #1267

Closed
eraac opened this issue Nov 15, 2019 · 10 comments
Closed

Unable to configure read autoscaling for dynamoDB #1267

eraac opened this issue Nov 15, 2019 · 10 comments
Labels
stale A stale issue or PR that will automatically be closed.

Comments

@eraac
Copy link
Contributor

eraac commented Nov 15, 2019

Describe the bug
DynamoDB AutoScaling doesn't work for read

To Reproduce
Steps to reproduce the behavior:

  1. Download this config and use it for Loki
  2. Started Loki (latest commit)

Expected behavior
Have autoscaling for read and write

Environment:

  • Infrastructure: ec2 instance (with iam role attached to the instance)

Screenshots dynamodb console
image

logs

level=info ts=2019-11-18T10:16:21.962837889Z caller=table_manager.go:220 msg="synching tables" expected_tables=1
level=info ts=2019-11-18T10:16:24.044363479Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)

Question
Does dynamodb + autoscaling configured can work without #1226 ? Everything I try with loki 0.4.0 result as a fail, does I miss something in the configuration or no one have try this? (I will be a bit surprised)

@eraac
Copy link
Contributor Author

eraac commented Nov 15, 2019

More information

Loki logs (UTC+0)

level=info ts=2019-11-15T10:10:08.392559986Z caller=dynamodb_table_client.go:301 msg="updating provisioned throughput on table" table=loki_index_2602 old_read=30 old_write=10 new_read=30 new_write=300
level=info ts=2019-11-15T10:11:08.392703076Z caller=dynamodb_table_client.go:301 msg="updating provisioned throughput on table" table=loki_index_2602 old_read=30 old_write=10 new_read=30 new_write=300
level=info ts=2019-11-15T10:42:02.824300182Z caller=dynamodb_table_client.go:301 msg="updating provisioned throughput on table" table=loki_index_2602 old_read=30 old_write=10 new_read=30 new_write=300
level=info ts=2019-11-15T11:13:22.33312897Z caller=dynamodb_table_client.go:301 msg="updating provisioned throughput on table" table=loki_index_2602 old_read=30 old_write=10 new_read=30 new_write=300
level=info ts=2019-11-15T11:15:22.298819647Z caller=dynamodb_table_client.go:301 msg="updating provisioned throughput on table" table=loki_index_2602 old_read=30 old_write=10 new_read=30 new_write=300
level=info ts=2019-11-15T12:15:22.320476983Z caller=dynamodb_table_client.go:301 msg="updating provisioned throughput on table" table=loki_index_2602 old_read=30 old_write=10 new_read=30 new_write=300

I have the feeling that the new_write and the old_write are reversed.

level=info ts=2019-11-15T15:41:30.605722793Z caller=dynamodb_table_client.go:308 msg="updating provisioned throughput on table" table=loki_index_2602 old_read=30 old_write=10 new_read=30 new_write=100
level=info ts=2019-11-15T15:52:30.607094963Z caller=dynamodb_table_client.go:308 msg="updating provisioned throughput on table" table=loki_index_2602 old_read=30 old_write=16 new_read=30 new_write=100

dynamodb console (UTC+0)
Metrics
image

Scaling activities
image

Scaling activities when table is inactive
image

cloudtrail log (UTC+1)
Cloudtrail logs from 12h10 to 12h20 for event UpdateTable -> https://gist.github.com/Eraac/8c6336119fae9e4bd164a4118dcd7d3f

Some entries are weird, like this one

@bboreham
Copy link
Contributor

bboreham commented Nov 28, 2019

It looks like the Tablemanager is ignoring your desire to use AWS auto-scaling, and simply overwriting with the static 300 number each time it spots the provision has changed.

Possibly because applicationautoscaling needs a sub-field url ? This shouldn't be necessary.

I gave up on AWS auto-scaling long ago and wrote the "metrics-based scaling".

@bboreham
Copy link
Contributor

Given more logs, especially from the beginning of the run, it's possible that something might give a clue.

I have the feeling that the new_write and the old_write are reversed.

Old is what it found; new is what it's about to set it to.

@eraac
Copy link
Contributor Author

eraac commented Nov 28, 2019

Old is what it found; new is what it's about to set it to.

Just now from the log

level=info ts=2019-11-28T14:55:27.960803543Z caller=dynamodb_table_client.go:308 msg="updating provisioned throughput on table" table=loki_index_2604 old_read=100 old_write=47 new_read=100 new_write=30
level=info ts=2019-11-28T15:13:25.962475431Z caller=dynamodb_table_client.go:308 msg="updating provisioned throughput on table" table=loki_index_2604 old_read=100 old_write=35 new_read=100 new_write=30

graph
image

I've configured a cooldown of 1h, but the last two update have less than 20 minutes interval (generating LimitExceededException)

Given more logs, especially from the beginning of the run

I've 268840 line of log, but I can grep some keyword

autoscaling

level=info ts=2019-11-28T15:19:27.038040931Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:20:24.083825356Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:20:27.015308895Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:21:24.013114472Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:21:27.031446917Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:22:22.036986715Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:22:25.034468798Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:23:22.022567013Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:23:27.541360812Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:24:24.035129993Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:24:27.025391016Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:25:24.172227843Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:25:27.026707703Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:26:24.016891186Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:26:27.03978903Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:27:24.036532193Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:27:27.014925951Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:28:24.010808289Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:28:27.02810959Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)
level=info ts=2019-11-28T15:29:22.031204169Z caller=aws_autoscaling.go:144 msg="enabling autoscaling on table" table=(MISSING)

@mostlyAtNight
Copy link

Hi all,

I'm experiencing similar problems here. I thought I'd set up loki to set write (and read) autoscaling on active tables (and the first inactive) but it's only setting write autoscaling for some reason.

I thought this was because I updated by configuration after my tables were initially created but I checked the settings in AWS on a newly created active index table last night and can see that it's still missing the read autoscaling settings:

image

The logs show this when the new table was created:

level=info ts=2019-12-18T23:50:49.046121842Z caller=table_manager.go:220 msg="synching tables" expected_tables=8
level=info ts=2019-12-18T23:50:49.068163415Z caller=table_manager.go:363 msg="creating table" table=loki_prod_index_2607
level=info ts=2019-12-18T23:52:49.04608598Z caller=table_manager.go:220 msg="synching tables" expected_tables=8
level=info ts=2019-12-18T23:54:49.046061857Z caller=table_manager.go:220 msg="synching tables" expected_tables=8

.. and my config is shown below. Any help you can give much appreciated. Happy to attach more information if it helps.

config:
  schema_config:
    configs:
    - from: 2019-11-01
      store: aws
      object_store: aws
      schema: v9
      index:
        prefix: loki_prod_index_
        period: 168h
  storage_config:
    aws:
      s3: s3://REDACTED:REDACTED@eu-west-1/lutra-loki-prod
      dynamodbconfig:
        dynamodb: dynamodb://REDACTED:REDACTED@eu-west-1
        applicationautoscaling: https://REDACTED:REDACTED@eu-west-1
  table_manager:
    retention_deletes_enabled: true
    retention_period: 26208h
    index_tables_provisioning:
      # Active tables
      # Active tables will use provisioned throughput mode, not on-demand
      provisioned_throughput_on_demand_mode: false
      # Starting provisoned throughput values for new tables
      provisioned_read_throughput: 1
      provisioned_write_throughput: 1
      read_scale:
        enabled: true
        role_arn: arn:aws:iam::REDACTED:role/aws-service-role/dynamodb.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_DynamoDBTable
        # 1 unit is $0.000147 / hr = ~$0.1 / mo. / table
        min_capacity: 1
        # ~$0.5 / mo. on 1 active table
        max_capacity: 10
        # DynamoDB minimum seconds between each autoscale up.
        out_cooldown: 1800
        # DynamoDB minimum seconds between each autoscale down.
        in_cooldown: 3600
        # DynamoDB target ratio of consumed capacity to provisioned capacity.
        target: 80
      write_scale:
        enabled: true
        role_arn: arn:aws:iam::REDACTED:role/aws-service-role/dynamodb.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_DynamoDBTable
        # 1 unit is $0.000735 / hr = ~$0.5 / mo. / table
        min_capacity: 1
        # ~$5 / mo. on 1 active table
        max_capacity: 10
        # DynamoDB minimum seconds between each autoscale up. (default)
        out_cooldown: 1800
        # DynamoDB minimum seconds between each autoscale down. (default)
        in_cooldown: 3600
        # DynamoDB target ratio of consumed capacity to provisioned capacity.
        target: 80
      # Inactive tables
      # The most recent inactive table will still be auto-scaled but the 
      # rest should be set to use on-demand throughput
      inactive_throughput_on_demand_mode: true
      inactive_read_throughput: 1
      inactive_write_throughput: 1
      inactive_write_scale_lastn: 0
      inactive_read_scale_lastn: 1
      inactive_read_scale:
        enabled: true
        role_arn: arn:aws:iam::REDACTED:role/aws-service-role/dynamodb.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_DynamoDBTable
        # 1 unit is $0.000147 / hr = ~$0.1 / mo. / table
        min_capacity: 1
        # ~$0.5 / mo. on 1 inactive table
        max_capacity: 10
        # DynamoDB minimum seconds between each autoscale up.
        out_cooldown: 1800
        # DynamoDB minimum seconds between each autoscale down.
        in_cooldown: 3600
        # DynamoDB target ratio of consumed capacity to provisioned capacity.
        target: 80
      #inactive_write_scale:
      #  enabled: true
      #  role_arn: arn:aws:iam::REDACTED:role/aws-service-role/dynamodb.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_DynamoDBTable
      #  # 1 unit is $0.000147 / hr = ~$0.1 / mo. / table
      #  min_capacity: 1
      #  # ~$0.5 / mo. on 1 inactive table
      #  max_capacity: 1
      #  # DynamoDB minimum seconds between each autoscale up.
      #  out_cooldown: 1800
      #  # DynamoDB minimum seconds between each autoscale down.
      #  in_cooldown: 3600
      #  # DynamoDB target ratio of consumed capacity to provisioned capacity.
      #  target: 80

@mostlyAtNight
Copy link

Hi people,

Any thoughts on this issue? Let me know if I can provide more useful information.

Kind regards,

Pete

@bboreham
Copy link
Contributor

bboreham commented Jan 19, 2020

Setting the read scaling parameters on DynamoDB is not implemented, sorry.

Note that I plan to remove the AWS auto-scaling code from Cortex entirely.

EDIT: I think what happened is there was no way to set read scaling parameters, then they were added for the metrics-based scaling, and never implemented for AWS auto-scaling.

@mostlyAtNight
Copy link

Hi @bboreham - thanks for your reply. Would read/write scaling work if I switch to metrics-based scaling?

@eraac
Copy link
Contributor Author

eraac commented Jan 21, 2020

Make sense, if i understand well Loki use Cortex with AWS autoscaling mode, so Loki team has to update the code and use the "metrics-based" scaling mode? (which is implemented in cortex independently of AWS?)

I guess, this will fix the weird behaviors with write scaling? (see my second post)

@stale
Copy link

stale bot commented Feb 20, 2020

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Feb 20, 2020
@stale stale bot closed this as completed Feb 27, 2020
cyriltovena pushed a commit to cyriltovena/loki that referenced this issue Jun 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale A stale issue or PR that will automatically be closed.
Projects
None yet
Development

No branches or pull requests

3 participants