Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too large memory usage in last version #103

Closed
asmyasnikov opened this issue May 20, 2020 · 35 comments
Closed

Too large memory usage in last version #103

asmyasnikov opened this issue May 20, 2020 · 35 comments

Comments

@asmyasnikov
Copy link

asmyasnikov commented May 20, 2020

Hi!
I'm use cgroups memory limitations from docker-compose.yml

  mbtiles:
    image: mbtiles
    environment:
      - TILE_DIR=/external,/internal
      - PORT=80
    networks:
      phoenix:
        ipv4_address: 172.18.0.21
    read_only: True
    restart: always
    volumes: ['/opt/mbtiles/:/internal', '/opt/usb/mbtiles/:/external']
    logging:
      driver: "none"
    deploy:
      restart_policy:
        condition: any
      resources:
        limits:
          memory: 30M

Limit 30M not exceed in older version. But last version (ca4a94e) want about 1G of memory without limitation =(
This is a critical for single-board computers...
Log:

mbtiles_1               | time="2020-05-20T15:45:59Z" level=info msg="Searching for tilesets in /external\n"
mbtiles_1               | time="2020-05-20T15:45:59Z" level=info msg="Searching for tilesets in /internal\n"
mbtiles_1               | time="2020-05-20T15:46:01Z" level=info msg="Searching for tilesets in /external\n"
mbtiles_1               | time="2020-05-20T15:46:01Z" level=info msg="Searching for tilesets in /internal\n"
docker_mbtiles_1 exited with code 137
mbtiles_1               | time="2020-05-20T15:46:04Z" level=info msg="Searching for tilesets in /external\n"
mbtiles_1               | time="2020-05-20T15:46:04Z" level=info msg="Searching for tilesets in /internal\n"

internal and external paths contains big mbtiles files:

root@debian:/etc/docker# du -sh /opt/mbtiles/*
9,2M    /opt/mbtiles/countries-raster.mbtiles
18G     /opt/mbtiles/world_yandex.mbtiles
root@debian:/etc/docker# du -sh /opt/usb/mbtiles/*
52G     /opt/usb/mbtiles/osm-2017-07-03-v3.6.1-planet.mbtiles
@brendan-ward
Copy link
Collaborator

@asmyasnikov thanks for reporting this. I haven't seen that to be the case in my own usage so far.

Can you help me understand some of the factors involved here?

Are you building your own Docker container for mbtileserver? If so, which version of Go?

Does it run out of memory immediately while searching for tiles, or does the server run for a while before running out of memory?

How many tiles are in internal vs external folders?

@asmyasnikov
Copy link
Author

internal and external paths contains big mbtiles files:

root@debian:/etc/docker# du -sh /opt/mbtiles/*
9,2M    /opt/mbtiles/countries-raster.mbtiles
18G     /opt/mbtiles/world_yandex.mbtiles
root@debian:/etc/docker# du -sh /opt/usb/mbtiles/*
52G     /opt/usb/mbtiles/osm-2017-07-03-v3.6.1-planet.mbtiles

@asmyasnikov
Copy link
Author

asmyasnikov commented May 20, 2020

Are you building your own Docker container for mbtileserver? If so, which version of Go?

I'm build mbtileserver with go1.14
And yes, I'm running mbtileserver inside a docker container
My Dockerfile:

ARG ARCH=arm32v7
ARG QARCH=arm
ARG GOARCH=arm
ARG ORIGIN=github.com/consbio/mbtileserver
FROM ${ARCH}/busybox:glibc AS base
ARG QARCH=arm
ARG GOARCH=arm
ARG ORIGIN=None
FROM multiarch/qemu-user-static:x86_64-${QARCH} as qemu
ARG ARCH=arm32v7
ARG GOARCH=arm
ARG ORIGIN=None
FROM --platform=linux/${GOARCH} ${ARCH}/golang:latest AS build
COPY --from=qemu /usr/bin/ /usr/bin/
ARG ORIGIN=None
WORKDIR /build
RUN git clone https://${ORIGIN}.git .
RUN go build -mod=vendor -ldflags="-w -s" -o ./mbtileserver
RUN cp $(ldd ./mbtileserver | grep libdl.so | awk '{print $3}') /build/libdl.so.2
FROM base
ARG VERSION=None
ENV VERSION=$VERSION
COPY --from=build /build/libdl.so.2 /lib/libdl.so.2
COPY --from=build /build/mbtileserver /mbtileserver
ENTRYPOINT ["/mbtileserver"]
CMD []

Arch with this bug - amd64 (qemu - x86_64)

@asmyasnikov
Copy link
Author

Does it run out of memory immediately while searching for tiles, or does the server run for a while before running out of memory?

No. A familiar message "Use Ctrl-C to exit the server" is missing

@asmyasnikov
Copy link
Author

asmyasnikov commented May 20, 2020

Without limitation of memory I have

mbtiles_1               | time="2020-05-20T16:13:10Z" level=info msg="Searching for tilesets in /external\n"
mbtiles_1               | time="2020-05-20T16:13:10Z" level=info msg="Searching for tilesets in /internal\n"
mbtiles_1               |
mbtiles_1               | --------------------------------------
mbtiles_1               | Use Ctrl-C to exit the server
mbtiles_1               | --------------------------------------
mbtiles_1               | HTTP server started on port 80
mbtiles_1               | time="2020-05-20T16:13:28Z" level=info msg="Published 3 services"

And after that usage of memory from 100% decresed to normal.

@asmyasnikov
Copy link
Author

Снимок экрана от 2020-05-20 19-21-36
Start of process

@brendan-ward
Copy link
Collaborator

Thanks for the additional info. We are allocating a number of objects at startup, but most of those are proportional to the number of tilesets rather than their size.

I don't have very large tilesets available for testing with at the moment, but could rig up a test setup where memory is similarly limited during startup. I can't promise I can get to this immediately.

You might also be able to do some minimal testing here that could help:

  1. try a test where you start the server but point it to an empty directory. If this fails, this tells us we're grabbing a lot of memory during startup that isn't related to tiles.
  2. try a test where you start the server only pointing at your internal directory.
  3. try another test where you start the server only pointing at your external directory. If this fails but 2 does not, then please try an additional test where you have the 56GB tileset on an internal directory instead.

@asmyasnikov
Copy link
Author

try a test where you start the server but point it to an empty directory. If this fails, this tells us we're grabbing a lot of memory during startup that isn't related to tiles.

No too memory usage

@asmyasnikov
Copy link
Author

try a test where you start the server only pointing at your internal directory.

Снимок экрана от 2020-05-20 19-36-16

@brendan-ward
Copy link
Collaborator

Sorry, I couldn't understand what you meant by:

No too memory usage

Do you mean it started correctly?

@asmyasnikov
Copy link
Author

try another test where you start the server only pointing at your external directory.

No memory excess. Yes, start correctly

@asmyasnikov
Copy link
Author

But I cannot view map from /opt/usb/mbtiles/osm-2017-07-03-v3.6.1-planet.mbtiles (pbf vector tiles). This tileset present in /services response
Too memory usage tileset /opt/mbtiles/world_yandex.mbtiles (from internal) viewed on my map (mbtileserver returning tiles by z/x/y requests). Tileset /opt/mbtiles/countries-raster.mbtiles (from internal) also viewed on my map.
Shaize!
I find out the features...

@asmyasnikov
Copy link
Author

asmyasnikov commented May 20, 2020

On active moving of map (intensive requests z/x/y) mbtileserver also too memory usage
Снимок экрана от 2020-05-20 20-00-26
And I have some log messages

mbtiles_1               |
mbtiles_1               | --------------------------------------
mbtiles_1               | Use Ctrl-C to exit the server
mbtiles_1               | --------------------------------------
mbtiles_1               | HTTP server started on port 80
mbtiles_1               | time="2020-05-20T16:58:14Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/131/88.png: write tcp 172.18.0.21:80->172.18.0.14:48110: write: broken pipe" 
mbtiles_1               | time="2020-05-20T16:58:17Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/131/90.png: write tcp 172.18.0.21:80->172.18.0.14:48244: write: broken pipe" 
mbtiles_1               | time="2020-05-20T16:58:21Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/139/87.png: write tcp 172.18.0.21:80->172.18.0.14:48322: write: broken pipe" 
mbtiles_1               | time="2020-05-20T16:58:21Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/139/86.png: write tcp 172.18.0.21:80->172.18.0.14:48324: write: broken pipe" 
mbtiles_1               | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/138/88.png: write tcp 172.18.0.21:80->172.18.0.14:48314: write: broken pipe" 
mbtiles_1               | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/7/66/44.png: write tcp 172.18.0.21:80->172.18.0.14:48048: write: broken pipe"  
mbtiles_1               | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/7/65/43.png: write tcp 172.18.0.21:80->172.18.0.14:48068: write: broken pipe"  
mbtiles_1               | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/135/90.png: write tcp 172.18.0.21:80->172.18.0.14:48088: write: broken pipe" 
mbtiles_1               | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/131/89.png: write tcp 172.18.0.21:80->172.18.0.14:48224: write: broken pipe" 
mbtiles_1               | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/10/540/346.png: write tcp 172.18.0.21:80->172.18.0.14:48384: write: broken pipe"
mbtiles_1               | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/9/272/175.png: write tcp 172.18.0.21:80->172.18.0.14:48332: write: broken pipe"
mbtiles_1               | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/132/84.png: write tcp 172.18.0.21:80->172.18.0.14:48564: write: broken pipe" 
mbtiles_1               | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/9/253/169.png: write tcp 172.18.0.21:80->172.18.0.14:48670: write: broken pipe"
mbtiles_1               | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/9/252/171.png: write tcp 172.18.0.21:80->172.18.0.14:48666: write: broken pipe"
mbtiles_1               | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/130/84.png: write tcp 172.18.0.21:80->172.18.0.14:48576: write: broken pipe" 
mbtiles_1               | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/9/251/170.png: write tcp 172.18.0.21:80->172.18.0.14:48676: write: broken pipe"
mbtiles_1               | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/131/84.png: write tcp 172.18.0.21:80->172.18.0.14:48568: write: broken pipe" 
mbtiles_1               | time="2020-05-20T17:00:07Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/131/85.png: write tcp 172.18.0.21:80->172.18.0.14:48558: write: broken pipe" 

But this behaviour can be because single-board computer (LattePanda 4/64) too slow on network connections. May be

@brendan-ward
Copy link
Collaborator

I usually seebroken pipe messages when the client (browser map library) has canceled the tile requests; usually when aggressively panning / zooming in the web map.

Are you able to test with the prior commit (f2305b5) just against your internal directory? It would be interesting to know if the refactor introduced a major memory regression.

@asmyasnikov
Copy link
Author

asmyasnikov commented May 20, 2020

Are you able to test with the prior commit (f2305b5) just against your internal directory? It would be interesting to know if the refactor introduced a major memory regression.

Also too memory usage in f2305b5. With limitation process always killed by cgroups policies. Without limitaion I have next picture
Снимок экрана от 2020-05-20 20-00-26

@asmyasnikov
Copy link
Author

I m understand nothing now
I will search my stable (by memory usage version) tomorrow.

@brendan-ward
Copy link
Collaborator

Thanks @asmyasnikov for running these tests!

It looks like the memory on startup is stable between the prior and latest commits, so we didn't introduce a memory regression as far as I can tell. That's good news, at least.

I wonder if Go is not respecting the cgroups memory policy? Are you indeed seeing OOM errors in the log, as described here: https://fabianlee.org/2020/01/18/docker-placing-limits-on-container-memory-using-cgroups/

It could also be another issue:

@asmyasnikov
Copy link
Author

asmyasnikov commented May 20, 2020

I found small interpretation about not viewing map from vector (pbf) mbtiles file

root@debian:/du -s # used file
53492336        /opt/usb/mbtiles/osm-2017-07-03-v3.6.1-planet.mbtiles
root@debian:/du -s # original file
53492348        /opt/usb2/mbtiles/osm-2017-07-03-v3.6.1-planet.mbtiles

File was corrupted. Maybe because a storage is a USB-device
I wil try to restore file and retry running mbtileserver with read-only volume mounting

@asmyasnikov
Copy link
Author

asmyasnikov commented May 20, 2020

image

This is latest stable version ( https://github.com/consbio/mbtileserver/commit/8c21f5ac4c1a6fa3e3a0d6115f86881fba359ddc ) of mbtileserver which I was used long time. I re-build (few minute ago) this version with my Dockerfile (equal golang, sqlite and other). docker-compose.yml are not modified. Load my `/internal` path with two mbtiles files (9MB and 18GB). When I dragging the map actively there are no load big CPU or memory peaks

@asmyasnikov
Copy link
Author

I wonder if Go is not respecting the cgroups memory policy? Are you indeed seeing OOM errors in the log, as described here: https://fabianlee.org/2020/01/18/docker-placing-limits-on-container-memory-using-cgroups/

root@debian:/etc/docker# dmesg | grep -Ei "killed process"
[  531.528371] Killed process 7577 (mbtileserver) total-vm:1169360kB, anon-rss:13744kB, file-rss:0kB, shmem-rss:0kB
[  593.809625] Killed process 8035 (mbtileserver) total-vm:1169104kB, anon-rss:15044kB, file-rss:0kB, shmem-rss:0kB
[  656.035194] Killed process 8506 (mbtileserver) total-vm:1167952kB, anon-rss:11648kB, file-rss:0kB, shmem-rss:0kB
[  718.287512] Killed process 8960 (mbtileserver) total-vm:1168652kB, anon-rss:15032kB, file-rss:0kB, shmem-rss:0kB
[  780.535074] Killed process 9438 (mbtileserver) total-vm:1242836kB, anon-rss:13756kB, file-rss:0kB, shmem-rss:0kB
[ 1058.944100] Killed process 11774 (mbtileserver) total-vm:1168640kB, anon-rss:15004kB, file-rss:4kB, shmem-rss:13780kB
[ 1061.036764] Killed process 11892 (mbtileserver) total-vm:1242600kB, anon-rss:17232kB, file-rss:4kB, shmem-rss:16652kB
[ 1063.157830] Killed process 12010 (mbtileserver) total-vm:1167460kB, anon-rss:13328kB, file-rss:4kB, shmem-rss:16372kB
[ 1065.414487] Killed process 12129 (mbtileserver) total-vm:1168612kB, anon-rss:17240kB, file-rss:4kB, shmem-rss:16468kB
[ 1068.108419] Killed process 12246 (mbtileserver) total-vm:1242600kB, anon-rss:13160kB, file-rss:4kB, shmem-rss:16564kB
[ 1071.586636] Killed process 12372 (mbtileserver) total-vm:1242856kB, anon-rss:15192kB, file-rss:4kB, shmem-rss:16364kB
[ 1076.798350] Killed process 12506 (mbtileserver) total-vm:1316588kB, anon-rss:15220kB, file-rss:4kB, shmem-rss:16436kB
[ 1085.463741] Killed process 12657 (mbtileserver) total-vm:1242600kB, anon-rss:15160kB, file-rss:4kB, shmem-rss:16464kB

@brendan-ward
Copy link
Collaborator

Interesting! And confusing!

There really aren't substantive differences between that commit and the previous one you tested above. It seems that f2305b5 should have had the same memory profile as that branch, because it was before the substantive changes were merged in.

Would you mind trying with the common ancestor (a613fe5) between that branch and master?

@asmyasnikov
Copy link
Author

Would you mind trying with the common ancestor (a613fe5) between that branch and master?

It is a stable version (by memory and CPU on start and moving map)

@asmyasnikov
Copy link
Author

ca4a94e also ok

@asmyasnikov
Copy link
Author

asmyasnikov commented May 20, 2020

image

446c5e4 some peaks on moving map and you will see some differences between memory and cache

@asmyasnikov
Copy link
Author

Mistics...

@brendan-ward
Copy link
Collaborator

ca4a94e is the tip of master. Is that correct? This is the one that started all the issues at the top of the thread.

@asmyasnikov
Copy link
Author

image

ca4a94e was look like this

@asmyasnikov
Copy link
Author

f2305b5 also rebuild, testing and getting this
image
I have no idea whats changed

@asmyasnikov
Copy link
Author

3 hours ago I was see too large of memory usage. Only mbtileserver (from 24 containers) using so large memory.
Sorry. I will be search another causes...

@asmyasnikov
Copy link
Author

O! I remembered. All of last tests I make with read-only volume /internal:ro.
Now I will try to tests without ro flag

@asmyasnikov
Copy link
Author

image

f2305b5 is looks wrong

@asmyasnikov
Copy link
Author

image

It is my stable version ( 8c21f5a ) Sorry. This bug only partly about of mbtileserver. Most causes about docker mounting ro/rw volumes. Sorry yet.

@asmyasnikov
Copy link
Author

asmyasnikov commented May 20, 2020

For your calm - see latest (head) version stats with ro volumes
image
Sorry
image

@brendan-ward
Copy link
Collaborator

Thanks for your detailed investigation @asmyasnikov ! I appreciate all the effort you invested in this and in reporting back on it here.

I'm glad that our recent changes didn't introduce a memory regression, and that in general our memory profile is very low.

@asmyasnikov
Copy link
Author

image

=)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants