Skip to content

Commit

Permalink
refactor: improve memory usage of concatenation of files
Browse files Browse the repository at this point in the history
  • Loading branch information
hmatalonga committed Sep 13, 2019
1 parent e0cadc8 commit f9a7f48
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 3 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# tags from Docker Hub.
FROM python:3.7-slim

LABEL Name=dataset-converter Version=0.2.0
LABEL Name=dataset-converter Version=0.2.1
LABEL maintainer="Hugo Matalonga <[email protected]>"

ARG UID=1000
Expand Down
8 changes: 6 additions & 2 deletions app/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,8 +165,12 @@ def load_multiple(options):

if not options['partition']:
print('Merging all processed chunks')
# concat the list into dataframe
return pd.concat(chunk_list)
# concat the list into dataframe
df = None
while chunk_list:
df = pd.concat([df, chunk_list.pop(0)], ignore_index=True)

return df


def convert_df(params):
Expand Down
1 change: 1 addition & 0 deletions config/samples.yml.example
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ chunksize: 1000000
compression: true
partition: false
usecols:
- id
- device_id
- timestamp
- app_version
Expand Down

0 comments on commit f9a7f48

Please sign in to comment.