Releases · innosat-mats/rac-extract-payload

Bump certifi from 2022.9.24 to 2022.12.7 in /raclambda by @dependabot in #164
Refactor lambda to expect and handle a single file at a time by @e-larsson in #167
Increase memory and storage by @skymandr in #172
Partition by hour by @skymandr in #174

New Contributors

@dependabot made their first contribution in #164

Full Changelog: v1.2.0...v1.3.0

Contributors

skymandr, dependabot, and noreklint

Assets 5

07 Dec 09:52

skymandr

v1.2.0

20b4dcc

v1.2.0: Parquet partition update

🌟 Features

This changes the output directory structure (the "partitioning scheme") when writing Parquet from

y/m/d/STREAM_filename.parquet

STREAM/y/m/d/filename.parquet.

This structure is preferable, since it makes it very easy for e.g. a Lambda function listening for new files to know if it should wake up or not. The downside is that this makes it a little less convenient to read data from different sources at the same time, but no more so, than from different CSVs, so in a sense this brings the Parquet writing back in line with the CSV/PNG/JSON pipeline.

🐛️ Bugs

📋 Documentation

No changes

📋 Documentation

No changes

🛠 System

No changes

👷 Chore / Maintenance

No changes

Assets 5

30 Nov 14:27

skymandr

v1.1.2

06c89d7

v1.1.2: Optional image data

🌟 Features

No changes

🐛️ Bugs

Parquet schema now allows ImageData to be empty. This is necessary because we want the meta data associated with an image, even if the image itself is broken.

📋 Documentation

No changes

🛠 System

No changes

👷 Chore / Maintenance

No changes

Assets 5

30 Nov 11:17

skymandr

v1.1.1

336e345

v1.1.1: Better error-handling when parsing JPEGs

🌟 Features

No changes

🐛️ Bugs

The default error handling from libjpeg is designed to exit the program on certain errors where we just want to skip that step and continue processing. This lead to .rac-files containing corrupt JPEG data breaking the entire processing (see #153). This release fixes that by implementing a custom error-handler.

📋 Documentation

No changes

🛠 System

No changes

👷 Chore / Maintenance

No changes

Assets 5

30 Nov 15:23

skymandr

v0.2.8

b7c1192

v0.2.8: Better error-handling when parsing JPEGs

🌟 Features

No changes

🐛️ Bugs

📋 Documentation

No changes

🛠 System

No changes

👷 Chore / Maintenance

No changes

Assets 5

25 Nov 09:09

skymandr

v1.1.0

8e05798

v1.1.0 Day one patch: New file name and schema conventions

🌟 Features

This release changes so that parquet files are written to different files based on both packet type and original file, using different parquet schemas depending on packet. This is preferable, since otherwise Go will helpfully add default values (e.g 0) to columns that belong to another packet type, making the origin of rows hard to disambiguate, and making it hard to filter.

The new file naming scheme also makes it easy to filter what files to read in PyArrow, which should be useful to save on resources.

🐛️ Bugs

Updated time convention from micro seconds to nano seconds for EXPDate output.

📋 Documentation

rac -help has been updated with information about he updated outputs.

🛠 System

No changes

👷 Chore / Maintenance

No changes

Assets 5

24 Nov 13:18

skymandr

v1.0.0

5cc92ce

v1.0.0 Major release: New output format

🌟 Features

This release introduces a new output format, Parquet 0. This is a compact binary format, similar to CSV, but made for high-throughput data processing. It has excellent support in Python through e.g. PyArrow 1. To write to parquet use the -parquet flag.

Also note that the option to write to AWS directly has been removed in this release; see System below.

Details on the outputs

The parquet files follow the same naming conventions used in the CSVs, but the header row is stored as meta-data instead. Parquet files support variable length rows, so instead of one file per packet type, one file per input file is produced.

In addition, the parquet files are written using a partitioning scheme so that data for each day is written to a file in a directory for that day. This means that files with the same name may occur in directories for subsequent days, if the original RAC-file covers two days. Partitioning is performed based on the CUC time of the source packet.

When writing to parquet the PNG-files are stored in the parquet files themselves, rather than as separate files. This introduces two new columns:

ImageName: the name of the PNG-image, if it had been written to disk,
ImageData: the parsed PNG data.

Capability to write to CSV/PNG/JSON is retained as default and writes the same files as before, but some headers etc. have been updated in the resulting files, so any scripts parsing those may have to be updated.

For further details, see rac -help!

🐛️ Bugs

No changes

📋 Documentation

rac -help has been updated with information about he updated and new output formats.

🛠 System

AWS is no-longer supported directly from the RAC binary, see #148 for rationale. AWS sync will henceforth be handled by the calling outer layer, such as a Lambda function.

👷 Chore / Maintenance

Minor fixes of typographical errors and the like.

Assets 5

09 Nov 12:56

skymandr

v0.2.7

dfe2f71

v0.2.7 Continue interrupted multi packet

🌟 Features

Added new command line argument -dregs <path>. This lets the user specify a path where to read and write temporary "dregs" files [0] used for saving and continuing multi-packet data between batch runs.

🐛️ Bugs

No changes

📋 Documentation

No changes

🛠 System

No changes

👷 Chore / Maintenance

No changes

[0]: Data Remaining after Extracting Group of Source packets

Assets 5

Releases: innosat-mats/rac-extract-payload

v1.4.0 More restrictive partitioning

What's Changed

Contributors

v1.3.1

What's Changed

Contributors

v1.3.0 Partition by hour

What's Changed

New Contributors

Contributors

v1.2.0: Parquet partition update

🌟 Features

🐛️ Bugs

📋 Documentation

📋 Documentation

🛠 System

👷 Chore / Maintenance

v1.1.2: Optional image data

🌟 Features

🐛️ Bugs

📋 Documentation

🛠 System

👷 Chore / Maintenance

v1.1.1: Better error-handling when parsing JPEGs

🌟 Features

🐛️ Bugs

📋 Documentation

🛠 System

👷 Chore / Maintenance

v0.2.8: Better error-handling when parsing JPEGs

🌟 Features

🐛️ Bugs

📋 Documentation

🛠 System

👷 Chore / Maintenance

v1.1.0 Day one patch: New file name and schema conventions

🌟 Features

🐛️ Bugs

📋 Documentation

🛠 System

👷 Chore / Maintenance

v1.0.0 Major release: New output format

🌟 Features

Details on the outputs

🐛️ Bugs

📋 Documentation

🛠 System

👷 Chore / Maintenance

v0.2.7 Continue interrupted multi packet

🌟 Features

🐛️ Bugs

📋 Documentation

🛠 System

👷 Chore / Maintenance