Skip to content

Releases: innosat-mats/rac-extract-payload

v1.4.0 More restrictive partitioning

01 Mar 13:43
f38baa3
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.3.1...v1.4.0

v1.3.1

01 Feb 11:44
6e20c68
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.3.0...v1.3.1

v1.3.0 Partition by hour

01 Feb 10:17
aadb4b8
Compare
Choose a tag to compare

What's Changed

  • Bump certifi from 2022.9.24 to 2022.12.7 in /raclambda by @dependabot in #164
  • Refactor lambda to expect and handle a single file at a time by @e-larsson in #167
  • Increase memory and storage by @skymandr in #172
  • Partition by hour by @skymandr in #174

New Contributors

Full Changelog: v1.2.0...v1.3.0

v1.2.0: Parquet partition update

07 Dec 09:52
20b4dcc
Compare
Choose a tag to compare

🌟 Features

This changes the output directory structure (the "partitioning scheme") when writing Parquet from

y/m/d/STREAM_filename.parquet

to

STREAM/y/m/d/filename.parquet.

This structure is preferable, since it makes it very easy for e.g. a Lambda function listening for new files to know if it should wake up or not. The downside is that this makes it a little less convenient to read data from different sources at the same time, but no more so, than from different CSVs, so in a sense this brings the Parquet writing back in line with the CSV/PNG/JSON pipeline.

🐛️ Bugs

📋 Documentation

No changes

📋 Documentation

No changes

🛠 System

No changes

👷 Chore / Maintenance

No changes

v1.1.2: Optional image data

30 Nov 14:27
06c89d7
Compare
Choose a tag to compare

🌟 Features

No changes

🐛️ Bugs

Parquet schema now allows ImageData to be empty. This is necessary because we want the meta data associated with an image, even if the image itself is broken.

📋 Documentation

No changes

🛠 System

No changes

👷 Chore / Maintenance

No changes

v1.1.1: Better error-handling when parsing JPEGs

30 Nov 11:17
336e345
Compare
Choose a tag to compare

🌟 Features

No changes

🐛️ Bugs

The default error handling from libjpeg is designed to exit the program on certain errors where we just want to skip that step and continue processing. This lead to .rac-files containing corrupt JPEG data breaking the entire processing (see #153). This release fixes that by implementing a custom error-handler.

📋 Documentation

No changes

🛠 System

No changes

👷 Chore / Maintenance

No changes

v0.2.8: Better error-handling when parsing JPEGs

30 Nov 15:23
b7c1192
Compare
Choose a tag to compare

🌟 Features

No changes

🐛️ Bugs

The default error handling from libjpeg is designed to exit the program on certain errors where we just want to skip that step and continue processing. This lead to .rac-files containing corrupt JPEG data breaking the entire processing (see #153). This release fixes that by implementing a custom error-handler.

📋 Documentation

No changes

🛠 System

No changes

👷 Chore / Maintenance

No changes

v1.1.0 Day one patch: New file name and schema conventions

25 Nov 09:09
Compare
Choose a tag to compare

🌟 Features

This release changes so that parquet files are written to different files based on both packet type and original file, using different parquet schemas depending on packet. This is preferable, since otherwise Go will helpfully add default values (e.g 0) to columns that belong to another packet type, making the origin of rows hard to disambiguate, and making it hard to filter.

The new file naming scheme also makes it easy to filter what files to read in PyArrow, which should be useful to save on resources.

🐛️ Bugs

Updated time convention from micro seconds to nano seconds for EXPDate output.

📋 Documentation

  • rac -help has been updated with information about he updated outputs.

🛠 System

No changes

👷 Chore / Maintenance

No changes

v1.0.0 Major release: New output format

24 Nov 13:18
Compare
Choose a tag to compare

🌟 Features

This release introduces a new output format, Parquet 0. This is a compact binary format, similar to CSV, but made for high-throughput data processing. It has excellent support in Python through e.g. PyArrow 1. To write to parquet use the -parquet flag.

Also note that the option to write to AWS directly has been removed in this release; see System below.

Details on the outputs

The parquet files follow the same naming conventions used in the CSVs, but the header row is stored as meta-data instead. Parquet files support variable length rows, so instead of one file per packet type, one file per input file is produced.

In addition, the parquet files are written using a partitioning scheme so that data for each day is written to a file in a directory for that day. This means that files with the same name may occur in directories for subsequent days, if the original RAC-file covers two days. Partitioning is performed based on the CUC time of the source packet.

When writing to parquet the PNG-files are stored in the parquet files themselves, rather than as separate files. This introduces two new columns:

  • ImageName: the name of the PNG-image, if it had been written to disk,
  • ImageData: the parsed PNG data.

Capability to write to CSV/PNG/JSON is retained as default and writes the same files as before, but some headers etc. have been updated in the resulting files, so any scripts parsing those may have to be updated.

For further details, see rac -help!

🐛️ Bugs

No changes

📋 Documentation

  • rac -help has been updated with information about he updated and new output formats.

🛠 System

  • AWS is no-longer supported directly from the RAC binary, see #148 for rationale. AWS sync will henceforth be handled by the calling outer layer, such as a Lambda function.

👷 Chore / Maintenance

  • Minor fixes of typographical errors and the like.

v0.2.7 Continue interrupted multi packet

09 Nov 12:56
dfe2f71
Compare
Choose a tag to compare

🌟 Features

Added new command line argument -dregs <path>. This lets the user specify a path where to read and write temporary "dregs" files [0] used for saving and continuing multi-packet data between batch runs.

🐛️ Bugs

No changes

📋 Documentation

No changes

🛠 System

No changes

👷 Chore / Maintenance

No changes


[0]: Data Remaining after Extracting Group of Source packets