Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
…nto qualip-ocr_bitmap-last_font_tag
  • Loading branch information
jstrot committed Dec 31, 2024
2 parents 69a115e + b08c5fa commit 81550c3
Show file tree
Hide file tree
Showing 82 changed files with 15,638 additions and 1,248 deletions.
16 changes: 0 additions & 16 deletions .github/workflows/build_mac.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,22 +74,6 @@ jobs:
working-directory: build
- name: Display version information
run: ./build/ccextractor --version
cmake_ocr_hardsubx:
runs-on: macos-latest
steps:
- uses: actions/checkout@v4
- name: Install dependencies
run: brew install pkg-config autoconf automake libtool tesseract leptonica gpac ffmpeg
- name: cmake
run: |
mkdir build && cd build
cmake -DWITH_OCR=ON -DWITH_HARDSUBX=ON ../src
- name: build
run: |
make -j$(nproc)
working-directory: build
- name: Display version information
run: ./build/ccextractor --version
build_rust:
runs-on: macos-latest
steps:
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/build_windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Check out repository
uses: actions/checkout@v4
- name: Setup MSBuild.exe
uses: microsoft/setup-msbuild@v1.3.1
uses: microsoft/setup-msbuild@v2.0.0
with:
msbuild-architecture: x64
- name: Install gpac
Expand All @@ -38,7 +38,7 @@ jobs:
run: mkdir C:\vcpkg\.cache
- name: Cache vcpkg
id: cache
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: |
C:\vcpkg\.cache
Expand Down Expand Up @@ -80,7 +80,7 @@ jobs:
- name: Check out repository
uses: actions/checkout@v4
- name: Setup MSBuild.exe
uses: microsoft/setup-msbuild@v1.3.1
uses: microsoft/setup-msbuild@v2.0.0
with:
msbuild-architecture: x64
- name: Install gpac
Expand All @@ -89,7 +89,7 @@ jobs:
run: mkdir C:\vcpkg\.cache
- name: Cache vcpkg
id: cache
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: |
C:\vcpkg\.cache
Expand Down
15 changes: 9 additions & 6 deletions .github/workflows/format.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,19 +26,22 @@ jobs:
git diff-index --quiet HEAD -- || (git diff && exit 1)
format_rust:
runs-on: ubuntu-latest
strategy:
matrix:
workdir: ['./src/rust', './src/rust/lib_ccxr']
defaults:
run:
working-directory: ./src/rust
working-directory: ${{ matrix.workdir }}
steps:
- uses: actions/checkout@v4
- name: cache
uses: actions/cache@v4
with:
path: |
src/rust/.cargo/registry
src/rust/.cargo/git
src/rust/target
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
${{ matrix.workdir }}/.cargo/registry
${{ matrix.workdir }}/.cargo/git
${{ matrix.workdir }}/target
key: ${{ runner.os }}-cargo-${{ hashFiles('${{ matrix.workdir }}/Cargo.lock') }}
restore-keys: ${{ runner.os }}-cargo-
- uses: actions-rs/toolchain@v1
with:
Expand All @@ -51,4 +54,4 @@ jobs:
run: cargo fmt --all -- --check
- name: clippy
run: |
cargo clippy --all-features -- -D warnings
cargo clippy -- -D warnings
41 changes: 41 additions & 0 deletions .github/workflows/test_rust.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Unit Test Rust
on:
push:
paths:
- ".github/workflows/test.yml"
- "src/rust/**"
tags-ignore:
- "*.*"
pull_request:
types: [opened, synchronize, reopened]
paths:
- ".github/workflows/test.yml"
- "src/rust/**"
jobs:
test_rust:
runs-on: ubuntu-latest
defaults:
run:
working-directory: ./src/rust
steps:
- uses: actions/checkout@v4
- name: cache
uses: actions/cache@v4
with:
path: |
src/rust/.cargo/registry
src/rust/.cargo/git
src/rust/target
src/rust/lib_ccxr/target
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
restore-keys: ${{ runner.os }}-cargo-
- uses: actions-rs/toolchain@v1
with:
toolchain: stable
override: true
- name: Test main module
run: cargo test
working-directory: ./src/rust
- name: Test lib_ccxr module
run: cargo test
working-directory: ./src/rust/lib_ccxr
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@ CVS
mac/ccextractor
linux/ccextractor
linux/depend
windows/x86_64-pc-windows-msvc/**
windows/Debug/**
windows/Debug-OCR/**
windows/release-with-debug/**
windows/Release/**
windows/Release-Full/**
windows/Release-OCR/**
Expand Down Expand Up @@ -154,3 +156,4 @@ windows/ccx_rust.lib
windows/*/debug/*
windows/*/CACHEDIR.TAG
windows/.rustc_info.json
linux/configure~
61 changes: 61 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# CCExtractor Docker image

This dockerfile prepares a minimalist Docker image with CCExtractor. It compiles CCExtractor from sources following instructions from the [Compilation Guide](https://github.com/CCExtractor/ccextractor/blob/master/docs/COMPILATION.MD).

You can install the latest build of this image by running `docker pull CCExtractor/ccextractor`

## Build

You can build the Docker image directly from the Dockerfile provided in [docker](https://github.com/CCExtractor/ccextractor/tree/master/docker) directory of CCExtractor source

```bash
$ git clone https://github.com/CCExtractor/ccextractor.git && cd ccextractor
$ cd docker/
$ docker build -t ccextractor .
```

## Usage

The CCExtractor Docker image can be used in several ways, depending on your needs.

```bash
# General usage
$ docker run ccextractor:latest <features>
```

1. Process a local file & use `-o` flag

To process a local video file, mount a directory containing the input file inside the container:

```bash
# Use `-o` to specifying output file
$ docker run --rm -v $(pwd):$(pwd) -w $(pwd) ccextractor:latest input.mp4 -o output.srt

# Alternatively use `--stdout` feature
$ docker run --rm -v $(pwd):$(pwd) -w $(pwd) ccextractor:latest input.mp4 --stdout > output.srt
```

Run this command from where your input video file is present, and change `input.mp4` & `output.srt` with the actual name of files.

2. Enter an interactive environment

If you need to run CCExtractor with additional options or perform other tasks within the container, you can enter an interactive environment:
bash

```bash
$ docker run --rm -it --entrypoint='sh' ccextractor:latest
```

This will start a Bash shell inside the container, allowing you to run CCExtractor commands manually or perform other operations.

### Example

I run help command in image built from `dockerfile`

```bash
$ docker build -t ccextractor .
$ docker run --rm ccextractor:latest --help
```

This will show the `--help` message of CCExtractor tool
From there you can see all the features and flags which can be used.
48 changes: 48 additions & 0 deletions docker/dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
FROM alpine:latest as base

FROM base as builder

RUN apk add --no-cache --update git curl gcc cmake glew glfw \
tesseract-ocr-dev leptonica-dev clang-dev llvm-dev make pkgconfig \
zlib-dev libpng-dev libjpeg-turbo-dev openssl-dev freetype-dev libxml2-dev

RUN cd && git clone https://github.com/gpac/gpac
WORKDIR root/gpac/
RUN ./configure && make && make install-lib && cd && rm -rf /root/gpac

WORKDIR /root
RUN git clone https://github.com/CCExtractor/ccextractor.git
RUN apk add bash cargo
RUN export LIB_CLANG_PATH=$(find / -name 'libclang*.so*' 2>/dev/null | grep -v 'No such file' | head -n 1 | xargs dirname)
RUN cd /root/ccextractor/linux && ./pre-build.sh && ./build

RUN cp /root/ccextractor/linux/ccextractor /ccextractor && rm -rf ~/ccextractor

FROM base as final

COPY --from=builder /lib/ld-musl-x86_64.so.1 /lib/
COPY --from=builder /usr/lib/libtesseract.so.5 /usr/lib/
COPY --from=builder /usr/lib/libleptonica.so.6 /usr/lib/
COPY --from=builder /usr/local/lib/libgpac.so.12 /usr/local/lib/
COPY --from=builder /usr/lib/libstdc++.so.6 /usr/lib/
COPY --from=builder /usr/lib/libgcc_s.so.1 /usr/lib/
COPY --from=builder /usr/lib/libgomp.so.1 /usr/lib/
COPY --from=builder /usr/lib/libpng16.so.16 /usr/lib/
COPY --from=builder /usr/lib/libjpeg.so.8 /usr/lib/
COPY --from=builder /usr/lib/libgif.so.7 /usr/lib/
COPY --from=builder /usr/lib/libtiff.so.6 /usr/lib/
COPY --from=builder /usr/lib/libwebp.so.7 /usr/lib/
COPY --from=builder /usr/lib/libwebpmux.so.3 /usr/lib/
COPY --from=builder /lib/libz.so.1 /lib/
COPY --from=builder /lib/libssl.so.3 /lib/
COPY --from=builder /lib/libcrypto.so.3 /lib/
COPY --from=builder /usr/lib/liblzma.so.5 /usr/lib/
COPY --from=builder /usr/lib/libzstd.so.1 /usr/lib/
COPY --from=builder /usr/lib/libsharpyuv.so.0 /usr/lib/

COPY --from=builder /ccextractor /

ENTRYPOINT [ "/ccextractor" ]

CMD [ "/ccextractor" ]

15 changes: 13 additions & 2 deletions docs/CHANGES.TXT
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
0.95 (to be released)
-----------------
1.0 (to be released)
-----------------
- New: Create unit test for rust code (#1615)
- Breaking: Major argument flags revamp for CCExtractor (#1564 & #1619)
- New: Create a Docker image to simplify the CCExtractor usage without any environmental hustle (#1611)
- New: Add time units module in lib_ccxr (#1623)
- New: Add bits and levenshtein module in lib_ccxr (#1627)
- New: Add constants module in lib_ccxr (#1624)
- New: Add log module in lib_ccxr (#1622)
- New: Create `lib_ccxr` and `libccxr_exports` (#1621)
- Fix: Unexpected behavior of get_write_interval (#1609)
- Update: Bump rsmpeg to latest version for ffmpeg bindings (#1600)
Expand All @@ -26,6 +33,10 @@
- Cleanup: Remove the (unmaintained) Nuklear GUI code
- Cleanup: Reduce the amount of Windows build options in the project file
- Fix: infinite loop in MP4 file type detector.
- Improvement: Use Corrosion to build Rust code
- Improvement: Ignore MXF Caption Essence Container version byte to enhance SRT subtitle extraction compatibility
- New: Add tesseract page segmentation modes control with `--psm` flag
- Fix: Resolve compile-time error about implicit declarations (#1646)
- Fix: fatal out of memory error extracting from a VOB PS

0.94 (2021-12-14)
Expand Down
3 changes: 3 additions & 0 deletions docs/COMPILATION.MD
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ Clone the latest repository from Github
git clone https://github.com/CCExtractor/ccextractor.git
```

## Docker
You can now use docker image to build latest source of CCExtractor without any environmental hustle. Follow these [instructions](https://github.com/CCExtractor/ccextractor/tree/master/docker/README.md) for building docker image & usage of it.

## Linux

1. Make sure all the dependencies are met.
Expand Down
2 changes: 1 addition & 1 deletion docs/FFMPEG.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Note:If you installed ffmpeg on non-standard location, please change/update your

### On Windows:
#### Set preprocessor flag `ENABLE_FFMPEG=1`
1. In visual studio 2013 right click <Project> and select property.
1. In visual studio 2022 right click <Project> and select property.
2. In the left panel, select Configuration Properties, C/C++, Preprocessor.
3. In the right panel, in the right-hand column of the Preprocessor Definitions property, open the drop-down menu and choose Edit.
4. In the Preprocessor Definitions dialog box, add `ENABLE_FFMPEG=1`. Choose OK to save your changes.
4 changes: 2 additions & 2 deletions docs/OCR.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,15 +93,15 @@ Download prebuild library of leptonica and tesseract from following link
https://drive.google.com/file/d/0B2ou7ZfB-2nZOTRtc3hJMHBtUFk/view?usp=sharing

put the path of libs/include of leptonica and tesseract in library paths.
1. In visual studio 2013 right click <Project> and select property.
1. In visual studio 2022 right click <Project> and select property.
2. Select Configuration properties in left panel(column) of property.
3. Select VC++ Directory.
4. In the right pane, in the right-hand column of the VC++ Directory property, open the drop-down menu and choose Edit.
5. Add path of Directory where you have kept uncompressed library of leptonica and tesseract.


Set preprocessor flag ENABLE_OCR=1
1. In visual studio 2013 right click <Project> and select property.
1. In visual studio 2022 right click <Project> and select property.
2. In the left panel, select Configuration Properties, C/C++, Preprocessor.
3. In the right panel, in the right-hand column of the Preprocessor Definitions property, open the drop-down menu and choose Edit.
4. In the Preprocessor Definitions dialog box, add ENABLE_OCR=1. Choose OK to save your changes.
Expand Down
71 changes: 71 additions & 0 deletions docs/Rust_migration_guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# C to Rust Migration Guide

## Porting C Functions to Rust

This guide outlines the process of migrating C functions to Rust while maintaining compatibility with existing C code.

### Step 1: Identify the C Function

First, identify the C function you want to port. For example, let's consider a function named `net_send_cc()` in a file called `networking.c`:

```c
void net_send_cc() {
// Some C code
}
```

### Step 2: Create a Pure Rust Equivalent

Write an equivalent function in pure Rust within the `lib_ccxr` module:

```rust
fn net_send_cc() {
// Rust equivalent code to `net_send_cc` function in `networking.c`
}
```

### Step 3: Create a C-Compatible Rust Function

In the `libccxr_exports` module, create a new function that will be callable from C:

```rust
#[no_mangle]
pub extern "C" fn ccxr_net_send_cc() {
net_send_cc() // Call the pure Rust function
}
```

### Step 4: Declare the Rust Function in C

In the original C file (`networking.c`), declare the Rust function as an external function:

```rust
extern void ccxr_net_send_cc();
```

### Step 5: Modify the Original C Function

Update the original C function to use the Rust implementation when available:

```c
void net_send_cc() {
#ifndef DISABLE_RUST
return ccxr_net_send_cc(); // Use the Rust implementation
#else
// Original C code
#endif
}
```

## Rust module system

- `lib_ccxr` crate -> **The Idiomatic Rust layer**

- Path: `src/rust/lib_ccxr`
- This layer will contain the migrated idiomatic Rust. It will have complete documentation and tests.

- `libccxr_exports` module -> **The C-like Rust layer**

- Path: `src/rust/src/libccxr_exports`
- This layer will have function names the same as defined in C but with the prefix `ccxr_`. These are the functions defined in the `lib_ccx` crate under appropriate modules. And these functions will be provided to the C library.
- Ex: `extern "C" fn ccxr_<function_name>(<args>) {}`
Loading

0 comments on commit 81550c3

Please sign in to comment.