Skip to content

Commit

Permalink
Release 0.0.2
Browse files Browse the repository at this point in the history
  • Loading branch information
jmpaz committed Aug 19, 2024
1 parent 19efff0 commit 4a7a1db
Show file tree
Hide file tree
Showing 3 changed files with 49 additions and 3 deletions.
39 changes: 39 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# catalog

`catalog` is a Python library and CLI for managing and processing media.

The project was initially conceived to facilitate batch transcription of media files (voice notes, specifically) with Whisper.

Its features were designed with this use case in mind, but the working plan is to generalize the library to be useful for a broader range of media types (e.g., images, webpages, API connections).



## Installation

Install with pip (or pipx for a global installation):
```bash
pip install git+https://github.com/jmpaz/catalog.git
```


## Usage

### CLI

The following actions are available via the CLI:
- importing, managing, transcribing, and grouping/tagging media objects
- processing resultant transcriptions ("entries") externally, i.e., with LLMs (not fully implemented)
- searching (keyword, fuzzy, vector) across all entries for a given query
- inspecting media object metadata and entries
- writing Markdown files with the highest-quality textual representation available (LLM-processed transcript > lightly-formatted raw transcript)

For a full list of commands and options, use `catalog --help`.


## Library

`catalog` currently stores:
- metadata and transcriptions in a single JSON file located at `$XDG_CONFIG_HOME/catalog/library.json`.
- copies of imported media files in `~/.local/share/catalog/datastore`.
- embeddings for entry text in `~/.local/share/catalog/embeddings.json`.

2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
git+https://github.com/m-bain/whisperX.git
whisperx==3.1.5
contextualize==0.0.3
yt-dlp==2023.11.16
click>=8.1.7
Expand Down
11 changes: 9 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,20 @@ def get_requirements():
return [req for req in required if re.match(r"^(?!git\+)[\w-]+", req)]


with open("README.md", "r", encoding="utf-8") as fh:
long_description = fh.read()


setup(
name="catalog",
version="0.0.1",
version="0.0.2",
packages=find_packages(),
install_requires=get_requirements(),
entry_points={"console_scripts": ["catalog = catalog.cli:cli"]},
author="jmpaz",
url="https://github.com/jmpaz/transcribe",
description="Library and CLI for managing and processing media",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/jmpaz/catalog",
python_requires=">=3.6",
)

0 comments on commit 4a7a1db

Please sign in to comment.