Designed by Agile Lab, witboost is a versatile platform that addresses a wide range of sophisticated data engineering challenges. It enables businesses to discover, enhance, and productize their data, fostering the creation of automated data platforms that adhere to the highest standards of data governance. Want to know more about witboost? Check it out here or contact us!.
This repository is part of our Starter Kit meant to showcase witboost's integration capabilities and provide a "batteries-included" product.
This project implements a Witboost Data Catalog Plugin for Collibra using Java & SpringBoot.
A Data Catalog Plugin is an extension point for Witboost that allows publishing entities on an external, pluggable Data Catalog. It is invoked at the end of the provisioning flow and receives the whole information about the entity descriptor, provisioning info, etc.
You can learn more about how Data Catalog plugins fit in the broader picture here.
This microservice is written in Java 17, using SpringBoot for the HTTP layer. Project is built with Apache Maven and supports packaging and Docker image, ideal for Kubernetes deployments (which is the preferred option).
Hooks are programs you can place in a hooks directory to trigger actions at certain points in git’s execution. Hooks that don’t have the executable bit set are ignored.
The hooks are all stored in the hooks subdirectory of the Git directory. In most projects, that’s .git/hooks
.
Out of the many available hooks supported by Git, we use pre-commit
hook in order to check the code changes before each commit. If the hook returns a non-zero exit status, the commit is aborted.
In order to use pre-commit
hook, you can use pre-commit framework to set up and manage multi-language pre-commit hooks.
To set up pre-commit hooks, follow the below steps:
-
Install pre-commit framework either using pip (or) using homebrew (if your Operating System is macOS):
- Using pip:
pip install pre-commit
- Using homebrew:
brew install pre-commit
- Using pip:
-
Once pre-commit is installed, you can execute the following:
pre-commit --version
If you see something like pre-commit 3.3.3
, your installation is ready to use!
- To use pre-commit, create a file named
.pre-commit-config.yaml
inside the project directory. This file tells pre-commit which hooks needed to be installed based on your inputs. Below is an example configuration:
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: trailing-whitespace
The above configuration says to download the pre-commit-hooks
project and run its trailing-whitespace hook on the project.
- Run the below command to install pre-commit into your git hooks. pre-commit will then run on every commit.
pre-commit install
Requirements:
- Java 17
- Apache Maven 3.9+
Build:
The project uses the openapi-generator
Maven plugin to generate the API endpoints from the interface specification located in src/main/resources/interface-specification.yml
. For more information on the documentation, check API docs.
mvn compile
Type check: is handled by Checkstyle:
mvn checkstyle:check
Bug checks: are handled by SpotBugs:
mvn spotbugs:check
Tests: are handled by JUnit:
mvn test
Artifacts & Docker image: the project leverages Maven for packaging. Build artifacts (normal and fat jar) with:
mvn package spring-boot:repackage
The Docker image can be built with:
docker build .
More details can be found here.
Note: when running in the CI/CD pipeline the version for the project is automatically computed using information gathered from Git, using branch name and tags. Unless you are on a release branch 1.2.x
or a tag v1.2.3
it will end up being 0.0.0
. You can follow this branch/tag convention or update the version computation to match your preferred strategy. When running locally if you do not care about the version (ie, nothing gets published or similar) you can manually set the environment variable PROVISIONER_VERSION
to avoid warnings and oddly-named artifacts; as an example you can set it to the build time like this:
export PROVISIONER_VERSION=$(date +%Y%m%d-%H%M%S);
CI/CD: the pipeline is based on GitLab CI as that's what we use internally. It's configured by the .gitlab-ci.yaml
file in the root of the repository. You can use that as a starting point for your customizations.
Configuration is handled via Spring Boot application.yaml
file. It allows setting custom domain and asset types for the provisioned descriptors, as well as the set of relation types and attributes related to each asset. Check Configuration for more information.
To run the server locally, use:
mvn -pl collibra-data-catalog-plugin-server spring-boot:run
By default, the server binds to port 8888
on localhost. After it's up and running you can make provisioning requests to this address. You can access the running application here.
SwaggerUI is configured and hosted on the path /docs
. You can access it here
This microservice is meant to be deployed to a Kubernetes cluster with the included Helm chart and the scripts that can be found in the helm
subdirectory. You can find more details here.
This project is available under the Apache License, Version 2.0; see LICENSE for full details.
Witboost is a cutting-edge Data Experience platform, that streamlines complex data projects across various platforms, enabling seamless data production and consumption. This unified approach empowers you to fully utilize your data without platform-specific hurdles, fostering smoother collaboration across teams.
It seamlessly blends business-relevant information, data governance processes, and IT delivery, ensuring technically sound data projects aligned with strategic objectives. Witboost facilitates data-driven decision-making while maintaining data security, ethics, and regulatory compliance.
Moreover, Witboost maximizes data potential through automation, freeing resources for strategic initiatives. Apply your data for growth, innovation and competitive advantage.
Contact us or follow us on: