This project is an end-to-end Deep learning Classifier project which is the part of a video instructed by Krish naik team https://youtu.be/p1bfK8ZJgkE
Create a new github repo of name of your choice like I have chosen "Chicken-Disease-Classification-project" include the readme and .gitignore files and choose python
as language in gitignore
Start project locally by cloning this repository on your local computer
git clone <link copied from http in code section>
In side you we will be writing script by which we can automate the project of creating project structure( required folders and file) Automatically you can create all this new on your own is you want but for now you can check out template.py file
# This file is use to create the folder structure automatically
import os
import sys
from pathlib import Path
import logging
#* Create a logger
logging.basicConfig(level=logging.INFO, format='[%(asctime)s] - %(message)s:')
project_name = "cnnClassifier"
list_of_folders = [
".github/workflows/.gitkeep", # .gitkeep is to keep the folder in github even if it is empty when w'll be using this folder we will remove this file
f"src/{project_name}/__init__.py", # this is going to be the local package
f"src/{project_name}/components/__init__.py", # this is going to be the local package folder
f"src/{project_name}/utils/__init__.py", # All the utility functions will be here
f"src/{project_name}/config/__init__.py", # All the configuration files will be here
f"src/{project_name}/config/configuration.py",
f"src/{project_name}/pipeline/__init__.py", # All the pipeline files will be here
f"src/{project_name}/entity/__init__.py", # All the entity files will be here
f"src/{project_name}/constants/__init__.py", # All the constants files will be here
f"config/config.yaml", # github action file
"dvc.yaml", # integrating MLops to DVC (Data Version Control)
"params.yaml",
"requirements.txt", # all the dependencies will be here
"setup.py", # this is the file which will be used to install the package
"research/trails.ipnyb", # this is the file which will be used to do some research and other stuff
"templates/index.html"
]
for filepath in list_of_folders:
filepath = Path(filepath) # converting the string to path object
filedir, filename = os.path.split(filepath) # splitting the path into directory and filename
if filedir != "":
os.makedirs(filedir, exist_ok=True) # creating the directory if it does not exist
logging.info(f"Created the directory {filedir} for the file: {filename}")
if (not os.path.exists(filepath)) or (os.path.getsize(filepath) == 0):
with open(filepath, "w") as f:
logging.info(f"Created the file: {filepath}")
else:
logging.info(f"File already exists: {filepath}")
this setup.py file will help us to host this project as package later on site like pypi or you can just use it locally to organize all of your work
Inside setup.py I have given all the neccessary code by which it can hold all the information it needs for any one to understand the code properly (well it is just display this readme.md file if this package is hosted)
import setuptools
with open("README.md", "r") as fh:
long_description = fh.read()
__version__ = "0.0.0"
REPO_NAME = "Chicken-Disease-Classification-project"
AUTHOR_USER_NAME = "Rahul-lalwani-learner"
SRC_REPO = "cnnClassifier"
AUTHOR_EMAIL = "[email protected]"
setuptools.setup(
name = SRC_REPO,
version = __version__,
author = AUTHOR_USER_NAME,
author_email = AUTHOR_EMAIL,
description="A small Pythn package for CNN Classifier",
long_description=long_description,
long_description_content_type="text/markdown",
url=f"https://github.com/{AUTHOR_USER_NAME}/{REPO_NAME}",
project_urls={
'Bug Tracker': f"https://github.com/{AUTHOR_USER_NAME}/{REPO_NAME}/issues",
},
package_dir = {"": "src"},
packages = setuptools.find_packages(where="src"),
)
All the neccessary modules name are written here for combine installation of the modules.
you will see why we have written name of these modules in this project as you go along with it
tensorflow
pandas
dvc
notebook
numpy
matplotlib
seaborn
python-box==6.0.2
pyYAML
tqdm
ensure==1.0.2
joblib
types-pyYAML
scipy
Flask
Flask-Cors
ipykernel
-e .
-e . it also present here which is not a module it will run setup.py file automatically to build package of you run requirements.txt file and setup.py will extract all the folder which has __init__.py
Here i will be using python version 3.8 for project, you can create a new virtual environment by writting (chicken is name of Environment)
conda create -n chicken python=3.8 -y
Make sure you have miniconda or anaconda installed on your computer
After Creating this environment simply activate this
conda activate chicken
Now, you can simply install all the requirements
pip install -r requirements.txt
automatically install all the requirements fot his project
πNote : Don't Forget to commit you source code time to time for proper management of code
Now your step is complete go ahead to create logging to create logs for each process you perform
Here i have not created Expection.py module for expection handling since i'll be using box-execption module using this you can also handle exceptions But if you you can write
Here we have created Logger inside __init__.py
of src.cnnClassifier so that it will be easily acessible to file by using "from cnnClassifier import logger"
In logging i have set stream out since i also want to print output of logging to the terminal (console)
You can checkout this src.cnnClassifier.init.py file
import os
import sys
import logging
logging_str = "[%(asctime)s: %(levelname)s: %(module)s] %(message)s"
log_dir = "logs"
log_filepath = os.path.join(log_dir, "running_logs.log")
os.makedirs(log_dir, exist_ok=True)
logging.basicConfig(
level=logging.INFO,
format=logging_str,
handlers=[
logging.FileHandler(log_filepath),
logging.StreamHandler(sys.stdout) # print to console
]
)
logger = logging.getLogger("cnnClassifierLogger")
We can check that logging is working properly in main.py file where i have print a simple Custom logging to console
This log will create a running_logs.log file in logs folder where we can check all of our logs and if something when wrong we can easily go through it.
I have created utility functions under common.py in utils folder This are the functions that we are going to use frequently in this project
Several methods inside utils are:
- read_yaml -> helps us read yaml file while CI/CD pipeline and github actions
- Create_directory -> simply Create the directory at given path
- save_json -> There results of the prediction w'll be saving in Json formate
- Load_json -> To access those predictions and JSON file
- Save_bin -> save binary files
- load_bin -> Load binary files
- get_size -> to get the size of particular file
- decodeImage -> for decoding stringImage format to int
- encodeImageIntoBase64 - encoding the binary image to base64
@ensure_annotations
def read_yaml(path_to_yaml: Path) -> ConfigBox:
"""
Reads a yaml file and returns a ConfigBox object
Args:
path_to_yaml (Path): Path to yaml file
Returns:
ConfigBox: ConfigBox object
Raises:
BoxValueError: If path_to_yaml does not exist
e: empty file
"""
try:
with open(path_to_yaml, "r") as yaml_file:
yaml_dict = yaml.safe_load(yaml_file)
logger.info(f"yaml file: {path_to_yaml} loaded successfully")
return ConfigBox(yaml_dict)
except BoxValueError:
logger.info("yaml file does not exist")
raise ValueError("yaml file does not exist")
except Exception as e:
logger.info(e)
raise e
@ensure_annotations
def create_directories(path_to_directories: list, verbose=True):
"""
Create list of directories
Args:
path_to_directories (list): list of paths to directories
verbose (bool, optional): Defaults to True.
"""
for path in path_to_directories:
os.makedirs(path, exist_ok=True)
if verbose:
logger.info(f"directory created at: {path}")
@ensure_annotations
def save_json(path: Path, data: dict):
"""
save json file
Args:
path (Path): path to json file
data (dict): dictionary to save
"""
with open(path, "w") as json_file:
json.dump(data, json_file, indent=4)
logger.info(f"json file saved at: {path}")
@ensure_annotations
def load_json(path:Path) -> ConfigBox:
"""
load json file
Args:
path (Path): path to json file
Returns:
ConfigBox: ConfigBox object
"""
with open(path, "r") as json_file:
data = json.load(json_file)
logger.info(f"json file loaded from: {path}")
return ConfigBox(data)
@ensure_annotations
def save_bin(data:Any, path: Path):
"""
save binary file
Args:
data (Any): data to save
path (Path): path to save
"""
joblib.dump(value = data,filename = path)
logger.info(f"binary file saved at: {path}")
@ensure_annotations
def load_bin(path: Path) -> Any:
"""
load binary file
Args:
path (Path): path to load
Returns:
Any: data
"""
data = joblib.load(filename = path)
logger.info(f"binary file loaded from: {path}")
return data
@ensure_annotations
def get_size(path:Path) -> str:
"""
get size of file in KB
Args:
path (Path): path to file
Returns:
str: size of file
"""
size_in_kb = round(os.path.getsize(path)/1024)
return size_in_kb
def decodeImage(imgstring, filename):
imgdata = base64.b64decode(imgstring)
with open(filename, 'wb') as f:
f.write(imgdata)
f.close()
def encodeImageIntoBase64(image_path):
with open(image_path, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
return encoded_string
Testing of this function and questions like why configbox is used instead of dictionary and what ensure_annotation decorator does are resolved in research/trail.ipynb File. Do check it out
- Update config.yaml
All filepaths related things of dataingestion and other pipelines are written here
- Update secrets.yaml [Optional]
If you have some ceredentials or some secret information you can write it in secrets.yaml
- Update params.yaml
During model Configuration i'll we updating this params
- Update the Entity
It is the return type of the function (if you don't have any inbuilt return type you can create your custom return type)
- Update the configuration manger in src config
This will help us properly connect yaml file and read content from them
- Update the components
Writing different components like data ingestion data prediction
- Update the pipeline
Creating prediction pipelines and training pipelines
- Update the main.py
After writing code modulary udpate main.py for CI/CD implementation
- Update the dvc.yaml
for pipeline tracking
Firstly we will perform this on 01_data_ingestion.ipynb in research folder so that we all configer that everything is running properly then we will upgrade it to modular programing fashion
Go through workflow to properly do it
for data ingestion we will store all the links and path of data and folders inside this data is stored inside this folder
you can look as the code
artifacts_root: artifacts
data_ingestion:
root_dir: artifacts/data_ingestion
source_URL: https://github.com/entbappy/Branching-tutorial/raw/master/Chicken-fecal-images.zip
local_data_file: artifacts/data_ingestion/data.zip
unzip_dir: artifacts/data_ingestion
This yaml file will be read as ConfigBox format mean somewhat looks like dictionary format for more convenience means we can use this as variables inside different files some what like global variables for paths
we can skip updating params.yaml and secrets.yaml for now since this we are working in testing phase
Here New entity we required is DataIngestionconfig will have attributes same as config.yaml file
from dataclasses import dataclass
from pathlib import Path
@dataclass(frozen=True)
class DataIngestionConfig:
root_dir : Path
source_URL : str
local_data_file : Path
unzip_dir: Path
This is going to extract all the data from yaml file and return us as a ConfigBox format which we can use in our next step for components.dataingestion before this we also have to update the constants.init.py inside src which will hold the path of Config.yaml and params.yaml
from pathlib import Path
CONFIG_FILE_PATH = Path("config/config.yaml")
PARAMS_FILE_PATH = Path("params.yaml")
Now we can import constants form cnnClassifier and use them
from cnnClassifier.constants import *
from cnnClassifier.utils.common import read_yaml, create_directories
class ConfigurationManager:
def __init__(
self,
config_filepath = CONFIG_FILE_PATH,
params_filepath = PARAMS_FILE_PATH
):
self.config = read_yaml(config_filepath)
self.params = read_yaml(params_filepath)
create_directories([self.config.artifacts_root])
def get_data_ingestion_config(self) -> DataIngestionConfig:
config = self.config.data_ingestion
create_directories([config.root_dir])
data_ingestion_config = DataIngestionConfig(
root_dir = config.root_dir,
source_URL = config.source_URL,
local_data_file=config.local_data_file,
unzip_dir=config.unzip_dir,
)
return data_ingestion_config
While modular programing we will be Creating a file inside Componetns for data ingestion Load all the data and extraction it to artifacts folder
import os
import urllib.request as request
import zipfile
from cnnClassifier.utils.common import get_size
from cnnClassifier import logger
class DataIngestion:
def __init__(self, config: DataIngestionConfig):
self.config = config
def download_file(self):
if not os.path.exists(self.config.local_data_file):
filename, headers = request.urlretrieve(
url = self.config.source_URL,
filename = self.config.local_data_file
)
logger.info(f"File downloaded at: {filename} with following headers: {headers}")
else:
logger.info(f"File already exists of size: {get_size(Path(self.config.local_data_file))}")
def extract_zip_file(self):
unzip_path = self.config.unzip_dir
os.makedirs(unzip_path, exist_ok=True)
with zipfile.ZipFile(self.config.local_data_file, 'r') as zip_ref:
zip_ref.extractall(unzip_path)
Here we have downloaded data from url using urllib.request.urlretrieve(url, filename)
and after that created a function that will extraction it to local storages as <local_data_file> name
Now you can run this code under a try except statement to check does everything is running properly
try:
config = ConfigurationManager()
data_ingestion_config = config.get_data_ingestion_config()
data_ingestion = DataIngestion(data_ingestion_config)
data_ingestion.download_file()
data_ingestion.extract_zip_file()
except Exception as e:
raise e
In theory this should download our data and extract it to local storage you can check it in your file explorer.
You just have to follow all the steps one by one that are mentioned above let me write them for you
- update config.yaml (that if already updated)
- update params.yaml and secrets.yaml (for now you can skip them)
- update entity Inside entity create a file named config_entity and paste the code where you have created the DataIngestionConfig entity
- Update configurationManger in src.config.configuration.py
paste the code of
ConfigurationManager
class - update the components Inside components create the data_ingestion.py
copy the dataIngestion class code and paste it here
Here is all your code converted to Modular approach
Create stage_01_data_ingestion.py
inside pipeline folder Here we will be importing DataIngestion from Components and Create class to run them
from cnnClassifier.config.configuration import ConfigurationManager
from cnnClassifier.components.data_ingestion import DataIngestion
from cnnClassifier import logger
STAGE_NAME = "Data Ingestion Stage"
class DataIngestionTrainingPipeline:
def __init__(self):
pass
def main(self):
config = ConfigurationManager()
data_ingestion_config = config.get_data_ingestion_config()
data_ingestion = DataIngestion(data_ingestion_config)
data_ingestion.download_file()
data_ingestion.extract_zip_file()
if __name__ == "__main__":
try:
logger.info(">>> Stage {} started <<<".format(STAGE_NAME))
obj = DataIngestionTrainingPipeline()
obj.main()
logger.info(">>> Stage {} completed <<<\n\n X===========X".format(STAGE_NAME))
except Exception as e:
logger.exception(e)
raise e
until we are not using DVC w'll be using main.py to run pipelines and components
Here is the simple Code to run DataIngestionTrainingPipeline
from cnnClassifier import logger
from cnnClassifier.pipeline.stage_01_data_ingestion import DataIngestionTrainingPipeline
STAGE_NAME = "Data Ingestion Stage"
try:
logger.info(">>> Stage {} started <<<".format(STAGE_NAME))
obj = DataIngestionTrainingPipeline()
obj.main()
logger.info(">>> Stage {} completed <<<\n\n X===========X".format(STAGE_NAME))
except Exception as e:
logger.exception(e)
raise e
β Data Ingestion Completed
Firstly we will perform our experiments in research/02_prepare_base_model.ipynb
Firstly for model prepration we have update the config.yaml file
it include all the neccessary file paths for base models and updated model to save
prepare_base_model:
root_dir: artifacts/prepare_base_model
base_model_path: artifacts/prepare_base_model/base_model.h5
updated_base_model_path: artifacts/prepare_base_model/base_model_udpated.h5
πNote: The base model here is VGG16 from tf.keras.application and updated model is the model with changed top and output layer
In params.yaml i will give all the parameters related to model this are going to work as global parameter that i can use in any file and update Globally
AUGMENTATION: True
IMAGE_SIZE: [224,224,3] # as per the VGG 16 model
BATCH_SIZE: 16
INCLUDE_TOP: False
EPOCHS: 1
CLASSES: 2
WEIGHTS: imagenet
LEARNING_RATE: 0.01
I have created a new entity inside config_entity.py for prepare base model stage Where it will be mix of params.yaml and config.yaml
@dataclass(frozen=True)
class PrepareBaseModelConfig:
root_dir : Path
base_model_path : Path
updated_base_model_path : Path
params_image_size : list
params_learning_rate : float
params_include_top : bool
params_weights : str
params_classes: int
Here in configuration manager i have to Create a methods that will help read the yaml files (grab data from them) and return to Components as ConfigBox
def get_prepare_base_model_config(self) -> PrepareBaseModelConfig:
config = self.config.prepare_base_model
create_directories([config.root_dir])
prepare_base_model_config = PrepareBaseModelConfig(
root_dir=Path(config.root_dir),
base_model_path=Path(config.base_model_path),
updated_base_model_path=Path(config.updated_base_model_path),
params_image_size=self.params.IMAGE_SIZE,
params_learning_rate=self.params.LEARNING_RATE,
params_include_top=self.params.INCLUDE_TOP,
params_weights=self.params.WEIGHTS,
params_classes=self.params.CLASSES,
)
return prepare_base_model_config
As you have seen earlier that constructor of this Configuration Manager read the data form yaml files and create root directories
Here we will create a new Class PrepareBaseModel which will have 3 methods
get_base_model
- This method is going to load VGG16 model from keras.application on imagenet weights and also saves it in H5 format_prepare_full_model
- This methods is going to help in Updating our model by adding extra flatten and Dense layer to the end of the VGG16 modelUpdate_base_model
- This function will finally update the basemodel with extra layer and save it in H5 format inside artifacts folder
mport os
import urllib.request as request
from zipfile import ZipFile
import tensorflow as tf
from pathlib import Path
from cnnClassifier.entity.config_entity import PrepareBaseModelConfig
from cnnClassifier import logger
class PrepareBaseModel:
def __init__(self, config: PrepareBaseModelConfig):
self.config = config
def get_base_model(self):
self.model = tf.keras.applications.vgg16.VGG16(
input_shape=self.config.params_image_size,
weights=self.config.params_weights,
include_top=self.config.params_include_top
)
logger.info('Base model loaded successfully')
self.save_model(path=self.config.base_model_path, model=self.model)
@staticmethod
def _prepare_full_model(model, classes, freeze_all, freeze_till, learning_rate):
if freeze_all:
for layer in model.layers:
model.trainable = False
elif (freeze_till is not None) and (freeze_till > 0):
for layer in model.layers[:-freeze_till]:
model.trainable = False
flatten_layer = tf.keras.layers.Flatten()(model.output)
prediction = tf.keras.layers.Dense(units=classes, activation='softmax')(flatten_layer)
full_model = tf.keras.models.Model(inputs=model.input, outputs=prediction)
full_model.compile(
optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate),
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = ['accuracy']
)
logger.info("Full model Compiled successfully")
full_model.summary()
return full_model
def update_base_model(self):
self.full_model = self._prepare_full_model(
model=self.model,
classes=self.config.params_classes,
freeze_all=True,
freeze_till=None,
learning_rate=self.config.params_learning_rate
)
self.save_model(path=self.config.updated_base_model_path, model=self.full_model)
logger.info("Updated base model saved successfully")
@staticmethod
def save_model(path: Path, model: tf.keras.Model):
model.save(path)
Similar to Data ingestion pipeline this will call call the methods from components and Configuration manger to Create new Class PrepareBaseModelTrainingPipeline
from cnnClassifier.config.configuration import ConfigurationManager
from cnnClassifier.components.prepare_base_model import PrepareBaseModel
from cnnClassifier import logger
STAGE_NAME = 'prepare base model'
class PrepareBaseModelTrainingPipeline:
def __init__(self):
pass
def main(self):
config = ConfigurationManager()
prepare_base_model_config = config.get_prepare_base_model_config()
prepare_base_model = PrepareBaseModel(config=prepare_base_model_config)
prepare_base_model.get_base_model()
prepare_base_model.update_base_model()
Now finally Append new lines of code to main.py to Check whether everything is working properly or not after this a new folder will be created in artifacts and both base model and updated model are also going to be present their.
STAGE_NAME = "Prepare Base Model Stage"
try:
logger.info("************************")
logger.info(f">>>>>>>>> Running stage: {STAGE_NAME} started <<<<<<<<<<")
obj = PrepareBaseModelTrainingPipeline()
obj.main()
logger.info(f">>>>>>> stage: {STAGE_NAME} completed <<<<<<<<<\n\nX================X")
except Exception as e:
logger.error(f"Error while running stage: {STAGE_NAME} - Error message: {e}")
raise e
β Preparing Base Model Completed
Procedure for this also going to be same Here we will create till components since Callbacks standalone can't perform in pipeline so they will be helping us while training the model.
Same workflow
Insert new paths and location for Checkpoint_log_dir and tensorboard_log_dir
prepare_callbacks:
root_dir: artifacts/prepare_callbacks
tensorboard_root_log_dir: artifacts/prepare_callbacks/tensorboard_log_dir
checkpoint_model_filepath: artifacts/prepare_callbacks/checkpoint_dir/model.h5
No need to update params since that will only be used will using model and training it
Create new entity which follow same format as Config.yaml > prepare_callbacks
@dataclass(frozen=True)
class PrepareCallbacksConfig:
root_dir: Path
tensorboard_root_log_dir: Path
checkpoint_model_filepath: Path
Now you again have to update the Configuration manger that will help us read the data from Config.yaml and return that as the ConfigBox format that we can again use it in Components
def get_prepare_callback_config(self) -> PrepareCallbacksConfig:
config = self.config.prepare_callbacks
model_ckpt_dir = os.path.dirname(config.checkpoint_model_filepath)
create_directories([
Path(model_ckpt_dir),
Path(config.tensorboard_root_log_dir)
])
prepare_callback_config = PrepareCallbacksConfig(
root_dir=Path(config.root_dir),
tensorboard_root_log_dir=Path(config.tensorboard_root_log_dir),
checkpoint_model_filepath=Path(config.checkpoint_model_filepath)
)
return prepare_callback_config
Here we have to create new Component as prepareCallback where we will Create checkpoints and tensorboard that will be returning as the list that we can use will fitting the model
class PrepareCallback:
def __init__(self, config: PrepareCallbacksConfig):
self.config = config
@property
def _create_tb_callbacks(self):
timestamp = time.strftime("%Y-%m-%d-%H-%M-%S")
tb_running_log_dir = os.path.join(
self.config.tensorboard_root_log_dir,
f"tb_logs_at_{timestamp}",
)
return tf.keras.callbacks.TensorBoard(log_dir=tb_running_log_dir)
@property
def _create_ckpt_callbacks(self):
return tf.keras.callbacks.ModelCheckpoint(
filepath=str(self.config.checkpoint_model_filepath),
save_best_only=True
)
def get_tb_ckpt_callbacks(self):
return [
self._create_tb_callbacks,
self._create_ckpt_callbacks
]
β Till here Prepare Model callbacks is Completed You can check all this in research/03_prepare_callbacks.ipynb To check whether how this all is going to create new directory in artifacts folder
Wow!! Excited we have reached to the stage of Model training Now you can see everything you have done earlier in working stage
Here also we will follow the same Workfloww mean first we have to Update the config.yaml and also we don't have to worry about params.yaml and Secret.yaml since they both are already updated
This time we don't have to do much in it just training path and updated model path
training:
root_dir: artifacts/training
trained_model_path: artifacts/training/model.h5
Here again w'll create new entity that will hold the content from both config.yaml and params.yaml according to requirements of training configuration manager
@dataclass(frozen=True)
class TrainingConfig:
root_dir: Path
trained_model_path: Path
updated_base_model_path: Path
training_data: Path
params_epochs: int
params_batch_size: int
params_is_augmentation: bool
params_image_size: list
Here i haved added a new components that will Extract information from config.yaml and params.yaml and return all this information in TrainingConfig format
def get_training_config(self) -> TrainingConfig:
training = self.config.training
prepare_base_model = self.config.prepare_base_model
params = self.params
training_data = os.path.join(self.config.data_ingestion.unzip_dir, "Chicken-fecal-images")
create_directories([Path(training.root_dir)])
training_config = TrainingConfig(
root_dir = Path(training.root_dir),
trained_model_path = Path(training.trained_model_path),
updated_base_model_path=Path(prepare_base_model.updated_base_model_path),
training_data = Path(training_data),
params_epochs=params.EPOCHS,
params_batch_size=params.BATCH_SIZE,
params_is_augmentation=params.AUGMENTATION,
params_image_size=params.IMAGE_SIZE,
)
return training_config
Create new trainig Components that will have the functions to load updated base model, training the model and saving the model that will be used in pipeline
train_valid_generator is an interesting function this will perform and DataAugmentation and also do Data rescale while simulaneously Loading the Data from training directories in Artifacts
class Training:
def __init__(self, config: TrainingConfig):
self.config = config
def get_base_model(self):
self.model = tf.keras.models.load_model(
self.config.updated_base_model_path
)
def train_valid_generator(self):
datagenerator_kawrgs = dict(
rescale = 1./255,
validation_split = 0.20
)
dataflow_kwargs = dict(
target_size = self.config.params_image_size[:-1],
batch_size = self.config.params_batch_size,
interpolation = "bilinear"
)
valid_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
**datagenerator_kawrgs
)
self.valid_generator = valid_datagen.flow_from_directory(
directory=self.config.training_data,
subset="validation",
shuffle=False,
**dataflow_kwargs
)
if self.config.params_is_augmentation:
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rotation_range=40,
horizontal_flip=True,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
**datagenerator_kawrgs
)
else:
train_datagen = valid_datagen
self.train_generator = train_datagen.flow_from_directory(
directory=self.config.training_data,
subset="training",
shuffle=True,
**dataflow_kwargs
)
@staticmethod
def save_model(path: Path, model: tf.keras.Model):
model.save(path)
def train(self, callback_list: list):
self.steps_per_epoch = self.train_generator.samples // self.train_generator.batch_size
self.validation_steps = self.valid_generator.samples // self.valid_generator.batch_size
self.history = self.model.fit(
self.train_generator,
epochs=self.config.params_epochs,
steps_per_epoch=self.steps_per_epoch,
validation_data=self.valid_generator,
validation_steps = self.validation_steps,
callbacks = callback_list
)
self.save_model(
path=self.config.trained_model_path,
model = self.model
)
Now i have created a new Pipeline that will properly Use training and Callbacks components and basically trains the model and saves it
class ModelTrainingPipeline:
def __init__(self):
pass
def main(self):
config = ConfigurationManager()
prepare_callbacks_config = config.get_prepare_callback_config()
prepare_callbacks = PrepareCallback(config=prepare_callbacks_config)
callback_list = prepare_callbacks.get_tb_ckpt_callbacks()
training_config = config.get_training_config()
training = Training(config=training_config)
training.get_base_model()
training.train_valid_generator()
training.train(callback_list = callback_list)
if __name__ == "__main__":
try:
logger.info(f"***********************")
logger.info(f">>>>>>>>>>>> {STAGE_NAME} started <<<<<<<<<<<")
obj = ModelTrainingPipeline()
obj.main()
logger.info(f">>>>>>>>>>>> Stage {STAGE_NAME} completed <<<<<<<<<<<")
except Exception as e:
logger.exception(f"Error in {STAGE_NAME} pipeline: {e}")
raise e
Now you see all this in action by updating main.py and running TrainingPipeline
STAGE_NAME = "Training Stage"
try:
logger.info(f"***********************")
logger.info(f">>>>>>>>>>>> {STAGE_NAME} started <<<<<<<<<<<")
obj = ModelTrainingPipeline()
obj.main()
logger.info(f">>>>>>>>>>>> {STAGE_NAME} completed <<<<<<<<<<<")
except Exception as e:
logger.exception(f"Error in {STAGE_NAME} pipeline: {e}")
raise e
πNote: Here are some thing you have to remember
- You can do check logs at any time if you find any trouble in execution
- You can also check all of this in research/04_training.ipynb to see everything thing under one shed
β Model Training Stage Completed Finally the model training is done now w'll go ahead with model evaluation part
Model Evaluation is also a very important part of any machine learning project Here we'll be able to see how our model is performing and does everything is working as expected or not
Here Also w'll follow same Workflow
IN model Evaluation we don't have to update any yaml file we can start with updating the entity directly
similar to above create new entity that will satisfy the requirement of evaluation classes
@dataclass(frozen=True)
class EvaluationConfig:
path_of_model: Path
training_data: Path
all_params: dict
params_image_size: list
params_batch_size: int
Updating the Configuration manager to load imformation from yaml files to entity
def get_validation_config(self) -> EvaluationConfig:
eval_config = EvaluationConfig(
path_of_model=Path("artifacts/training/model.h5"),
training_data=Path("artifacts/data_ingestion/Chicken-fecal-images"),
all_params=self.params,
params_image_size=self.params.IMAGE_SIZE,
params_batch_size=self.params.BATCH_SIZE
)
return eval_config
In there component w'll write methods to Evalute our model and save the results of that model as the JSON
file
πNote: This class also uses the ImageDataGenerator to rescale and load data from directories as above.
class Evaluation:
def __init__(self, config: EvaluationConfig):
self.config = config
def _valid_generator(self):
datagenerator_kwargs = dict(
rescale = 1./255,
validation_split = 0.30
)
dataflow_kwargs = dict(
target_size = self.config.params_image_size[:-1],
batch_size = self.config.params_batch_size,
interpolation = "bilinear"
)
valid_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
**datagenerator_kwargs
)
self.valid_generator = valid_datagen.flow_from_directory(
directory=self.config.training_data,
subset="validation",
shuffle=False,
**dataflow_kwargs
)
@staticmethod
def load_model(path: Path) -> tf.keras.Model:
return tf.keras.models.load_model(path)
def evaluation(self):
model = self.load_model(self.config.path_of_model)
self._valid_generator()
self.score = model.evaluate(self.valid_generator)
def save_score(self):
scores = {"loss": self.score[0], "accuracy": self.score[1]}
save_json(path = Path("scores.json"), data = scores)
Now we have create new pipeline to Run above components with data
STAGE_NAME = "Evaluation Stage"
class EvaluationPipeline:
def __init__(self):
pass
def main(self):
config = ConfigurationManager()
val_config = config.get_validation_config()
evaluation = Evaluation(val_config)
evaluation.evaluation()
evaluation.save_score()
if __name__ == "__main__":
try:
logger.info(f"******************")
logger.info(f">>>>> {STAGE_NAME} started <<<<<")
obj = EvaluationPipeline()
obj.main()
logger.info(f">>>>>>>>>> {STAGE_NAME} completed <<<<<<<<<<\n\nX=====================X")
except Exception as e:
logger.exception(f"Exception occured in {STAGE_NAME} : {e}")
raise e
Finally update the main.py to see the effect the result on validation set using try except block
STAGE_NAME = "Evaluation Stage"
try:
logger.info(f"******************")
logger.info(f">>>>> {STAGE_NAME} started <<<<<")
obj = EvaluationPipeline()
obj.main()
logger.info(f">>>>>>>>>> {STAGE_NAME} completed <<<<<<<<<<\n\nX=====================X")
except Exception as e:
logger.exception(f"Exception occured in {STAGE_NAME} : {e}")
raise e
β Booommm!! Model Evaluation is Completed Do Check out research/05_model_evaluation.ipynb
So you might we thinking what is DVC and what is the need of it.
DVC (Data Version Control) - DVC is a free, open-source tool for data management, ML pipeline automation and Experiment Management
This mean DVC can we used in many way but here we are going to use it as Pipeline management system and Pipeline Tracking tool
So what exactly Pipeline tracking means it is like you know if we have to run our code and test the code we have to update the main.py file and run that file but when we run main.py file the runs all of the pipelines like if model_preparation is done it will still do that and that will waste lots of our time, to save time we can use something that will check which pipeline it have to run and run exactly that one for better performance and Efficently
writing dvc code is every similar to code in yaml file "well what more you can expect from a file named as dvc.yaml" so let's start writing the DVC Code
if you want to learn more about DVC you can go to its website link
πOne important thing to keep in mind is DVC runs on top of git to always use git with DVC
Let's break it down in step:
- Update DVC.yaml Here i'll write all dvc commands (simple i'll write about All stages in order so that i can run all of them in order)
Here is code
stages:
data_ingestion:
cmd: python src/cnnClassifier/pipeline/stage_01_data_ingestion.py
deps:
- src/cnnClassifier/pipeline/stage_01_data_ingestion.py
- config/config.yaml
outs:
- artifacts/data_ingestion/Chicken-fecal-images
prepare_base_model:
cmd: python src/cnnClassifier/pipeline/stage_02_prepare_base_model.py
deps:
- src/cnnClassifier/pipeline/stage_02_prepare_base_model.py
- config/config.yaml
params:
- IMAGE_SIZE
- INCLUDE_TOP
- CLASSES
- WEIGHTS
- LEARNING_RATE
outs:
- artifacts/prepare_base_model
training:
cmd: python src/cnnClassifier/pipeline/stage_03_training.py
deps:
- src/cnnClassifier/pipeline/stage_03_training.py
- config/config.yaml
- src/cnnClassifier/components/prepare_callbacks.py
- artifacts/data_ingestion/Chicken-fecal-images
- artifacts/prepare_base_model
params:
- IMAGE_SIZE
- EPOCHS
- BATCH_SIZE
- AUGMENTATION
outs:
- artifacts/training/model.h5
evaluation:
cmd: python src/cnnClassifier/pipeline/stage_04_evaluation.py
deps:
- src/cnnClassifier/pipeline/stage_04_evaluation.py
- config/config.yaml
- artifacts/data_ingestion/Chicken-fecal-images
- artifacts/training/model.h5
params:
- IMAGE_SIZE
- BATCH_SIZE
metrics:
- scores.json:
cache: false
If you look at this code you'll easily be able to understand what is happing but for help here are some tips
- cmd - Command to be execute
- deps - Dependencies for command
- params - all the parameter used from params.yaml
- metrics - resulted metrics from command
- outs - Expected output of the command
After updating the file you have to open the command prompt in same environment and then have to Initialize the DVC
dvc init
This command will initialize the DVC for you project and also create some files in your directory .dvc
folder and .dvcignore
file
After this you can write
dvc repro
To execute all the statements in order as you mentioned The interesting thing is here if all the outputs are present already and there are no changes in files then this will skip all the pipeline to save time
Stage 'data_ingestion' didn't change, skipping
Stage 'prepare_base_model' didn't change, skipping
Stage 'training' didn't change, skipping
Stage 'evaluation' didn't change, skipping
Data and pipelines are up to date.
you can also see the dependencies of Pipelines through dvc dag
and it is a very interesting way to visualize the dependencies
+----------------+ +--------------------+
| data_ingestion | | prepare_base_model |
+----------------+***** +--------------------+
* ***** *
* ****** *
* *** *
** +----------+
** | training |
*** +----------+
*** ***
** **
** **
+------------+
| evaluation |
+------------+
β DVC pipeline tracking completed Now you can push your code
So, prediction pipeline is very important part of any project here we will produce prediction on Custom ImageData
You can also break it down in steps
Create new pipeline to predict on Custom data this pipeline will do most of the prediction part (model loading , model prediction)
import numpy as np
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing import image
import os
class PredictPipeline:
def __init__(self, filename):
self.filename = filename
def predict(self):
# load model
model = load_model(os.path.join("artifacts", "training", "model.h5"))
imagename = self.filename
test_image = image.load_img(imagename, target_size = (224, 224))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis = 0)
result = np.argmax(model.predict(test_image), axis=1)
print(result)
if result[0] == 1:
prediction = "Healthy"
return [prediction]
else:
prediction = "Coccidiosis"
return [prediction]
This is the last part of the project here we will create our final flask application on app.py
and connecting it with prediction pipeline and index.html
I am assuming you are familiar with Flask so i am most focusing on that part If you want you can checkout the code
from flask import Flask, render_template, request, redirect, jsonify
import os
from flask_cors import CORS, cross_origin
from cnnClassifier.utils.common import decodeImage
from cnnClassifier.pipeline.predict import PredictPipeline
os.putenv("LANG", "en_US.UTF-8")
os.putenv("LC_ALL", "en_US.UTF-8")
app = Flask(__name__)
CORS(app)
class ClientApp:
def __init__(self):
self.filename = "inputImage.jpg"
self.classifier = PredictPipeline(self.filename)
@app.route("/", methods = ["GET"])
@cross_origin()
def home():
return render_template("index.html")
@app.route("/train", methods = ["POST", "GET"])
@cross_origin()
def trainRoute():
os.system("dvc repro")
return "Training done Successfully"
@app.route("/predict", methods = ["POST", "GET"])
@cross_origin()
def predictRoute():
image = request.json["image"]
decodeImage(image, App.filename)
result = App.classifier.predict()
return jsonify(result)
if __name__ == "__main__":
App = ClientApp()
app.run(host="0.0.0.0", port=5000, debug=True)
Do checkout template code (This is general templated taken from bootstrap and with some custom code integration) index.html
The final app will run on local host 127.0.0.1:5000 port on you local pc
πWoow!! We have come a long way
β Project Completed