Skip to content

Commit

Permalink
Merge pull request #56 from srbhr/FDEV-004-Code-cleanup-readme-update…
Browse files Browse the repository at this point in the history
…s-and-others

Fdev 004 code cleanup readme updates and others
  • Loading branch information
srbhr authored Jul 18, 2023
2 parents 9190e75 + 0c27815 commit 3871e91
Show file tree
Hide file tree
Showing 10 changed files with 413 additions and 77 deletions.
66 changes: 45 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

# Resume Matcher

## AI Based Resume Matcher to tailor your resume to a job description. Find the bestkewords, and gain deep insights into your resume.
## AI Based Free & Open Source ATS, Resume Matcher to tailor your resume to a job description. Find the best keywords, and gain deep insights into your resume.

</div>

Expand All @@ -19,37 +19,61 @@

[![Resume Matcher](https://custom-icon-badges.demolab.com/badge/www.resumematcher.fyi-gold?style=for-the-badge&logo=globe&logoColor=black)](https://www.resumematcher.fyi)

[![Live Demo](https://custom-icon-badges.demolab.com/badge/live-demo-red?style=for-the-badge&logo=globe&logoColor=black)](https://resume-matcher.streamlit.app/)

</div>

A Machine Learning Based Resume Matcher, to compare Resumes with Job Descriptions.
Create a score based on how good/similar a resume is to the particular Job Description.\n
Documents are sorted based on Their TF-IDF Scores (Term Frequency-Inverse Document Frequency)
### How does It work?

The Resume Matcher takes your resume and job descriptions as input, parses them using Python, and mimics the functionalities of an ATS, providing you with insights and suggestions to make your resume ATS-friendly.

The process is as follows:

1. **Parsing**: The system uses Python to parse both your resume and the provided job description, just like an ATS would. Parsing is critical as it transforms your documents into a format the system can readily analyze.

2. **Keyword Extraction**: The tool uses advanced machine learning algorithms to extract the most relevant keywords from the job description. These keywords represent the skills, qualifications, and experiences the employer seeks.

3. **Key Terms Extraction**: Beyond keyword extraction, the tool uses textacy to identify the main key terms or themes in the job description. This step helps in understanding the broader context of what the resume is about.

4. **Vector Similarity Using Qdrant**: The tool uses Qdrant, a highly efficient vector similarity search tool, to measure how closely your resume matches the job description. This process is done by representing your resume and job description as vectors in a high-dimensional space and calculating their cosine similarity. The more similar they are, the higher the likelihood that your resume will pass the ATS screening.

On top of that, there are various data visualizations that I've added to help you get started.

Matching Algorihms used are :-
#### PRs Welcomed 🤗

- **String Matching**
<br/>

---

<div align="center">

## How to install

- Monge Elkan
</div>

- **Token Based**
- Jaccard
- Cosine
- Sorensen-Dice
- Overlap Coefficient
1. Clone the project.
2. Create a python virtual environment.
3. Activate the virtual environment.
4. Do `pip install -r requirements.txt` to install all dependencies.
5. Put your resumes in PDF Format in the `Data/Resumes` folder. (Delete the existing contents)
6. Put your Job Descriptions in PDF Format in `Data/JobDescription` folder. (Delete the existing contents)
7. Run `python run_first.py` this will parse all the resumes to JSON.
8. Run `streamlit run streamlit_app.py`.

Topic Modelling of Resumes is done to provide additional information about the resumes and what clusters/topics,
the belong to.
For this :-
**Note**: For local versions don't run the streamlit_second.app it's for deploying to streamlit.

1. [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) of resumes is done to improve the sentence similarities. As it helps reduce the redundant terms and brings out the important ones.
2. id2word, and doc2word algorithms are used on the Documents (from Gensim Library).
3. [LDA](https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation) (Latent Dirichlet Allocation) is done to extract the Topics from the Document set.(In this case Resumes)
4. Additional Plots are done to gain more insights about the document.
Note: The Vector Similarity Part is precomputed here. As sentence encoders require heavy GPU and Memory (RAM). I am working on a blog that will show how you can leverage that in a google colab environment for free.

<br/>

---

### Older Version
### Note 📝

Thanks for the support 💙 this is an ongoing project that I want to build with open source community. There are many ways in which this tool can be upgraded. This includes (not limited to):

Check the older version of the project [**here**](https://github.com/srbhr/Naive-Resume-Matching/blob/master/README.md).
- Create a better dashboard instead of Streamlit.
- Add more features like upploading of resumes and parsing.
- Add a docker image for easy usage.
- Contribute to better parsing algorithm.
- Contribute to on a blog to how to make this work.
16 changes: 15 additions & 1 deletion Data.py → archive/Data.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,21 @@ def build_resume_list(resume_names, path):
return resumes


def build_jobdesc_list(jobdesc_names, path):
resumes = []
for resume in resume_names:
selected_file = read_json(path + '/' + resume)
resumes.append({
"resume": selected_file["clean_data"]
})
return resumes


resume_names = get_filenames_from_dir(resume_path)
resumes = build_resume_list(resume_names, resume_path)

print(resumes) # To see the output.
jobdesc_names = get_filenames_from_dir(job_path)
jobdescs = build_jobdesc_list(jobdesc_names, job_path)

print(resumes)
print(jobdescs)
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
101 changes: 101 additions & 0 deletions archive/streamlit_app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
import string
import spacy
import pywaffle
import streamlit as st
import pandas as pd
import json
import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import squarify

st.title('Resume :blue[Matcher]')
st.image('Assets/img/header_image.jpg')
st.subheader('_AI Based Resume Analyzer & Ranker_')


def read_json(filename):
with open(filename) as f:
data = json.load(f)
return data


# read the json file
resume = read_json(
'Data/Processed/Resume-d531571e-e4fa-45eb-ab6a-267cdeb6647e.json')
job_desc = read_json(
'Data/Processed/Job-Desc-a4f06ccb-8d5a-4d0b-9f02-3ba6d686472e.json')

st.write("### Reading Resume's POS")
df = pd.DataFrame(resume['pos_frequencies'], index=[0])
fig = go.Figure(data=go.Bar(y=list(resume['pos_frequencies'].values()), x=list(resume['pos_frequencies'].keys())),
layout_title_text="Resume's POS")
st.write(fig)

df2 = pd.DataFrame(resume['keyterms'], columns=["keyword", "value"])
st.dataframe(df2)

# Create the dictionary
keyword_dict = {}
for keyword, value in resume['keyterms']:
keyword_dict[keyword] = value

fig = go.Figure(data=[go.Table(header=dict(values=["Keyword", "Value"],
font=dict(size=12),
fill_color='#070A52'),
cells=dict(values=[list(keyword_dict.keys()),
list(keyword_dict.values())],
line_color='darkslategray',
fill_color='#6DA9E4'))
])
st.plotly_chart(fig)

st.divider()

for keyword, value in resume['keyterms']:
pass


# display the waffle chart
figure = plt.figure(
FigureClass=pywaffle.Waffle,
rows=20,
columns=20,
values=keyword_dict,
legend={'loc': 'upper left', 'bbox_to_anchor': (1, 1)})


# Display the dictionary

st.pyplot(fig=figure)
# st.write(dict)

fig = px.treemap(df2, path=['keyword'], values='value',
color_continuous_scale='RdBu',
title='Resume POS')
st.write(fig)


st.plotly_chart(figure_or_data=fig)

fig = go.Figure(data=[go.Table(
header=dict(values=["Tri Grams"],
fill_color='#1D267D',
align='center', font=dict(color='white', size=16)),
cells=dict(values=[resume['tri_grams']],
fill_color='#19A7CE',
align='left'))])

st.plotly_chart(figure_or_data=fig)

fig = go.Figure(data=[go.Table(
header=dict(values=["Bi Grams"],
fill_color='#1D267D',
align='center', font=dict(color='white', size=16)),
cells=dict(values=[resume['bi_grams']],
fill_color='#19A7CE',
align='left'))])

st.plotly_chart(figure_or_data=fig)


File renamed without changes.
Loading

0 comments on commit 3871e91

Please sign in to comment.