Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking β€œSign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Similar Courses Algorithm #470

Merged
merged 6 commits into from
Nov 20, 2024
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 32 additions & 1 deletion server/src/course/course.controller.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import { findCourseById, findCourseByInfo } from './course.data-access';
import { CourseIdRequestType, CourseInfoRequestType } from './course.type';
import { CourseIdRequestType, CourseInfoRequestType, CourseDescriptionRequestType } from './course.type';
import { preprocess, tfidf, cosineSimilarity, idf } from './course.recalgo';

import { findReviewCrossListOR } from '../utils';

@@ -78,3 +79,33 @@ export const getReviewsCrossListOR = async ({

return null;
};

export const getProcessedDescription = (text) => {
const processed = preprocess(text);
return processed;
}

export const getSimilarity = () => {
const descriptions = ["This course provides a detailed study on multiple financial markets including bonds, forwards, futures, swaps, and options and their role in addressing major issues facing humanity. In particular, we plan to study specific topics on the role of financial markets in addressing important issues like funding cancer cure, tackling climate change, and financing educational needs for the underserved. Relative to a traditional finance class, we take a broad approach and think of finance as a way to get things done and financial instruments as a way to solve problems. We explore topics related to diversification and purpose investing, including a highly innovative idea of a mega-fund developing cancer treatment. We examine how financial instruments can help solve or hedge some societal issues, particularly on climate change. As an example, we will be studying a financial solution to deal with California forest fire. We also examine the potential for social impact bonds for educating pre-school children and reducing prisoners' recidivism.",
"This course introduces and develops the leading modern theories of economies open to trade in financial assets and real goods. The goal is to understand how cross-country linkages in influence macroeconomic developments within individual countries; how financial markets distribute risk and wealth around the world; and how trade changes the effectiveness of national monetary and fiscal policies. In exploring these questions, we emphasize the role that exchange rates and exchange rate policy take in shaping the consequences of international linkages. We apply our theories to current and recent events, including growing geoeconomic conflict between Eastern and Western countries, hyperinflation in Argentina, Brexit, and recent Euro-area debt crises.",
"The Corporate Finance Immersion (CFI) Practicum is designed to provide students with a real world and practical perspective on the activities, processes and critical questions faced by corporate finance executives. It is oriented around the key principles of shareholder value creation and the skills and processes corporations use to drive value. The CFI Practicum will help develop skills and executive judgement for students seeking roles in corporate finance, corporate strategy, business development, financial planning, treasury, and financial management training programs. The course can also help students pursuing consulting to sharpen their financial skills and get an excellent view of a corporation's strategic and financial objectives. The practicum will be comprised of a mix of lectures, cases, guest speakers, and team projects. Additionally, there will be training workshops to build your financial modelling skills.",
"Environmental Finance & Impact Investing Practicum",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these test cases (the short ones that are just the course name) meant to be tested for similarity against the longer descriptions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just courses without descriptions on the course roster API, so I used the course title as a filler for now.

"Corporate Finance II"
]

const processedDescriptions = descriptions.map(desc => preprocess(desc).split(' '));
const allTerms = [...new Set(processedDescriptions.flat())];
const idfValues = idf(allTerms, processedDescriptions);
const tfidfVectors = processedDescriptions.map(terms => tfidf(terms, idfValues));

let similarity = [];

for (let i = 0; i < descriptions.length; i++) {
for (let j = i + 1; j < descriptions.length; j++) {
const cos = cosineSimilarity(tfidfVectors[i], tfidfVectors[j]);
similarity.push({ courseA: i, courseB: j, similarity: cos });
}
}
similarity.sort((a, b) => b.similarity - a.similarity);
return similarity.slice(0, 5);
}
106 changes: 106 additions & 0 deletions server/src/course/course.recalgo.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
/**
* Applies stemming rules to reduce a word to its base form
*/
const stemWord = (word) => {
if (word.endsWith("sses")) {
return word.replace(/sses$/, 'ss');
} if (word.endsWith("ies")) {
return word.replace(/ies$/, 'y');
} if (word.endsWith("es") && !/[aeiou]es$/.test(word)) {
return word.replace(/es$/, '');
} if (word.endsWith("s") && word.length > 1 && !/[sxz]$/.test(word)) {
return word.replace(/s$/, '');
}
return word;
}

/**
* Preprocesses the description to remove pluralities and unnecessary punctuation
* @param description A course description that needs to be preprocessed
* @returns The processed description for a course
*/
export const preprocess = (description: string) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed in the testing picture you uploaded that there are some strange text breaks or punctuation that now occurs between words? Not sure if that's still happening

let sentences = description.match(/[^.!?]*[.!?]\s+[A-Z]/g) || [description];
let processedText = sentences.map(sentence => {
let words = sentence.match(/\b\w+\b/g) || [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought I had about preprocessing was getting rid of "filler words," (i.e. and, the, to, for, with...)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice idea! Also i saw "this" and maybe any pronouns?

let cleanedWords = words.map(word => {
const singularWord = stemWord(word.toLowerCase());
return singularWord.replace(/[^\w\s]/g, '');
});
return cleanedWords.join(' ');
});
return processedText.join('. ');
}

/**
* Calculates the inverse document frequency for the given terms
* @param terms list of terms in the course description
* @param words list of all course descriptions as word arrays
* @returns a dictionary with terms as keys and their IDF scores as values
*/
export const idf = (terms, words) => {
let df = {};
let idf = {};
for (const term of terms) {
df[term] = words.reduce((count, wordsSet) => (count + (wordsSet.includes(term) ? 1 : 0)), 0);
idf[term] = 1 / (df[term] + 1);
}
return idf;
}

/**
* Calculates the TF-IDF vector for the given terms
* @param terms list of terms in the course description
* @param idf inverse document frequency (IDF) for the terms
* @returns a dictionary with terms as keys and their TF-IDF scores as values
*/
export const tfidf = (terms, idf) => {
let d = {};
for (const term of terms) {
if (!d[term]) {
d[term] = 0;
}
d[term]++;
}
for (const term in d) {
if (idf && idf[term] === undefined) {
idf[term] = 1;
}
d[term] *= idf[term];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could also normalize by dividing by term frequency here to make sure that the tfidf score accounts for different lengths of the documents to reflect an accurate importance for each term no matter document length.

}
return d;
}

/**
* Computes the dot product between two vectors
*/
const dot = (a, b) => {
let sum = 0;
for (let key in a) {
if (b[key]) {
sum += a[key] * b[key];
}
}
return sum;
}

/**
* Computes the magnitude of a vector
*/
const norm = (vec) => {
const sum = dot(vec, vec);
return Math.sqrt(sum);
}

/**
* Calculates the cosine similarity of two frequency word vectors
* @param vecA frequency word vector corresponding to the first course description
* @param vecB frequency word vector corresponding to the second course description
* @returns a number representing the similarity between the two descriptions
*/
export const cosineSimilarity = (vecA, vecB) => {
const dotProduct = dot(vecA, vecB);
const magA = norm(vecA);
const magB = norm(vecB);
return dotProduct / (magA * magB);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could also add a check here in case magA or magB is 0 to avoid dividing by 0.

}
25 changes: 23 additions & 2 deletions server/src/course/course.router.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import express from 'express';

import { CourseIdRequestType, CourseInfoRequestType } from './course.type';
import { getCourseByInfo, getReviewsCrossListOR } from './course.controller';
import { CourseIdRequestType, CourseInfoRequestType, CourseDescriptionRequestType } from './course.type';
import { getCourseByInfo, getReviewsCrossListOR, getProcessedDescription, getSimilarity } from './course.controller';

import { getCourseById } from '../utils';

@@ -69,3 +69,24 @@ courseRouter.post('/get-reviews', async (req, res) => {
.json({ error: `Internal Server Error: ${err.message}` });
}
});

/** Reachable at POST /api/courses/getPreDesc
* @body description: a course description
* Gets the processed description to use for the similarity algorithm
* Currently used for testing
*/
courseRouter.post('/getPreDesc', async (req, res) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be any errors to catch here? Also I think that a route name more like /preprocess or /preprocess-desc might fit more with our naming theme

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep we should handle errors at the router level

const { description }: CourseDescriptionRequestType = req.body;
const processed = getProcessedDescription(description);
return res.status(200).json({ result: processed });
});

/** Reachable at POST /api/courses/getSimilarity
* @body courseId: a course's id field
* Gets the array of the top 5 similar courses for the course with id = courseId
*/
courseRouter.post('/getSimilarity', async (req, res) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, /api/courses/get/similarity or something similar

// const { courseId }: CourseIdRequestType = req.body;
const similarity = getSimilarity();
return res.status(200).json({ result: similarity });
});
4 changes: 4 additions & 0 deletions server/src/course/course.type.ts
Original file line number Diff line number Diff line change
@@ -6,3 +6,7 @@ export interface CourseInfoRequestType {
export interface CourseIdRequestType {
courseId: string;
}

export interface CourseDescriptionRequestType {
description: string;
}

Unchanged files with check annotations Beta

import ReviewModal from './ReviewModal'
enum PageStatus {

Check warning on line 29 in client/src/modules/Course/Components/Course.tsx

GitHub Actions / build

'PageStatus' is already declared in the upper scope on line 29 column 6

Check warning on line 29 in client/src/modules/Course/Components/Course.tsx

GitHub Actions / build

'PageStatus' is defined but never used

Check warning on line 29 in client/src/modules/Course/Components/Course.tsx

GitHub Actions / build

'PageStatus' is already declared in the upper scope on line 29 column 6
Loading,

Check warning on line 30 in client/src/modules/Course/Components/Course.tsx

GitHub Actions / build

'Loading' is already declared in the upper scope on line 12 column 8

Check warning on line 30 in client/src/modules/Course/Components/Course.tsx

GitHub Actions / build

'Loading' is defined but never used
Success,

Check warning on line 31 in client/src/modules/Course/Components/Course.tsx

GitHub Actions / build

'Success' is defined but never used
Error,

Check warning on line 32 in client/src/modules/Course/Components/Course.tsx

GitHub Actions / build

'Error' is defined but never used
}
export const Course = () => {
* Fetches current course info and reviews and updates UI state
*/
useEffect(() => {
async function updateCurrentClass(number: number, subject: string) {

Check warning on line 68 in client/src/modules/Course/Components/Course.tsx

GitHub Actions / build

'number' is already declared in the upper scope on line 36 column 11

Check warning on line 68 in client/src/modules/Course/Components/Course.tsx

GitHub Actions / build

'subject' is already declared in the upper scope on line 36 column 19
try {
const response = await axios.post(`/api/courses/get-by-info`, {
number,
courseId: courseId,
})
clearSessionReview()

Check warning on line 116 in client/src/modules/Course/Components/Course.tsx

GitHub Actions / build

'clearSessionReview' is not defined
if (response.status === 200) {
toast.success(
'Thanks for reviewing! New reviews are updated every 24 hours.'
toast.error('An error occurred, please try again.')
}
} catch (e) {
clearSessionReview()

Check warning on line 125 in client/src/modules/Course/Components/Course.tsx

GitHub Actions / build

'clearSessionReview' is not defined
toast.error('An error occurred, please try again.')
}
}
}
useEffect(() => {
const signIn = (redirectFrom: string) => {

Check warning on line 68 in client/src/auth/auth_utils.ts

GitHub Actions / build

'redirectFrom' is already declared in the upper scope on line 48 column 39
Session.setPersistent({ redirectFrom: redirectFrom })
history.push('/login')
}
import React, { useEffect, useState } from 'react'
import { Redirect, useParams } from 'react-router-dom'

Check warning on line 2 in client/src/modules/Admin/Components/Admin.tsx

GitHub Actions / build

'useParams' is defined but never used

Check warning on line 2 in client/src/modules/Admin/Components/Admin.tsx

GitHub Actions / build

'useParams' is defined but never used
import axios from 'axios'
};
const [updatingField, setUpdatingField] = useState<string>("");
const [addSemester, setAddSemester] = useState('')

Check warning on line 39 in client/src/modules/Admin/Components/Admin.tsx

GitHub Actions / build

'setAddSemester' is assigned a value but never used

Check warning on line 39 in client/src/modules/Admin/Components/Admin.tsx

GitHub Actions / build

'setAddSemester' is assigned a value but never used
const [isAdminModalOpen, setIsAdminModalOpen] = useState<boolean>(false)
const { isLoggedIn, token, isAuthenticating } = useAuthMandatoryLogin('admin')
* If this is the user's second click, call addAllCourses above to initiaize
* the local database
*/
function renderInitButton(doubleClick: boolean) {

Check warning on line 304 in client/src/modules/Admin/Components/Admin.tsx

GitHub Actions / build

'doubleClick' is already declared in the upper scope on line 23 column 10
// Offer button to edit database
if (doubleClick) {
return (
}
function renderAdmin(token: string) {

Check warning on line 336 in client/src/modules/Admin/Components/Admin.tsx

GitHub Actions / build

'token' is already declared in the upper scope on line 42 column 23
return (
<div className={styles.adminWrapper}>
<div className="headInfo">
getCourse()
function renderButtons(review: any) {

Check warning on line 39 in client/src/modules/Admin/Components/AdminReview.tsx

GitHub Actions / build

'review' is already declared in the upper scope on line 21 column 24
const reported = review.reported
if (reported === 1) {
return (
useEffect(() => {
async function getAdmins() {
const response = await axios.post('/api/admin/users/get', {token: token})
const admins = response.data.result

Check warning on line 29 in client/src/modules/Admin/Components/ManageAdminModal.tsx

GitHub Actions / build

'admins' is already declared in the upper scope on line 16 column 12
if (response.status === 200) {
setAdmins(admins)
}