Video Demonstration: https://youtu.be/rCXia55nKrc
Our team’s mission is to build a tool that alleviates stress on Canadians during the hefty tax season.
With Canadians spending over 7 hours to complete their tax returns and over $5 billion dollars to cover personal income compliance costs, we decided to come up with a solution to help Canadians save time and money. We created TaxEasy as a web application that uses machine learning to generate a tax return file based on your tax slips! With TaxEasy, Canadians don’t need to understand the complications involved with taxes to file their tax returns. All they need to do is upload their tax slips and TaxEasy will do the rest.
While filing taxes only occurs once a month, it is a gruelling task that takes up time and money. We built TaxEasy in hopes of making Canadians’ lives easier so that they can use their saved time to explore their interests and spend time with their loved ones.
TaxEasy is a web application that simplifies the process of completing a tax return as it generates a tax return file for Canadians by taking the information given on tax slips. Using optical character recognition (OCR), TaxEasy recognizes specific categories in the uploaded tax slips and fills out the tax return form accordingly. For instance, when scanning the T4 form, TaxEasy looks for the “Employment Income” box and inserts the given value into the tax return form’s section for Employment Income. This is all done with a simple click of a button. Users only need to upload their tax slips for this process to occur.
We used Microsoft Azure’s Optical Character Recognition (OCR) API for our machine learning implementation. This API was used to train 6 models to recognize the distinct categories present in the following tax slips: T4, T4A, T4A(OAS), T4AP, T1032, and T4E. During the training process, we used supervised learning by creating a labelled training set. We assigned labels based on the information needed on a tax return form. For instance, a tax return form requires an individual’s Employment Income on their T4. Thus, we trained our model to identify where that is on a T4 based on our labels. Moreover, we used Pandas, a Python library, to store the tax return data into a csv-file which was then used to fill in a blank tax return form. For our front-end we used HTML, CSS, Bootstrap, and Python Flask to ensure responsiveness and the smooth integration between our front-end and back-end.
The biggest challenge was the learning curve for us. Having never used Python Flask and Microsoft Azure’s APIs, we spent the majority of our first day understanding the basics of each technology. This meant diving deep into YouTube videos and documentation reading. Once we gained an understanding of the technologies, we were ready to start our project! However, we were faced with the challenge of obtaining a dataset of tax slips. To overcome this, we decided to create our own tax slips using the files provided by the CRA. In order to maintain consistency with realistic tax slips, we used our own tax slips as reference. Overall, the challenges we had were overcome with persistence and creativity which were powered by our desire to learn.
Starting the project, we were not confident that we could complete it within the timeframe since we were both going out of our comfort zones to learn new concepts. Thus, completing the project is an accomplishment in itself because it demonstrates our passion for learning new things. Moreover, we are proud to have created an application that can have an impact on Canadians. With time being more precious than ever, we’ve enabled Canadians to spend more of that time towards their own wellbeing. Overall, we’re extremely proud that we were able to learn new skills and make an impact.
With no experience with APIs, we learned how to use Microsoft Azure’s OCR and Storage APIs in order to create a machine learning implementation to recognize the different structures given in tax slips. During this process, we got first-hand experience with supervised learning by having to label our data to increase our model accuracy. Moreover, we learned how to use Python to convert data into a csv-file in order to fill out a blank PDF file. On the front-end, we learned how to use Flask by leveraging its HTTP methods to allow for a smooth integration with our backend.
For the future, we plan to implement a questionnaire feature that will allow users to input information that cannot be gathered from tax slips, such as email, birthdate, and etc… Moreover, we want to enhance our machine learning model by training it on a larger set of tax slips. We decided to only train our models over 6 tax slips due to the limited timeframe and the need to deliver a working product.