Team Members: Chirag Belani, Chinmay Inamdar, Tanishka Singh, Anushka Waghmare
ART Finder is an automated system designed to streamline the research and marketing content generation process. It combines data scraping, machine learning, and actionable insights to identify user pain points and provide strategic marketing suggestions. The integration of Langflow's Retrieval-Augmented Generation (RAG) agents allows the system to leverage vectorized datasets for advanced content analysis and recommendation generation.
The process begins by collecting data from the following sources:
- YouTube: Scrapes video descriptions, comments, and metadata for user opinions and trends.
- Reddit: Gather data from posts, discussions, and comments to identify common pain points and discussions.
- Company’s Data: Includes internal datasets, feedback forms, and existing customer insights.
- Purpose: Automates the extraction of structured and unstructured data from the provided sources.
- Techniques: Uses tools such as Beautiful Soup, Scrapy, or Selenium for scraping. For social platforms, APIs may also be used (e.g., YouTube Data API and Reddit API).
- Output: Generates raw datasets containing user-generated content, competitor data, and industry-specific information.
- Purpose: Transforms raw textual data into vectorized representations to enable machine learning models to process them efficiently.
- Tools: Uses pre-trained embeddings such as Sentence Transformers, OpenAI’s embeddings, or TF-IDF for textual representation.
- Storage: Saves vectorized data in a database or vector database like Pinecone, Weaviate, or FAISS for efficient similarity searches.
- What It Does: The Langflow Retrieval-Augmented Generation (RAG) agent queries the vectorized dataset to generate insights and content recommendations based on the input context.
- Functions:
- Identifies user pain points.
- Analyzes market trends in specific regions.
- Process: The RAG agent retrieves relevant vectors and uses GPT or other LLMs to generate outputs tailored to marketing or user insight requirements.
- Objective: Extract recurring problems, challenges, or feedback from users in the dataset.
- Implementation:
- Sentiment Analysis: Understands the sentiment (positive, negative, or neutral) in user discussions.
- Clustering: Groups similar pain points to identify common themes.
- Objective: Analyze the data to identify marketing trends and strategies in a given geographical area.
- Implementation:
- Regional Categorization: Filters data based on location-specific keywords or tags.
- Competitor Benchmarking: Compares trends with competitor strategies to uncover market opportunities.
- Purpose: Summarizes findings into a comprehensive report.
- Content:
- Key user pain points.
- High-performing strategies by competitors.
- Suggested hooks, CTAs, and content formats.
- Visualization: Uses graphs, word clouds, and sentiment analysis to make data comprehensible.
- Purpose: Links the insights with the company’s social media or product strategy.
- Applications:
- Social Media Campaigns: Develops targeted campaigns based on user feedback and pain points.
- Product Improvement: Suggests features or changes based on analyzed user data.
- Objective: Generates ready-to-use marketing content.
- Steps:
- Content Suggestions: Provides hooks, CTAs, and solutions for marketing campaigns.
- Copywriting: Uses Google’s Gemini GPT integration for generating persuasive and user-centric ad copies.
- Creative Recommendations: Suggests banner themes, visuals, and layouts.
- Tools: APIs, web scraping libraries (Beautiful Soup, Selenium, etc.).
- Input: Data source URLs or API configurations.
- Tools:
- Text Cleaning: NLTK or SpaCy for cleaning and preprocessing text.
- Vectorization: Sentence Transformers or OpenAI embeddings for creating vectors.
- Output: Vectorized dataset stored in a vector database.
- Description: A retrieval-based language model integrated with the vector database for generating insights.
- Usage: Employs LangChain workflows for querying and generating responses.
- Tools: Matplotlib, Seaborn, Plotly, or D3.js for creating graphs and charts.
- Outputs:
- Graphs showing user trends and pain points.
- Word clouds representing recurring themes.
- Tools: GPT-3.5/GPT-4 for generating ad copies, hooks, and campaign recommendations.
The user provides:
- Topic: E.g., "Eco-friendly furniture."
- Target Audience: E.g., "Urban millennials aged 25-35."
- Competitors: E.g., "IKEA, Wayfair."
- Preferred Platforms: E.g., "YouTube, Reddit, App Reviews."
The system begins data scraping and analysis using the provided input.
- Visualized insights and pain points.
- Competitor analysis results.
- Suggested strategies for marketing.
- The user selects "Generate Content" and specifies preferences like themes, CTAs, and visuals.
- The system generates ad copies, banners, or reports.
- Language: Javascript
- Database: Datastax Astra DB for user inputs and processed data storage.
- HTML and CSS for visually appealing dashboard.
- Models: Google’s Gemini.
- Libraries: LangChain for RAG integration.
- Multi-language Support: Expand analysis to non-English content.
- Real-time Updates: Continuously scrape and update insights.
- Customization: Allow users to set specific analysis goals or filters.
Our initial idea for a unique selling point was to integrate an automated ad banner creation feature. This would take the output provided by Gemini's insights—including hooks, CTAs, and marketing content—and use it to generate customized ad banners. The banners would include tailored visuals, layouts, and text to suit specific target audiences and platforms, offering a seamless transition from insights to actionable marketing material.
However, due to time constraints, we won't be able to implement this feature in the current version. Despite this, it remains a promising enhancement for future iterations to make our platform truly stand out in the market.