Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spam classifier with google gemini flash 2.0 experimental and check for spam earlier #58

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

nonprofittechy
Copy link
Member

Fix #54

This adds a basically free and optional spam filter to the feedback form, driven by Google Gemini.

If the user's message passes the keyword filter, it will be sent to Google Gemini flash 2.0 experimental for additional filtering. As of 1/3/2025, the free tier has a limit of 1,500 queries/day, plenty to handle the small volume of feedback form spam we've been dealing with (a dozen a month in some cases).

To use it, a google gemini api key must be added to the global configuration.

I also noticed that the existing spam filtering wasn't being used except when the form fell back to delivering an email. I'm not sure why that was the case but this should also solve that problem.

log(f"~~~USER FEEDBACK~~~ {github_repo} -{issue_template.subject_as_html(trim=True)} - {issue_template.content_as_html(trim=True)}")
mark_task_as_performed('issue noted', persistent=True)
else:
log("Already sent feedback to github from a feedback interview, not going to send again")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can likely simplify the nested ifs much more. Current structure, as I understand it:

if feedback looks like spam:
    log
    mark task as done
    set some values
else:
    if task not yet performed:
        prepare saved_uuid
        if user should be added:
              add user
         if feedback should be sent to github:
                prepare
                if url:
                   if saved_uuid:
                        link
                else:
                      log
                      if error email AND not spam:
                            log
                            send
                      else:
                            log
               else:
                    log
set note_issue to true

What we could do, to reduce code duplication and the logic branches:

if feedback looks like spam:
    log
    mark as done 
    save values
    return
if task already performed:
    log
    return
    
if should add user to panel:
    add user

if send feedback to github:
     create issue
     if issue_url and saved_uuid:
          link to issue
     else:
         log error
         if error email configured:
               send
          else:
                log

mark as done
set note_issue to true
    

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use a return statement here in a Docassemble code block, unfortunately! I can take another look at simplifying this--I just wanted to be careful to scope my change to be as small as possible.

Comment on lines +196 to +198
if not context:
context = "a guided interview in the legal context"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To improve readability when setting defaults, we can use or. For example:

context = context or "a guided interview in the legal context"
gemini_api_key = gemini_api_key or get_config("google gemini api key")
... etc ...

Comment on lines +228 to +231
try:
response = model.generate_content(body)
if response.text.strip() == "spam":
return True

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the sense that this would be readable if we folded it into the other try. I'm not sure there's a need to keep them distinct. We can leverage using specific exception types to do this. The structure would change to something like this:

try:
    attempt configuration
    generate the response
except UseANameException as e:
    log error configuring 
    return False
except Exception as e:
    log generic error 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a better spam filter with Bayesian classification
2 participants