Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes .get_vocabulary() JSONDecodeError as API has changed #141

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gigajuwels
Copy link

fixes #139 by indexing each section, unit, and level to get the skill_id in order to probably build the .get_vocabulary() API query. It also handles pagination and will query every page.

@HumanBot000
Copy link

This still gives me:

requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
    def get_vocabulary(self, language_abbr=None):
        """Get overview of user's vocabulary in a language."""
        if self.username != self._original_username:
            raise OtherUserException("Vocab cannot be listed when the user has been switched.")

        if language_abbr and not self._is_current_language(language_abbr):
            self._switch_language(language_abbr)

        current_courses = self.get_data_by_user_id()["currentCourse"]["pathSectioned"]
        progressed_skills = []
        for section in current_courses:
            completedUnits = section["completedUnits"]
            units = section["units"]
            for i in range(completedUnits):
                unit = units[i]
                levels = unit["levels"]
                for l in levels:
                    level_type = l["type"]
                    # unit review doesnt contain new words
                    if level_type in ["chest", "unit_review"]:
                        continue
                    pathLevelClientData = l["pathLevelClientData"]
                    finishedSessions = l["finishedSessions"]
                    if "skillId" in pathLevelClientData:
                        skillId = pathLevelClientData["skillId"]
                        new_obj = {
                            "finishedLevels": 1,
                            "finishedSessions": finishedSessions,
                            "skillId": {
                                "id": skillId
                            }
                        }
                        progressed_skills.append(new_obj)
                    elif "skillIds" in pathLevelClientData:
                        skillIds = pathLevelClientData["skillIds"]
                        for skillId in skillIds:
                            new_obj = {
                                "finishedLevels": 1,
                                "finishedSessions": finishedSessions,
                                "skillId": {
                                    "id": skillId
                                }
                            }
                            progressed_skills.append(new_obj)

        # updated URL, default language to be english,
        current_index = 0
        data = []
        while True:
            overview_url = f"https://www.duolingo.com/2017-06-30/users/{self.user_data.id}/courses/{language_abbr}/en/learned-lexemes?sortBy=ALPHABETICAL&startIndex={current_index}"
            overview_request = self._make_req(overview_url, data={
                "lastTotalLexemeCount": 0,
                "progressedSkills": progressed_skills
            })
            overview = overview_request.json()
            learnedLexemes = overview['learnedLexemes']
            data.extend(learnedLexemes)
            pagination = overview['pagination']
            totalLexemes = pagination['totalLexemes']
            if len(data) >= totalLexemes:
                break
            # its not my database so i am being wasteful :)
            nextStartIndex = pagination['nextStartIndex']
            current_index = nextStartIndex
        return data
        ```

@JASchilz
Copy link
Contributor

JASchilz commented Apr 7, 2024

It seems like development in this project is pretty challenging because, among other things, login is broken on the master branch. I'm going to create a fork today that has login working. I'll be happy to maintain that fork, with the understanding that I'll be treating the project as maintenance mode and that it is likely to break intermittently due to API changes from Duolingo.

I myself don't use this library frequently at all, so if there's anyone who thinks they'd be a more natural owner for that fork I'd be happy to contribute to your branch rather than the other way around.

@HumanBot000
Copy link

I would love to see a working API again. I never looked any deeper how the actual duolingo backend works, but maybe I will do that in the future. I don't have enough time to constantly fix something because of other projects, but maybe I will contribute to your fork in the Future.

  • Thanks

@JASchilz
Copy link
Contributor

JASchilz commented Apr 7, 2024

OK! I've created a fork at https://gitlab.com/JASchilz/duolingo . I've integrated login fixes from #137 .

I was able to try this change out and also retrieve vocabulary! I noticed a few challenges:

  • I think I encountered an error when I didn't provide a language abbreviation, when I provided a language abbreviation other than my current language, and also when I provided a language abbreviation for my current language when I have zero learned vocabulary in that language. I'm not totally sure I got an error in all of these cases, but in any case they seem really tractable.
  • A more complicated issue is that this seems to return a format that's different from what the original returned. I think that the old one returned pretty much the raw json of the response from the old endpoint, and this one does too, but the format is different of course.

It doesn't seem important to replicate the old format, but it might suggest two different methods: one that just returns a list of vocab and another that returns a more raw response and we tell users is liable to change if Duolingo changes their API.

I've started up a merge request in my own repo from your change @gigajuwels . I can address these points if you don't get around to them.

@HumanBot000
Copy link

<3

@HumanBot000
Copy link

OK! I've created a fork at https://gitlab.com/JASchilz/duolingo . I've integrated login fixes from #137 .

I was able to try this change out and also retrieve vocabulary! I noticed a few challenges:

  • I think I encountered an error when I didn't provide a language abbreviation, when I provided a language abbreviation other than my current language, and also when I provided a language abbreviation for my current language when I have zero learned vocabulary in that language. I'm not totally sure I got an error in all of these cases, but in any case they seem really tractable.
  • A more complicated issue is that this seems to return a format that's different from what the original returned. I think that the old one returned pretty much the raw json of the response from the old endpoint, and this one does too, but the format is different of course.

It doesn't seem important to replicate the old format, but it might suggest two different methods: one that just returns a list of vocab and another that returns a more raw response and we tell users is liable to change if Duolingo changes their API.

I've started up a merge request in my own repo from your change @gigajuwels . I can address these points if you don't get around to them.

Could you please review my merge request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JSONDecodeError when trying to call .get_vocabulary()
3 participants