Fixes .get_vocabulary() JSONDecodeError as API has changed #141

gigajuwels · 2024-03-27T01:16:30Z

fixes #139 by indexing each section, unit, and level to get the skill_id in order to probably build the .get_vocabulary() API query. It also handles pagination and will query every page.

HumanBot000 · 2024-04-06T10:42:01Z

This still gives me:

requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

    def get_vocabulary(self, language_abbr=None):
        """Get overview of user's vocabulary in a language."""
        if self.username != self._original_username:
            raise OtherUserException("Vocab cannot be listed when the user has been switched.")

        if language_abbr and not self._is_current_language(language_abbr):
            self._switch_language(language_abbr)

        current_courses = self.get_data_by_user_id()["currentCourse"]["pathSectioned"]
        progressed_skills = []
        for section in current_courses:
            completedUnits = section["completedUnits"]
            units = section["units"]
            for i in range(completedUnits):
                unit = units[i]
                levels = unit["levels"]
                for l in levels:
                    level_type = l["type"]
                    # unit review doesnt contain new words
                    if level_type in ["chest", "unit_review"]:
                        continue
                    pathLevelClientData = l["pathLevelClientData"]
                    finishedSessions = l["finishedSessions"]
                    if "skillId" in pathLevelClientData:
                        skillId = pathLevelClientData["skillId"]
                        new_obj = {
                            "finishedLevels": 1,
                            "finishedSessions": finishedSessions,
                            "skillId": {
                                "id": skillId
                            }
                        }
                        progressed_skills.append(new_obj)
                    elif "skillIds" in pathLevelClientData:
                        skillIds = pathLevelClientData["skillIds"]
                        for skillId in skillIds:
                            new_obj = {
                                "finishedLevels": 1,
                                "finishedSessions": finishedSessions,
                                "skillId": {
                                    "id": skillId
                                }
                            }
                            progressed_skills.append(new_obj)

        # updated URL, default language to be english,
        current_index = 0
        data = []
        while True:
            overview_url = f"https://www.duolingo.com/2017-06-30/users/{self.user_data.id}/courses/{language_abbr}/en/learned-lexemes?sortBy=ALPHABETICAL&startIndex={current_index}"
            overview_request = self._make_req(overview_url, data={
                "lastTotalLexemeCount": 0,
                "progressedSkills": progressed_skills
            })
            overview = overview_request.json()
            learnedLexemes = overview['learnedLexemes']
            data.extend(learnedLexemes)
            pagination = overview['pagination']
            totalLexemes = pagination['totalLexemes']
            if len(data) >= totalLexemes:
                break
            # its not my database so i am being wasteful :)
            nextStartIndex = pagination['nextStartIndex']
            current_index = nextStartIndex
        return data
        ```

JASchilz · 2024-04-07T17:11:24Z

It seems like development in this project is pretty challenging because, among other things, login is broken on the master branch. I'm going to create a fork today that has login working. I'll be happy to maintain that fork, with the understanding that I'll be treating the project as maintenance mode and that it is likely to break intermittently due to API changes from Duolingo.

I myself don't use this library frequently at all, so if there's anyone who thinks they'd be a more natural owner for that fork I'd be happy to contribute to your branch rather than the other way around.

HumanBot000 · 2024-04-07T17:30:27Z

I would love to see a working API again. I never looked any deeper how the actual duolingo backend works, but maybe I will do that in the future. I don't have enough time to constantly fix something because of other projects, but maybe I will contribute to your fork in the Future.

Thanks

JASchilz · 2024-04-07T23:41:18Z

OK! I've created a fork at https://gitlab.com/JASchilz/duolingo . I've integrated login fixes from #137 .

I was able to try this change out and also retrieve vocabulary! I noticed a few challenges:

I think I encountered an error when I didn't provide a language abbreviation, when I provided a language abbreviation other than my current language, and also when I provided a language abbreviation for my current language when I have zero learned vocabulary in that language. I'm not totally sure I got an error in all of these cases, but in any case they seem really tractable.
A more complicated issue is that this seems to return a format that's different from what the original returned. I think that the old one returned pretty much the raw json of the response from the old endpoint, and this one does too, but the format is different of course.

It doesn't seem important to replicate the old format, but it might suggest two different methods: one that just returns a list of vocab and another that returns a more raw response and we tell users is liable to change if Duolingo changes their API.

I've started up a merge request in my own repo from your change @gigajuwels . I can address these points if you don't get around to them.

HumanBot000 · 2024-04-08T05:07:54Z

<3

HumanBot000 · 2024-04-11T14:05:34Z

OK! I've created a fork at https://gitlab.com/JASchilz/duolingo . I've integrated login fixes from #137 .

I was able to try this change out and also retrieve vocabulary! I noticed a few challenges:

I think I encountered an error when I didn't provide a language abbreviation, when I provided a language abbreviation other than my current language, and also when I provided a language abbreviation for my current language when I have zero learned vocabulary in that language. I'm not totally sure I got an error in all of these cases, but in any case they seem really tractable.

A more complicated issue is that this seems to return a format that's different from what the original returned. I think that the old one returned pretty much the raw json of the response from the old endpoint, and this one does too, but the format is different of course.

It doesn't seem important to replicate the old format, but it might suggest two different methods: one that just returns a list of vocab and another that returns a more raw response and we tell users is liable to change if Duolingo changes their API.

I've started up a merge request in my own repo from your change @gigajuwels . I can address these points if you don't get around to them.

Could you please review my merge request?

fix: get vocab func

e80c650

gigajuwels mentioned this pull request Mar 27, 2024

Nothing being pulled JASchilz/AnkiSyncDuolingo#76

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes .get_vocabulary() JSONDecodeError as API has changed #141

Fixes .get_vocabulary() JSONDecodeError as API has changed #141

gigajuwels commented Mar 27, 2024

HumanBot000 commented Apr 6, 2024

JASchilz commented Apr 7, 2024

HumanBot000 commented Apr 7, 2024

JASchilz commented Apr 7, 2024

HumanBot000 commented Apr 8, 2024

HumanBot000 commented Apr 11, 2024

Fixes .get_vocabulary() JSONDecodeError as API has changed #141

Are you sure you want to change the base?

Fixes .get_vocabulary() JSONDecodeError as API has changed #141

Conversation

gigajuwels commented Mar 27, 2024

HumanBot000 commented Apr 6, 2024

JASchilz commented Apr 7, 2024

HumanBot000 commented Apr 7, 2024

JASchilz commented Apr 7, 2024

HumanBot000 commented Apr 8, 2024

HumanBot000 commented Apr 11, 2024