-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Journal #1
Comments
Issues history in the Tribler repo |
An animated history of Tribler from the Beginning of Time to the Present Day. |
An animated history of Superapp: |
TIL: how to build libtorrent 1.2.18 on Apple Silicone: git clone https://github.com/arvidn/libtorrent
cd libtorrent
git checkout v1.2.18
brew install [email protected] openssl boost boost-build boost-python3
python3.11 setup.py build
python3.11 setup.py install |
Trying to untangle the tangle of In fact, it's even fascinating. I can see how different people have added code over time. Sometimes it was with mistakes. Then other people went in and corrected the errors without understanding the bigger picture. Then there were refactorings... |
One week with It is mysterious. It is written in a way it shouldn't work. But it works. Thanks to @kozlovsky for his help in unraveling how this works. I leave here one example of how it is written. All irrelevant parts of functions are replaced by Here is the function @db_session
def check_local_torrents(self):
...
for random_torrent in selected_torrents:
self.check_torrent_health(bytes(random_torrent.infohash))
...
... The listing of the @task
async def check_torrent_health(self, infohash, timeout=20, scrape_now=False):
... Below we can see a sync call to an async function which shouldn't lead to the execution of the async function. The secret here is a decorator def task(func):
"""
Register a TaskManager function as an anonymous task and return the Task
object so that it can be awaited if needed. Any exceptions will be logged.
Note that if awaited, exceptions will still need to be handled.
"""
if not iscoroutinefunction(func):
raise TypeError('Task decorator should be used with coroutine functions only!')
@wraps(func)
def wrapper(self, *args, **kwargs):
return self.register_anonymous_task(func.__name__,
ensure_future(func(self, *args, **kwargs)),
ignore=(Exception,))
return wrapper This trick makes code much harder to read and neglects sync-async separation in python. |
I've found a nice and super simple tool for creating gifs
|
@kozlovsky handed me a shocking secret:
https://textual.textualize.io/blog/2023/02/11/the-heisenbug-lurking-in-your-async-code/ |
Using Environment Protection Rules to Secure Secrets When Building External Forks with |
AsyncGroup, developed in collaboration with @kozlovsky getting better and better 🚀 . It could be used as a lightweight replacement of TaskManager and also could be replaced by itself with native Task Groups in the future (available since python 3.11) |
TIL: Web3 Sybil avoidance using network latency. Vector Clock: Bloom Clock:
|
The Google standard for code review: https://google.github.io/eng-practices/review/reviewer/standard.html Thus, we get the following rule as the standard we expect in code reviews: In general, reviewers should favor approving a CL once it is in a state where it definitely improves the overall code health of the system being worked on, even if the CL isn’t perfect. That is the senior principle among all of the code review guidelines. A key point here is that there is no such thing as “perfect” code—there is only better code. Reviewers should not require the author to polish every tiny piece of a CL before granting approval. Rather, the reviewer should balance out the need to make forward progress compared to the importance of the changes they are suggesting. Instead of seeking perfection, what a reviewer should seek is continuous improvement. A CL that, as a whole, improves the maintainability, readability, and understandability of the system shouldn’t be delayed for days or weeks because it isn’t “perfect.” |
The annual (seemingly traditional) analysis of the Tribler repo: TriblerPrerequisites:pip install git-of-theseus
git clone https://github.com/Tribler/tribler
git-of-theseus-analyze tribler Also, I assume that the correct .mailmap is present in the repo. Resultsgit-of-theseus-stack-plot cohorts.json git-of-theseus-stack-plot authors.json --normalize git-of-theseus-stack-plot authors.json git-of-theseus-stack-plot exts.json git-of-theseus-survival-plot survival.json IPv8Prerequisites:pip install git-of-theseus
git https://github.com/Tribler/py-ipv8
git-of-theseus-analyze py-ipv8 Resultsgit-of-theseus-stack-plot cohorts.json git-of-theseus-stack-plot authors.json --normalize git-of-theseus-stack-plot authors.json git-of-theseus-stack-plot exts.json git-of-theseus-survival-plot survival.json AcknowledgementsPowered by https://github.com/erikbern/git-of-theseus |
stunning and impressive! @drew2a Can you possibly update the above figures by adding also the cardinal py-IPv8 dependancy in together with the Tribler code count and code authorship records (Tribler ➕ IPv8)? Something magical got born in 2017 😁 |
Merged Tribler and IPv8To create plots with merged data, I utilized this branch erikbern/git-of-theseus#70 git-of-theseus-stack-plot tribler/authors.json ipv8/authors.json git-of-theseus-stack-plot tribler/authors.json ipv8/authors.json --normalize git-of-theseus-stack-plot tribler/cohorts.json ipv8/cohorts.json git-of-theseus-stack-plot tribler/survival.json ipv8/survival.json --normalize |
Another question that piqued my curiosity was how the count of Tribler's open bugs changes over time. The scripts: https://gist.github.com/drew2a/3eec7389359a57737b06c1991bf2c2a3 |
Visualization for the open issues (last 60 days): How to use:
An example for the Tribler repo: https://github.com/Tribler/tribler |
Driven by curiosity about how the number of continuous contributors changes over time for Tribler, I started with a visualization of Tribler contributors over the last year, using a 7-day window and 1-day granularity: The "window" refers to the maximum allowed gap between consecutive commits to be considered as part of the same activity period. In this case, a 7-day window means that if the gap between two commits is less than or equal to 7 days, they are considered part of a continuous contribution period. "Granularity" refers to the minimum length of time that a contribution period must be to be considered. Here, a 1-day granularity means that any period shorter than 1 day is extended to 1 day. Then I got a visualization of Tribler contributors over the last 5 years, using a 30-day window and 14 days of granularity: The same plot but filtered by contributors who contributed at least two days in total: Here, the "at least two days in total" filter means that only contributors who have made commits on two or more separate days throughout the entire period are included. Last 10 years, all contributors that contributed at least 2 days, plotted using a 60-day window and 14-day granularity: Contributors from all Tribler history that contributed at least 2 days, plotted using a 90-day window and 30-day granularity: Contributors from all Tribler history that contributed at least 90 days, plotted using a 90-day window and 30-day granularity: In the last plot, the filter is applied to include only those contributors who have a total of at least 90 days of contributions throughout the entire history of Tribler. This filter, combined with a 90-day window and 30-day granularity, provides a long-term perspective on contributor engagement. The 90-day window means that consecutive commits within this period are considered as continuous contributions, while the 30-day granularity extends shorter contribution periods to 30 days, ensuring that each period reflects a significant amount of activity. These visualizations provide valuable insights into the dynamics of the Tribler project's contributor base, highlighting both short-term and long-term contribution patterns. By adjusting the window and granularity parameters, as well as the contribution duration filter, we can observe different aspects of contributor engagement and project activity over time. The script: https://gist.github.com/drew2a/b05141a13c8d0c85c041714bba44b2d3#file-plot_number_of_contributors-py Using the obtained data, it's straightforward to plot the number of contributors over time: This graph shows the fluctuation in the number of active contributors at any given time. The number of contributors is calculated based on their continuous activity periods, considering the window and granularity settings used in the analysis. To smooth out the variations and make the plot less jumpy, we can increase the window to a longer duration, such as half a year: By extending the window, we're considering a longer period of inactivity as continuous contribution, which smoothens the curve. It shows us a more averaged view of the contributor engagement over time, reducing the impact of short-term fluctuations. This approach can be particularly useful for identifying long-term trends in contributor engagement. The script: https://gist.github.com/drew2a/b05141a13c8d0c85c041714bba44b2d3#file-plot_number_of_contributors-py |
I'm working with Tribler during my last days, and then I'm moving to another project outside of academia. I have been job hunting in the Netherlands for about two months, sending around 50 CVs and cover letters. Three of them led to interviews, but the others were either ghosted or resulted in rejections. It was quite a challenging time, but thankfully, I had a lot of experience changing jobs, so I was prepared for this period. Starting in September, I will be working with Airborn. It's a private company with closed-source projects, so unfortunately, I won't be able to share much (if anything) about my new job. I appreciate the freedom that @synctext gave me during my time with Tribler and its open-source DNA. It is a truly unique project with unique organizational principles that go above and beyond the beaten tracks of both product companies and academia. Despite the freedom, working on this project is not easy as it requires a developer to learn a new environment and adapt to it without the luxury of having information from the field's pioneers. From the very beginning, I focused on the engineering part of the project rather than the scientific part, as I thought engineers were a more unique resource for Tribler, which it had lacked in the past. Despite that, I contributed to some scientific work as well, working with @devos50 on a distributed knowledge graph: Another scientific project I worked on involved tackling a long-standing problem with content grouping: Working on content grouping was particularly interesting to me for two reasons. First, I did it solo with @synctext supervising. Second, at the very beginning, I didn't believe it was possible to solve the problem, but after the initial experiments, I saw a path forward and followed it until the task was completed. Kudos to @synctext for his intuition. I'm going to post a more detailed wrap-up dedicated to the project in the issue: In this issue, I'm going to publish the accumulated visualized data regarding Tribler's history that I have been posting here for the last two years. This historical research started as a necessity for me to understand Tribler and its codebase, and then it became driven by my curiosity, which I see as the purest possible scientific motivation. Time will tell how the knowledge we've gained will help the next generation. For now, I'm satisfied with my short scientific journey, even though it wasn't canonical. |
Migrating My Git Scripts to a New Repository: A Challenge with AII've started gradually moving all the scripts I've shared in this issue into a separate repository: Git Insights. The goal is to make these scripts publicly accessible and easy to use. The Challenge:I want the entire repository—including the README, documentation, and all related files—to be generated by AI. The original scripts were already generated by ChatGPT; I only came up with the ideas. Now, I'm using Aider to take this approach even further. On the Job HuntI’m currently looking for new opportunities and would greatly appreciate any support. Feel free to connect with me on LinkedIn: https://www.linkedin.com/in/drew2a/. |
Today, I added the second script for identifying and visualizing branches on GitHub to my repository. Although the task sounds simple, it doesn’t have readily available solutions online and is inherently complex due to the specifics of Git. I spent about an hour refining the script, writing documentation, and making it configurable. This took 33 commits and cost me $0.72 in OpenAI API calls with ChatGPT 4o. If I had done all of this manually, it would have taken me at least a full 8-hour workday, if not more. Here’s the same data, translated into Product Manager speak (using the average base hourly rate for developers in the Netherlands).
The savings per day for one developer will amount to €136 - €18.72 = €117.28. According to my calculations, with the money saved, we could buy one apartment per year for each developer. If I were running for government, this would be a great program to help solve the housing crisis. As an image for the post, I will attach a visualization of branches from a popular repository https://github.com/arvidn/libtorrent. python3 calculate_branch_age.py --repo_path ../../arvidn/libtorrent --main_branch master --min_age 100 If you want me to make your hidden data visible—hire me. |
Visualizing the Life Cycle of Issues: A New Script Added to My Experimental RepoToday, I uploaded the third script to my experimental repository—a tool that visualizes the number of open issues throughout the entire lifespan of a repository: https://github.com/drew2a/git-insights/tree/main?tab=readme-ov-file#plot_open_issuespy This approach to coding feels like a modern twist on good old pair programming, where I take on the role of the observer, and the AI serves as the driver. It’s incredibly convenient! For this post, I've attached a visualization of bugs from my previous project—Tribler. Interestingly, the last major drop in open issues coincides with the end of my generation of developers in the lab. Coincidence? One week into my new job searchThree rejections and seven applications sent. This time, I’m being much more selective about my next employer. Stay connected! |
Short updates. Most of them work-related.
The text was updated successfully, but these errors were encountered: