Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create 2023-12-13-GLUE-lab.md #81

Merged
merged 1 commit into from
Dec 13, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 114 additions & 0 deletions 2023-12-13-GLUE-lab.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
---
title: How the GLUE Lab is bringing the potential of HTC to track the movement of cattle and land use change

author: Sarah Matysiak

publish_on:
- <htcondor
- path
- osg
- chtc

type: user
canonical_url: https://chtc.cs.wisc.edu/GLUE-lab.html


image:
path: https://raw.githubusercontent.com/CHTC/Articles/main/images/mattchristie.png
alt: Text Description of image

excerpt: Researching land use change in the cattle sector is just one of several large projects where the GLUE Lab is working to apply HTC.

---


<figure class="figure float-end" style="margin-center: 1em">
<img src='https://raw.githubusercontent.com/CHTC/Articles/main/images/cattle.png' height="300" width="2400" class="figure-img img-fluid rounded" alt="Cattle grazing grass on the Cerrado in rural Mato Grosso, Brazil.">
<figcaption class="figure-caption">Cattle grazing grass on the Cerrado in rural Mato Grosso, Brazil.
<br/></figcaption>
</figure>




It was during a [Data Science Research Bazaar](https://datascience.wisc.edu/2023/10/12/share-your-work-at-the-2024-research-bazaar/) presentation led by OSG Research Facilitation
Lead Christina Koch in early 2023 when [Matthew Christie](https://gibbs-lab.wisc.edu/matt-christie.html), the Technical Lead of the [Global Land Use and Environment Lab](https://gibbs-lab.wisc.edu/)
based in Madison, Wisconsin, says the GLUE Lab became more familiar with the [Center for High Throughput Computing (CHTC)](https://chtc.github.io/). “That planted the seed for what
the center [CHTC] could offer us,” Christie says.

<figure class="figure float-end" style="margin-center: 1em">
<img src='https://raw.githubusercontent.com/CHTC/Articles/main/images/mattchristie.png' height="200" width="200" class="figure-img img-fluid rounded" alt="GLUE Lab technical lead Matthew Christie.">
<figcaption class="figure-caption">GLUE Lab technical lead Matthew Christie.
<br/></figcaption>
</figure>





The GLUE Lab studies how land across the world is being used for agriculture and the systems responsible for land use change. Christie — who researches land use in Brazil with a
focus on how the Amazon and Cerrado biomes are changing as natural vegetation recedes — takes data describing the cattle supply chain in Brazil and integrates it into a single
database the GLUE Lab can use for research. With this data, the lab also aims to inform policy decisions by the Brazilian government and international companies.



In the Amazon, Christie says, one of the main systems causing land use change is in the cattle sector, or the production of cattle. “One of the motivating facts of our research
is that 80% of forest cleared in the Amazon is cleared in order to raise cattle. And so we're interested in understanding the cattle supply chain, how it operates, and what it
looks like.” The lab gets its data from the Rural Environmental Registry (CAR), which is a public property boundary registry data from Brazil, and the Guide to Animal Transport
(GTA), which records animal movement and sales in Brazil.



The possibilities of utilizing HTC for their research intrigued Christie, who had some awareness of it from the research bazaar and had even started refactoring some of the lab’s
data pipeline before attending, but he wanted to learn more besides what he gained from watching introductory tutorials. Christie was accepted and attended the [OSG School](https://osg-htc.org/user-school-2023/)
in the summer of 2023. He and other lab members believed their work could benefit from the school training with [HTCondor, the workload management application developed by the CHTC for HTC, and the associated big data sets](https://htcondor.org/)
with a large number of jobs.



Upon realizing the lab’s work could greatly benefit from the OSG School, Christie used a “test case” project that resembled a standard research project to model a task with many
independent trials, finding how — for the first time — high throughput computing (HTC) could prove itself resourceful for GLUE Lab research. The specific project Christie worked
on during the school using HTC was to compute simulated journeys of cows through properties in Brazil's cattle supply chain. By the end of the week-long school, Christie says
using HTC scaled up the modeling project by a factor of 10. In this sense, HTC is the “grease that makes our research run more smoothly.”



Since attending the school, witnessing the test case’s success with HTC, and discovering ways its other research projects could benefit, the GLUE Lab has begun shifting to applying
HTC. However, this process requires pipeline changes lab members are currently working through. “We have been in the process of working through some of our big projects that we
think really could benefit from these resources, but that in itself has a cost. Currently, we’re still in the process of writing or refactoring our pipelines to use HTC,” Christie
elaborates.



For a current project, Christie mentions he and other GLUE Lab members are looking at how to adapt their code to HTC without having to rewrite all of it. With the parallelism that
HTC offers compared to the single computing environment the lab used before to run its data pipeline, each job now has its own environment. But it’s complex “leveraging the
parallelism in our database build pipeline. Working on that is an exercise, but with handling data, there are many dependencies, and you have to figure out how to model them.”
Christie says lab members are working on adjusting the workflow to ensure each job has the data it needs before it can run. While this can sometimes be straightforward,
“sometimes a step in the pipeline has special inputs that are unique to it. With many steps in the pipeline, properly tracking and preparing all this data has been the main source
of work to get the pipeline to run fully using HTC.”



For now, Christie says cutting down the two-day run time of their database build pipeline to just a matter of hours with HTC “would be a wonderful improvement that would accelerate
deployment and testing of this database. It would let us introduce new features and catch bugs faster.”


<figure class="figure float-end" style="margin-center: 1em">
<img src='https://raw.githubusercontent.com/CHTC/Articles/main/images/deforestation.png' height="300" width="2400" class="figure-img img-fluid rounded" alt="Smoke rising over recently burned pastures in Alto Boa Vista, Brazil.">
<figcaption class="figure-caption">Smoke rising over recently burned pastures in Alto Boa Vista, Brazil.
<br/></figcaption>
</figure>




Christie recognizes the strength of the CHTC comes from not only its limitless computation power but also the humans who are running it behind the screen and that it’s free for
researchers at UW–Madison, distinguishing it from other platforms and drastically lowering the entry barrier for researchers who want to scale up their research projects —
“Instead of waiting months or years to receive funding for cloud resources, they can [request an account](https://uwmadison.co1.qualtrics.com/jfe/form/SV_8f6nTgaaVhefdmS) and get
started in a matter of weeks,” Christie says.



Christie values the unique opportunity to attend office hours and meet with facilitators, which makes his experience special. “I would definitely recommend that people look at this
invaluable resource that we have on campus. Whether your work is with high throughput or high performance computing, there are offerings for both that researchers should consider,"
Christie says.