-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add huggingface extension #261
Add huggingface extension #261
Conversation
@Xuanwo this is as far as i got today - im at the point where i need to now figure out how to map huggingface to object store semantics. I tried a quick create table statement and ended up with this error. the path looks okay to me but im not that familiar with huggingface and didnt get to look much into this. I'll pick back up on this tomorrow but if you have any insight would it would be very helpful |
src/extensions/huggingface.rs
Outdated
hf_builder = hf_builder.root(root); | ||
}; | ||
if let Some(token) = &huggingface_config.token { | ||
hf_builder = hf_builder.repo_id(token); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is wrong.
Hi, given the error output in your posted image, I assume we are trying to access an incorrect path. OpenDAL manages all paths internally, so we only need to provide the path relative to the repository root instead of trying to build the url. I built a real example with the repo you are using: use std::sync::Arc;
use opendal::Operator;
use opendal::Result;
use anyhow::Result;
use opendal::services::Huggingface;
use opendal::Operator;
#[tokio::main]
async fn main() -> Result<()> {
// Create Huggingface backend builder
let mut builder = Huggingface::default()
// set the type of Huggingface repository
.repo_type("dataset")
// set the id of Huggingface repository
.repo_id("HuggingFaceTB/finemath")
// set the revision of Huggingface repository
.revision("main")
// set the root for Huggingface, all operations will happen under this root
.root("/");
let op: Operator = Operator::new(builder)?.finish();
let entries = op.list("/").await?;
println!("{:?}", entries.iter().map(|v| v.path()).collect::<Vec<_>>());
let meta = op
.stat("finemath-3plus/train-00000-of-00128.parquet")
.await?;
println!("{:?}", meta);
Ok(())
} The output will be: ["assets/", "finemath-3plus/", "finemath-4plus/", "infiwebmath-3plus/", "infiwebmath-4plus/", ".gitattributes", "README.md"]
Metadata { mode: FILE, is_current: None, is_deleted: false, cache_control: None, content_disposition: None, content_length: Some(507607173), content_md5: None, content_range: None, content_type: Some("application/json; charset=utf-8"), content_encoding: None, etag: Some("W/\"23e-Dio8lpah4iHFyrzOC8sgQMZGg8E\""), last_modified: Some(2024-12-19T09:49:58Z), version: None, user_metadata: None } I hope this example effectively demonstrates how to properly configure the huggingface service here. |
src/extensions/huggingface.rs
Outdated
// I'm not that famliar with Huggingface so I'm not sure what permutations of config | ||
// values are supposed to work. | ||
|
||
let mut base_url = String::from("https://huggingface.co/"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing we can register url as hf://datasets/<repo_id>/
. And visit the file in the way hf://datasets/HuggingFaceTB/finemath/finemath-3plus/train-00000-of-00128.parquet
.
There may be some tricks things on the url handling inside datafusion and oebjct_store.
@Xuanwo thanks much for the feedback and apologies for the delay getting back - currently on vacation and not online as much. For your information I ended up creating a separate repo to test out opendal with datafusion to get minimal working example independent from the context of dft. Once I have it working there I'll finish this branch. |
Hi, we have a great example for this: https://github.com/apache/opendal/blob/main/integrations/object_store/examples/datafusion.rs |
@Xuanwo got it working :) thanks for your help |
Nice! |
No description provided.