-
Notifications
You must be signed in to change notification settings - Fork 132
Migrate dataset reader code from (scala) DeepQA Experiments to DeepQA #328
Comments
Yes, as you can see in the README, the data processing code is currently in the scala library. That is for historical reasons. When we write new data processing code, it will almost certainly be in the python library. However, it's not very high priority for us to migrate the data processing code, because we already have all of the data processed, we know how to use the scala library easily enough, and we have a lot of other things on our plate. This is a great place where contributions would be much appreciated. For anyone who wants to contribute to this, it's as simple as taking a (scala) If you just want to use the DeepQA Experiments library to get the data for you, the easiest way to do so is probably like this (steps shown for SQuAD, but are similar for other datasets):
And you can repeat that last step for the dev set, or for any other dataset you want to process. |
@matt-gardner I'll try these steps. Thanks. |
@matt-gardner when I do sbt console, it complains: [warn] :::::::::::::::::::::::::::::::::::::::::::::: So Is there some missing? thanks. |
Oh, yeah, sorry about that. I forgot about that dependency. I just removed it, so it should work now. Can you update your repo and try again? |
@matt-gardner after update, the dependency solved, but another issue is:
I'm not familiar to these errors. The systems need linux or os? windows not ok? Thanks |
Yeah, I have no idea what's going on there. I think the only thing I had to install to get the protobuf stuff to work was this: This doesn't look like it's a windows issue to me, but even if we figure this out, I think the rest of the code has various places where Another thing to consider is that at this point, it probably is less work to translate the ~50 lines of scala code in the dataset reader into python than it is to figure out what's going on here. |
@matt-gardner So nice to you, I use python3.5 in my environment, resulting to this error. |
The util library is a dependency in the DeepQA Experiments library, and it's grabbed automatically when you run |
@matt-gardner can run it now. Thanks all the way. |
Firstly, Much thanks to this great project, which is what I would like to do; I'll continuously watch, use, and even contribute to this project.
But when I want to run some pipelines from scratch, but found that the data pre processing steps is in another project: https://github.com/allenai/deep_qa_experiments, the project's code is scala.
I think the preprocessing steps in another steps is complicated for someone wishing to start the stuff quickly.
The text was updated successfully, but these errors were encountered: