The configuration of data searching and downloading directories is not linked #142

bhaddow · 2024-01-08T22:00:55Z

If you set DATA_PATH to something other than the default, then you can download data successfully, but it does not show up on the data listing page. This is because the download directory is hard-coded, and also it is relative to wherever you run the server from.

Also, DATA_PATH is actually a glob. Setting it to a directory will result in no data being found, but this is hard to debug.

The text was updated successfully, but these errors were encountered:

jelmervdl · 2024-01-12T19:59:17Z

It probably shouldn't be a glob, I thought that flexibility would come in useful but it just makes things complicated.

The pattern of the glob is not even free to choose. datasets.py specifically looks for files matching $NAME.$LANG.gz so there have to be at least two dots in the filename for it to not cause issues:

OpusCleaner/opuscleaner/datasets.py

Lines 26 to 31 in 8d5c4a2

    
           datasets = [ 
        
               (name, list(files)) 
        
               for name, files in groupby( 
        
                   sorted(files, key=lambda entry: str(entry)), 
        
                   key=lambda entry: str(entry.relative_to(root)).rsplit('.', 2)[0]) 
        
           ]

Lol this was a todo all along:

OpusCleaner/opuscleaner/config.py

Lines 17 to 19 in 8d5c4a2

    
           # TODO: Derive this from DATA_PATH. The `train-parts` is a mtdata compatibility 
        
           # thing. I'm now used to also have a data/clean directory there, so keeping it. 
        
           DOWNLOAD_PATH = 'data/train-parts'

jelmervdl added the bug Something isn't working label Jan 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The configuration of data searching and downloading directories is not linked #142

The configuration of data searching and downloading directories is not linked #142

bhaddow commented Jan 8, 2024

jelmervdl commented Jan 12, 2024

The configuration of data searching and downloading directories is not linked #142

The configuration of data searching and downloading directories is not linked #142

Comments

bhaddow commented Jan 8, 2024

jelmervdl commented Jan 12, 2024