Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ballista integration tests are failing #623

Closed
andygrove opened this issue Jun 26, 2021 · 6 comments · Fixed by #629
Closed

Ballista integration tests are failing #623

andygrove opened this issue Jun 26, 2021 · 6 comments · Fixed by #629
Labels
bug Something isn't working

Comments

@andygrove
Copy link
Member

Describe the bug
When running ./dev/integration-tests.sh I see:

Running benchmarks with the following options: BallistaBenchmarkOpt { query: 1, debug: true, iterations: 1, batch_size: 8192, path: "/data", file_format: "tbl", mem_table: false, partitions: 8, host: Some("ballista-scheduler"), port: Some(50050) }
Running benchmark with query 1:
 select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    lineitem
where
        l_shipdate <= date '1998-09-02'
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus;
[2021-06-26T13:53:07Z INFO  ballista::context] Connecting to Ballista scheduler at http://ballista-scheduler:50050
Error: Plan("Execution(\"Status { code: Internal, message: \\\"Could not parse logical plan protobuf: DataFusion error: Plan(\\\\\\\"No field named 'lineitem.l_returnflag'\\\\\\\")\\\", metadata: MetadataMap { headers: {\\\"content-type\\\": \\\"application/grpc\\\", \\\"date\\\": \\\"Sat, 26 Jun 2021 13:53:07 GMT\\\"} } }\")")

To Reproduce

./dev/integration-tests.sh

Expected behavior
Queries should run without error.

Additional context
None

@andygrove andygrove added bug Something isn't working ballista labels Jun 26, 2021
@andygrove
Copy link
Member Author

After improving the error message, this seems to be related to the recent support for qualified field names.

Error: Plan("Execution(\"Status { code: Internal, message: \\\"Could not parse logical plan protobuf: DataFusion error: Plan(\\\\\\\"No field named 'lineitem.l_returnflag'. Valid fields are l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate.\\\\\\\")\\\", metadata: MetadataMap { headers: {\\\"content-type\\\": \\\"application/grpc\\\", \\\"date\\\": \\\"Sat, 26 Jun 2021 14:40:30 GMT\\\"} } }\")")

@houqp
Copy link
Member

houqp commented Jun 26, 2021

I am taking a look into this too. Sorry that I forgot to run integration tests from my local machine. Should we add integration test run in github action as well?

@andygrove
Copy link
Member Author

It's my bad for not getting these test running in CI. We do have an issue open for that. apache/datafusion-ballista#24

@houqp
Copy link
Member

houqp commented Jun 26, 2021

I can take a stab at adding integration test to CI after this is fixed if no one gets to it :)

After some initial investigation, I think this bug might have something to do with us not loading table_name attribute in logical_plan/from_proto.rs's LogicalPlanType::CsvScan(scan) => { match branch.

@houqp
Copy link
Member

houqp commented Jun 26, 2021

With #629, I am able to get the integration tests to pass for 1, 3, 5, 6. Looking at a different error with query 7 at the moment:

Running benchmark with query 7:                                                                               
 select                                                                                                                                                                                                                     
    supp_nation,                                                                                                                                                                                                            
    cust_nation,                                                                                              
    l_year,                                                                                                   
    sum(volume) as revenue                                                                                    
from                                                                                                          
    (                                                                                                         
        select                                                                                                
            n1.n_name as supp_nation,                                                                                                                                                                                       
            n2.n_name as cust_nation,                                                                                                                                                                                       
            extract(year from l_shipdate) as l_year,                                                                                                                                                                        
            l_extendedprice * (1 - l_discount) as volume                                                                                                                                                                    
        from                                                                                                                                                                                                                
            supplier,                                                                                         
            lineitem,                                                                                                                                                                                                       
            orders,                                                                                           
            customer,                                                                                         
            nation n1,                                                                                        
            nation n2                                                                                         
        where                                                                                                 
                s_suppkey = l_suppkey                                                                         
          and o_orderkey = l_orderkey                                                                         
          and c_custkey = o_custkey                                                                           
          and s_nationkey = n1.n_nationkey                                                                    
          and c_nationkey = n2.n_nationkey                                                                    
          and (                                                                                               
                (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')                                                                                                                                                            
                or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')                                           
            )                                                                                                 
          and l_shipdate between date '1995-01-01' and date '1996-12-31'                                      
    ) as shipping                                                                                             
group by                                                                                                      
    supp_nation,                                                                                              
    cust_nation,                                                                                              
    l_year                                                                                                    
order by                                                                                                                                                                                                                    
    supp_nation,                                                                                              
    cust_nation,                                                                                              
    l_year;                                                                                                                                                                                                                 
[2021-06-26T20:16:57Z INFO  ballista::context] Connecting to Ballista scheduler at http://ballista-scheduler:50050                                                                                                          
Error: Plan("Execution(\"General(\\\"logical_plan::to_proto() unsupported scalar function DatePart\\\")\")")  

@houqp
Copy link
Member

houqp commented Jun 26, 2021

OK, I also disabled 7, 8, 9 in ballista integration test in #623. They were enabled in #55, but turns out other than adding qualified column support, there are more ser/de work needed (e.g. enable cross join serde). Most of those errors look straight forward to fix. I will send fixes as separate PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants