-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement SparkSession.Catalog #231
Conversation
src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/FunctionsTests.cs
Outdated
Show resolved
Hide resolved
src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/FunctionsTests.cs
Outdated
Show resolved
Hide resolved
using System.Collections.Generic; | ||
using System.IO; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these using
s still needed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new test uses Path.Combine to build a path, I know in other places the code just manually creates a path but I think Path.Combine is better (I was going to do another pr converting the manually built strings to Path.Combine)
Co-Authored-By: Steve Suh <[email protected]>
Co-Authored-By: Steve Suh <[email protected]>
src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/FunctionsTests.cs
Outdated
Show resolved
Hide resolved
src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/FunctionsTests.cs
Outdated
Show resolved
Hide resolved
Assert.IsType<DataFrame>(catalog.ListFunctions()); | ||
Assert.IsType<DataFrame>(catalog.ListFunctions("default")); | ||
|
||
var table = catalog.CreateTable("users", Path.Combine(TestEnvironment.ResourceDirectory, "users.parquet")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surround your test and put something like the following at the beginning of CatalogFunctions()
using (var tempDir = new TemporaryDirectory())
{
...
}
Then you should be able to do something like
DataFrame table = catalog.CreateTable("users", Path.Combine(tempDir.Path, "users.parquet"));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/FunctionsTests.cs
Outdated
Show resolved
Hide resolved
src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/FunctionsTests.cs
Outdated
Show resolved
Hide resolved
src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/FunctionsTests.cs
Outdated
Show resolved
Hide resolved
Co-Authored-By: Steve Suh <[email protected]>
Just to clarify, is the guidance to copy from the comments from spark docs or describe them in more detail? In some feedback here I have been asked to copy from the spark codebase and some to add more detail. I don't mind which but just need some guidance! "copy from spark": #231 (comment) Example "tableExists":
I have added a return type and can add something like "bool - true if the table exists, false if it doesn't" - the meaning of the return should be obvious?? Looking throughthe codebase some places are more descriptive and some less:
Let me know and i'll fix them all up :) |
Thanks for bringing this up @GoEddie. Looks like there was an inconsistency in the review process. What I typically do about comments when new APIs are introduced is to start from the Spark comments and update them as fit (not all their comments are great either. :)). I understand that there are already inconsistencies in comments in our codebase. Since it's hard to go back and fix all the time, we try to make the bar a little higher on the new PRs. If you think review comments are not reasonable, please include me in the discussion so I can resolve them. BTW, We also publish API references based on the comments: https://docs.microsoft.com/en-us/dotnet/api/microsoft.spark.sql.dataframe?view=spark-dotnet. Thanks for your contribution @GoEddie! |
Thanks @imback82 i'll make sure to start with the spark comments and add to them if they need it. |
I think that is all the comments dealt with - if your not happy or want changes let me know :) |
Thanks @GoEddie, I will review this soon. |
Great let me know if you want any changes :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have few minor comments. Great work!
src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/FunctionsTests.cs
Outdated
Show resolved
Hide resolved
src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/FunctionsTests.cs
Outdated
Show resolved
Hide resolved
Assert.IsType<DataFrame>(catalog.ListFunctions()); | ||
Assert.IsType<DataFrame>(catalog.ListFunctions("default")); | ||
|
||
var table = catalog.CreateTable("users", Path.Combine(TestEnvironment.ResourceDirectory, "users.parquet")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @imback82 for reviewing - I think that is everything - let me know if there are any other changes :) For this one #231 (comment) the suggestion was when I was having trouble setting the path to spark-warehouse - this is reading the users.parquet file from the resources directory so there aren't any resources to clean up here. |
@@ -568,7 +571,7 @@ public void TestSignaturesV2_3_X() | |||
Assert.IsType<Column>(JsonTuple(col, "a")); | |||
Assert.IsType<Column>(JsonTuple(col, "a", "b")); | |||
|
|||
var options = new Dictionary<string, string>() { { "hello", "world" } }; | |||
var options = new Dictionary<string, string>() {{"hello", "world"}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know why this formatting changed? I think the existing formatting is fine. (Ctrl + K, D gives me the existing formatting). Could you double check the latest commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think resharper got a bit excited :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is for #226
We are excited to review your PR.
So we can do the best job, please check:
Fixes #nnnn
in your description to cause GitHub to automatically close the issue(s) when your PR is merged.Implementing SparkSession.Catalog, this includes most of the catalog functions except any deprecated ones and some of the experimental functions.