Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposed Column.Apply() API #323

Merged
merged 8 commits into from
Nov 3, 2019

Conversation

Niharikadutta
Copy link
Collaborator

We are excited to review your PR.

So we can do the best job, please check:

  • There's a descriptive title that will make sense to other developers some time from now.
  • There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • You have included any necessary tests in the same PR.

This PR addresses feature request #314 and fixes issue #201

/// </summary>
/// <param name="colObject">Column object to apply</param>
/// <returns>Column object</returns>
public Column Apply(object colObject)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only going to be exposed for Column type or anything like how scala has it defined

  /**
   * Extracts a value or values from a complex type.
   * The following types of extraction are supported:
   * <ul>
   * <li>Given an Array, an integer ordinal can be used to retrieve a single value.</li>
   * <li>Given a Map, a key of the correct type can be used to retrieve an individual value.</li>
   * <li>Given a Struct, a string fieldName can be used to extract that field.</li>
   * <li>Given an Array of Structs, a string fieldName can be used to extract filed
   *    of every struct in that array, and return an Array of fields.</li>
   * </ul>
   * @group expr_ops
   * @since 1.4.0
   */
  def apply(extraction: Any): Column = withExpr {
    UnresolvedExtractValue(expr, lit(extraction).expr)
  }

Copy link
Collaborator Author

@Niharikadutta Niharikadutta Oct 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any which is why it is an object, I will change the parameter name to avoid confusion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The summary description should be updated to be somewhat similar to scala's.

/// <returns>Column object</returns>
public Column Apply(object colObject)
{
return Apply("apply", colObject);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a unit test

/// </summary>
/// <param name="colObject">Column object to apply</param>
/// <returns>Column object</returns>
public Column Apply(object colObject)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're revealing this public method Apply() then I think we should rename the private method Apply() which has totally different semantics.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions on what to call the private Apply function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about ApplyMethod

@suhsteve
Copy link
Member

@imback82 it looks like python's column.getItem() eventually calls the "apply" method. I'm wondering if we should mimic that behavior with C#'s Column.GetItem()

@imback82
Copy link
Contributor

Actually, this is a bug in Spark. I am fixing it: https://issues.apache.org/jira/browse/SPARK-29664


/// <summary>
/// Extracts a value or values from a complex type.
///The following types of extraction are supported:
Copy link
Member

@suhsteve suhsteve Nov 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
///The following types of extraction are supported:
/// The following types of extraction are supported:

Same for the next 5 lines


col = col1.Name("alias");
//col.Explain(true); -> Do I want to do this on col1 or col2? Or remove this line.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//col.Explain(true); -> Do I want to do this on col1 or col2? Or remove this line.
col1.Explain(true);

/// </summary>
/// <param name="Obj">object to apply</param>
/// <returns>Column object</returns>
public Column Apply(object Obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything more meaningful than Obj ? Same comment regarding the param description

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change the paramname and description to something more meaningful

/// </summary>
/// <param name="mappingObject">Object to use to map the values on returning Column object </param>
/// <returns>Column object</returns>
public Column Apply(object mappingObject)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use extraction. Update description as well.

/// of every struct in that array, and return an Array of fields.
///
/// </summary>
/// <param name="mappingObject">Object to use to map the values on returning Column object </param>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// <param name="mappingObject">Object to use to map the values on returning Column object </param>
/// <param name="mappingObject">Object to use to map the values on returning Column object</param>

Copy link
Member

@suhsteve suhsteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, merging to master.

@imback82 imback82 merged commit 30d6c37 into dotnet:master Nov 3, 2019
@imback82
Copy link
Contributor

imback82 commented Nov 3, 2019

Fixes #201

@imback82
Copy link
Contributor

imback82 commented Nov 3, 2019

Spark side change is also merged: apache/spark#26351

@Niharikadutta Niharikadutta deleted the nidutta/ExposeColumnApplyAPI branch November 4, 2019 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants