Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update synthDataGen, provide fix for MSFT codeGen #451

Merged
merged 3 commits into from
Nov 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/Spark/gems/img/synth_7_seq_or_foreign.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/Spark/gems/machine-learning/ml-openai.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ tags:

The OpenAI Gem allows the Prophecy user to interact with the OpenAI API using two different requests:

1. Compute text embeddings ([link](/docs/Spark/gems/machine-learning/ml-openai.md#compute-text-embeddings)).
2. Answer a question, where the user has the option to provide context ([link](/docs/Spark/gems/machine-learning/ml-openai.md#answer-a-question-with-a-given-context)).
1. Compute text embeddings
2. Answer a question, where the user has the option to provide context

Follow along to learn how to interact with the OpenAI API using Prophecy's easy-to-use interface. For an example set of Pipelines that use these Gems to create a Generative AI Chatbot, see this [guide.](/docs/getting-started/genaichatbot.md)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Generate synthetic data with this special kind of Source Gem.

Generating mock data is crucial when building data Pipelines to simulate real-world scenarios for testing, validating, and optimizing Pipeline performance before using actual production data. It helps ensure the Pipeline handles various data formats, structures, and edge cases effectively, minimizing potential issues in a live environment.

A wide range of synthetic data can be created using any column name and an array of data types. For example, generate browser history to track fictitious devices and the details on when that device visits a particular website with a particular click count and frequency.
A wide range of synthetic data can be created using any column name and an array of data types. For example, generate browser history data as shown below.

![img](../../../img/synth_0_datasample.png)

Expand Down Expand Up @@ -56,6 +56,9 @@ What type of data do you need to generate? Specify the data structure using Rand

![img](../../../img/synth_3_properties.png)

Generate column using a sequence of integers (left). Generate another column by referencing an existing catalog table (right). Randomly select elements of the foreign key from that table.
![img](../../../img/synth_7_seq_or_foreign.png)

### Infer the Schema

Changes to the columns in the Properties tab are incorporated by inferring the schema in the Schema tab.
Expand Down
Binary file added docs/concepts/img/team_metadata.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions docs/concepts/teamuser.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,14 @@ Teams represent a group of users who work together.
Teams, User, and Git [Settings](https://app.prophecy.io/metadata/settings) are accessed by clicking the `Settings` icon at the bottom left of the menu bar. The following image shows the page and the available functionality.

![Team Page](./img/team_page.png)

## Team Metadata

Manage the entities within a team by accessing the team's metadata page. Click **(1) Metadata**, **(2) Teams**, and select the **(3) team of interest**. Now you can see all the metadata for that team - including Info, which Projects, Pipelines, Jobs, etc are owned by that team. Also, team admins can manage **(4) Settings** for the team.
![Team metadata](./img/team_metadata.png)

**[Execution Metrics](/docs/Spark/execution/execution-metrics.md)** - collect metrics and data samples for each execution.

**Code Generation** - enable multi-file code generation in the case of code payload size limitations.

**Advanced** - update the artifactid, generative AI settings, etc for the team's projects.