Skip to content

Commit

Permalink
Added all files for profiling and loadgen.
Browse files Browse the repository at this point in the history
Signed-off-by: L Lakshmanan <[email protected]>

Added yamls to loadgen.

Signed-off-by: L Lakshmanan <[email protected]>

Edited invocations.csv.

Signed-off-by: Lakshman <Lakshman@localhost>
  • Loading branch information
Lakshman authored and Lakshman committed Aug 17, 2024
1 parent 135a667 commit 45d97c1
Show file tree
Hide file tree
Showing 486 changed files with 104,430 additions and 6 deletions.
30 changes: 28 additions & 2 deletions tools/invoker/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,30 @@ func runExperiment(endpoints []*endpoint.Endpoint, runDuration int, targetRPS fl

Start(TimeseriesDBAddr, endpoints, workflowIDs)

// Calculation to find the right RPS
for _, ep := range endpoints {
if ep.Eventing {
invokeEventingFunction(ep)
} else {
invokeServingFunction(ep)
}
}
latSlice.Lock()
var maxLatency int64
maxLatency = 0
for _, value := range latSlice.slice {
if value > maxLatency {
maxLatency = value
}
}
latSlice.slice = []int64{}
latSlice.Unlock()
profSlice.Lock()
profSlice.slice = []int64{}
profSlice.Unlock()

// ACTUAL EXPERIMENT

timeout := time.After(time.Duration(runDuration) * time.Second)
d := time.Duration(1000000/targetRPS) * time.Microsecond
if d <= 0 {
Expand Down Expand Up @@ -196,11 +220,14 @@ func SayHello(address, workflowID string) {
log.Warnf("Failed to invoke %v, err=%v", address, err)
} else {
if *funcDurEnableFlag {
log.Debugf("Inside if\n")
words := strings.Fields(response.Message)
lastWord := words[len(words)-1]
duration, err := strconv.ParseInt(lastWord, 10, 64)
if err != nil {
log.Warnf("Failed to parse the duration from the response: %v", err)
}
if err == nil {
log.Debugf("Invoked %v. Response: %v\n", address, response.Message)
profSlice.Lock()
profSlice.slice = append(profSlice.slice, duration)
profSlice.Unlock()
Expand Down Expand Up @@ -290,7 +317,6 @@ func writeFunctionDurations(funcDurationOutputFile string) {
}

datawriter := bufio.NewWriter(file)

for _, dur := range profSlice.slice {
_, err := datawriter.WriteString(strconv.FormatInt(dur, 10) + "\n")
if err != nil {
Expand Down
251 changes: 251 additions & 0 deletions tools/load-generator/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
# Load Generator

The Load Generator tool can be used for analyzing the performance of serverless cluster deployments. It reconstructs the invocation traffic(load) based on a given trace using the functions in the vSwarm benchmark suite as its proxy. It reads the trace (containing memory utilization, duration of invocation & invocation timestamps of the trace functions) and generates a load consisting of timestamped invocations of the proxy functions closely mimicking the trace.

Each function in the trace is mapped to a function in the benchmark suite as its closest proxy (based on memory and duration), and the proxy functions are invoked. The tool utilizes the `profile.json` JSON output file generated by the [`profiler` tool](https://github.com/vhive-serverless/vSwarm/tree/load-generator/tools/profiler#profiler) to obtain the profile of the benchmark suite functions (for it to be used as a proxy). Once the load is generated (by default `load.json` file), the `invoker` can be used to run the generated load.

## Setup

The profiler utilizes the stock-only cluster setup as per [vHive quickstart guide](https://github.com/vhive-serverless/vhive/blob/main/docs/quickstart_guide.md) and [vHive developer guide](https://github.com/vhive-serverless/vHive/blob/main/docs/developers_guide.md).

### Python packages

```bash
pip3 install -r requirements.txt
```

## Generating load

```
python3 main.py loadgen -h
usage: main.py loadgen [-h] [-o path] [-t path] [-p path] [-c path] [-b path] [-m bool] [-u bool] [-i string]
[-d int] [-w int] [-dbg bool]
optional arguments:
-h, --help show this help message and exit
-o path, --output path
Output JSON file containing timestamps and endpoints of load. Default: load.json
-t path, --trace path
Directory in which durations, invocations, memory CSV files of trace are located. Default: trace
-p path, --profile path
JSON file containing profile details of the proxy functions. Default: profile.json
-c path, --config_proxy path
Contains details about proxy functions used for deploying. Default: config.json
-b path, --build path
Directory in which temporary build files are located. Default: build
-m bool, --minute bool
Trace contains information at minute/second granularity. True if minute granularity. Default: True
-u bool, --unique bool
Proxy-Trace functions mapping. Should it be unique?. Default: False
-i string, --iat_distribution string
IAT Distribution: equidistant, uniform or exponential. Default: uniform
-d int, --duration int
Experiment Duration. Default: 20
-dbg bool, --dbg bool
Show debug messages
```
### Command:

```bash
python3 main.py loadgen
```

The tool reads the trace information(memory, duration and invocation details) from the `trace/` directory (can be configured using `-t` or `--trace` flags). The `trace/` directory must contain `memory.csv`, `durations.csv` and `invocations.csv` files containing the respective trace information of the format mentioned in [*Azure Functions Dataset 2019*](https://github.com/Azure/AzurePublicDataset/blob/master/AzureFunctionsDataset2019.md)

#### Function Invocation Counts `invocations.csv` Schema

|Field|Description |
|--|--|
| HashOwner | unique id of the application owner|
| HashApp | unique id for application name|
| HashFunction | unique id for the function name within the app|
|Trigger | trigger for the function|
|x .. y | fields describing the number of invocations of the function per minute/second

If the `invocations.csv` file contains the number of invocations of functions per minute, the `-m`(or `--minute`) flag must be set to `true`. If it is per second, then the flag must be set to `false`.

#### Function Execution Duration `durations.csv` Schema

|Field|Description |
|--|--|
| HashOwner | unique id of the application owner |
| HashApp | unique id for application name |
| HashFunction | unique id for the function name within the app |
|Average | Average execution time (ms) across all invocations of the 24-period|
|Count | Number of executions used in computing the average|
|Minimum | Minimum execution time|
|Maximum | Maximum execution time|
|percentile_Average_0| Weighted 0th-percentile of the execution time *average*|
|percentile_Average_1| Weighted 1st-percentile of the execution time *average*|
|percentile_Average_25 | Weighted 25th-percentile of the execution time *average*|
|percentile_Average_50 | Weighted 50th-percentile of the execution time *average*|
|percentile_Average_75 | Weighted 75th-percentile of the execution time *average*|
|percentile_Average_99 | Weighted 99th-percentile of the execution time *average*|
|percentile_Average_100 | Weighted 100th-percentile of the execution time *average*|
Execution time is in milliseconds.

#### Function Memory Usage `memory.csv` Schema

|Field|Description |
|--|--|
| HashOwner | unique id of the application owner |
| HashApp | unique id for application name |
|SampleCount | Number of samples used for computing the average |
|AverageAllocatedMb | Average allocated memory across all SampleCount measurements|
|AverageAllocatedMb_pct1 | 1st percentile of the average allocated memory|
|AverageAllocatedMb_pct5 | 5th percentile of the average allocated memory|
|AverageAllocatedMb_pct25 | 25th percentile of the average allocated memory|
|AverageAllocatedMb_pct50 | 50th percentile of the average allocated memory|
|AverageAllocatedMb_pct75 | 75th percentile of the average allocated memory|
|AverageAllocatedMb_pct95 | 95th percentile of the average allocated memory|
|AverageAllocatedMb_pct99 | 99th percentile of the average allocated memory|
|AverageAllocatedMb_pct100 | 100th percentile of the average allocated memory|

*One can utilize the [`sampler`](https://github.com/vhive-serverless/invitro/tree/main/sampler) from [`invitro`](https://github.com/vhive-serverless/invitro/tree/main) to generate a sampled trace from the original Azure dataset trace. The sampled trace can be used as the input trace to this tool.*

For every function in the trace, the closest function in the benchmark suite is set as its proxy. (75-percentile memory and 75-percentile duration are considered to find the highest correlation). If the `-u` (or `--unique`) flag is set to true, the tool tries to find a one-to-one(injective) mapping between trace functions and proxy functions by modelling it as a *linear sum assignment* problem. If the number of trace functions is greater than the number of proxy functions, or if the mapping is not achieved, the injective constraint is removed, and the closest proxy function is obtained. Currently the tool utilizes only _Serving Functions_ that are _NOT Pipelined_ as proxy functions.

This mapping obviously requires the profiles of the benchmark functions for it to be used as a proxy. The tool utilizes the `profile.json` JSON output file generated by the [`profiler` tool](https://github.com/vhive-serverless/vSwarm/tree/load-generator/tools/profiler#profiler) to obtain the profile of the benchmark suite functions. The User can configure the path of the JSON file through the `-p` (or `--profile`) flag (by default, it is `profile.json`).

The tool finds the trace-proxy mapping and automatically deploys the proxies. The deployment of the proxies would require details about `predeployment-commands`, `postdeployment-commands` and `yaml-location` of the proxy functions in the JSON format. The tool obtains this information from the `config.json` file by default (The user can utilize the `-c` or `--config_proxy` flag to change this argument). More details about this format can be found in the [profiler tool README.md](https://github.com/vhive-serverless/vSwarm/tree/load-generator/tools/profiler#command). One can directly copy the file from the `vSwarm/tools/profiler/config.json` directory and use it here.

An example generated load file would look like
```json
[
{
"timestamp": [
1365552,
2418429,
5343091,
5548417,
9047101
],
"endpoint": [
"aes-nodejs-8vicg27vtbz07fi6iu0uu3d42ha7c0vl.default.192.168.1.240.sslip.io",
"aes-nodejs-8vicg27vtbz07fi6iu0uu3d42ha7c0vl.default.192.168.1.240.sslip.io",
"aes-nodejs-8vicg27vtbz07fi6iu0uu3d42ha7c0vl.default.192.168.1.240.sslip.io",
"aes-nodejs-8vicg27vtbz07fi6iu0uu3d42ha7c0vl.default.192.168.1.240.sslip.io",
"aes-nodejs-8vicg27vtbz07fi6iu0uu3d42ha7c0vl.default.192.168.1.240.sslip.io"
]
},
{
"timestamp": [
980005,
1836228,
2068303,
2145022,
2162754,
3101559,
3425828,
4719316,
4915234,
5536220,
5743669,
5819248,
5893692,
6423868,
6519094,
6713161,
7456063,
7492532,
8710957,
8736089
],
"endpoint": [
"currencyservice-i2lki1cie2nv762ki2o6ugaxeebr03qd.default.192.168.1.240.sslip.io",
"emailservice-2hab4uja57g7852i10xk8zx6joltcklu.default.192.168.1.240.sslip.io",
"currencyservice-i2lki1cie2nv762ki2o6ugaxeebr03qd.default.192.168.1.240.sslip.io",
"emailservice-2hab4uja57g7852i10xk8zx6joltcklu.default.192.168.1.240.sslip.io",
"emailservice-2hab4uja57g7852i10xk8zx6joltcklu.default.192.168.1.240.sslip.io",
"currencyservice-i2lki1cie2nv762ki2o6ugaxeebr03qd.default.192.168.1.240.sslip.io",
"emailservice-2hab4uja57g7852i10xk8zx6joltcklu.default.192.168.1.240.sslip.io",
"currencyservice-i2lki1cie2nv762ki2o6ugaxeebr03qd.default.192.168.1.240.sslip.io",
"gptj-python-q0j5cwv0iri9h1v0enba17hl46636ksx.default.192.168.1.240.sslip.io",
"gptj-python-q0j5cwv0iri9h1v0enba17hl46636ksx.default.192.168.1.240.sslip.io",
"aes-nodejs-8vicg27vtbz07fi6iu0uu3d42ha7c0vl.default.192.168.1.240.sslip.io",
"currencyservice-i2lki1cie2nv762ki2o6ugaxeebr03qd.default.192.168.1.240.sslip.io",
"emailservice-2hab4uja57g7852i10xk8zx6joltcklu.default.192.168.1.240.sslip.io",
"gptj-python-q0j5cwv0iri9h1v0enba17hl46636ksx.default.192.168.1.240.sslip.io",
"aes-nodejs-8vicg27vtbz07fi6iu0uu3d42ha7c0vl.default.192.168.1.240.sslip.io",
"aes-nodejs-8vicg27vtbz07fi6iu0uu3d42ha7c0vl.default.192.168.1.240.sslip.io",
"emailservice-2hab4uja57g7852i10xk8zx6joltcklu.default.192.168.1.240.sslip.io",
"aes-nodejs-8vicg27vtbz07fi6iu0uu3d42ha7c0vl.default.192.168.1.240.sslip.io",
"aes-nodejs-8vicg27vtbz07fi6iu0uu3d42ha7c0vl.default.192.168.1.240.sslip.io",
"gptj-python-q0j5cwv0iri9h1v0enba17hl46636ksx.default.192.168.1.240.sslip.io"
]
}
]
```
The generated load consists of a list of timestamp-invocation objects, with each object depicting a minute/second. The timestamp is a list of microseconds of that minute/second when the appropriate endpoint is invoked.

## Invoking load

The invoker reads the target addresses specified inside the input file (`load.json` by default) generated previously and invokes the endpoints at the correct time instance. `make invoker` can be used to build a binary of the `invoker`. Once this and the input file are ready, calling `./invoker` will start the process.

### Command

```bash
cd ./invoker
make invoker
./invoker -duration 15 -min true -traceFile load.json
```

#### More details:

```
./invoker -h
Usage of ./invoker:
-dbg
Enable debug logging
-duration int
Experiment duration (default 8)
-grpcTimeout int
Timeout in seconds for gRPC requests (default 30)
-latf string
CSV file for the latency measurements in microseconds (default "lat.csv")
-min
Is it minute granularity (default true)
-port int
The port that functions listen to (default 80)
-trace
Enable tracing in the client
-traceFile string
File with trace endpoints' metadata (default "load.json")
-warmup int
Warm up duration (default 2)
-zipkin string
zipkin url (default "http://localhost:9411/api/v2/spans")
```

### Multi-node setup

In the case of a multi-node setup, the `python3 main.py loadgen` must be run on the Master node to generate and deploy the load. Copy the generated `load.json` to the worker nodes. `invoker` must be run on the worker nodes with the `load.json` file as the input. However, note that the generated `load.json` is independent of the number of worker nodes. It is the user's responsibility to distribute the load to the worker nodes if there is more than one worker node.


## Plotting

```bash
python3 main.py plot -h
usage: main.py plot [-h] [-t path] [-p path] [-o path] [-dbg bool]

optional arguments:
-h, --help show this help message and exit
-t path, --trace path
Directory in which durations, invocations, memory CSV files of trace are located
-p path, --profile path
JSON file containing profile details of the proxy functions
-o path, --png_folder path
Output folder where plots are stored
-dbg bool, --dbg bool
Show debug messages
```
The tool can also plot the profiles of the trace and proxy functions as a histogram. The command takes the profile.json file (`-p` or `--profile`) and `trace` location (`-t` or `--trace`) as input and saves the histogram PNG files at the `png/` folder (-o or --png_folder flags).

This functionality can be utilized to understand how the distribution of proxy functions varies from the trace functions. A good set of proxy functions should cover the entire distribution of the trace functions.

### Example:

![image](https://github.com/vhive-serverless/vSwarm/assets/70060966/017ae173-343c-4b6d-99b2-478331d4d04d)
![image](https://github.com/vhive-serverless/vSwarm/assets/70060966/3f9482aa-3544-40b1-959c-80f69cc143bf)
![image](https://github.com/vhive-serverless/vSwarm/assets/70060966/e1ef35ce-2e6a-4d2c-bf6c-b16e125ee1de)
![image](https://github.com/vhive-serverless/vSwarm/assets/70060966/7ccb4548-83e4-4449-95c2-7fea27360ea1)
![image](https://github.com/vhive-serverless/vSwarm/assets/70060966/d18f1da6-7373-4769-a84d-a563ab7b6a37)
Loading

0 comments on commit 45d97c1

Please sign in to comment.