forked from pivotal-cf/docs-pks
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbbr-restore.html.md.erb
215 lines (178 loc) · 9.19 KB
/
bbr-restore.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
---
title: Restoring the PKS Control Plane
owner: PKS
---
<strong><%= modified_date %></strong>
This topic describes how to use BOSH Backup and Restore (BBR) to restore the PKS control plane.
To back up the PKS control plane with BBR, see [Backing up the PKS Control Plane](bbr-backup.html).
## <a id="compatibility"></a> Compatibility of Restore
This section describes the restrictions for a backup artifact to be restorable to another
environment.
This section is for guidance only, and Pivotal highly recommends that operators validate their
backups by using the backup artifacts in a restore.
The restrictions for a backup artifact to be restorable are the following:
+ **Topology**: BBR requires the BOSH topology of a deployment to be the same in the restore
environment as it was in the backup environment.
+ **Naming of instance groups and jobs**: For any deployment that implements the backup and restore
scripts, the instance groups and jobs must have the same names.
+ **Number of instance groups and jobs**: For instance groups and jobs that have backup and restore scripts, the same number of instances must exist..
+ **Limited validation**: BBR puts the backed up data into the corresponding instance groups and
jobs in the restored environment, but cannot validate the restore beyond that. For example, if the
MySQL encryption key is different in the restore environment, the BBR restore might succeed
although the restored MySQL database is unusable.
<p class="note"><strong>Note</strong>: A change in VM size or underlying hardware should not affect the ability for BBR restore data, as long as adequate storage space to restore the data exists.</p>
## <a id="recreate-vms"></a> Step 1: Recreate VMs
Before restoring the PKS control plane, you must create the VMs that constitute the deployment.
In a disaster recovery scenario, you can re-create the control plane with your PKS deployment
manifest.
If you used the `--with-manifest` flag when you ran the BBR backup command, your backup artifact
includes a copy of your manifest.
## <a id="artifacts-jumpbox"></a> Step 2: Transfer Artifacts to Jumpbox
Transfer your BBR backup artifact from your safe storage location to the jumpbox.
For example, you could run the following command to SCP the backup artifact to your jumpbox:
```
scp LOCAL-PATH-TO-BACKUP-ARTIFACT JUMPBOX-USER/JUMPBOX-ADDRESS
```
If this artifact is encrypted, you must decrypt it.
## <a id='restore'></a> Step 3: Restore
<p class="note"><strong>Note</strong>: The BBR restore command can take a long time to complete.
You can run it independently of the SSH session so that the process can continue running even if
your connection to the jumpbox fails. The command above uses <code>nohup</code>, but you run the
command in a <code>screen</code> or <code>tmux</code> session instead.</p>
Perform the following steps to restore the PKS control plane. You can use the optional `--debug` flag to enable debug logs. See the [BBR Logging](bbr-logging.html) topic for more information.
1. Ensure the PKS deployment backup artifact is in the folder from which you run BBR.
1. Download the root CA certificate for your PKS deployment as follows:
1. On the Ops Manager Installation Dashboard, in the top right corner, click your username.
1. Navigate to **Settings** > **Advanced**.
1. Click **Download Root CA Cert**.
1. Locate and record your PKS BOSH deployment name as follows:
1. On the Ops Manager Installation Dashboard, click the Director tile.
1. In the Director tile, click the **Credentials** tab.
1. Navigate to **Bosh Commandline Credentials** and click **Link to Credential**.
1. Copy the credential value.
1. On the command line, run the following command to retrieve your PKS BOSH deployment name.
Replace `BOSH-CLI-CREDENTIALS` with the credential value you copied in the previous step:
<pre>BOSH-CLI-CREDENTIALS deployments | grep pivotal-container-service</pre>
<p class="note"><strong>Note</strong>: Your PKS BOSH deployment name begins with "pivotal-container-service" and includes a unique identifier.</p>
1. Run the BBR restore command to restore the PKS control plane:
```
BOSH_CLIENT_SECRET=BOSH-CLIENT-SECRET \
nohup bbr deployment \
--target BOSH-TARGET \
--username BOSH-CLIENT \
--deployment DEPLOYMENT-NAME \
--ca-cert PATH-TO-BOSH-SERVER-CERT \
restore \
--artifact-path PATH-TO-DEPLOYMENT-BACKUP
```
Replace the placeholder values as follows:
<table>
<tr>
<th width="32%">Credential</th>
<th>Location</th>
</tr>
<tr>
<td><code>BOSH-CLIENT-SECRET</code></td>
<td>In the BOSH Director tile, navigate to <strong>Credentials > Bosh Commandline Credentials</strong>. Record the value for <code>BOSH_CLIENT_SECRET</code>.</td>
</tr>
<tr>
<td><code>BOSH-TARGET</code></td>
<td>In the BOSH Director tile, navigate to <strong>Credentials > Bosh Commandline Credentials</strong>. Record the value for <code>BOSH_ENVIRONMENT</code>.
You must be able to reach the target address from the workstation where you run
<code>bbr</code> commands.</td>
</tr>
<tr>
<td><code>BOSH-CLIENT</code></td>
<td>In the BOSH Director tile, navigate to <strong>Credentials > Bosh Commandline Credentials</strong>. Record the value for <code>BOSH_CLIENT</code>.</td>
</tr>
<tr>
<td><code>DEPLOYMENT-NAME</code></td>
<td>Use the PKS BOSH deployment name that you recorded in a previous step.</td>
</tr>
<tr>
<td><code>PATH-TO-BOSH-CA-CERT</code></td>
<td>Use the path to the root CA certificate that you downloaded in a previous step.</td>
</tr>
<tr>
<td><code>PATH-TO-DEPLOYMENT-BACKUP</code></td>
<td>Use the path to the PKS control plane backup that you want to restore.</td>
</tr>
</table>
<br>
For example:
<pre class="terminal">
$ BOSH_CLIENT_SECRET=p455w0rd \
nohup bbr deployment \
--target bosh.example.com \
--username admin \
--deployment cf-acceptance-0 \
--ca-cert bosh.ca.crt \
restore \
--artifact-path /home/cf-abcd1234abcd1234abcd-abcd1234abcd1234
</pre>
If the command fails, follow the steps in [Recover from a Failing Command](#recover-from-failing-command).
## <a id="recover-from-failing-command"></a>Recover from a Failing Command
1. Ensure that you set all the parameters in the command.
1. Ensure that the BOSH Director credentials are valid.
1. Ensure that the specified BOSH deployment exists.
1. Ensure that the jumpbox can reach the BOSH Director.
1. Ensure the source BOSH deployment is compatible with the target BOSH deployment.
1. If you see the error message `Directory /var/vcap/store/bbr-backup already exists on instance`,
run the relevant commands from the [Clean up After Failed Restore](#manual-clean) section of this
topic.
1. See the [BBR Logging](bbr-logging.html) topic.
## <a id='cancel-restore'></a>Cancel a Restore
If you must cancel a restore, perform the following steps:
1. Terminate the BBR process by pressing Ctrl-C and typing `yes` to confirm.
1. Perform the procedures in the [Clean up After Failed Restore](#manual-clean) section to enable future restores. Stopping a restore can leave the system in an unusable state and prevent future restores.
## <a id="manual-clean"></a>Clean up After Failed Restore
If your restore process fails, then the process may leave the BBR restore folder on the instance. As a result, any subsequent restore attempts may also fail. In addition, BBR may not have run the post-restore scripts, which can leave the instance in a locked state.
To resolve these issues, run the BBR cleanup script with the following command:
```
BOSH-CLIENT-SECRET=BOSH-CLIENT-SECRET \
bbr deployment \
--target BOSH-TARGET \
--username BOSH-CLIENT \
--deployment DEPLOYMENT-NAME \
--ca-cert PATH-TO-BOSH-CA-CERT \
restore-cleanup
```
Replace the placeholder values as follows:
<table>
<tr>
<th width="32%">Credential</th>
<th>Location</th>
</tr>
<tr>
<td><code>BOSH-CLIENT-SECRET</code></td>
<td>In the BOSH Director tile, navigate to <strong>Credentials > Bosh Commandline Credentials</strong>. Record the value for <code>BOSH_CLIENT_SECRET</code>.</td>
</tr>
<tr>
<td><code>BOSH-TARGET</code></td>
<td>In the BOSH Director tile, navigate to <strong>Credentials > Bosh Commandline Credentials</strong>. Record the value for <code>BOSH_ENVIRONMENT</code>.
You must be able to reach the target address from the workstation where you run
<code>bbr</code> commands.</td>
</tr>
<tr>
<td><code>BOSH-CLIENT</code></td>
<td>In the BOSH Director tile, navigate to <strong>Credentials > Bosh Commandline Credentials</strong>. Record the value for <code>BOSH_CLIENT</code>.</td>
</tr>
<tr>
<td><code>DEPLOYMENT-NAME</code></td>
<td>Use the PKS BOSH deployment name that you recorded in a previous step.</td>
</tr>
<tr>
<td><code>PATH-TO-BOSH-CA-CERT</code></td>
<td>Use the path to the root CA certificate that you downloaded in a previous step.</td>
</tr>
</table>
For example:
<pre class="terminal">
$ BOSH_CLIENT_SECRET=p455w0rd \
bbr deployment \
--target bosh.example.com \
--username admin \
--deployment cf-acceptance-0 \
--ca-cert bosh.ca.crt \
restore-cleanup
</pre>