Skip to content

Processing ETBR bonsentan data (5XPR)

keitaroyam edited this page Jan 27, 2018 · 4 revisions

The following describes how endothelin ETB receptor+bonsentan datasets can be processed using KAMO (documentation in Japanese / English).

References

  • Original paper
    • Shihoya et al. (2017) "X-ray structures of endothelin ETB receptor bound to clinical antagonist bosentan and its analog." Nature Structural & Molecular Biology doi: 10.1038/nsmb.3450 PDB: 5XPR

Raw data

  • Available in Zenodo. DOI
  • Collected on BL32XU, SPring-8
  • MX225HS CCD detector (2x2 binning), 18×10 μm2 beam, 1 Å wavelength, 250.0 mm camera length
  • 10°/dataset, 0.2°/frame (shutterless)
  • 16 datasets collected automatically (ZOO system) from 2 cryoloops
  • P3221; a=b= 74.7, c= 218.9 Å

How data were processed in the original paper

GUI command 'kamo' was used by default parameters, that is, XDS (ver. May 1, 2016 BUILT=20160617) was used for integration and no prior crystal information was employed. All 16 datasets were indexed and integrated with consistent unit cells:

[ 1] 16 members:
 Averaged P1 Cell= 74.52 74.72 218.74 90.08 90.29 119.75
 Members= [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
 Possible symmetries:
   freq symmetry     a      b      c     alpha  beta   gamma reindex
      0 P 1         74.52  74.72 218.74  90.08  90.29 119.75 a,b,c
      0 P 1 2 1     74.52 218.74  74.72  89.92 119.75  89.71 a,-c,b
      0 C 1 2 1     74.72 129.40 218.74  89.62  90.08  90.34 b,-2*a-b,c
      0 C 1 2 1    129.09  74.90 218.74  90.37  90.12  90.18 a-b,a+b,c
      0 C 1 2 1     74.52 129.75 218.74  90.26  90.29  89.84 a,a+2*b,c
      0 C 1 2 1    129.75  74.52 218.74  89.71  90.26  90.16 a+2*b,-a,c
      0 C 1 2 1    129.40  74.72 218.74  90.08  90.38  89.66 2*a+b,b,c
      0 C 1 2 1     74.90 129.09 218.74  89.88  90.37  89.82 a+b,-a+b,c
      0 C 2 2 2     74.52 129.75 218.74  90.26  90.29  89.84 a,a+2*b,c
      0 C 2 2 2     74.72 129.40 218.74  89.62  89.92  89.66 b,2*a+b,-c+1/4
      0 C 2 2 2     74.90 129.09 218.74  89.88  90.37  89.82 a+b,-a+b,c
      0 P 3         74.52  74.72 218.74  90.08  90.29 119.75 a,b,c
      0 P 3 1 2     74.72  74.52 218.74  89.71  89.92 119.75 b,a,-c
     14 P 3 2 1     74.52  74.72 218.74  90.08  90.29 119.75 a,b,c
      0 P 6         74.52  74.72 218.74  90.08  90.29 119.75 a,b,c
      2 P 6 2 2     74.52  74.72 218.74  90.08  90.29 119.75 a,b,c

As P321 symmetry was the most frequent one, P321 was assumed and the XDS_ASCII files were re-indexed to P321 symmetry.

As P321 symmetry is lower than highest possible symmetry (P622; their unit cells exactly match), there was a need to resolve indexing ambiguity problem; that is, (h,k,l) and (-h,-k,l) operators need to be tested for each dataset to make all indexing modes consistent. To do this, just type

kamo.resolve_indexing_ambiguity formerge.lst

and selective-breeding algorithm developed by Kabsch (2014) converged in 2 cycles and 7 datasets were reindexed.

Next, the template script merged_blend.sh was edited to use the updated list file (with appropriately reindexed files).

#!/bin/sh
# settings
dmin=3.5
anomalous=false # true or false
lstin=formerge_reindexed.lst
use_ramdisk=true # set false if there is few memory or few space in /tmp
# _______/setting

kamo.multi_merge \
        workdir=blend_${dmin}A_framecc_b \
        lstin=${lstin} d_min=${dmin} anomalous=${anomalous} \
        space_group=None reference.data=None \
        program=xscale xscale.reference=bmin \
        reject_method=framecc+lpstats rejection.lpstats.stats=em.b \
        clustering=blend blend.min_cmpl=90 blend.min_redun=2 blend.max_LCV=None blend.max_aLCV=None \
        xscale.use_tmpdir_if_available=${use_ramdisk} \
        batch.engine=sge batch.par_run=merging batch.nproc_each=8 nproc=8 batch.sge_pe_name=par

After running this script, the largest cluster was found to have the best statistics. However, the inner-shell R-meas value was a little bit high (blend_3.5A_framecc_b/cluster_0015/run_03; 14 datasets):

 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

    10.45        2737     418       432       96.8%      11.2%     11.3%     2712   17.50     12.0%    98.3*     9    0.966     172
     7.41        4900     685       687       99.7%      13.9%     11.9%     4879   15.87     14.8%    99.0*    -4    0.966     383
     6.05        6359     861       874       98.5%      22.9%     22.4%     6339    8.93     24.6%    96.8*    -4    0.769     514
     5.25        7154     978       982       99.6%      34.8%     36.7%     7123    5.93     37.5%    95.5*     5    0.789     603
     4.69        8372    1125      1130       99.6%      34.3%     35.1%     8338    6.19     37.0%    95.2*     0    0.756     735
     4.29        9068    1182      1190       99.3%      43.3%     46.3%     9043    5.30     46.4%    93.9*    -3    0.770     785
     3.97       10100    1324      1337       99.0%      75.1%     86.2%    10066    3.09     80.4%    85.5*    -6    0.693     883
     3.71       10765    1428      1440       99.2%     115.8%    141.0%    10734    1.94    123.9%    69.1*     2    0.671     946
     3.50       11403    1466      1479       99.1%     227.2%    285.4%    11352    0.98    242.3%    35.4*     3    0.612    1006
    total       70858    9467      9551       99.1%      33.4%     36.8%    70586    5.62     35.8%    98.2*     0    0.735    6027

We found that increasing NBATCH= value to 50 and setting low resolution limit to 30 Å helped improve the statistics. And finally, high resolution limit was set to 3.6 Å based on the paired-refinement result. Here is the final result:

 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

    10.23        2849     431       463       93.1%       4.6%      4.7%     2825   32.60      4.9%    99.8*     3    0.990     184
     7.45        4484     636       637       99.8%       6.0%      5.8%     4465   25.89      6.4%    99.8*     0    0.908     354
     6.15        5695     783       795       98.5%      16.2%     16.1%     5677   12.32     17.4%    98.9*     3    0.816     461
     5.35        6628     907       911       99.6%      29.6%     30.6%     6598    7.69     31.8%    98.1*     4    0.805     563
     4.80        7794    1027      1032       99.5%      30.9%     31.0%     7767    8.08     33.1%    98.5*     2    0.819     670
     4.39        8193    1088      1095       99.4%      35.8%     35.9%     8167    7.20     38.4%    96.3*     0    0.805     719
     4.07        9345    1213      1221       99.3%      65.9%     70.2%     9315    4.25     70.4%    89.8*     1    0.773     813
     3.82        9750    1274      1286       99.1%     100.3%    112.5%     9721    2.62    107.1%    76.0*     0    0.702     853
     3.60       10231    1332      1347       98.9%     181.9%    205.1%    10194    1.50    194.1%    54.5*     4    0.678     900
    total       64969    8691      8787       98.9%      28.0%     29.8%    64729    8.49     29.9%    99.3*     2    0.779    5517