Skip to content

Latest commit

 

History

History
264 lines (169 loc) · 20.8 KB

lift-and-shift.md

File metadata and controls

264 lines (169 loc) · 20.8 KB

FastTrack for Azure Architectural Discussion Framework - Lift and Shift

Azure Design Principles with Lift and Shift

App & Data Migration

Linux Lift and Shift into Azure

Distributed Architecture

High Availability and Business Continuity / Disaster Recovery

  • Do you have Availability Requirements defined for the workload? How much downtime is acceptable? Have you defined a Recovery Time Objective (RTO) and Recovery Point Objective (RPO)?

    It is critical that there are defined Resiliency (High Availability and Disaster Recovery) metrics like RPO/RTO/SLA. Without these metrics it will be hard to design an application to hit business needs.

    Designing resilient applications for Azure

  • Is there a defined SLA for the workload?

    It is important to understand how this impacts the SLA that you provide your end-users. You should define a target SLA for each workload. An SLA makes it possible to evaluate whether the architecture meets the business requirements. Ensure that you calculate Composite SLA (SLA of multiple Azure and other dependent services that makes up the workload) and see if that meets the business SLA requirements. You can then cross compare your SLAs with Microsoft Azure's to determine the necessary requirements needed.

    Defining Resiliency Requirements

    Azure SLAs

  • Have you performed Failure Mode Analysis (FMA)?

    FMA is to identify possible points of failures and define how the applications will respond to those failures.

    Failure Mode Analysis

  • Do you require a level of high availability for your Open Source Application systems when they are migrated to Azure? Do you require your Linux system to be a failover cluster?

    Understand if the Open Source System requires high availability as part of the application design.

  • Does your MySQL database have any high availability requirements?

    Be aware of the associated guidance for MySQL on Azure.

    MySQL High Availability Architecture in Microsoft Azure

Monitoring & Management

  • How is the health and performance of the application measured and monitored? What KPIs are reviewed by the operations team?  

    When developing a monitoring strategy and implementation for a system, the first step is determining the Key Performance Indicators used to measure the health, behavior and performance of the system. Without having clear and quantified targets in mind, making informed engineering decisions is effectively impossible, and leads to circular, anecdotal discussions of "the system feels slow"  

    Reference - Monitoring and diagnostics guidance

    Configure Azure Backup Reports

    Introducing OMS Network Performance Monitor

  • If you are offering an SLA, do you have any methods of monitoring the SLA that you are providing?

    Many commercial systems that support paying customers make guarantees about the performance of the system in the form of SLAs. Essentially, SLAs state that the system can handle a defined volume of work within an agreed time frame and without losing critical information. SLA monitoring is concerned with ensuring that the system can meet measurable SLAs.  

    SLA Monitoring

  • What is the resource utilization across deployed resources? How much headroom (capacity) is there across the deployed resources?  

    Usage monitoring can help to determine which features may benefit from functional partitioning or re-architecture. Additionally, it may help to serve which components should be retired from the solution.   It could also be used to Detect (possibly indirectly) user satisfaction with the performance or functionality of the system. For example, if a large number of customers in an e-commerce system regularly abandon their shopping carts, this might be due to a problem with the checkout functionality.  

    Usage Monitoring

  • How many operations are executed across various tiers? Per second/hour/day? Peak load? Are you monitoring the performance of your application as it scales and is placed under more and more stress/load?  

    As the system is placed under more and more stress (by increasing the volume of users), the size of the datasets that these users access grows and the possibility of failure of one or more components becomes more likely. Frequently, component failure is preceded by a decrease in performance. If you're able detect such a decrease, you can take proactive steps to remedy the situation.  

    Performance Monitoring

  • How many errors (user-visible) are produced in the system? Per second/hour/day? If needed, can a root-cause be defined? Do you actively check if the system is functioning as expected? Are you currently performing health monitoring?  

    An operator should be alerted quickly (within a matter of seconds) if any part of the system is deemed to be unhealthy. The operator should be able to ascertain which parts of the system are functioning normally, and which parts are experiencing problems.  

    Health Monitoring

    Overview of Azure Monitor

  • How many active, unique users does your application support (mobile, web, etc.)?  

    Understand the personas that are using the application and when peak usage occurs

  • Are you aware of managed disks, and how they can provide High Availability compared with user managed disks?

    Explain the benefit of Managed Disks as well as how they are supported in ASR during lift and shift.

    Azure Site Recovery now supports managed disks for on-premises to Azure

Performance & Scalability

  • How many users does your application need to support? (total and concurrently active; how are they geographically distributed) Have you designed for scaling within your solution? (e.g. Stateless where appropriate, auto-detection of new instances)  

    Determine whether the solution is being tested beyond the predicted peak load. Additionally, this will determine whether their solution has been configured for scaling or not, and whether it is architected to avoid platform limits.

    Scalability checklist

  • What are the per-operation latency targets, and acceptable latency range? How many operations/messages per second?

    Reduce chatty interactions between components and services. Avoid designing interactions in which an application is required to make multiple calls to a service (each of which returns a small amount of data), rather than a single call that can return all of the data. Where possible, combine several related operations into a single request when the call is to a service or component that has noticeable latency. This makes it easier to monitor performance and optimize complex operations. For example, use stored procedures in databases to encapsulate complex logic, and reduce the number of round trips and resource locking.

    Scalability checklist - Reduce chatty interactions

  • What, if any, seasonality is there to the solution?  

    Azure Advisor is a personalized cloud consultant that helps you follow best practices to optimize your Azure deployments. It analyzes your resource configuration and usage telemetry and recommends solutions that can help you improve the cost effectiveness, performance, high availability and security of your Azure resources.

    Introduction to Azure Advisor

  • What are the operational constraints and requirements for the system? Are there compliance, location (required geography), regulatory (HIPAA, FEDRAMP, etc.) or sovereignty requirements?

    Microsoft Trust Center provides you with a wealth of information about how Microsoft Cloud Services are secured, how Microsoft ensures privacy of your data, as well as information about Compliance and more.

    Microsoft Trust Center

Security