AWS Troubleshooting – Instance failure.

Brief: 

Few days back we found that one of our AWS instances stopped working, it was not accessible via remote access and through console it's status checks were failing. This blog post explains how to recover your instance and get it back online!

How It Works:

Method 1:

Let’s try installing the latest version of EC2config service on this instance to get it out of our way. As sometimes no output in the console indicates that the issue is with EC2config service such as misconfigured configuration file, or that windows failed to boot properly. Please follow the steps below to install latest version on an unreachable instance:
Use the following procedure to update the EC2Config service on a Windows Server instance that is inaccessible using Remote Desktop.
To update EC2Config on an Amazon EBS-backed Windows instance that you can't connect to

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

2. In the navigation pane, choose Instances.

3. Locate the affected instance. Open the context (right-click) menu for the instance, choose Instance State, and then choose Stop.

Warning:
When you stop an instance, the data on any instance store volumes is erased. Therefore, if you have any data on instance store volumes that you want to keep, be sure to back it up to persistent storage.


4. Choose Launch Instance and create a temporary t2.micro instance in the same Availability Zone as the affected instance. Use a Windows Server 2003 Amazon Machine Image (ami). If you use a later version of Windows Server, you won't be able to boot the original instance when you restore its root volume. To find an AMI for Windows Server 2003, search for public Windows AMIs with the name Windows_Server-2003-R2_SP2.

Important:
If you do not create the instance in the same Availability Zone as the affected instance you will not be able to attach the root volume of the affected instance to the new instance.


5. In the EC2 console, choose Volumes.

6. Locate the root volume of the affected instance. Detach the volume and attach it to the temporary instance you created earlier. Attach it with the default device name (xvdf).

7. Use Remote Desktop to connect to the temporary instance, and then use the Disk Management utility to make the volume available for use.

8. Download the latest EC2Config from Amazon Windows EC2Config Service. Extract the files from the .zip file to the Temp directory on the drive you attached.

9. On the temporary instance, open the Run dialog box, type regedit, and press Enter.

10. Choose HKEY_LOCAL_MACHINE. From the File menu, choose Load Hive. Choose the drive and then navigate to and open the following file: Windows\System32\config\SOFTWARE. When prompted, specify a key name.

11. Select the key you just loaded and navigate to Microsoft\Windows\CurrentVersion. Choose the RunOnce key. If this key doesn't exist, choose CurrentVersion from the context (right-click) menu, choose New and then choose Key. Name the key RunOnce.

12. From the context (right-click) menu choose the RunOnce key, choose New and then choose String Value. Enter Ec2Install as the name and C:\Temp\Ec2Install.exe /quiet as the data.

13. Choose the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon key. From the context (right-click) menu choose New, and then choose String Value. Enter AutoAdminLogon as the name and 1 as the value data.

14. Choose the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon> key. From the context (right-click) menu choose New, and then choose String Value. Enter DefaultUserName as the name and Administrator as the value data.

15. Choose the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon key. From the context (right-click) menu choose New, and then choose String Value. Enter DefaultPassword as the name and enter a password in the value data.

16. In the Registry Editor navigation pane, choose the temporary key that you created when you first opened the Registry Editor.

17. From the File menu, choose Unload Hive.

18. In the Disk Management Utility, choose the drive you attached earlier, open the context (right-click) menu, and choose Offline.

19. In the Amazon EC2 console, detach the affected volume from the temporary instance and reattach it to your instance with the device name /dev/sda1. You must specify this device name to designate the volume as a root volume.

20. Start the instance.

21. After the instance starts, check the system log and verify that you see the message Windows is ready to use.

22. Open Registry Editor and choose HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft \Windows  NT\CurrentVersion\Winlogon. Delete the String Value keys you created earlier: AutoAdminLogon, DefaultUserName, and DefaultPassword.

23. Delete or stop the temporary instance you created in this procedure.

Method 2:

If the above doesn't work, I would suggest using EC2 savior tool to check if the settings within the instance are set to default. Please follow the steps below:

1. Launch a new Windows 2008 R2 instance just for recovery proposes.

2. Stop the instance and create an AMI (Amazon Machine Image) as backup of your instance.

3. Install the .Net Framework 3.5 Features using the following command on the Windows 2008 recovery instance: DISM /online /enable-feature /FeatureName:NetFx3

4. Download and unzip the AWS EC2 Savior tool on recovery instance: http://ec2-downloads-windows.s3.amazonaws.com/AWSDiagnostics/EC2Savior.zip

5. From the AWS Management Console find the root volume of your problematic instance in the EC2 Volumes menu and detach it from the instance.

6. Attach the volume to the recovery instance with the default device name (usually xvdf): 

7. RDP to the recovery instance and now you should have the root volume of your problematic instance as D:\ drive. If don't see the new volume, open the Windows Disk Management (diskmgmt.msc), select the disk, right click and click online

8. Open the AWS EC2 Savior on the recovery instance and click on the following options: Turn Firewall OFF, Enable RDP and Set Auto Start and Enable DHCP.

9. Close AWS EC2 Savior.

10. Detach the drive, attach to the original instance as /dev/sda1 and start that instance.

Podcast

Michael Patterson sat down with the CEO of Boston Byte, Mustapha Shaikh to discuss the significance and rapid digitization of the healthcar...