Top 20 AWS Interview Questions for L1/L2 Cloud Engineer Positions

If you’re gearing up for an interview as an AWS Cloud Engineer (L1/L2), it’s crucial to concentrate on key topics. As AWS holds a significant portion of the cloud market (approximately 31% as of Q1 2024), having a solid grasp of its fundamental services and concepts is vital. The following questions address essential areas within AWS, including compute, storage, networking, security, and more.

1. How would you troubleshoot an EC2 instance that is unreachable (e.g., cannot SSH in)? What AWS features and logs would you check?

When you cannot SSH into an EC2 instance, follow these steps:

Check Security Group Rules:
- Go to the EC2 console, select your instance, and check the associated security group.
- Ensure that there is an inbound rule allowing SSH (port 22) from your IP address. For example, a rule might look like:
  - Type: SSH
  - Protocol: TCP
  - Port Range: 22
  - Source: Your IP address (e.g., 203.0.113.0/32)
Network ACLs:
- Check the Network ACLs associated with the subnet. Ensure that both inbound and outbound rules allow traffic for SSH. A typical inbound rule for SSH would allow TCP traffic on port 22.
Instance Status Checks:
- In the EC2 console, check the instance’s status checks. There are two types:
  - System Status Check: Indicates if the underlying hardware is functioning properly.
  - Instance Status Check: Indicates if the operating system is running correctly.
- If either check fails, you may need to reboot or investigate further.
Elastic IP:
- If your instance uses an Elastic IP, ensure that it is correctly associated with the instance. If the Elastic IP is dissociated, you won’t be able to connect.
VPC Configuration:
- Verify the VPC settings, including route tables and internet gateway configurations. Ensure that the route table for the subnet has a route to the internet (0.0.0.0/0) via the internet gateway.
Logs:
- Access the instance logs via the EC2 console. You can view the system log or instance console output to look for boot errors or other issues.

2. How do you secure access to EC2 instances? Consider aspects like SSH key pairs, Security Groups, IAM roles, and bastion hosts.

To secure access to your EC2 instances, consider the following practices:

SSH Key Pairs:
- Generate SSH key pairs using the AWS Management Console or command line. Store the private key securely and use it to log in to your instance. Avoid using passwords for SSH access.
Security Groups:
- Create security groups that restrict access to only necessary IP addresses. For example, allow SSH access only from your office IP or VPN.
IAM Roles:
- Assign IAM roles to your EC2 instances to grant permissions to access other AWS services without hardcoding credentials. This enhances security by following the principle of least privilege.
Bastion Hosts:
- Use a bastion host (also known as a jump box) in a public subnet to provide SSH access to private instances. This adds an additional layer of security as direct access to private instances is restricted.

3. What is the difference between an EC2 instance store volume and an EBS volume? When might you choose one over the other?

Instance Store:
- Characteristics:
  - Temporary storage physically attached to the host machine.
  - Data is lost when the instance is stopped or terminated.
  - Provides high I/O performance, making it suitable for workloads requiring fast access to data (e.g., caches, temporary data).
- Use Cases:
  - High-performance computing, temporary data processing, and caching.
EBS Volume:
- Characteristics:
  - Persistent block storage that remains available even if the instance is stopped.
  - Data is retained even after instance termination.
  - Offers various volume types (e.g., SSD for performance, HDD for throughput).
- Use Cases:
  - Databases, application data, and any data that needs to persist beyond the instance lifecycle.

4. How can you connect to an EC2 instance in a private subnet with no public IP address?

To access an EC2 instance located in a private subnet without a public IP address, you can use several methods:

VPN Connection:
- Set up a VPN connection between your on-premises network and your AWS VPC. This allows secure access to resources in the private subnet.
AWS Direct Connect:
- Use AWS Direct Connect to establish a dedicated network connection from your premises to AWS, providing secure access to your VPC.
Bastion Host:
- Deploy a bastion host in a public subnet. SSH into the bastion host first, then SSH into the private instance from there.
Session Manager:
- Use AWS Systems Manager Session Manager, which allows you to connect to your instance without needing an SSH connection. Ensure your instance has the necessary IAM role and SSM agent installed.

5. What is an AWS VPC and why is it important for AWS deployments?

Definition:
- A VPC is a logically isolated section of the AWS cloud where you can define and control a virtualized network environment.
Importance:
- Isolation: Provides a secure and isolated environment for resources.
- Customization: Allows you to configure IP address ranges, subnets, route tables, and network gateways.
- Control: Enables you to control inbound and outbound traffic using security groups and network ACLs.

6. How do Security Groups differ from Network ACLs in a VPC? When would you use each?

Security Groups:
- Stateful: If you allow an incoming request, the response is automatically allowed, regardless of outbound rules.
- Instance Level: Applied to individual instances.
- Use Cases: Ideal for defining permissions for instances (e.g., allowing HTTP/HTTPS traffic to a web server).
Network ACLs:
- Stateless: Each request is evaluated against both inbound and outbound rules separately.
- Subnet Level: Applied to entire subnets.
- Use Cases: Useful for broader security controls, such as blocking specific IP ranges or protocols across multiple instances.

7. Your application runs in a private subnet but needs to reach the internet (e.g., to download updates). What AWS service do you configure to enable outbound internet access?

To enable instances in a private subnet to access the internet:

NAT Gateway:
- Create a NAT Gateway in a public subnet. Configure the route table for the private subnet to direct outbound traffic (0.0.0.0/0) to the NAT Gateway. This allows instances to initiate outbound traffic while preventing inbound traffic.

8. How can you connect resources in two different VPCs or accounts so they can communicate securely?

To facilitate communication between resources in different VPCs or AWS accounts:

VPC Peering:
- Establish a VPC peering connection, allowing resources in different VPCs to communicate as if they are in the same network. Ensure routing is configured appropriately.
AWS Transit Gateway:
- Use a Transit Gateway to connect multiple VPCs and on-premises networks. It simplifies management and scales easily as you add more connections.

9. What is AWS IAM and why is it crucial for AWS operations?

Definition:
- IAM is a service that enables you to manage access to AWS services and resources securely.
Importance:
- Access Management: Control who can access what resources.
- Security: Helps enforce security best practices by implementing the principle of least privilege.
- Auditability: Provides detailed logs of actions taken by users and services, aiding in compliance and security audits.

10. What’s the difference between an IAM user, group, and role? When would you use an IAM role?

IAM User:
- Represents an individual who needs access to AWS resources. Users have their own credentials (username and password or access keys).
IAM Group:
- A collection of IAM users that share the same permissions. Groups simplify permission management by allowing you to apply policies to multiple users at once.
IAM Role:
- A set of permissions that can be assumed by AWS services, users, or applications. Roles are used for temporary access and are ideal for granting permissions to AWS services (like EC2) to access other AWS resources.

11. What is Amazon CloudWatch, and what types of metrics or logs can it collect?

Definition:
- CloudWatch is a monitoring and observability service that provides data and insights to monitor AWS resources and applications.
Metrics and Logs:
- Metrics: Collects metrics such as CPU utilization, disk I/O, network traffic, and custom application metrics.
- Logs: Captures log files from AWS services, allowing you to monitor and troubleshoot applications. You can set up log groups and streams to organize logs.

12. How do you set up a CloudWatch alarm to notify you if an EC2 instance’s CPU usage remains above a threshold?

To create a CloudWatch alarm for CPU usage:

Access CloudWatch Console: Go to the CloudWatch service in the AWS Management Console.
Create Alarm: Click on “Alarms” and then “Create Alarm.”
Select Metric: Choose the EC2 metric (e.g., CPUUtilization) from the list of available metrics.
Define Threshold: Set the conditions for the alarm (e.g., “greater than 80%”).
Configure Actions: Specify what happens when the alarm state changes (e.g., send a notification to an SNS topic).
Review and Create: Review your settings and create the alarm.

13. What is Amazon S3, and what are common use cases for it in cloud operations?

Definition:
- S3 is an object storage service that provides highly scalable, durable, and secure storage for data.
Common Use Cases:
- Data Backup: Storing backups of applications and databases.
- Content Distribution: Hosting static websites and serving media files.
- Big Data Analytics: Storing large datasets for analytics and processing.
- Data Archiving: Long-term storage of infrequently accessed data.

14. How do you secure data in an S3 bucket? Explain bucket policies, ACLs, and IAM permissions for S3.

To secure data in S3:

Bucket Policies:
- Define access permissions for the entire S3 bucket. You can specify who can access the bucket and what actions they can perform (e.g., read, write).
Access Control Lists (ACLs):
- Control access at the object level. ACLs can grant read/write permissions to specific AWS accounts or predefined groups.
IAM Permissions:
- Use IAM policies to grant users or roles permissions to access S3 buckets and objects. This provides fine-grained control over who can access what.

15. What is Amazon RDS, and how does it differ from running a database on EC2?

Definition:
- RDS is a managed database service that simplifies the setup, operation, and scaling of relational databases.
Differences from EC2:
- Management: RDS automates tasks such as backups, patching, and scaling, reducing operational overhead.
- Performance: RDS provides optimized configurations for various database engines (e.g., MySQL, PostgreSQL).
- High Availability: RDS supports Multi-AZ deployments for automatic failover and read replicas for scaling reads.

16. How do you ensure high availability and backups for an RDS database instance? (Hint: Multi-AZ, read replicas, snapshots)

To ensure high availability and backups for RDS:

Multi-AZ Deployments:
- RDS can be deployed across multiple Availability Zones (AZs) for high availability. In the event of an outage, RDS automatically fails over to the standby instance.
Read Replicas:
- Create read replicas to offload read traffic from the primary instance, improving performance for read-heavy applications.
Snapshots:
- RDS allows you to take automated backups and manual snapshots. Automated backups enable point-in-time recovery, while manual snapshots are retained until you delete them.

17. How does an AWS Elastic Load Balancer work? What are the differences between an Application Load Balancer and a Network Load Balancer?

Definition:
- ELB automatically distributes incoming application traffic across multiple targets, such as EC2 instances, containers, and IP addresses.
Types of Load Balancers:
- Application Load Balancer (ALB): Operates at the application layer (HTTP/HTTPS). It supports advanced routing features such as host-based and path-based routing, making it ideal for microservices architectures.
- Network Load Balancer (NLB): Operates at the transport layer (TCP). It is designed for high-performance applications that require ultra-low latency and can handle millions of requests per second.

18. Describe how an Auto Scaling group works. How do you configure it to handle changes in traffic load?

Definition:
- An Auto Scaling group automatically adjusts the number of EC2 instances based on demand.
Configuration:
- Scaling Policies: Define rules for scaling in and out based on metrics (e.g., CPU utilization, request count).
- Health Checks: Automatically replace unhealthy instances to ensure availability.
- Scheduled Scaling: Scale based on predictable traffic patterns (e.g., scaling up during business hours).

19. What is Amazon Route 53 and how can it improve application availability (e.g., using health checks and routing policies)?

Definition:
- Route 53 is a scalable domain name system (DNS) web service designed to route users to applications by translating domain names into IP addresses.
Improving Application Availability:
- Health Checks: Monitor the health of resources (e.g., web servers) and route traffic only to healthy endpoints.
- Routing Policies: Use different routing policies (e.g., weighted, latency-based, geolocation) to improve performance and availability.

20. When would you use a Route 53 Alias record instead of a CNAME record? What advantages does it offer?

Alias Record:
- Can point to AWS resources like CloudFront distributions, S3 buckets, or ELB. It can be used at the root domain level (e.g., example.com) and automatically updates if the target changes.
CNAME Record:
- Points to another domain name (e.g., www.example.com). It cannot be used at the root domain level and requires an additional DNS lookup, which can introduce latency.

BIBEK ARYAL