Deploying DeepSeek AI on Alibaba Cloud Singapore with Autoscaling and Load Balancing
In this comprehensive guide, we'll walk through deploying DeepSeek AI on Alibaba Cloud's Singapore region (ap-southeast-1) with a robust, scalable architecture. We'll create a secure VPC environment, set up DeepSeek with a web UI, implement autoscaling with spot instances, and secure the deployment behind a load balancer.
Why Singapore Region?
The Singapore region offers several advantages for AI workloads:
- Low latency for Southeast Asian users
- Strong data privacy regulations compliance
- Availability of GPU instances (ecs.gn6v, ecs.gn7i)
- High availability with multiple zones
- Excellent connectivity to global internet infrastructure
Architecture Overview
Our final architecture will include:
- Virtual Private Cloud (VPC) for network isolation
- ECS instances running DeepSeek with web UI
- Custom AMI for rapid scaling
- Auto Scaling Group with spot instances
- Server Load Balancer (SLB) for high availability
- Security groups for access control
Step 1: Creating a VPC on Alibaba Cloud
First, we'll establish our network foundation with a Virtual Private Cloud.
Using the Console
- Navigate to VPC Console
- Log into Alibaba Cloud Console
- Go to Virtual Private Cloud service
Create VSwitch (Subnet)
VSwitch Name: deepseek-public-subnet
Zone: ap-southeast-1a (Singapore Zone A)
IPv4 CIDR Block: 10.0.1.0/24
Create VPC
VPC Name: deepseek-vpc
IPv4 CIDR Block: 10.0.0.0/16
Resource Group: Default
Using Terraform (Alternative)
resource "alicloud_vpc" "deepseek_vpc" {
vpc_name = "deepseek-vpc"
cidr_block = "10.0.0.0/16"
}
resource "alicloud_vswitch" "public_subnet" {
vpc_id = alicloud_vpc.deepseek_vpc.id
cidr_block = "10.0.1.0/24"
zone_id = "ap-southeast-1a"
}
Step 2: Building an ECS Instance with DeepSeek and UI
Now we'll create our base ECS instance with DeepSeek installed and configured with a web interface.
Create Security Group
First, create a security group for our instances:
# Using Alibaba Cloud CLI
aliyun ecs CreateSecurityGroup \
--RegionId ap-southeast-1 \
--GroupName deepseek-sg \
--Description "Security group for DeepSeek instances" \
--VpcId vpc-xxxxxxxxx
Add necessary rules:
# SSH access (restrict to your IP)
aliyun ecs AuthorizeSecurityGroup \
--SecurityGroupId sg-xxxxxxxxx \
--IpProtocol tcp \
--PortRange 22/22 \
--SourceCidrIp YOUR_IP/32
# HTTP access for web UI
aliyun ecs AuthorizeSecurityGroup \
--SecurityGroupId sg-xxxxxxxxx \
--IpProtocol tcp \
--PortRange 8000/8000 \
--SourceCidrIp 0.0.0.0/0
Launch ECS Instance
Create an ECS instance with the following specifications optimized for Singapore region:
Instance Type: ecs.g6.2xlarge (8 vCPU, 32GB RAM) or ecs.gn6v-c8g1.2xlarge (GPU enabled)
Image: Ubuntu 20.04 LTS
System Disk: 100GB SSD
VPC: deepseek-vpc
VSwitch: deepseek-public-subnet
Security Group: deepseek-sg
Zone: ap-southeast-1a
Install DeepSeek with Web UI
SSH into your instance and run the following setup script:
#!/bin/bash
# Update system
sudo apt update && sudo apt upgrade -y
# Install Python and pip
sudo apt install python3 python3-pip python3-venv git -y
# Install NVIDIA drivers (if using GPU instances - Singapore supports GPU instances)
sudo apt install ubuntu-drivers-common -y
sudo ubuntu-drivers autoinstall
# Create virtual environment
python3 -m venv deepseek-env
source deepseek-env/bin/activate
# Install DeepSeek dependencies
pip install torch torchvision torchaudio
pip install transformers accelerate
pip install gradio fastapi uvicorn
# Clone or install DeepSeek
git clone https://github.com/deepseek-ai/DeepSeek-Coder.git
cd DeepSeek-Coder
# Install requirements
pip install -r requirements.txt
# Create web UI script
cat > deepseek_ui.py << 'EOF'
import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load DeepSeek model
model_name = "deepseek-ai/deepseek-coder-6.7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
def generate_code(prompt, max_length=512, temperature=0.7):
inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
inputs,
max_length=max_length,
temperature=temperature,
pad_token_id=tokenizer.eos_token_id,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response[len(prompt):]
# Create Gradio interface
iface = gr.Interface(
fn=generate_code,
inputs=[
gr.Textbox(label="Code Prompt", placeholder="Enter your coding request..."),
gr.Slider(minimum=100, maximum=1024, value=512, label="Max Length"),
gr.Slider(minimum=0.1, maximum=1.0, value=0.7, label="Temperature")
],
outputs=gr.Textbox(label="Generated Code"),
title="DeepSeek Code Generator",
description="Generate code using DeepSeek AI"
)
if __name__ == "__main__":
iface.launch(server_name="0.0.0.0", server_port=8000)
EOF
# Create systemd service
sudo tee /etc/systemd/system/deepseek.service > /dev/null << 'EOF'
[Unit]
Description=DeepSeek Web UI
After=network.target
[Service]
Type=simple
User=ubuntu
WorkingDirectory=/home/ubuntu/DeepSeek-Coder
Environment=PATH=/home/ubuntu/deepseek-env/bin
ExecStart=/home/ubuntu/deepseek-env/bin/python deepseek_ui.py
Restart=always
[Install]
WantedBy=multi-user.target
EOF
# Enable and start service
sudo systemctl daemon-reload
sudo systemctl enable deepseek
sudo systemctl start deepseek
Step 3: Creating a Custom Image
Once your DeepSeek instance is configured and running properly, create a custom image for autoscaling.
Using Console
- Go to ECS Console → Instances
- Select your DeepSeek instance
- Click More → Disk and Image → Create Custom Image
Configure:
Image Name: deepseek-base-image
Description: DeepSeek AI with web UI pre-configured
Using CLI
aliyun ecs CreateImage \
--RegionId ap-southeast-1 \
--InstanceId i-xxxxxxxxx \
--ImageName deepseek-base-image \
--Description "DeepSeek AI with web UI pre-configured"
Step 4: Creating Auto Scaling Group with Spot Instances
Now we'll set up autoscaling to handle varying loads efficiently and cost-effectively.
Create Scaling Group
- Navigate to Auto Scaling Console
Create Scaling Group:
Scaling Group Name: deepseek-scaling-group
VPC: deepseek-vpc
VSwitch: deepseek-public-subnet
Min Size: 1
Max Size: 10
Default Cool-down: 300 seconds
Create Scaling Configuration
Create a scaling configuration that uses spot instances:
# Create scaling configuration
aliyun ess CreateScalingConfiguration \
--ScalingGroupId asg-xxxxxxxxx \
--ImageId m-xxxxxxxxx \
--InstanceType ecs.g6.2xlarge \
--SecurityGroupId sg-xxxxxxxxx \
--ScalingConfigurationName deepseek-spot-config \
--SpotStrategy SpotWithPriceLimit \
--SpotPriceLimit 0.5 \
--InternetMaxBandwidthOut 100 \
--SystemDisk.Category cloud_ssd \
--SystemDisk.Size 100
Create Scaling Rules
Set up scaling rules based on CPU utilization:
# Scale out rule
aliyun ess CreateScalingRule \
--ScalingGroupId asg-xxxxxxxxx \
--ScalingRuleName scale-out-rule \
--AdjustmentType QuantityChangeInCapacity \
--AdjustmentValue 2 \
--Cooldown 300
# Scale in rule
aliyun ess CreateScalingRule \
--ScalingGroupId asg-xxxxxxxxx \
--ScalingRuleName scale-in-rule \
--AdjustmentType QuantityChangeInCapacity \
--AdjustmentValue -1 \
--Cooldown 300
Create CloudMonitor Alarms
Set up alarms to trigger scaling:
# High CPU alarm (scale out)
aliyun cms PutMetricRuleTargets \
--RuleName deepseek-cpu-high \
--MetricName CPUUtilization \
--Namespace acs_ecs_dashboard \
--ComparisonOperator GreaterThanThreshold \
--Threshold 70 \
--EvaluationCount 2
# Low CPU alarm (scale in)
aliyun cms PutMetricRuleTargets \
--RuleName deepseek-cpu-low \
--MetricName CPUUtilization \
--Namespace acs_ecs_dashboard \
--ComparisonOperator LessThanThreshold \
--Threshold 30 \
--EvaluationCount 3
Step 5: Deploying Behind Server Load Balancer (SLB)
Create a load balancer to distribute traffic across your scaled instances.
Create Application Load Balancer
# Create ALB
aliyun slb CreateLoadBalancer \
--RegionId ap-southeast-1 \
--LoadBalancerName deepseek-alb \
--VpcId vpc-xxxxxxxxx \
--VSwitchId vsw-xxxxxxxxx \
--LoadBalancerSpec slb.s3.medium \
--PayType PayOnDemand
Configure Backend Server Group
# Create VServer Group
aliyun slb CreateVServerGroup \
--LoadBalancerId lb-xxxxxxxxx \
--VServerGroupName deepseek-backend-group
# Add backend servers (this will be automated by Auto Scaling)
aliyun slb SetVServerGroupAttribute \
--VServerGroupId rsp-xxxxxxxxx \
--BackendServers '[{"ServerId":"i-xxxxxxxxx","Port":8000,"Weight":100}]'
Create Listener
# Create HTTP listener
aliyun slb CreateLoadBalancerHTTPListener \
--LoadBalancerId lb-xxxxxxxxx \
--ListenerPort 80 \
--BackendServerPort 8000 \
--VServerGroupId rsp-xxxxxxxxx \
--HealthCheck on \
--HealthCheckURI /health \
--HealthCheckConnectPort 8000
Update Auto Scaling Group
Attach the scaling group to the load balancer:
aliyun ess AttachLoadBalancers \
--ScalingGroupId asg-xxxxxxxxx \
--LoadBalancer lb-xxxxxxxxx
Step 6: Securing Private Instance Access
Implement security best practices to protect your DeepSeek deployment.
Network Security
Update Security Groups:
# Create restrictive security group for private instances
aliyun ecs CreateSecurityGroup \
--GroupName deepseek-private-sg \
--Description "Private security group for DeepSeek" \
--VpcId vpc-xxxxxxxxx
# Allow traffic only from load balancer
aliyun ecs AuthorizeSecurityGroup \
--SecurityGroupId sg-private-xxxxxxxxx \
--IpProtocol tcp \
--PortRange 8000/8000 \
--SourceGroupId sg-lb-xxxxxxxxx
Create Private Subnet:
# Create private subnet for backend instances
aliyun vpc CreateVSwitch \
--VpcId vpc-xxxxxxxxx \
--CidrBlock 10.0.2.0/24 \
--ZoneId ap-southeast-1a \
--VSwitchName deepseek-private-subnet
Access Control
Implement API Authentication:
# Add to deepseek_ui.py
import jwt
import os
from functools import wraps
def require_auth(f):
@wraps(f)
def decorated_function(*args, **kwargs):
token = request.headers.get('Authorization')
if not token:
return {'error': 'No token provided'}, 401
try:
jwt.decode(token, os.environ.get('JWT_SECRET'), algorithms=['HS256'])
except jwt.InvalidTokenError:
return {'error': 'Invalid token'}, 401
return f(*args, **kwargs)
return decorated_function
Create RAM Roles:
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": ["ecs.aliyuncs.com"]
},
"Action": "sts:AssumeRole"
}
]
}
Monitoring and Logging
Configure Log Service:
# Create log project
aliyun log CreateProject \
--ProjectName deepseek-logs \
--Description "DeepSeek application logs"
# Create log store
aliyun log CreateLogstore \
--ProjectName deepseek-logs \
--LogstoreName app-logs \
--TTL 30 \
--ShardCount 2
Enable CloudMonitor:
# Install CloudMonitor agent on instances
wget http://cms-agent-ap-southeast-1.oss-ap-southeast-1.aliyuncs.com/release/1.3.7/install.sh
sudo bash install.sh
Best Practices and Optimization
Cost Optimization
- Spot Instance Strategy: Use spot instances with price limits to reduce costs by up to 90%
- Right-sizing: Monitor resource usage and adjust instance types accordingly
- Reserved Instances: For baseline capacity, consider reserved instances for predictable workloads
Performance Optimization
- GPU Instances: For better AI model performance, consider GPU-enabled instances like ecs.gn6v
- Model Caching: Implement model caching to reduce cold start times
- Load Balancer Optimization: Use session affinity if needed for stateful operations
Security Hardening
- Regular Updates: Implement automated patching for OS and dependencies
- Network Segmentation: Use multiple subnets to isolate different tiers
- Encryption: Enable encryption at rest and in transit
- Audit Logging: Enable comprehensive audit logging for compliance
Monitoring and Maintenance
Key Metrics to Monitor
- CPU and memory utilization
- Request latency and throughput
- Error rates and response codes
- Auto scaling events
- Spot instance interruption rates
Automated Maintenance
# Create maintenance script
cat > /etc/cron.daily/deepseek-maintenance << 'EOF'
#!/bin/bash
# Update system packages
apt update && apt upgrade -y
# Clean up logs
find /var/log -name "*.log" -mtime +7 -delete
# Restart services if needed
systemctl reload deepseek
EOF
chmod +x /etc/cron.daily/deepseek-maintenance
This architecture provides a robust, scalable, and cost-effective way to deploy DeepSeek AI on Alibaba Cloud. The solution offers:
- High Availability: Load balancer distributes traffic across multiple instances
- Cost Efficiency: Spot instances reduce compute costs significantly
- Security: Private networking and proper access controls protect the deployment
- Scalability: Auto scaling handles varying loads automatically
- Maintainability: Standardized images and automation reduce operational overhead
By following this guide, you'll have a production-ready DeepSeek AI deployment that can handle enterprise workloads while maintaining security and cost efficiency.
Remember to regularly review and update your security configurations, monitor costs, and optimize performance based on your specific use case requirements.