Jun 20, 2025 6 min read alibaba cloud

Deploying DeepSeek AI on Alibaba Cloud Singapore with Autoscaling and Load Balancing

Learn how to deploy DeepSeek AI on Alibaba Cloud Singapore with autoscaling, load balancing, and spot instances. Complete guide covering VPC setup, ECS configuration, custom AMI creation, Auto Scaling Groups, and security best practices for cost-effective AI model deployment in Singapore region.

Photo by Igor Omilaev / Unsplash

In this comprehensive guide, we'll walk through deploying DeepSeek AI on Alibaba Cloud's Singapore region (ap-southeast-1) with a robust, scalable architecture. We'll create a secure VPC environment, set up DeepSeek with a web UI, implement autoscaling with spot instances, and secure the deployment behind a load balancer.

Why Singapore Region?

The Singapore region offers several advantages for AI workloads:

Low latency for Southeast Asian users
Strong data privacy regulations compliance
Availability of GPU instances (ecs.gn6v, ecs.gn7i)
High availability with multiple zones
Excellent connectivity to global internet infrastructure

Architecture Overview

Our final architecture will include:

Virtual Private Cloud (VPC) for network isolation
ECS instances running DeepSeek with web UI
Custom AMI for rapid scaling
Auto Scaling Group with spot instances
Server Load Balancer (SLB) for high availability
Security groups for access control

Step 1: Creating a VPC on Alibaba Cloud

First, we'll establish our network foundation with a Virtual Private Cloud.

Using the Console

Navigate to VPC Console
- Log into Alibaba Cloud Console
- Go to Virtual Private Cloud service

Create VSwitch (Subnet)

VSwitch Name: deepseek-public-subnet
Zone: ap-southeast-1a (Singapore Zone A)
IPv4 CIDR Block: 10.0.1.0/24

Create VPC

VPC Name: deepseek-vpc
IPv4 CIDR Block: 10.0.0.0/16
Resource Group: Default

Using Terraform (Alternative)

resource "alicloud_vpc" "deepseek_vpc" {
  vpc_name   = "deepseek-vpc"
  cidr_block = "10.0.0.0/16"
}

resource "alicloud_vswitch" "public_subnet" {
  vpc_id     = alicloud_vpc.deepseek_vpc.id
  cidr_block = "10.0.1.0/24"
  zone_id    = "ap-southeast-1a"
}

Step 2: Building an ECS Instance with DeepSeek and UI

Now we'll create our base ECS instance with DeepSeek installed and configured with a web interface.

Create Security Group

First, create a security group for our instances:

# Using Alibaba Cloud CLI
aliyun ecs CreateSecurityGroup \
  --RegionId ap-southeast-1 \
  --GroupName deepseek-sg \
  --Description "Security group for DeepSeek instances" \
  --VpcId vpc-xxxxxxxxx

Add necessary rules:

# SSH access (restrict to your IP)
aliyun ecs AuthorizeSecurityGroup \
  --SecurityGroupId sg-xxxxxxxxx \
  --IpProtocol tcp \
  --PortRange 22/22 \
  --SourceCidrIp YOUR_IP/32

# HTTP access for web UI
aliyun ecs AuthorizeSecurityGroup \
  --SecurityGroupId sg-xxxxxxxxx \
  --IpProtocol tcp \
  --PortRange 8000/8000 \
  --SourceCidrIp 0.0.0.0/0

Launch ECS Instance

Create an ECS instance with the following specifications optimized for Singapore region:

Instance Type: ecs.g6.2xlarge (8 vCPU, 32GB RAM) or ecs.gn6v-c8g1.2xlarge (GPU enabled)
Image: Ubuntu 20.04 LTS
System Disk: 100GB SSD
VPC: deepseek-vpc
VSwitch: deepseek-public-subnet
Security Group: deepseek-sg
Zone: ap-southeast-1a

Install DeepSeek with Web UI

SSH into your instance and run the following setup script:

#!/bin/bash

# Update system
sudo apt update && sudo apt upgrade -y

# Install Python and pip
sudo apt install python3 python3-pip python3-venv git -y

# Install NVIDIA drivers (if using GPU instances - Singapore supports GPU instances)
sudo apt install ubuntu-drivers-common -y
sudo ubuntu-drivers autoinstall

# Create virtual environment
python3 -m venv deepseek-env
source deepseek-env/bin/activate

# Install DeepSeek dependencies
pip install torch torchvision torchaudio
pip install transformers accelerate
pip install gradio fastapi uvicorn

# Clone or install DeepSeek
git clone https://github.com/deepseek-ai/DeepSeek-Coder.git
cd DeepSeek-Coder

# Install requirements
pip install -r requirements.txt

# Create web UI script
cat > deepseek_ui.py << 'EOF'
import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load DeepSeek model
model_name = "deepseek-ai/deepseek-coder-6.7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

def generate_code(prompt, max_length=512, temperature=0.7):
    inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_length=max_length,
            temperature=temperature,
            pad_token_id=tokenizer.eos_token_id,
            do_sample=True
        )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response[len(prompt):]

# Create Gradio interface
iface = gr.Interface(
    fn=generate_code,
    inputs=[
        gr.Textbox(label="Code Prompt", placeholder="Enter your coding request..."),
        gr.Slider(minimum=100, maximum=1024, value=512, label="Max Length"),
        gr.Slider(minimum=0.1, maximum=1.0, value=0.7, label="Temperature")
    ],
    outputs=gr.Textbox(label="Generated Code"),
    title="DeepSeek Code Generator",
    description="Generate code using DeepSeek AI"
)

if __name__ == "__main__":
    iface.launch(server_name="0.0.0.0", server_port=8000)
EOF

# Create systemd service
sudo tee /etc/systemd/system/deepseek.service > /dev/null << 'EOF'
[Unit]
Description=DeepSeek Web UI
After=network.target

[Service]
Type=simple
User=ubuntu
WorkingDirectory=/home/ubuntu/DeepSeek-Coder
Environment=PATH=/home/ubuntu/deepseek-env/bin
ExecStart=/home/ubuntu/deepseek-env/bin/python deepseek_ui.py
Restart=always

[Install]
WantedBy=multi-user.target
EOF

# Enable and start service
sudo systemctl daemon-reload
sudo systemctl enable deepseek
sudo systemctl start deepseek

Step 3: Creating a Custom Image

Once your DeepSeek instance is configured and running properly, create a custom image for autoscaling.

Using Console

Go to ECS Console → Instances
Select your DeepSeek instance
Click More → Disk and Image → Create Custom Image

Configure:

Image Name: deepseek-base-image
Description: DeepSeek AI with web UI pre-configured

Using CLI

aliyun ecs CreateImage \
  --RegionId ap-southeast-1 \
  --InstanceId i-xxxxxxxxx \
  --ImageName deepseek-base-image \
  --Description "DeepSeek AI with web UI pre-configured"

Step 4: Creating Auto Scaling Group with Spot Instances

Now we'll set up autoscaling to handle varying loads efficiently and cost-effectively.

Create Scaling Group

Navigate to Auto Scaling Console

Create Scaling Group:

Scaling Group Name: deepseek-scaling-group
VPC: deepseek-vpc
VSwitch: deepseek-public-subnet
Min Size: 1
Max Size: 10
Default Cool-down: 300 seconds

Create Scaling Configuration

Create a scaling configuration that uses spot instances:

# Create scaling configuration
aliyun ess CreateScalingConfiguration \
  --ScalingGroupId asg-xxxxxxxxx \
  --ImageId m-xxxxxxxxx \
  --InstanceType ecs.g6.2xlarge \
  --SecurityGroupId sg-xxxxxxxxx \
  --ScalingConfigurationName deepseek-spot-config \
  --SpotStrategy SpotWithPriceLimit \
  --SpotPriceLimit 0.5 \
  --InternetMaxBandwidthOut 100 \
  --SystemDisk.Category cloud_ssd \
  --SystemDisk.Size 100

Create Scaling Rules

Set up scaling rules based on CPU utilization:

# Scale out rule
aliyun ess CreateScalingRule \
  --ScalingGroupId asg-xxxxxxxxx \
  --ScalingRuleName scale-out-rule \
  --AdjustmentType QuantityChangeInCapacity \
  --AdjustmentValue 2 \
  --Cooldown 300

# Scale in rule
aliyun ess CreateScalingRule \
  --ScalingGroupId asg-xxxxxxxxx \
  --ScalingRuleName scale-in-rule \
  --AdjustmentType QuantityChangeInCapacity \
  --AdjustmentValue -1 \
  --Cooldown 300

Create CloudMonitor Alarms

Set up alarms to trigger scaling:

# High CPU alarm (scale out)
aliyun cms PutMetricRuleTargets \
  --RuleName deepseek-cpu-high \
  --MetricName CPUUtilization \
  --Namespace acs_ecs_dashboard \
  --ComparisonOperator GreaterThanThreshold \
  --Threshold 70 \
  --EvaluationCount 2

# Low CPU alarm (scale in)
aliyun cms PutMetricRuleTargets \
  --RuleName deepseek-cpu-low \
  --MetricName CPUUtilization \
  --Namespace acs_ecs_dashboard \
  --ComparisonOperator LessThanThreshold \
  --Threshold 30 \
  --EvaluationCount 3

Step 5: Deploying Behind Server Load Balancer (SLB)

Create a load balancer to distribute traffic across your scaled instances.

Create Application Load Balancer

# Create ALB
aliyun slb CreateLoadBalancer \
  --RegionId ap-southeast-1 \
  --LoadBalancerName deepseek-alb \
  --VpcId vpc-xxxxxxxxx \
  --VSwitchId vsw-xxxxxxxxx \
  --LoadBalancerSpec slb.s3.medium \
  --PayType PayOnDemand

Configure Backend Server Group

# Create VServer Group
aliyun slb CreateVServerGroup \
  --LoadBalancerId lb-xxxxxxxxx \
  --VServerGroupName deepseek-backend-group

# Add backend servers (this will be automated by Auto Scaling)
aliyun slb SetVServerGroupAttribute \
  --VServerGroupId rsp-xxxxxxxxx \
  --BackendServers '[{"ServerId":"i-xxxxxxxxx","Port":8000,"Weight":100}]'

Create Listener

# Create HTTP listener
aliyun slb CreateLoadBalancerHTTPListener \
  --LoadBalancerId lb-xxxxxxxxx \
  --ListenerPort 80 \
  --BackendServerPort 8000 \
  --VServerGroupId rsp-xxxxxxxxx \
  --HealthCheck on \
  --HealthCheckURI /health \
  --HealthCheckConnectPort 8000

Update Auto Scaling Group

Attach the scaling group to the load balancer:

aliyun ess AttachLoadBalancers \
  --ScalingGroupId asg-xxxxxxxxx \
  --LoadBalancer lb-xxxxxxxxx

Step 6: Securing Private Instance Access

Implement security best practices to protect your DeepSeek deployment.

Network Security

Update Security Groups:

# Create restrictive security group for private instances
aliyun ecs CreateSecurityGroup \
  --GroupName deepseek-private-sg \
  --Description "Private security group for DeepSeek" \
  --VpcId vpc-xxxxxxxxx

# Allow traffic only from load balancer
aliyun ecs AuthorizeSecurityGroup \
  --SecurityGroupId sg-private-xxxxxxxxx \
  --IpProtocol tcp \
  --PortRange 8000/8000 \
  --SourceGroupId sg-lb-xxxxxxxxx

Create Private Subnet:

# Create private subnet for backend instances
aliyun vpc CreateVSwitch \
  --VpcId vpc-xxxxxxxxx \
  --CidrBlock 10.0.2.0/24 \
  --ZoneId ap-southeast-1a \
  --VSwitchName deepseek-private-subnet

Access Control

Implement API Authentication:

# Add to deepseek_ui.py
import jwt
import os
from functools import wraps

def require_auth(f):
    @wraps(f)
    def decorated_function(*args, **kwargs):
        token = request.headers.get('Authorization')
        if not token:
            return {'error': 'No token provided'}, 401
        try:
            jwt.decode(token, os.environ.get('JWT_SECRET'), algorithms=['HS256'])
        except jwt.InvalidTokenError:
            return {'error': 'Invalid token'}, 401
        return f(*args, **kwargs)
    return decorated_function

Create RAM Roles:

{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": ["ecs.aliyuncs.com"]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Monitoring and Logging

Configure Log Service:

# Create log project
aliyun log CreateProject \
  --ProjectName deepseek-logs \
  --Description "DeepSeek application logs"

# Create log store
aliyun log CreateLogstore \
  --ProjectName deepseek-logs \
  --LogstoreName app-logs \
  --TTL 30 \
  --ShardCount 2

Enable CloudMonitor:

# Install CloudMonitor agent on instances
wget http://cms-agent-ap-southeast-1.oss-ap-southeast-1.aliyuncs.com/release/1.3.7/install.sh
sudo bash install.sh

Best Practices and Optimization

Cost Optimization

Spot Instance Strategy: Use spot instances with price limits to reduce costs by up to 90%
Right-sizing: Monitor resource usage and adjust instance types accordingly
Reserved Instances: For baseline capacity, consider reserved instances for predictable workloads

Performance Optimization

GPU Instances: For better AI model performance, consider GPU-enabled instances like ecs.gn6v
Model Caching: Implement model caching to reduce cold start times
Load Balancer Optimization: Use session affinity if needed for stateful operations

Security Hardening

Regular Updates: Implement automated patching for OS and dependencies
Network Segmentation: Use multiple subnets to isolate different tiers
Encryption: Enable encryption at rest and in transit
Audit Logging: Enable comprehensive audit logging for compliance

Monitoring and Maintenance

Key Metrics to Monitor

CPU and memory utilization
Request latency and throughput
Error rates and response codes
Auto scaling events
Spot instance interruption rates

Automated Maintenance

# Create maintenance script
cat > /etc/cron.daily/deepseek-maintenance << 'EOF'
#!/bin/bash
# Update system packages
apt update && apt upgrade -y

# Clean up logs
find /var/log -name "*.log" -mtime +7 -delete

# Restart services if needed
systemctl reload deepseek
EOF

chmod +x /etc/cron.daily/deepseek-maintenance

This architecture provides a robust, scalable, and cost-effective way to deploy DeepSeek AI on Alibaba Cloud. The solution offers:

High Availability: Load balancer distributes traffic across multiple instances
Cost Efficiency: Spot instances reduce compute costs significantly
Security: Private networking and proper access controls protect the deployment
Scalability: Auto scaling handles varying loads automatically
Maintainability: Standardized images and automation reduce operational overhead

By following this guide, you'll have a production-ready DeepSeek AI deployment that can handle enterprise workloads while maintaining security and cost efficiency.

Remember to regularly review and update your security configurations, monitor costs, and optimize performance based on your specific use case requirements.