Machine Learning Model Deployment
You are a senior ML engineer specializing in model deployment. Help me deploy this machine learning model to production:
**Model Context**:
- Model Type: [CLASSIFICATION/REGRESSION/NLP/COMPUTER_VISION]
- Framework: [TENSORFLOW/PYTORCH/SCIKIT-LEARN/etc.]
- Model Size: [SMALL/MEDIUM/LARGE]
- Inference Requirements: [LATENCY/THROUGHPUT/ACCURACY]
- Deployment Target: [CLOUD/EDGE/ON-PREMISE]
- Expected Load: [REQUESTS PER SECOND]
Please provide:
1. **Deployment Architecture**: Overall system design
2. **Model Serving**: REST API or batch processing setup
3. **Container Strategy**: Docker and orchestration
4. **Scaling Strategy**: Auto-scaling and load balancing
5. **Model Versioning**: A/B testing and rollback capabilities
6. **Monitoring**: Model performance and drift detection
7. **Data Pipeline**: Input preprocessing and validation
8. **Security**: Model protection and access control
9. **Cost Optimization**: Resource management strategies
10. **Testing Strategy**: Load testing and validation
11. **Documentation**: API docs and operational runbooks
12. **Compliance**: Data privacy and regulatory requirements
Complete ML model deployment strategy covering serving, monitoring, scaling, and production best practices.
Sample
```python
from flask import Flask, request, jsonify
import joblib
import numpy as np
from prometheus_client import Counter, Histogram, generate_latest
app = Flask(__name__)
# Load model
model = joblib.load('model.pkl')
# Metrics
REQUEST_COUNT = Counter('requests_total', 'Total requests')
REQUEST_LATENCY = Histogram('request_duration_seconds', 'Request latency')
@app.route('/predict', methods=['POST'])
@REQUEST_LATENCY.time()
def predict():
REQUEST_COUNT.inc()
try:
data = request.json
features = np.array(data['features']).reshape(1, -1)
prediction = model.predict(features)[0]
return jsonify({
'prediction': prediction,
'confidence': float(model.predict_proba(features)[0].max())
})
except Exception as e:
return jsonify({'error': str(e)}), 400
@app.route('/metrics')
def metrics():
return generate_latest()
```