Skip to main content

One post tagged with "Reliability"

System reliability and uptime

View All Tags

Financial Data Quality: Why 99.9% Uptime Isn't Good Enough for Trading

· 5 min read
StockAPI Team
Financial Data Infrastructure Engineers

When choosing a financial data provider, pricing is easy to compare. Data quality is harder. This guide breaks down the 5 critical metrics that separate professional-grade data from unreliable sources.

The Hidden Cost of Bad Data

A Real Trading Disaster

March 15, 2024 - A mid-sized crypto trading firm lost $127,000 in a single day:

  • Their scraping infrastructure had 98.5% uptime (sounds good, right?)
  • That's 0.36 hours of downtime per day (21.6 minutes)
  • During a 12-minute outage, BTC dropped 8%
  • Their stop-losses didn't trigger (no data = no action)
  • Positions stayed open, accumulating losses

98.5% uptime = 21.6 minutes of daily downtime = unacceptable for trading

Metric 1: Data Accuracy

What Gets Measured

# Accuracy = matching the exchange's official data
exchange_price = 43251.50 # Direct from Binance API
parser_price = 43251.50 # From your data source

accuracy = 100% if exchange_price == parser_price else 0%

Common Accuracy Problems

Problem 1: Stale Data

# ❌ BAD: Scraping HTML (30-60s delay)
import requests
from bs4 import BeautifulSoup

html = requests.get("https://exchange.com/markets/BTC-USD").text
soup = BeautifulSoup(html, 'html.parser')
price = float(soup.find("div", class_="price").text)

# Issues:
# - Price from 30-60 seconds ago
# - HTML may be cached by CDN
# - No timestamp information
# - Can't verify freshness

Problem 2: Parsing Errors

# ❌ BAD: Fragile HTML parsing
price_text = soup.find("span", class_="price-value").text
# "$ 43,251.50 USD"

# Naive parsing
price = float(price_text.replace("$", "").replace(",", ""))
# Works... until exchange changes format to "43.251,50" (EU format)
# Result: Crash or wrong data

Professional Accuracy

# ✅ GOOD: Direct API access with validation
from stockapi import BinanceParser

parser = BinanceParser()
ticker = parser.get_ticker("BTCUSDT")

print(f"Price: ${ticker['price']}")
print(f"Timestamp: {ticker['timestamp']}") # Server timestamp
print(f"Data age: {ticker['age_ms']}ms") # Calculated freshness

# Guarantees:
# - Direct from exchange API
# - Validated against schema
# - Timestamp included
# - Sub-second freshness
# - 99.99% accuracy rate

Measuring Accuracy

# Compare provider data against exchange API
import time
from stockapi import BinanceParser

parser = BinanceParser()
correct = 0
total = 0

for _ in range(1000):
# Get from both sources simultaneously
parser_data = parser.get_ticker("BTCUSDT")
exchange_data = requests.get(
"https://api.binance.com/api/v3/ticker/price?symbol=BTCUSDT"
).json()

if parser_data['price'] == float(exchange_data['price']):
correct += 1
total += 1

time.sleep(1)

accuracy = (correct / total) * 100
print(f"Accuracy: {accuracy}%")

# StockAPI Results:
# - Binance: 99.99% (5 mismatches in 50,000 samples)
# - Coinbase: 99.98%
# - NYSE: 99.95%
#
# Typical DIY Scraping:
# - 85-95% accuracy (frequent parsing errors)

Metric 2: Latency

What Gets Measured

# Latency = time from event to data availability
event_time = 1699564723.145 # Exchange timestamp
receive_time = 1699564723.198 # Local timestamp

latency_ms = (receive_time - event_time) * 1000
# Target: <100ms for real-time trading

Latency Breakdown

MethodAverage LatencyBest CaseWorst Case
Direct WebSocket20-50ms10ms100ms
REST API Polling500ms-2s200ms5s
HTML Scraping2-5s1s30s
Cached Data30-300s10s

Why Latency Matters

# Arbitrage opportunity window
binance_price = 43250.00 # Updated at T+0ms
coinbase_price = 43270.00 # Updated at T+50ms (50ms latency)

spread = 43270 - 43250 = $20 profit opportunity

# But...
# High-frequency traders with 10ms latency already took it
# You arrive at T+50ms: opportunity gone
# Result: Missed trade

Measuring Latency

# ✅ Real-world latency measurement
from stockapi import BinanceParser
import time

parser = BinanceParser()
latencies = []

for update in parser.stream_ticker("BTCUSDT"):
exchange_time = update['timestamp']
local_time = time.time() * 1000

latency = local_time - exchange_time
latencies.append(latency)

if len(latencies) == 1000:
break

# Calculate percentiles
p50 = sorted(latencies)[500] # Median
p95 = sorted(latencies)[950] # 95th percentile
p99 = sorted(latencies)[990] # 99th percentile

print(f"Median latency: {p50:.2f}ms")
print(f"P95 latency: {p95:.2f}ms")
print(f"P99 latency: {p99:.2f}ms")

# StockAPI Results (WebSocket):
# - P50: 35ms
# - P95: 85ms
# - P99: 150ms
#
# DIY Scraping (REST polling):
# - P50: 650ms
# - P95: 2400ms
# - P99: 5000ms+

Metric 3: Reliability (Uptime)

What Gets Measured

# Uptime = percentage of time data is available
uptime_percentage = (operational_time / total_time) * 100

The 99% Trap

Uptime %Downtime per DayDowntime per MonthAcceptable?
99.9%1.4 minutes43.2 minutes✅ Trading OK
99.5%7.2 minutes3.6 hours⚠️ Risky
99.0%14.4 minutes7.2 hours❌ Unacceptable
98.0%28.8 minutes14.4 hours❌ Disaster
95.0%72 minutes36 hours❌ Worthless

Reality check: DIY scraping typically achieves 85-95% uptime without dedicated DevOps.

Common Reliability Issues

Issue 1: No Automatic Recovery

# ❌ BAD: Crashes on first error
import requests

while True:
response = requests.get("https://api.binance.com/ticker")
data = response.json()
# Process data...

# What happens when:
# - Network hiccup: CRASH
# - API rate limit: CRASH
# - Server timeout: CRASH
# - Invalid JSON: CRASH
#
# Requires manual restart
# 95% uptime at best

Issue 2: Silent Failures

# ❌ BAD: Fails silently, returns stale data
cached_price = 43250.00

try:
response = requests.get("https://api.binance.com/ticker", timeout=1)
price = response.json()['price']
except:
price = cached_price # Return old data!

# Problems:
# - Trading on stale data
# - No error notification
# - Silent degradation
# - False confidence

Professional Reliability

# ✅ GOOD: Automatic recovery with monitoring
from stockapi import BinanceParser

parser = BinanceParser(
retry_attempts=5,
retry_delay=1.0,
circuit_breaker=True, # Stop on repeated failures
health_check_interval=60,
)

# Real-time health monitoring
if parser.is_healthy():
ticker = parser.get_ticker("BTCUSDT")
else:
# Parser detected issues and switched to backup
send_alert("Primary parser unhealthy, using backup")

# Handles automatically:
# - Network failures
# - API rate limits
# - Server timeouts
# - Invalid responses
# - 99.9% uptime guaranteed

Measuring Reliability

# 30-day uptime tracking
import time
from stockapi import BinanceParser

parser = BinanceParser()
successful_calls = 0
failed_calls = 0

# Check every minute for 30 days
for _ in range(43200): # 30 days * 24 hours * 60 minutes
try:
ticker = parser.get_ticker("BTCUSDT", timeout=5)
if ticker and ticker['price'] > 0:
successful_calls += 1
else:
failed_calls += 1
except:
failed_calls += 1

time.sleep(60)

uptime = (successful_calls / (successful_calls + failed_calls)) * 100
print(f"30-day uptime: {uptime}%")

# StockAPI Results:
# - 99.92% uptime (35 minutes downtime/month)
#
# DIY Scraping Results:
# - 85-95% uptime (36-108 hours downtime/month)

Metric 4: Data Completeness

What Gets Measured

# Completeness = percentage of expected data fields present
expected_fields = [
'symbol', 'price', 'volume', 'high', 'low',
'open', 'close', 'timestamp', 'change_24h'
]

received_fields = list(data.keys())
completeness = (
len(set(expected_fields) & set(received_fields)) /
len(expected_fields)
) * 100

Incomplete Data Examples

Problem: Missing Critical Fields

# ❌ BAD: Scraping misses fields
html_data = {
'price': 43250.00,
'symbol': 'BTCUSDT',
# Missing: volume, timestamp, high/low, change
}

# Can't calculate:
# - Price momentum (no change %)
# - Volume trend (no volume)
# - Data freshness (no timestamp)
# - Daily range (no high/low)

Professional Completeness

# ✅ GOOD: Complete data set
from stockapi import BinanceParser

parser = BinanceParser()
ticker = parser.get_ticker("BTCUSDT")

print(ticker)
# {
# 'symbol': 'BTCUSDT',
# 'price': 43250.00,
# 'volume_24h': 28450.5,
# 'high_24h': 44100.00,
# 'low_24h': 42800.00,
# 'open_24h': 43000.00,
# 'close_24h': 43250.00,
# 'change_24h': 0.58,
# 'change_percent_24h': '0.58%',
# 'timestamp': 1699564723145,
# 'bid': 43249.50,
# 'ask': 43250.50,
# 'spread': 1.00,
# }

# 100% completeness
# All fields guaranteed
# Validated schema

Metric 5: Historical Consistency

The Backfill Problem

# When your scraper was down, can you recover the data?

# ❌ DIY Scraping: Data is lost forever
downtime_start = "2024-03-15 14:30:00"
downtime_end = "2024-03-15 14:42:00"
# 12 minutes of missing data
# Can't recover: exchange APIs don't provide historical tick data
# Result: Gaps in your database

# ✅ StockAPI: Automatic backfill
parser = BinanceParser()
historical_data = parser.get_ticker_history(
symbol="BTCUSDT",
start_time="2024-03-15 14:30:00",
end_time="2024-03-15 14:42:00",
interval="1m"
)
# Complete data recovered
# No gaps in historical analysis

Real-World Comparison

DIY Scraping Infrastructure

6-Month Results (medium-sized trading firm):

  • Accuracy: 89% (frequent parsing errors)
  • Latency: 650ms median, 2.4s P95
  • Uptime: 94.2% (42 hours downtime)
  • Completeness: 65% (missing fields)
  • Cost: $35K (dev time + infrastructure)
  • Incidents: 37 critical outages

StockAPI Professional Infrastructure

6-Month Results (same period):

  • Accuracy: 99.98%
  • Latency: 35ms median, 85ms P95
  • Uptime: 99.95% (22 minutes downtime)
  • Completeness: 100%
  • Cost: $1,794 (Professional plan)
  • Incidents: 0 (automatic recovery)

Conclusion

Financial data quality isn't negotiable for serious trading:

  1. Accuracy: 99.98% vs 89% (DIY)
  2. Latency: 35ms vs 650ms
  3. Uptime: 99.95% vs 94%
  4. Completeness: 100% vs 65%
  5. Total Cost: $1,794 vs $35K

The real question: Can you afford 42 hours of downtime per year?

For professional trading, 99.9% uptime is the minimum. Anything less is gambling with your capital.


Ready for professional-grade data quality? Start with StockAPI → 99.95% uptime, <100ms latency, guaranteed accuracy.