A real-time cryptocurrency order book implementation that connects to Binance WebSocket streams. Built for the Raven Developer Challenge. YouTube link: https://youtu.be/oGTKVbSRoYY , it shows some basic commands and that my app is working properly from my machine.
# Clone and run
git clone <your-repo>
cd raven-dev-challenge
go mod tidy # Install github.com/gorilla/websocket
go run main.goThat's it! You'll see live Bitcoin prices updating every second.
- Connects to Binance WebSocket API for real-time market data
- Maintains a sorted order book with great update performance
- Handles snapshot loading and incremental updates automatically
- Shows live bid/ask prices with spread calculation
While the challenge suggests Rust for maximum performance, I chose Go for this implementation:
Isn't Go the future? Just kidding. Every language has its pros and cons. As a student/developer with advanced knowledge in Java, C, Python, a few months ago, I decided to try Go and this project is perfect for showing Go's advantages.
Go Advantages:
- Faster development iteration for prototyping and debugging
- Built-in goroutines perfect for WebSocket handling - very helpful
- Excellent benchmarking and profiling tools - saves time for developing my own, and also adds great visualization
- Memory management that's good enough for financial applications
- Still fast enough to achieve sub-microsecond latency
Performance Reality: I expected to have some kind of implementation issue or poor performance due to my language choice. As shared above, I chose Go because of my knowledge and brief research on whether or not it would manage to be good for what I needed in my demo application. And I was right! My language choice brought simplicity to my project and the bottleneck was algorithmic (O(n log n) sorts). The 67x improvement came from data structure optimization, not low-level language features. Binary search vs sorting was the deal breaker.
When Rust Would Be Better:
- Custom memory allocators if needed
- For ultra-low latency requirements that will be needed in production HFT
For this challenge, Go provided the right balance of performance and development speed. Rust would only overcomplicate my solution and bring some performance updates that are not that important in my case. Moreover, my knowledge in Rust is limited to what it is used for and the learning curve will be to much for me having in mind the short deadline.
I evaluated several approaches for maintaining sorted price levels:
type levelMap struct {
qtyAt map[float64]float64
}Pros: O(1) lookups, simple Cons: No ordering, need to sort for iteration = O(n log n) => Too slow for frequent best bid/ask access
Pros: O(log n) for all operations Cons: Complex implementation, higher memory overhead, it will overcomplicate the solution without bringing much benefit. => Overkill for this use case, Go doesn't have built-in trees
type levelMap struct {
levels []PriceLevel // sorted by price
}Pros: O(1) best price access, it is cache-friendly Cons: O(n) search for updates, O(n) insertion/deletion Verdict: Too slow for updates - not HFT satisfactory
type levelMap struct {
isBid bool // bid or ask side
prices []float64 // sorted prices (desc for bids, asc for asks)
qtyAt map[float64]float64 // price -> quantity mapping
}Why I chose this:
- O(1) best price access:
prices[0]gives me top of book instantly - O(1) quantity lookups:
qtyAt[price]for any price level - O(log n) insertions: Binary search to find position + O(n) array insertion
There are some disadvantages, but the benefits exceed the drawbacks:
- Memory trade-off: ~2x memory usage but acceptable for the performance gain
This hybrid approach optimizes for the most common operations in trading systems:
- Best bid/ask queries (happens constantly) = O(1)
- Price level updates (frequent) = O(log n)
- Full book traversal (occasional) = O(n)
There is perfect correlation between need and efficiency. The more it's done, the faster it is. This is a valid point that shows the data structure and overall approach are good!
My initial idea was simple and trivial - add it to the list and sort the list. But when tested - too slow! The key optimization was replacing O(n log n) sorts with O(log n) binary search insertions:
func (lm *levelMap) set(price, qty float64) {
lm.qtyAt[price] = qty
// Terrible - resort entire array every time!
lm.prices = append(lm.prices, price)
if lm.isBid {
sort.Sort(sort.Reverse(sort.Float64Slice(lm.prices)))
} else {
sort.Float64s(lm.prices)
}
}Issue: O(n log n) complexity for every single update
func (lm *levelMap) insertPrice(price float64) {
n := len(lm.prices)
var insertPos int
if lm.isBid {
// Find position for descending order (highest first) - isBid is True
insertPos = sort.Search(n, func(i int) bool {
return lm.prices[i] <= price
})
} else {
// Find position for ascending order (lowest first) - isBid is false (it is an Ask)
insertPos = sort.Search(n, func(i int) bool {
return lm.prices[i] >= price
})
}
// Insert at found position
lm.prices = append(lm.prices, 0)
copy(lm.prices[insertPos+1:], lm.prices[insertPos:])
lm.prices[insertPos] = price
}Result: O(log n) to find position + O(n) to insert = Much more satisfactory!
Growing slices dynamically during snapshot loading takes too much memory. Instead of that approach:
// Before: multiple reallocations
ob.bids = newLevelMap(true)
for _, level := range snapshot.Bids {
ob.bids.set(price, qty) // Grows slice each time
}
// After: pre-allocate with known capacity
bidCount := len(snapshot.Bids)
ob.bids = &levelMap{
isBid: true,
prices: make([]float64, 0, bidCount), // pre-allocate capacity
qtyAt: make(map[float64]float64, bidCount),
}// Collect all prices first
bidPrices := make([]float64, 0, bidCount)
for _, lvl := range snap.Bids {
price, _ := strconv.ParseFloat(lvl[0], 64)
qty, _ := strconv.ParseFloat(lvl[1], 64)
if qty > 0 {
ob.bids.qtyAt[price] = qty
bidPrices = append(bidPrices, price)
}
}
// Sort once at the end - much more efficient
sort.Sort(sort.Reverse(sort.Float64Slice(bidPrices)))
ob.bids.prices = bidPricesI chose a single RWMutex for the entire order book rather than per-side locking:
type OrderBook struct {
mu sync.RWMutex // One lock for entire book
lastID int64
bids *levelMap
asks *levelMap
}Why single mutex:
- Simplicity: Easier to reason about, no deadlock risk
- Read optimization: Multiple goroutines can read simultaneously
- Atomic updates: Ensures bid/ask consistency during updates
- Performance: Read-heavy workloads benefit from RWMutex
Alternative considered: Per-side mutexes
- Pros: Better write concurrency
- Cons: Complexity, potential deadlocks, doesn't really match my understanding of trading patterns
Overall, just made more sense to me to use 1 lock (mutex) for operations that are related to each other.
What I tried: Initially connected using uppercase symbol format
// This didn't work - no messages received
url := fmt.Sprintf("wss://stream.binance.com:9443/ws/%s@depth@100ms", "BTCUSDT")What I observed:
- WebSocket connection succeeded (no error) and initial snapshot loaded correctly
- But no real-time updates were received and order book prices remained static
Debugging process:
- Added raw message logging to verify connection
- Checked Binance API documentation more carefully
- Found out that there is a case sensitivity requirement
Solution:
// Binance requires lowercase symbols
url := fmt.Sprintf("wss://stream.binance.com:9443/ws/%s@depth@100ms", strings.ToLower(symbol))Easy solution but definitely made me stress about why all parts work separately, but don't want to work together.
Initial approach: Used select with default case
// This was wrong - caused busy spinning
for {
select {
case <-ctx.Done():
return ctx.Err()
default:
_, msg, err := c.ws.ReadMessage() // Non-blocking
}
}Problem identified: The default case made ReadMessage non-blocking, causing a tight loop that interfered with message reception.
Solution: Simplified to blocking read with context checking
for {
select {
case <-ctx.Done():
return ctx.Err()
default:
}
c.ws.SetReadDeadline(time.Now().Add(10 * time.Second))
_, msg, err := c.ws.ReadMessage() // Blocking read
if err != nil {
return err
}
// Continue with Process message
}The Problem: Binance WebSocket and REST API weren't perfectly synchronized
What I observed:
(example run)
Snapshot lastUpdateId: 72665906280 First WebSocket update: U=72665906317 u=72665906336 I was surprised by the big gap - 37 sequence numbers!
Binance's Official Sync Logic (Complex):
- Subscribe to WebSocket stream
- Buffer all messages
- Fetch snapshot
- Drop buffered events where u <= snapshot.lastUpdateId
- Apply buffered events where U <= lastUpdateId+1 <= u
My Practical Solution: For this demo, I implemented a simplified but robust sync:
if c.syncing {
if upd.FinalID <= c.lastUpdateId {
continue // Skip old updates
}
// Accept first valid update after snapshot
c.syncing = false
log.Printf("Sync complete, applying first update after snapshot")
}Trade-offs:
- Pro: Simple, works reliably for demo purposes
- Con: Might miss some updates during snapshot fetch
Great for the challenge and my demo program, but if made for production:
- Production note: it would need full buffer implementation, so there are 0% chances of missing an update
Initial Performance (so terrible):
BenchmarkApplyUpdate_10k-8 15,081 68,382 ns/op 203 B/op 9 allocs/op
Profiling revealed: Surprisingly or not, 99% of time spent was in sort.Float64s(). Dind't think about that initially, it just looked like a simple and quick solution.
Root cause: Every single price level update triggered a full array sort. Very inefficient. Poor initial algorithmic choice on my part.
// This was what was killing us - O(n log n) every single time!
func (lm *levelMap) set(price, qty float64) {
lm.qtyAt[price] = qty
lm.prices = append(lm.prices, price)
sort.Float64s(lm.prices) // SO EXPENSIVE!
}Optimization strategy:
Key steps:
- Maintain sorted invariant instead of re-sorting
- Use binary search to find insertion point
- Shift elements rather than sort entire array
Result: 67x performance improvement! This conclusion is made from the result comparisons, there is a results section towards the end of the README.md. Check PERFORMANCE.md for more precise performance data.
Snapshot loading was allocating excessively:
BenchmarkSnapshot_1k-8 544 2,162,831 ns/op 239,665 B/op 2,075 allocs/op
Investigation: Each set() call during snapshot loading was:
- Calling
append()on prices slice (potential reallocation) - Calling
sort.Float64s()(temporary allocations) - Called 1000+ times per snapshot
Solution: Bulk processing approach
// Pre-allocate everything upfront
bidPrices := make([]float64, 0, len(snap.Bids))
qtyMap := make(map[float64]float64, len(snap.Bids))
// Collect all data first
for _, level := range snap.Bids {
// Parse and collect
}
// Sort once at the end
sort.Sort(sort.Reverse(sort.Float64Slice(bidPrices)))Result: 98% reduction in allocations (2,075 → 25). This was such a big deal!
Information about each file, folder, why it exists and how it has been used.
The optimized data structure with three specialized files:
orderbook.go - Core data structures and thread safety
- levelMap hybrid design (map + sorted slice)
- RWMutex protection for concurrent access
- Best bid/ask accessors (30ns, zero allocations)
snapshot.go - Bulk loading optimization
- Pre-allocated data structures
- Bulk processing to minimize allocations
- Single sort operation per side
update.go - Incremental updates
- Binary search insertion for O(log n) performance
- Sequence gap detection
- Zero-quantity level removal
binance_websocket.go - Network layer isolation
- Snapshot fetching from REST API
- WebSocket stream management
- Sync logic between snapshot and stream
- Auto-recovery on connection issues
- Error handling with specific logging
Simple demonstration showing:
- Real-time price updates
- Spread calculation
- Graceful shutdown handling
The optimizations made a huge difference:
| Operation | Before | After | Improvement |
|---|---|---|---|
| Update (10k levels) | 68,382 ns | 1,024 ns | 67x faster |
| Snapshot (1k levels) | 2.16ms | 117μs | 18x faster |
| Memory allocations | 2,075/op | 25/op | 98% less |
| Best bid/ask access | ~100ns | 30ns | 3x faster |
Current performance:
- ~1μs per update (sub-microsecond latency)
- >1M updates/second throughput
- 7M+ concurrent operations/second
Unit Tests (orderbook/orderbook_test.go):
go test ./orderbook/ -vTests cover:
- Snapshot loading correctness
- Incremental update application
- Best bid/ask accuracy
- Sequence gap detection
- Edge cases (empty book, single levels)
Happy Path Testing: Following challenge requirements, tests focus on:
- Snapshot followed by updates without sequence gaps
- Normal trading scenarios with realistic price movements
- Concurrent access patterns typical in trading systems
Complete benchmark execution:
# All benchmarks with memory tracking
go test -bench=. -benchmem ./benchmarks/
# Extended runs for stable results
go test -bench=. -benchmem -benchtime=10s ./benchmarks/
# Individual categories
go test -bench=BenchmarkApplyUpdate ./benchmarks/ # Update throughput
go test -bench=BenchmarkBestBidAsk ./benchmarks/ # Read performance
go test -bench=BenchmarkSnapshot ./benchmarks/ # Snapshot loading
go test -bench=BenchmarkConcurrentAccess ./benchmarks/ # Concurrent performance
go test -bench=BenchmarkRealisticWorkload ./benchmarks/ # Mixed workloadgo test -bench=BenchmarkApplyUpdate -benchmem ./benchmarks/Measures:
- Nanoseconds per update operation
- Memory allocations per update
- Bytes allocated per update
- Scalability across book sizes (100, 1k, 10k levels)
Key Results:
BenchmarkApplyUpdate_100-8 1,132,707 987 ns/op 65 B/op 8 allocs/op
BenchmarkApplyUpdate_1k-8 1,208,479 995 ns/op 65 B/op 8 allocs/op
BenchmarkApplyUpdate_10k-8 1,033,407 1,024 ns/op 66 B/op 8 allocs/op
go test -bench=BenchmarkBestBidAsk -benchmem ./benchmarks/Measures: Critical read path performance
Result: 35,869,435 ops 29.92 ns/op 0 B/op 0 allocs/op
go test -bench=BenchmarkSnapshot -benchmem ./benchmarks/Measures: Bulk loading efficiency
Result: 9,766 ops 116,959 ns/op 107,129 B/op 25 allocs/op
go test -bench=BenchmarkConcurrentAccess -benchmem ./benchmarks/Measures: Multi-threaded read/write performance
Result: 7,014,172 ops 172.9 ns/op 51 B/op 1 allocs/op
go test -bench=BenchmarkRealisticWorkload -benchmem ./benchmarks/Simulates: Mixed operations (updates + reads + level queries)
Result: 497,822 ops 2,416 ns/op 32 B/op 0 allocs/op
Pre-optimization snapshot loading:
- 239,665 bytes/operation - Excessive allocations
- 2,075 allocations/operation - GC pressure nightmare
Post-optimization results:
- 107,129 bytes/operation - 55% reduction
- 25 allocations/operation - 98% reduction
CPU Profiling:
go test -bench=BenchmarkRealisticWorkload -cpuprofile=cpu.prof ./benchmarks/
go tool pprof cpu.prof
# In pprof choose what you want to see: top10, list function_name, webMemory Profiling:
go test -bench=BenchmarkRealisticWorkload -memprofile=mem.prof ./benchmarks/
go tool pprof mem.prof
# In pprof: top10, list function_name, webAllocation Tracking:
go test -bench=BenchmarkApplyUpdate -benchmem -memprofilerate=1 ./benchmarks/Prerequisites:
- Go 1.19+ (for generics support in benchmarks)
Build process:
# Download dependencies
go mod tidy
# Build binary
go build -o orderbook-demo .
# Run with optimizations
go build -ldflags="-s -w" -o orderbook-demo .
# Cross-compilation example
GOOS=linux GOARCH=amd64 go build -o orderbook-linux-amd64 .Running benchmarks:
# Basic benchmark run
go test -bench=. ./benchmarks/
# With memory allocation tracking
go test -bench=. -benchmem ./benchmarks/
# Longer runs for statistical significance
go test -bench=. -benchmem -benchtime=30s -count=5 ./benchmarks/
# Save results for comparison
go test -bench=. -benchmem ./benchmarks/ > bench_results.txt- Data structure choice matters - Using both a map and sorted slice gave me O(1) reads and O(log n) updates
- Binary search beats sorting - Don't re-sort entire arrays when you can insert at the right position
- Pre-allocation helps - Knowing capacity upfront reduces memory allocations significantly
- Binance sync is tricky - The gap detection between snapshot and stream requires careful handling
raven-dev-challenge/
├── main.go # Demo program
├── orderbook/
│ ├── orderbook.go # Core data structures and thread safety
│ ├── snapshot.go # Bulk loading from REST API
│ └── update.go # Incremental updates from WebSocket
├── wsclient/
│ └── binance_websocket.go # Network layer for Binance API
└── benchmarks/
└── benchmark_test.go # Performance tests
Built with Go for simplicity and performance. No external frameworks needed beyond WebSocket support.