Engineering at Kweo.com
M-Square is the developer and operator of Kweo.com a highly scalable User Engagement Platform providing embeddable widgets to website operators and online media companies. Kweo is designed to handle millions of concurrent users and provides features like real-time comments with reddit-like ranking, polls/voting on massive scale to its subscribers.
As real-time is crucial, the architecture of Kweo is fundamental to our promise that our subscribers and their end users have the best user experience possible.
This is the first article of six in which we will discuss our architecture and design decisions made.
Core Requirements
- All Kweo widgets must be able to run on Kweo subscribers websites without blocking site content or performance impact
- Users don’t experience a noticeable lag, whenever users perform actions.
- All actions should feel instantly executed
- Works on major platforms including mobile platforms
- Kweo must be able to store and process millions of events per second (that’s important for polling/voting during major global events)
- Kweo must be cost effective to operate / maintain
- Must be fault-tolerant, highly available (works across datacenter and geographic regions)
- Ability to detect instantly abnormal deviations in the Kweo stack, auto-recovery and auto-scaling of infrastructure and applications
Design Decisions
The core stack
- Amazon Web Services for cost-effective infrastructure
- Netty as asynchronous event-driven network application framework
- Websockets and/or REST API for access of Kweo by developers
- Apache Kafka as a fault-tolerant, high throughput distributed messaging system
- Storm / Trident for distributed and fault-tolerant real-time computation
- Apache Cassandra a fault-tolerant, distributed column oriented database
- Netflix Astyanax is a high level Java client for Apache Cassandra
- Apache Solr for distributed, scalable and highly available search
DevOps
- Compuware dynaTrace for real-user monitoring and APM in dev, test and prod
- Puppet for IT automation
- Netflix Archaius dynamic configuration management API
- AWS CLI for automating AWS configuration
- Jenkins for Continuous Integration and Delivery
- Apache Jmeter for Performance and Stress Testing
- Selenium Webdriver for Browser automation
The future blog posts will describe in detail interesting topics about libraries, problems solved and challenges in our daily engineering work.
Read the following blog posts:
Application Performance Management for distributed real-time data processing