The DB jungle guide: "How to select the right database"
This list is compiled from 2 years of NoSQL consulting and has been presented on many conferences (video
here),
articles (e.g. here) and in the worlds first NoSQL Books (in german).
Cluster 1: Know & Segment your data
Analyze & Categorize it:
- Domain-Data
- Log-Data
- Event-Data
- Message-Data
- critical Data
- Business-Data
- Meta-Data
- temp Data
- Session-Data
- Geo Data
- etc.
Data- / Storage Model:
- relational
- column-o
- doc-alike
- graphs
- objects
- multivalue
- objects=ORM
- JSON
- BLOBS
- etc. (beyond bit-bucket)
Data / Type constraints:
- Data-Navigation?
- Data Amount?
- Data Komplexity (Deep XML?)
- Schema flexibility?
- Schema support needed?
Persistence design:
(Reference: (C) highscalability link to be inserted)
- Durability? On power failure?
- Memtable/SSTable; Apend-only B-tree; B-tree; On-disk linked lists;
In-memory replicated; In-memory snapshots; In-memory only; Hash; Pluggable.
Cluster 2: Consistency Model
Global consistency model:
- ACID / BASE / WATER?
- Ability to (fine) tune the consistency model
CAP tradeoff:
Cluster 3: Performance Dimensions
- Latency / Request behaviour / distribution [High = 10, Low = 0]
- Throughput [High = 10, Low = 0]
- High Concurrency?
Cluster 4: Query Requirements
- Typical queries look like?
- SQL needed? LINQ needed?
- BI / Analytic-Tools needed? (M/R sufficient?)
- Ad-Hoc Queries needed?
- Map/Reduce needed? Background data analytics?
- Secondary Indices
- Range queries
- Weird aggregations
- ColumnDB needed for Analytics?
- Views
Cluster 5: Architecture and Patterns
Architecture looks like:
- local, parallel, distributed / grid, service, p2p, …
- Target Platform: Hosted? Cloud? Local? Datacenter? Smartphone? Desktop?
Data Access Patterns
- read / write distribution?
- random / sequential access?
- Access Design Patterns
Cluster 6: Non functional Requirements
- Replication needed? = Rubustness
- Automatic load balancing, partitioning, and repartitioning?
- Auto-Scaling
- Text search integration? Lucene / Solr?
- Refactoring Frequency?
- 24/7 System? Live add and remove?
- Developer Qualification
- DB simplicity? (installation, configuration, development, deployment, upgrade)
- Company restrictions?
- DB diversity (allowed?)
- Security? (authentication, authorization, validation?)
- Licence Model?
- Vendor trustworthiness?
- Community support?
- Documentation?
- Company and DB dev in the future?
Costs:
- DB-Support? (responsiveness, SLA)
- Costs in general, Scaling Costs
- Sysadmin costs
- Operational Costs: (noOps)
- Safety / Backup & Restore
- Crash Resistance, Disaster Management
- Monitoring