Services/Sync/Server/Archived/HereComesEverybody/HBaseNotes
From MozillaWiki
< Services | Sync | Server | Archived | HereComesEverybody
- "HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware."
- "HBase is an open-source, distributed, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop"
- http://www.roadtofailure.com/2009/10/29/hbase-vs-cassandra-nosql-battle/
- "If you need highly available writes with only eventual consistency, then Cassandra is a viable candidate for now. However, many apps are not happy with eventual consistency, and it is still lacking many features. Furthermore, even if writes do not fail, there is still cluster downtime associated with even minor schema changes. HBase is more focused on reads, but can handle very high read and write throughput. It’s much more Data Warehouse ready, in addition to serving millions of requests per second. The HBase integration with MapReduce makes it valuable, and versatile."
- This is biased and incorrect information by an HBase partisan with an axe to grind. For instance, schema changes do not require cluster downtime (node downtime yes, cluster downtime no). Many more corrections at http://n2.nabble.com/Fwd-HBase-vs-Cassandra-new-article-td3915432.html#a3915825. I also note that Cassandra 0.6 will match HBase's hadoop (MapReduce) support.
- "If you need highly available writes with only eventual consistency, then Cassandra is a viable candidate for now. However, many apps are not happy with eventual consistency, and it is still lacking many features. Furthermore, even if writes do not fail, there is still cluster downtime associated with even minor schema changes. HBase is more focused on reads, but can handle very high read and write throughput. It’s much more Data Warehouse ready, in addition to serving millions of requests per second. The HBase integration with MapReduce makes it valuable, and versatile."
- http://spyced.blogspot.com/2009/03/why-i-like-cassandra.html
- "Follows the bigtable model, so it's more complicated than it needs to be. (300+kloc vs 50 for Cassandra; many more components). This means it's that much harder for me to troubleshoot. HBase is more bug-free than Cassandra but not so bug-free that troubleshooting would not be required. Does not have any non-java clients. I need CPython support. Sits on top of HDFS, which is optimized for streaming reads, not random accesses. So HBase is fine for batch processing but not so good for online apps."
- I [jbellis] wrote this over a year ago; my then-claim that HBase is more bug-free is definitely outdated now. But, Cassandra is still faster; see the recent Yahoo benchmarks: http://bit.ly/dlBh2w (note that while Cassandra 0.4 already beats HBase almost across the board, 0.5 improves those numbers by 50% on write-heavy and 20% on read-heavy)
- "Follows the bigtable model, so it's more complicated than it needs to be. (300+kloc vs 50 for Cassandra; many more components). This means it's that much harder for me to troubleshoot. HBase is more bug-free than Cassandra but not so bug-free that troubleshooting would not be required. Does not have any non-java clients. I need CPython support. Sits on top of HDFS, which is optimized for streaming reads, not random accesses. So HBase is fine for batch processing but not so good for online apps."