Interview :: Cassandra
Cqlsh is a Cassandra query language shell used to execute the commands of CQL (Cassandra query language).
Node: A node is a single machine running Cassandra.
Cluster: A cluster is a collection of nodes that contains similar types of data together.
Datacenter: A datacenter is a useful component when serving customers in different geographical areas. Different nodes of a cluster can be grouped into different data centers.
Cassandra CQL collection is used to collect the data and store it in a column where each collection represents the same type of data. CQL consist of three types of types:
- SET: It is a collection of unordered list of unique elements.
- List: It is a collection of elements arranged in an order and can contain duplicate values.
- MAP: It is a collection of unique elements in a form of key-value pair.
On a request of a data, before doing any disk I/O Bloom filter checks whether the requested data exist in the row of SSTable.
In Cassandra, to delete a row, it is required to associate the value of column to Tombstone (where Tombstone is a special value).
In Cassandra, SuperColumn is a unique element containing similar collection of data. They are actually key-value pairs with values as columns.
Difference between Column and SuperColumn:
- The values in columns are string while the values in SuperColumn are Map of Columns with different data types.
- Unlike Columns, Super Columns do not contain the third component of timestamp.
Hadoop, HBase, Hive and Cassandra all are Apache products.
Apache Hadoop supports file storage, grid compute processing via Map reduce. Apache Hive is a SQL like interface on the top of Haddop. Apache HBase follows column family storage built like Big Table. Apache Cassandra also follows column family storage built like Big Table with Dynamo topology and consistency.
In Cassandra, the void close() method is used to close the current session instance.
The cqlsh command is used to start the cqlsh prompt.