Seva Software

 

What is Aruna DB?

Last Updated: 9/25/2001

A_Buffer.rb

Purpose

License

Description

Dependencies

Limitations

Performance Considerations

To Do

Usage

Class Methods

Instance Methods

Testing

A_Debug Usage

 

Purpose:

This class quickly stores and retrieves key/value pairs in a fixed size buffer. This has been optimized to used as little disk space as possible and store/retrieve key/value pairs as quickly as possible. This class is used very heavily by the A_BTree class. This class stores key/value pairs in a malloced block of memory. You can insert and delete items similar to an Array. The items are put into the buffer using apack/aunpack or Marshal.dump/load. Apack/aunpack are methods very similar to Array.pack and String.unpack but have been modified to support database type objects such as date, time, varchar, char, etc. This class was created as a replacement for the Array class to store data in pages in the A_BTree class. The A_BTree class runs twice as fast, on average, using the A_Buffer class verses using an Array.

 

Description:

This class puts key/value pairs into a fixed size buffer for reading and writing to and from disk. This used by the A_BTree class. I don't expect there will be much of a need for a class like this, so I kept the methods to a minimum, creating only what I needed to support the A_BTree class. I wrote this in C to maximize performance because some of the functionality was difficult in Ruby. Apack/aunpack are methods written in C that very similar to pack and unpack but have been modified to support database type objects such as date, time, varchar, char, etc. They can be found in aruna.c.

Additionally, this module contains several C methods added to the A_FileStore class for performance reasons. These are used internally by the A_FileStore class to read and write A_Buffer class directly to disk. They are:

This module also contains a new class call A_Pos. This is returned and used by A_FileStore and the C methods mentioned above. See the documentation for A_Pos for a description of this class.

The a_buffer.c module also contains the following additions to the Array class:

This module also contains a new class call A_DeleteNode. This class is used internally by A_Filestore to track deleted file space in a filestore. There is no documentation on the A_DeleteNode class at this time.

This defines new errors standard errors called BufferIsFull and FileStoreIsFull. BufferIsFull is raised when you insert or update and the buffer it too full to accommodate the change. FileStoreIsFull is raised when you try to write to a filestore that has no space left.

The internals of A_Buffer. The first and second bytes in the filestore is number of items in the buffer (assuming index_data_type is an unsigned short). The next series of bytes are two byte pointers (offsets from the start of buffer) for each key/value pair in the buffer. Basically, it is an array of unsigned shorts containing one unsigned short for each key/valu pair in the buffer. These are used to quickly pinpoint the start of each key/value pair in the buffer. The key/value pairs are filled from the end of the buffer toward the front. The first pair is at the very end of the buffer, the second pair is in the second to last bytes of the buffer. The pointers and key/values fill up until they meet in the middle of the buffer. This design is not simple by any means, but it stores variable length records efficiently and the pointers are automatically optimized out when the key and value are fixed in length (when both are fixed in length, the position of each key/value pair can be easily calculated). In Aruna (A_BTree), A_Buffer.asearch(key) is called to find the slot to put the key/value pair in. This insures the keys are sorted in the buffer. Here is a sample of a buffer with 9 key/value pairs (8 bytes needed for each key/value pair) where buffer_size = 128 bytes. A | separates each value:

|09|120|112|104|094|086|078|070|062|054| … |k9v9|k8v8|k7v7|k6v6|k5v5|k4v4|k3v3|k2v2|k1v1|

Technically, if the key/value pair was 8 bytes for all pairs, then buffer would actually look like this because the pointers would get optimized out: 

|09| … |k9v9|k8v8|k7v7|k6v6|k5v5|k4v4|k3v3|k2v2|k1v1|

 

Dependencies:

 

Limitations:

#define index_data_type unsigned short

to

#define index_data_type unsigned int

The draw back is that the 2 byte buffer pointer for each key/value pair will now be a 4 byte buffer pointer.

 

Performance Considerations:

 

To Do:

 

 Usage:

N/A

 

Class Methods:

A_Buffer.new (buffer_size, key_pack_string, key_is_array, data_pack_string, data_is_array)

Creates a new A_Buffer object. The key and data pack strings are used specify how to pack the data into the buffer. When provided, apack and aunpack are used to store and retrieve data to and from the buffer. When not provided or a '*' is used then Marshal.dump and Marshal.load are used to store and retrieve the data. Using pack strings typically use less memory for each key/value pair and are faster than using Marshal.dump/load. Marhsal.dump/load many be slower but it allows you put almost any Ruby object into a buffer (see the documentation on Marshal for details). Valid pack strings are provided for a small set of objects that are typically found in most databases. See apack for details on supported pack strings.

 

Instance Methods:

asearch(key, exact_hit)

This performs a modified binary search on A_Buffer for key. This is a binary search on the elements in the a_buffer where the index of the first element >= key is returned. This is used by A_BTree to determine where to insert new rows. There is a similar method in a_buffer.c called Array.asearch.

key - is the key you are looking for in the buffer.

exact_hit - if 0 or nil, then return the index of the first element >= key. If != 0, then return nitems if the object is not found.

buffer_size()

Returns the actual size of this buffer in bytes.

clear()

Empties the buffer.

clear_pending()

Clears, deletes, or remove a pending insert/update. When an insert or update fails because the buffer is too full, the insert/update is remembered as a pending insert or update and an error is raised. There are three ways to clear a pending update/insert. 1) call move_rows to move rows to another buffer and make room for the insert/update. The pending update/insert is performed as rows are moved. 2) call clear_pending to clear or delete the pending insert/update. You will loose the inserted or updated data. 3) call complete_pending(). This will attempt to reapply the insert/update. This will fail unless you have deleted one or more rows from the buffer.

complete_pending()

Attempts to post a pending insert/update. This will likely fail unless you have deleted one or more rows from the buffer. When an insert or update fails because the buffer is too full, the insert/update is remembered as a pending insert or update and an error is raised. There are three ways to clear a pending update/insert. 1) call move_rows to move rows to another buffer and make room for the insert/update. The pending update/insert is performed as rows are moved. 2) call clear_pending to clear or delete the pending insert/update. You will loose the inserted or updated data. 3) call complete_pending(). This will attempt to reapply the insert/update. This will likely fail unless you have deleted one or more rows from the buffer.

copy(buffer)

Copies the internal data stored in self to buffer. For a complete copy of self do the following: new_buffer = buf.dup; buf.copy(new_buffer).

data_at(idx)

Returns the value stored at idx.

data_is_array?()

Return true if the values stored in this array should be returned as an array. Returns false if the values are returned as an individual object. It allows you to put 64 in and get 64 back out; [64] in and [64] back out.

data_is_array=(value)

Sets the data_is_array value. See description of data_is_array?() for more details.

data_pack_string()

Returns the data_pack_string currently assigned to this buffer. You can't change this value.

data_size()

If you have provided a valid data pack string and it contains a fixed size, then the fixed size of the value is returned. Otherwise 0 is returned. Certain optimizations are performed on the buffer when valid pack strings are provided and both the key and value have a fixed size. This is calculated from the data_pack_string. You can't change this value.

delete(idx)

Deletes the key/value pair stored at idx. Same as Array.delete_at(5).

dup(buffer)

This returns a new A_Buffer object that is a dup of self. This is a shallow copy of buffer. A new internal buffer is created but the objects in buffer are not copied to the dup.

empty()

Empties the buffer.

exists?(key, value)

Return true if this key/value pair exists in this buffer. If value=nil, then return true if this key exists in this buffer. Otherwise return false.

fits?(key, value)

Return the number of bytes this key value pair will use. Raise a NoMemoryError error if this key value pair does not fit in the buffer.

fixed_size?()

Return true if the key and value are both fixed in size. Otherwise return false. A_Buffer optimizes by eliminating the 2 byte buffer pointers when this is true. This value is automatically set when the buffer is created based on the key_pack_string and data_pack_string. You can't change this value.

insert(idx, key, value)

Insert the key/value pair at idx. Similar to Array[5, 0] = [key, value] where idx = 5. BufferIsFull is raised if the buffer is too full to hold this key/value pair.

key_at(idx)

Returns the key stored at idx.

key_is_array?()

Return true if the keys stored in this array should be returned as an array. Returns false if the keys are returned as an individual object. It allows you to put 64 in and get 64 back out; [64] in and [64] back out.

key_is_array=(value)

Sets the key_is_array? value. See description of key_is_array?() for more details.

key_is_byte_comparable?()

Return true if all the elements in the key are byte comparable. Otherwise return false. This is automatically set when the buffer is created by using the key_pack_string. If each element in the key_pack_string is byte_comparible then this returns true. Some types are byte comparible and others are not. When true, asearch is optimized by calling memcmp() on the data stored in the buffer. When false, the aunpack or Marshal.load is called to get the key out of the buffer before the compare is done. Byte comparable keys are faster than non byte comparable keys. You can't change this value.

key_pack_string()

Returns the key_pack_string currently assigned to this buffer. You can't change this value.

key_size()

If you have provided a valid data pack string and all of the keys elements are fixed in size, then the fixed size of the value is returned. Otherwise 0 is returned. Certain optimizations are performed on the buffer when valid pack strings are provided and both the key and value have a fixed size. This is calculated from the key_pack_string. You can't change this value.

move_rows(buffer, rows_to_move, force_move)

Move rows from one buffer to another. Returns the actual number of rows moved. There are many possibilities. You can even the rows out between the two pages, move all rows from one page to the other, or move a couple of rows from one page to the other.

If there is a pending update or insert, it is applied while the rows are being moved. When an insert or update fails because the buffer is too full, the insert/update is remembered as a pending insert or update and an error is raised. There are three ways to clear a pending update/insert. 1) call move_rows to move rows to another buffer and make room for the insert/update. The pending update/insert is performed as rows are moved. 2) call clear_pending to clear or delete the pending insert/update. You will loose the inserted or updated data. 3) call complete_pending(). This will attempt to reapply the insert/update. This will fail unless you have deleted one or more rows from the buffer.

Examples:

buf1.move_rows(buf2, -1, 1) # move all rows in buf1 to buf2 (prefix buff2 with rows in buff1 )

buf1.move_rows(buf2, -1, -1) # move all rows in buf2 to buf1 (append buff1 with rows in buff2 )

buf1.move_rows(buf2, 0, 0) # moves rows from the more full buffer to the less empty buffer until they match in size

buf1.move_rows(buf2, 1, 1) # move one row from the end of buf1 to the beginning of buf2

buf1.move_rows(buf2, 1, -1) # move one row from the beginning of buf2 to the end of buf1

nitems()

Return a count of key/value pairs stored in this buffer.

size()

Returns the actual size of the bufferin bytes, same as buffer_size.

unused()

Returns the number of unused, free, or available bytes in this buffer.

update(idx, new_value)

Updates the value stored a idx. BufferIsFull is raised if the buffer is too full to hold this update of the value.

update_key(idx, new_key)

Updates the value stored a idx. BufferIsFull is raised if the buffer is too full to hold the update of this key.

used()

Returns the number of used bytes in this buffer.

 

Testing: 

tst_a_buffer.rb - tests the basic functionality of the A_Buffer class. To run these tests type:

ruby -I.. tst_a_buffer.rb

 

A_Debug Usage:

N/A