Seva Software

 

What is Aruna DB?

Last Updated: 9/25/2001

A_FileSort

Purpose

License

Description

Dependencies

Limitations

Performance Considerations

To Do

Usage

Class Methods

Instance Methods

Testing

A_Debug Usage

 

Purpose:

This class allows you to sort large amounts of data. You put records in and when you iterate over the rows they are sorted. The records are stored in an array until the buffer size is exceeded. If all data fits in the specified buffer size, then the Array is sorted by calling Array.sort and you can iterate over the rows. If the buffer fills up, data is written to disk and sorted in chunks. The temporary files used for sorting will have an extension of .srt.

 

Description:

A_Index, A_BTree, and A_View use this class to allow you to iterate over your data sorted in other ways. When data has to be written to disk, this uses a divide and conquer method for sorted. This works best if the first buffer full of data that is process is a good representation of the data that is being sorted. If the first buffer full of data is mostly sorted, I assume all of the data is mostly sorted. Here are the basic steps:

 This nice thing about this approach is that is can sort a very large amount of data using a fairly small amount of memory. The temporary files are not opened until they are needed. This means that this class performs as well as an Array if the data all fits in the buffer size. The other nice thing about this approach is that you start getting data back after sorting only 1/20 of the data. For interactive users, this means they see data much quicker.

 

Dependencies:

 

Limitations:

  • None

 

Performance Considerations:

 

To Do:

 

 Usage:

  • Ascending sort:

srt = A_FileSort.new()

srt.write(['hello world', 'Test this 1', 2])

srt.write(['hello world3', 'Test this 1', 2])

srt.write(['hello world', 'Test this 3', 2])

srt.each {|data| print "#{data.join(',')}\n"}

srt.close()

  • To reverse sort:

rev_sort = proc {|k1, k2| k2 <=> k1}

srt = A_FileSort.new(nil, &rev_sort)

0.upto(99){|i| srt.write(i)}

srt.each {|data| print #{data}\n"}

srt.close

 

Class Methods:

A_FileSort.version()

Returns the version of this class.

A_FileSort.buffer_size=(new_size)

Allows you set the memory size in bytes of the buffer. This value is used to determine if the buffer is full and should written to disk. This value applies to all sort files. The default value is 500K.

A_FileSort.buffer_size()

Returns the currently defined size of the buffer used for sort files.

A_FileSort.file_count=(new_size)

Sets the number of temporary files to open and use for sorting. This applies to all FileSort objects. The default value is 20 temporary files are used for sorting.

A_FileSort.file_count()

Returns the number of temporary files that is used for sorting.

A_FileSort.new(path=nil, &sort_block)

Creates a new object. Temporary files are not opened until they are needed.

 

Instance Methods:

close()

Close this sort file and all opened temporary files.

each()

Iterate over all rows in the sort file. Yields(data). This starts the sorting process. The data is sorted the first time you call this method, so it take a minute or two before the data starts returning. Approximately every 1/20th of the data you could see more delays in the iterator as more data is being sorted.

write(data)

Write this data to the sort file.

 

Testing: 

tst_a_filesort.rb - this script performs basic testing for the A_FileSort class. To run this tests type:

ruby -I.. tst_a_filesort.rb

 

A_Debug Usage:

7 - prints high level stuff like creating and opening an A_TempFile object

70 - prints information about rewinding and sorting data.

71 - prints information about reading and writing blocks of data to and from disk.