An index lets you impose order on a file without rearranging the fileData file: It is not a requirement for the record to be in orderIndex file: The record must be in orderA simple index is simply an array of (key, reference) pairs which is ordered by the field of the file.Reference: Stores the address of the recordIndexing gives us keyed access to variable-length record filesIndex is called an access path on the fieldThe index file occupies less disk blocks than the data file because its entries are much smallerdense or sparseA dense index: has an index entry for every search key value (every record) in the data file.A Sparse index: has an index entry for only some search key value (some records) in the data file.Primary index, Clustering index, Secondary indexThe fields within each record are also of variable-length but are separated by delimitersprimary key by concatenating the record company label code and the record’s ID number. This should form a unique identifierrapid keyed access, we build a simple index with a key field associated with a reference fieldReference field provides the address of the first byte of the corresponding data record.entry sequenced: the record occurs in the order they are entered in the file.The index is easier to use than the data file becausefixed-length recordssmaller than the data filecarry more information than the key and reference fields (e.g., we could keep the length of each data file record in the index as well)index is small enough to be held in memory.Create the original empty index and data file.Load the index into memory before using itRewrite the index file from memory after using itAdd records to the data file and indexDelete records from the data fileUpdate records in the data fileas an array of recordsThe loading into memory can be done sequentially, reading a large number of index records (which are short) at onceif the index changes, but its rewriting does not take place or takes place incompletely?Use a mechanism for indicating whether or not the index is out of date.reconstructs the index from the data file in case it is out of date.data file and the index should be updatedIn the data file, the record can be added anywhere However, the byte-offset of the new record should be savedshift all the records that belong after the one we are inserting to open up space for the new record. However, this operation is not too costly as it is performed in memoryThe index record corresponding to the data record being deleted must also be deleted.If the update changes the value of the key field, then both the index and data file may need to be reordered.If the update does not affect the key field, then the index does not need reordering, but the data file may.If Indexes are too large to hold in memory, thenBinary searching requires several seeks rather than being performed at memory speed.Index rearrangement requires shifting or sorting records on secondary storage ==> Extremely time consuming.Solutions: You should
hashed organizationtree-structured index (e.g., a B-Tree)Given the following data file Employee (NAME, SSN, Address, JOB, ………………). Suppose that: record size = 150 bytes, Block size = 512 bytes, Number of records in the file = 30000 Find number of file blocks?
For an index on the SSN field. Assume the SSN field size is 9 bytes, and the record pointer size is 7 bytes
| Primary Index | Clustering Index | Secondary Index |
|---|---|---|
| Ordered file | Ordered file | Ordered file a secondary means of accessing a file. |
| Data file is ordered on a key field (distinct value for each record) | Data file is ordered on a non-key field (no distinct value for each record) | Data file is ordered maybe on (1 - candidate key has a unique value), (2 - a non-key with duplicate values) |
| One index entry for each disk block. key field value is the first record in the block, which is called the block anchor. | One index entry for each distinct value of the field. The index entry points to the first data block that contains records with that field value. | The index is an ordered file with two fields: (1 - field value.), (2 - it is either a block pointer or a record pointer.) |
| Non dense (sparse) index | Non dense (sparse) index | If key -> dense — If non key -> dense or sparse index |