Unicode String Searching -- Russian Text

This data set is for Russian language string searching in unicode UTF16BE encoding. Since the text is bilingual English and Russian, this data set can be used for searching English also.


The evil Boris and Natasha have escaped from jail and are up to their old tricks. They have stolen the new menu from The Little Russian Tea Room. Your task is to reconstruct the menu from an image (EnCase,ilook) of their hard drive. To accomplish this you have to find all eight sections of the menu located in the image.

  1. ЗАКУСКИ (Appetizers)
  2. СУП (Soup)
  3. БЛИНЫ (Pancakes)
  4. ПИРОЖКИ И ПЕЛМЕНИ (Meat pies and dumplings)
  5. МЯСО И РЫБА (Meat and fish)
  6. СЫР И МОЛОЧНЫЕ (Cheese and milk products)
  7. НАПИТКИ (Beverages)
  8. СЛАДКОЕ (Dessert)


The reconstructed menu.

Creating this test image

The test image was created so that the hard drive appears to have been a Western Digital: Model (WDC WD200EB-00CSF0) serial # (WD-WTAAV4044563) with 201600 sectors rather than the usual much larger size for this model drive. This was accomplished by creating a host protected area at sector 201600 and then attaching the drive to the host computer with an IDE to USB bridge to hide the HPA from the imaging software. Then the CFTT diskwipe program  was used to write the C/H/S and LBA address to each sector and fill the remainder of each sector with 0xB9. This base configuration was imaged (EnCase, iLook and bzip2 compressed dd). The drive was partitioned and formatted with Partition Magic Pro Version 7.0. After the drive was populated with the text from the menu it was imaged again. The ilook logfiles for the base and final images are available.

