Monday, August 11, 2014

Hollywood doesn't know how to computer

I thought I would share a funny clip from "Law and Order: SVU" I found.

In this clip, computer tech Reuben Morales shows detectives Benson and Stabler how he discovered a hidden message on a flash drive belonging to a pedophile. He finds "computer code hidden in a pixel" of a picture of a rainbow. He "cracks" the computer code to find a "hidden file" with thousands of pornographic images. In other pixels, he finds PDFs listing the names of the people in the pictures.

What's so disappointing about this scene is that the writers were a hair's breadth away from real methods for hiding information. There is a form of steganography where you use the low order bit of each pixel to hide a message. In an uncompressed image using 24-bit color, each pixel is encoded with three bytes, one byte each for red, green and blue. The human eye is not sensitive enough to distinguish colors that are adjacent to each other in this color space. That means an image with a message encoded into the lowest bit of each byte should not look strange when viewing it in a normal image viewer.

If the writers had consulted an actual engineer, they could have made some very minor tweaks to make that scene actually make sense. First, the hidden message should have been spread across multiple pixels. Steganography works by breaking up the secret message into tiny parts and sprinkling it throughout whatever message you're hiding your payload in.

The next mistake they made was to say that the thousands of images were actually stored within the one rainbow picture. One large image could probably hold a fair amount of data, but you are not going to hide a thousand images inside one image (let alone one pixel). It would have made more sense to hide something like a cryptographic key. Cryptographic keys are on the order of a couple hundred bytes and could easily be hidden, even in very small images. The cryptographic key could then be used to decrypt a hidden volume containing all the images and PDFs.

There you have it. Two small changes that could have made the world of difference. The writers could have even thrown in the word steganography to make themselves look extra smart. Instead, we have this.

Friday, February 7, 2014

Fixing a bonehead mistake in Solr

I was poking around in one of our Solr cores at work when I got this output from a query.
{
  "responseHeader": {
    "status": 0,
    "QTime": 248,
    "params": {
      "indent": "true",
      "q": "*:*",
      "_": "1391802519673",
      "wt": "json"
    }
  },
  "response": {
    "numFound": 36529751,
    "start": 0,
    "docs": [
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3304997"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3645477"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3645478"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3645479"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3645480"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3496486"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3645481"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3645482"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3645484"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3645485"
      }
    ]
  }
}
The reason for the error is that I reworked the schema of the core. I changed this field from a string to a long, and I forgot to delete all the existing records before reindexing.

Oops.

Okay, so how to fix it? I was able to determine that searching on the userId field with a numerical value would not return any of the corrupted records. A query like userId:[0 TO *] would select all valid records and exclude all corrupted records. I could invert that by doing *:* -userId:[0 TO *] to select all the corrupted records. A quick delete by query, and all the corrupted records disappeared.