Friday, February 7, 2014

Fixing a bonehead mistake in Solr

I was poking around in one of our Solr cores at work when I got this output from a query.
{
  "responseHeader": {
    "status": 0,
    "QTime": 248,
    "params": {
      "indent": "true",
      "q": "*:*",
      "_": "1391802519673",
      "wt": "json"
    }
  },
  "response": {
    "numFound": 36529751,
    "start": 0,
    "docs": [
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3304997"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3645477"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3645478"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3645479"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3645480"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3496486"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3645481"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3645482"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3645484"
      },
      {
        "userId": "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=3645485"
      }
    ]
  }
}
The reason for the error is that I reworked the schema of the core. I changed this field from a string to a long, and I forgot to delete all the existing records before reindexing.

Oops.

Okay, so how to fix it? I was able to determine that searching on the userId field with a numerical value would not return any of the corrupted records. A query like userId:[0 TO *] would select all valid records and exclude all corrupted records. I could invert that by doing *:* -userId:[0 TO *] to select all the corrupted records. A quick delete by query, and all the corrupted records disappeared.