SQLite Keynote

D. Richard Hipp, author of SQLite presented this years keynote. A fascinating jaunt from the civilized realms into barbarian territory and back. Here are some of the notes I took—

(Incidentally I noted that Richard pronounces SQLite S-Q-L-ite, not Sequel-Lite.)

Non-Comment Source Lines in SQLite 3.3.7 tree

Some of the major users of SQLite that he's allowed to talk about

Possibly the most widely distributed Tcl extension in the world.

Uses for SQLite

  1. Replacement for client-server RDBMS
  2. Stand-in for enterprise RDBMS during testing and demos
  3. Local cache of enterprise RDBMS data
  4. Persistence of objects, configuration
  5. Complex data structures
  6. ...

Complex Data Structures

Automated undo/redo using triggers.

Application File Format

What if ...

Key concept: SQLite wants to replace fopen()/[open] not Oracle/Oratcl.

Useful example code/utilities

Crossing into barbarian lands now...

Q: What is Full Text Search?
A: In brief: what Google does.
Q: Why is full text search important?
A: Internet search enginges have users spoiled. Modern applications need to support full text search in order to be competitive.
Q: Is it difficult to implement?
A: It is tricky for large data sets to get it right.

Basic approach

  1. tokenize into words
  2. case folding
  3. stemming (convert each word into its root form; porter stemmer very popular stemmer for English; recommended -> recommend; books -> book)
  4. remove stop words
  5. for each word left create posting list per word

OR queries do an intersection. AND queries do a union.

Phrase Queries: naive method: do an AND query, then examine every document in the result set in a second pass and eliminate those that lack the phrase. Very bad performance in pathalogical cases. Instead update posting list to store document:position (multiple occurances included multiple times).

Basic Operations

Keeping the Working Set Small

Full Text Search (version 1) built into SQLite 3.3.8 released Monday

As of SQLite 3.3.8 (released Monday!) full text search support in SQLite. Ricahrd's authorized to announce help from engineers at Google. Later question elicited that roughly half of the FTS code was written by him & Dan, the other half by four engineers [didn't have time to write down their names] from Google. He isn't able/can't comment on their motivations/plans/internal usage. (Obviouslyy won't be replacing their search engine with SQLite.)

Some example usage

CREATE VIRTUAL TABLE email USING fts1(content);

CREATE VIRTUAL TABLE email USING fts1("from" TEXT, "to" TEXT, subject TEXT, body TEXT, tokenize porter);

SELECT rowid, subject FROM email WHERE email MATCH 'wyrick sqlite';

SELECT rowid, subject FROM email WHERE email MATCH $::querytext;

MATCH: when left operand is the name of the table match against any column can specify a particular column $::querytext can specify columns i.e., "from:wyrick"

Built in snippet generator: SELECR rowid, snippet(email) FROM email WHERE email MATCH ...

FTS1 comes standard with Tcl bindings for SQLite (>= 3.3.8).

Potential uses:

Pervasive Full-Text Search (users want it; made easy using Tcl and SQLite).


—Michael A. Cleverly

Comments:

  1. Cameron Laird wrote (at Wed, 11 Oct 2006, 10:49):

Many thanks, Michael. It's great that you wrote this up, and so promptly.

  1. Matt Good wrote (at Fri, 13 Oct 2006, 21:48):

How does Subversion use SQLite? The Subversion backends are Berkely DB, or their own FSFS format. Trac uses both Subversion and SQLite, but I don't think Subversion has any SQLite support.

  1. Manuzhai wrote (at Sat, 14 Oct 2006, 07:36):

I think the Subversion work is on the merge-tracking branch, where (I think!) an SQLite database is used to keep track of what revisions where merged from what branch to where.

Also, it says in this article that OR does intersection, AND does union. Isn't that the other way around?

Permanent URL for this post: http://blog.cleverly.com/permalinks/247.html