SQLite Keynote
D. Richard Hipp, author of SQLite presented this years keynote. A fascinating jaunt from the civilized realms into barbarian territory and back. Here are some of the notes I took—
(Incidentally I noted that Richard pronounces SQLite S-Q-L-ite, not Sequel-Lite.)
Non-Comment Source Lines in SQLite 3.3.7 tree
- Tcl 54.28%
- Tcl Bindings 8.55%
- Core C 35.71%
- Other 1.46%
Some of the major users of SQLite that he's allowed to talk about
- GE
- Apple
- AOL
- Sun (Solaris 10)
- Subversion
- Yum
- Philips
- Monotone
- General Dynamics
- Toshiba
- Symbian
- Firefox
- Federal Aviation Administration
- Microsoft (xbox)
Possibly the most widely distributed Tcl extension in the world.
Uses for SQLite
- Replacement for client-server RDBMS
- Stand-in for enterprise RDBMS during testing and demos
- Local cache of enterprise RDBMS data
- Persistence of objects, configuration
- Complex data structures
- ...
Complex Data Structures
- Class <-> Table
- Object <-> Row
- Instance Variable <-> Column
Automated undo/redo using triggers.
Application File Format
- File/Open reads and parses the entire application file
- Error prone
- Lots of code to maintain
- Application crash causes data lost
- File/Save requires the user to remember to save, possible to overwrite of independent changes, corruption if file generator and parser do not match
What if ...
- No need to read and parse file content
- updates are atomic, consistent, isolated and durable
- automatic concurrency control
- Changes are written to disk immediately
- No data loss after unexpected power failure
- The save and save as options are obsolete -- remove them
Key concept: SQLite wants to replace fopen()/[open] not Oracle/Oratcl.
Useful example code/utilities
- SQLite Console for Tcl/Tk
- Email Client that uses SQLite as a back-end store
- Database Editor (define a new SQL function in Tcl, allowing you to write update changelog set message = edit(message) where change=2782; and be able to edit the message in a text widget that'll pop up instead of copying and pasting or re-typing manually)
Crossing into barbarian lands now...
- Q: What is Full Text Search?
- A: In brief: what Google does.
- Q: Why is full text search important?
- A: Internet search enginges have users spoiled. Modern applications need to support full text search in order to be competitive.
- Q: Is it difficult to implement?
- A: It is tricky for large data sets to get it right.
Basic approach
- tokenize into words
- case folding
- stemming (convert each word into its root form; porter stemmer very popular stemmer for English; recommended -> recommend; books -> book)
- remove stop words
- for each word left create posting list per word
OR queries do an intersection. AND queries do a union.
Phrase Queries: naive method: do an AND query, then examine every document in the result set in a second pass and eliminate those that lack the phrase. Very bad performance in pathalogical cases. Instead update posting list to store document:position (multiple occurances included multiple times).
Basic Operations
- Insert a new document: Break the document into words; Append the document ID to the posting list for each word
- Perform a query: Lookup the posting list for each word in the query; Combine posting lists
Keeping the Working Set Small
- Limit size of the lexicon (stemming, stop words)
- Compress the posting list aggressively
- Spill stationary parts of posting list into a separate table that is not cached
Full Text Search (version 1) built into SQLite 3.3.8 released Monday
As of SQLite 3.3.8 (released Monday!) full text search support in SQLite. Ricahrd's authorized to announce help from engineers at Google. Later question elicited that roughly half of the FTS code was written by him & Dan, the other half by four engineers [didn't have time to write down their names] from Google. He isn't able/can't comment on their motivations/plans/internal usage. (Obviouslyy won't be replacing their search engine with SQLite.)
Some example usage
CREATE VIRTUAL TABLE email USING fts1(content);
CREATE VIRTUAL TABLE email USING fts1("from" TEXT, "to" TEXT, subject TEXT, body TEXT, tokenize porter);
SELECT rowid, subject FROM email WHERE email MATCH 'wyrick sqlite';
SELECT rowid, subject FROM email WHERE email MATCH $::querytext;
MATCH: when left operand is the name of the table match against any column can specify a particular column $::querytext can specify columns i.e., "from:wyrick"
Built in snippet generator: SELECR rowid, snippet(email) FROM email WHERE email MATCH ...
FTS1 comes standard with Tcl bindings for SQLite (>= 3.3.8).
- FTS1 permanent-beta
- FTS2 will ship before Christmas
Potential uses:
- Search for private websites
- Email clients
- Online documentation search
- Searchable history in web browsers (like a Google search but instead of searching the whole corpus of the web search only those sites I've browsed to in the last x months)
- Chatroom archive search
- Search version diffs in a configuration management system
- Text editors and IDEs
Pervasive Full-Text Search (users want it; made easy using Tcl and SQLite).
—Michael A. Cleverly
Wednesday, October 11, 2006 at 09:38
Many thanks, Michael. It's great that you wrote this up, and so promptly.
Wed, 11 Oct 2006, 10:49