Wednesday, March 23, 2011

MySQL design with dynamic number of fields.

My experience with MySQL is very basic. The simple stuff is easy enough, but I ran into something that is going to require a little more knowledge. I have a need for a table that stores a small list of words. The number of words stored could be anywhere between 1 to 15. Later, I plan on searching through the table by these words. I have thought about a few different methods:

A.) I could create the database with 15 fields, and just fill the fields with null values whenever the data is smaller than 15. I don't really like this. It seems really inefficient.

B.) Another option is to use just a single field, and store the data as a comma separated list. Whenever I come back to search, I would just run a regular expression on the field. Again, this seems really inefficient.

I would hope there is a good alternative to those two options. Any advice would be very appreciated.

-Thanks

From stackoverflow
  • C) use a normal form; use multiple rows with appropriate keys. an example:

    mysql> SELECT * FROM blah;
    +----+-----+-----------+
    | K  | grp | name      |
    +----+-----+-----------+
    |  1 |   1 | foo       |
    |  2 |   1 | bar       |
    |  3 |   2 | hydrogen  |
    |  4 |   4 | dasher    |
    |  5 |   2 | helium    |
    |  6 |   2 | lithium   |
    |  7 |   4 | dancer    |
    |  8 |   3 | winken    |
    |  9 |   4 | prancer   |
    | 10 |   2 | beryllium |
    | 11 |   1 | baz       |
    | 12 |   3 | blinken   |
    | 13 |   4 | vixen     |
    | 14 |   1 | quux      |
    | 15 |   4 | comet     |
    | 16 |   2 | boron     |
    | 17 |   4 | cupid     |
    | 18 |   4 | donner    |
    | 19 |   4 | blitzen   |
    | 20 |   3 | nod       |
    | 21 |   4 | rudolph   |
    +----+-----+-----------+
    21 rows in set (0.00 sec)
    

    This is the table I posted in this other question about group_concat. You'll note that there is a unique key K for every row. There is another key grp which represents each category. The remaining field represents a category member, and there can be variable numbers of these per category.

    Hurpe : Awesome man! That's really interesting actually.
  • You are correct that A is no good. B is also no good, as it fails to adhere to First Normal Form (each field must be atomic). There's nothing in your example that suggests you would gain by avoiding 1NF.

    You want a table for your list of words with each word in its own row.

    Hurpe : So a better method would be create a new table for each list of words, then organize all the tables in one main table with a field containing table names? I could do that, and that doesn't seem so bad.
  • I would create a table with and ID and one field, then store your results as multiple records. This offers many benefits. For example, you can then programatically enforce your 15 word limit instead of doing it in your design, so if you ever change your mind it should be rather easy. Your queries to search on the data will also be much faster to run, regular expressions take a lot of time to run (comparatively). Plus using a varchar for the field will allow you to compress your table much better. And indexing on the table should be much easier (more efficient) with this design.

  • Do the extra work and store the 15 words as 15 rows in the table, i.e. normalize the data. It may require you to re-think your strategy a bit, but trust me when the client comes along and says "Can you change that 15 limit to 20...", you'll be glad you did.

  • What other data is associated with these words?

    One typical way to handle this kind of problem is best described by example. Let's assume your table captures certain words found in certain documents. One typical way is to assign each document an identifier. Let's pretend, for the moment, that each document is a web URL, so you'd have a table something like this:

    CREATE TABLE WebPage (
        ID INTEGER NOT NULL,
        URL VARCHAR(...) NOT NULL
    )
    

    Your Words table might look something like this:

    CREATE TABLE Words (
        Word VARCHAR(...) NOT NULL,
        DocumentID INTEGER NOT NULL 
    )
    

    Then, for each word, you create a new row in the table. To find all words in a particular document, select by the document's ID:

    SELECT Words.Word FROM Words, WebPage 
    WHERE Words.DocumentID = WebPage.DocumentID
    AND WebPage.URL = 'http://whatever/web/page/'
    

    To find all documents with a particular word, select by word:

    SELECT WebPage.URL FROM WebPage, Words
    WHERE Words.Word = 'hello' AND Words.DocumentID = WebPage.DocumentID
    

    Or some such.

  • Depending on exactly what you want to accomplish:

    1. Use a full-text index on your string table

    2. Three tables: one for the original string, one for unique words (after word-rooting?), and a join table. This would also let you do more complicated searches, like "return all strings containing at least three of the following five words" or "return all strings where 'fox' occurs after 'dog'".

      CREATE TABLE string ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, string TEXT NOT NULL )

      CREATE TABLE word ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, word VARCHAR(14) NOT NULL UNIQUE, UNIQUE INDEX (word ASC) )

      CREATE TABLE word_string ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, string_id INT NOT NULL, word_id INT NOT NULL, word_order INT NOT NULL, FOREIGN KEY (string_id) REFERENCES (string.id), FOREIGN KEY (word_id) REFERENCES (word.id), INDEX (word_id ASC) )

      // Sample data INSERT INTO string (string) VALUES ('This is a test string'), ('The quick red fox jumped over the lazy brown dog')

      INSERT INTO word (word) VALUES ('this'), ('test'), ('string'), ('quick'), ('red'), ('fox'), ('jump'), ('over'), ('lazy'), ('brown'), ('dog')

      INSERT INTO word_string ( string_id, word_id, word_order ) VALUES ( 0, 0, 0 ), ( 0, 1, 3 ), ( 0, 2, 4 ), ( 1, 3, 1 ), ( 1, 4, 2 ), ( 1, 5, 3 ), ( 1, 6, 4 ), ( 1, 7, 5 ), ( 1, 8, 7 ), ( 1, 9, 8 ), ( 1, 10, 9 )

      // Sample query - find all strings containing 'fox' and 'quick' SELECT UNIQUE string.id, string.string FROM string INNER JOIN word_string ON string.id=word_string.string_id INNER JOIN word AS fox ON fox.word='fox' AND word_string.word_id=fox.id INNER JOIN word AS quick ON quick.word='quick' AND word_string.word_id=word.id

  • Hurpe, is the scenario you are describing that you will have a database table with a column that can contain a up to 15 keywords. Later you will use these keywords to search the table which will presumably have other columns as well?

    Then isn't the answer to have a separate table for the keywords? You will also need to have a many-to-many relationship between the keywords and the main table.

    So using cars as an example, the WORD table that will store the 15 or so keywords would have the following structure:

    ID             int
    Word           varchar(100)
    

    The CAR table would have a structure something like:

    ID              int
    Name            varchar(100)
    

    Then finally you need a CAR_WORD table to hold the many-to-many relationships:

    ID              int
    CAR_ID          int
    WORD_ID         int
    

    And sample data to go with this for the WORD table:

    ID   Word
    
    001  Family
    002  Sportscar
    003  Sedan
    004  Hatchback
    005  Station-wagon
    006  Two-door
    007  Four-door
    008  Diesel
    009  Petrol
    

    together with sample data for the CAR table

    ID   Name
    
    001  Audi TT
    002  Audi A3
    003  Audi A4
    

    then the intersection CAR_WORD table sample data could be:

    ID    CAR_ID   WORD_ID
    001   001      002
    002   001      006
    003   001      009
    

    which give the Audi TT the correct characteristics.

    and finally the SQL to search would be something like:

    SELECT c.name
    FROM CAR c
    INNER JOIN CAR_WORD x
    ON c.id = x.id
    INNER JOIN WORD w
    ON x.id = w.id
    WHERE w.word IN('Petrol', 'Two-door')
    

    Phew! Didn't intend to set out to write quite so much, it looks complicated but it is where I always seem to end up however hard I try to simplify things.

0 comments:

Post a Comment