Wednesday, December 17, 2014

It starts with the data...

The Plan

I just posted the initial TODO list for pastenix into the README.md file.  It currently reads as follows:
  • General
    • Use Amnesia for Storage
    • Avoid Javascript as much as is possible
  • Pastebin Functionality
    • Ability to Post Paste
    • Ability to View Paste
    • Ability to Edit Paste
      • Edits produce new pastes with reference to old paste (version contro)
    • Ability to Post Private Pastes
  • IRC Functionality
    • Ability to publish Public Pastes on specified IRC channels
    • Query Bot for Latest X paste subjects
    • Ability to have Bot take notes from Channel and publish in a paste
  • Commandline Functionality
    • Pipe to Paste

With an rough finger-in-air approximation we can start to put finger to keyboard as it were.  We have some decisions to make based around data-structures.

Why Mnesia?

This is a learning experience for me.  I've done databases ten thousand times, I want to try something different.  The biggest disadvantages to using mnesia for this project are:
  1. The 2Gb Database limit on disk (unless you shard - may look at that later after my first million in ad revenue) ;-)
  2. Difficulty in doing things like efficient plaintext searches of the database.

Mnesia (or Amnesia in Elixir)

mnesia is the native name in Erlang for an abstraction over ets and dets.  Way too much information to try and communicate here but the tl;dr is:
  1. ETS - in-memory, ridiculously fast key/value store. Data lost on process death.
  2. DETS - on disk very slow but persistent key/value store with a 2Gb database limit.
  3. mnesia - Abstraction over the above giving you the ability to create tables with:
    1. Arbitrary distribution of multiple nodes.
    2. Arbitrary copies of tables in both memory and disk.
    3. Custom sharding.
    4. A Pony.

Elixir allows you to directly call any erlang function from any module by using the syntax :erlangmodulename.function(args).  As such, you can call mnesia directly using that format.

On github however there is a nice abstraction which I have chosen to use: https://github.com/meh/amnesia

What I hate about RDBMSs

I hate doing data serialization and de-serialization.  I loathe and detest the concept of having to take the data native to my application and format it the right way to go into the database and vice-versa.  This data "impedance mismatch" is a pain.

mnesia (and consequently Amnesia) store native Erlang terms which means you can store anything any way you want.  Less code, more awesome.

Creating our table

Here's the table definition that I've chosen:
defdatabase PasteDB do
  deftable Paste, [ :id, :title, :public, :ircchannels, :date, :expires, :previous_version, :content], type: :ordered_set do end
end

Proposed Datatypes:

  1. :id - String representation of a UUID.  Randomly generated (Hello UUID Module)
  2. :title - String representation of the Paste Title
  3. :public - Boolean value as to whether something is public or not.  Guess that's :true or :false.
  4. :ircchannels - Array of :atoms which represent pre-programmed irc channel / server combinations.
  5. :date - Integer representation a 'la UNIX epoch time.  Considered using the record found in Timex.Date but we will need a very fast way to do date sorting later so keeping it simple.
  6. :expires - Integer representation of UNIX epoch time when the Paste should be removed from the system.
  7. :previous_version - String representation of the UUID of the previous version of the Paste.  Should this be nil on first Paste or set to something else?
  8. :content - Arbitary binary data.
This is where I'm going to start, time to build the table and attempt to build the test cases to exercise it.

No comments:

Post a Comment