81

I need to learn how databases work in order to use them more efficiently, and my way of learning is by doing.

I want to create my own database system. I am not referring to creating a pseudo-database that would use query to parse files; this would simply be a filesystem interface with a query language. I am talking about the actual structure of a database engine. And since what I have in mind is neither relational nor document-oriented (it's "node-oriented", if that even exists), I would need any resource to be as abstract and high-level as possible.

So how would I go about creating that? What resources/tutorials/books can I read to understand?

The language does not matter in the slightest. Ideally, the code would be pseudo-code to illustrate the concept, not tied to a particular language, but anything would do. I was not able to find anything on the matter on google (since I am so illiterate on the subject, maybe I am just not entering the right search).

If such resources are not available, then I guess something about how to create a client would at least be a step in the right direction.

Xananax
  • 1,390
  • 1
  • 12
  • 14
  • 17
    Why not write a compiler instead? Or even better, your own operating system? If you are really serious about writing your own database, there are a thousand and one open source databases out there: Study their source code, contribute a few patches. Then start thinking of building your own. – yannis Nov 25 '11 at 06:20
  • 13
    I knew I was going to get bashed :) As I said, the purpose is to learn. I studied open-source DBs, but their codebase is too huge to get a real grasp; that's why I want to build from the ground up, beginning small and expanding until I get a real feel of how things work. THEN it will be useful for me to deeply study what has been done. – Xananax Nov 25 '11 at 06:30
  • 4
    You can take some college level and graduate level database courses. There are many open source courses online. You can also buy a few textbooks and study them in spare time. This will give you some ideas and starting points. Reading the history and news about PostgreSQL will also help (in terms of imagination, although it will not give you any idea how those features are actually implemented) – rwong Nov 25 '11 at 06:35
  • By "node-oriented", are you referring to Graph database? – rwong Nov 25 '11 at 06:37
  • 9
    I studied open-source DBs, but their codebase is too huge: If something like redis or flockdb is too huge for you to read, I don't see how you'll cope writing or own database. – yannis Nov 25 '11 at 06:37
  • 1
    @rwong: Yes. I did not use the term because I never used graph databases and I'm not sure they are what I think they are. I have experience in relational/document-oriented DBs only. Also, I have a hunch that understanding graph DBs might be even more difficult so I try to not be looking for that exclusively. – Xananax Nov 25 '11 at 06:39
  • 18
    @YannisRizos In fairness, reading code (imo) is much more difficult than writing it yourself. –  Nov 25 '11 at 06:44
  • @YannisRizos: Ok, I have to confess I had no idea code for a db could be so small. Thanks for the pointer, this is actually something I can study. – Xananax Nov 25 '11 at 06:45
  • @AlexWebr Yeap that's why I suggested it. Op wants to learn, choosing the easy way is almost never the best way to learn. – yannis Nov 25 '11 at 06:47
  • 4
    @YannisRizos that's not an argument. One could argue that reading code is much easier than writing custom code yourself. It might depend on people. I know that I can rarely learn only by reading. I have to do. If, for example, I begin to understand the code sources you provided, the first thing I am going to do is close the source, open a blank project and try to build again in another language, only going back to the source if I really get stuck. That's the only way what I learn sticks. – Xananax Nov 25 '11 at 06:54
  • @Xananax I'm not interested in debating learning methodologies. Good luck building your database. – yannis Nov 25 '11 at 07:01
  • @YannisRizos Agreed. This was not my intention either. Sorry if I offended you and thanks for the severe but useful help & advice. – Xananax Nov 25 '11 at 07:05
  • 1
    @Xananax Didn't offend me, the "Good luck building your database" was sincere. I'm actually hoping you get there... – yannis Nov 25 '11 at 07:08
  • 18
    @Xananax: don't listen to the frogs (http://www.crystal-reflections.com/stories/story_73.htm). Do whatever you enjoy and it is not necessary to have an objective to take pleasure in the process. –  Nov 25 '11 at 09:15
  • 1
    I know where you're coming from @Xananax, because I wanted to write my own RDBMS too, purely for my own understanding, and the suggestion to study the source of SQLite, Postgres or MySQL, for example, is just far to overwhelming when you don't even know the basics. In my case, I wanted to build something that uses Tutorial D as the query language, instead of SQL. I started picking SQLite apart, but I need more of a basic start first. FWIW, the documentation for how SQLite's internals work is excellent. – d11wtq Nov 25 '11 at 11:35
  • 1
    @Yannis Rizos: I wrote my own compiler as a Uni project and it was one of the best lessons in my IT career. I guess writing own RDBMS is equally educating. But of course - if you need to ship the product in some finite time then use standard solution. – MaR Nov 25 '11 at 12:59
  • @MaR Me too :) Read the last paragraph of my answer – yannis Nov 25 '11 at 13:00
  • Very old post (and answer is set) - however if somebody else comes up with the same idea - here is the approach to 'understand databases by doing' without even coding: – Quicker Sep 10 '18 at 13:18
  • make sure you REALLY UNDERSTOOD the concepts of ACID (see wikipedia) - that implies you'd be able to explain every sentence on that wikipage
  • think through how to read and/or write tiny portions of a huge files by multiple users/sessions and how to get that ACID compliant
  • a break down the processes of reading and of writing data portions to/from the file by defining the kind of processors, components and flows between them b in your thinking process run reads and writes at the same time -> if both are ACID compliant, congrats, you are there, else refine what you defined under a

    – Quicker Sep 10 '18 at 13:19
  • If you got ACID compliance without understanding the terms below you are a perfect candidate for any world-xyz-expert-price:
    • transactions (in the context of databases)
    • log files (in the context of databases)
    • locks (in the context of databases)
    • memory caches
    • memory pages (in the context of databases)
    • OS services or deamons
    • queues and pipes
    • seek latency and I/O throughput
    – Quicker Sep 10 '18 at 13:19
  • Also to shortcut your way on what makes a database complex:
    • ACID demands immediate persistence on writes
    • immediate persistence implies disk access, multi redundant RAM access or flash access
    • disk is cheap; write disk seek latency is creepy; once seeked, dumping bytes in bulks is relatively fast -> that is what log file concepts build on
    • understand the flow picture at https://dba.stackexchange.com/questions/142224/in-mysql-how-exactly-does-data-flows-from-query-to-disk
    • ACID demands transactions to not interfer with each other
    – Quicker Sep 10 '18 at 13:19
  • If you can ignore A and C and I and D in your 'database' use case do not do database but store your information in plain files! – Quicker Sep 10 '18 at 13:19
  • 1
    I've recently been trying to do the same thing, and have stumbled across this fantastic tutorial: https://cstack.github.io/db_tutorial/ -- hopefully this helps someone – toastrackengima Jul 06 '20 at 14:06