6

I would like to know your opinion about the next system:

  • with a large CSV file
  • convert it's header to RDF schema, in which exists only the CSV columns information and access point. This way there's no need to convert the hole file to RDF triples, eliminating huge conversion overhead.
  • launch a web app with SPARQL console, which converts the requests in file searches.

Why, you ask? Because having a 2.5GB cache with triples generated from 100MB CSV file is not useful at all.

In other words, i'm proposing the same system when converting RDB to RDF on p.ex. D2RQ, where all data is kept in the database and what is converted is the relational schema. This saves a lot of space and is a much faster process.

As mentioned in comments and answers below, a similar system might be tarql. The issue with this system is that it converts all the CSV file to rdf, not just the header. The issue with this approach is that a 100mb file is converted into a huge 2.5gb rdf file, which is not practical nor useful.

Here's a diagram that describes what i want to (or aspire to :)) create: enter image description here

Alexey
  • 63
  • 3

1 Answers1

3

Before building a completely new system from scratch, you should first check if an existing system satisfies your requirements. There are a few possible candidates out there:

  • As outlined in this answer, D2RQ and RDF HDT might be close to what you are looking for.

  • In addition there is Tarql which provides SPARQL access to CSV files — which is, as far as I can tell, exactly what you want.

One more comment: If you want performant search for your CSV files, you won't get around a search index. And instead of building one yourself, I recommend having a close look at D2RQ again.

Patrick Hoefler
  • 5,790
  • 4
  • 31
  • 47
  • Haven't tried rdf HDT, as soon as I try will post my opinion about it. Thank you :) – Alexey Apr 02 '15 at 10:35
  • 1
    I might post a diagram of what I need, might be better to understand my question. – Alexey Apr 02 '15 at 10:43
  • 1
    Just added a diagram i talked about. – Alexey Apr 02 '15 at 12:31
  • 1
    Considering your answer, i decided to use a CSV to RDB conversion, and on top of that I convert the RDB to RDF using D2RQ. I think it's a much easier solution, and without reinventing the wheel. Thank you all for the comments and answers! – Alexey Apr 02 '15 at 14:15