Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not sure I really follow, you can create new tables for any step if you want to do it entirely within the db, but you can also just run duckdb against your dataframes in memory.


In R, data sources, intermediate results, and final results are all dataframes (slight simplification). With DuckDB, to have the same consistency you need every layer and step to be a database table, not a data frame, which is awkward for the standard R user and use case.


You can also use duckplyr as a drop in replacing for dplyr. Automatically fails over to dplyr for unsupported behavior, and for most operations is notably faster.

Data.Table is competitive with DuckDb in many cases, though as a DuckDB enthusiast I hate to admit this. :)


You can, but then every step starts with a drop table if exists; insert into …


Or you nest your queries:

    select second from (select 42 as first, (select 69) as second);
Intermediate steps won't be stored but until queries take a while to execute it's a nice way to do step-wise extension of an analysis.

Edit: It's a rather neat and underestimated property of query results that you can query them in the next scope.


We all have different definitions on what is difficult. Maybe annoying or bothersome had been better words, but below beats nesting things:

    df |> select(..) |>
        filter(...) |>
        mutate(...) |>
        ...
And every time I've learned something about the intermediate result I can add another line, or save the result in a new variable and branch my exploration. And I can easily just highlight and run and number of of steps from step 1 onwards.

Even oldschool

    df2 <- df[...]
    df2 <- df2[...]
Gives me the same benefit.


Yeah, sure, I do a lot of such things in RAM in Elixir, some Lisp, PHP or, if I must, Python.

But sometimes I just happen to have just imported a data set in a SQL client or I'm hooked into a remote database where I don't have anything but the SQL client. When developing an involved analysis query nesting also comes in handy sometimes, e.g. to mock away a part of the full query.



Absolutely, if the engine has them and they're not wonky somehow.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: