I'm not sure I really follow, you can create new tables for any step if you want...

jgalt212 · 2025-05-22T11:39:14 1747913954

In R, data sources, intermediate results, and final results are all dataframes (slight simplification). With DuckDB, to have the same consistency you need every layer and step to be a database table, not a data frame, which is awkward for the standard R user and use case.

datadrivenangel · 2025-05-22T15:15:04 1747926904

You can also use duckplyr as a drop in replacing for dplyr. Automatically fails over to dplyr for unsupported behavior, and for most operations is notably faster.

Data.Table is competitive with DuckDb in many cases, though as a DuckDB enthusiast I hate to admit this. :)

wodenokoto · 2025-05-22T12:18:57 1747916337

You can, but then every step starts with a drop table if exists; insert into …

cess11 · 2025-05-22T13:57:10 1747922230

Or you nest your queries:

    select second from (select 42 as first, (select 69) as second);

Intermediate steps won't be stored but until queries take a while to execute it's a nice way to do step-wise extension of an analysis.

Edit: It's a rather neat and underestimated property of query results that you can query them in the next scope.

wodenokoto · 2025-05-23T07:08:37 1747984117

We all have different definitions on what is difficult. Maybe annoying or bothersome had been better words, but below beats nesting things:

    df |> select(..) |>
        filter(...) |>
        mutate(...) |>
        ...

And every time I've learned something about the intermediate result I can add another line, or save the result in a new variable and branch my exploration. And I can easily just highlight and run and number of of steps from step 1 onwards.

Even oldschool

    df2 <- df[...]
    df2 <- df2[...]

Gives me the same benefit.

cess11 · 2025-05-23T08:44:54 1747989894

Yeah, sure, I do a lot of such things in RAM in Elixir, some Lisp, PHP or, if I must, Python.

But sometimes I just happen to have just imported a data set in a SQL client or I'm hooked into a remote database where I don't have anything but the SQL client. When developing an involved analysis query nesting also comes in handy sometimes, e.g. to mock away a part of the full query.

jcheng · 2025-05-22T16:21:54 1747930914

Or better yet, use CTEs: https://duckdb.org/docs/stable/sql/query_syntax/with.html

cess11 · 2025-05-23T06:09:36 1747980576

Absolutely, if the engine has them and they're not wonky somehow.