Hacker Newsnew | past | comments | ask | show | jobs | submit | reinhardt's commentslogin

Same lack of desire to go out and I don't even have wife and kids, or even friends for that matter; just one friend I see once a week. Whatever the appeal/reward of socialization is for most people, I don't get it.

Also curious why every comment mentions just the number of rows as the only factor that matters. A 100M rows table of 3 integer columns is quite different from 50+ columns, 5 of which are text up to a few MB long.


Getting a cyclic import error is not a bug, it's a feature alerting you that your code structure is like spaghetti and you should refactor it to break the cycles.


That's not a problem, let alone the biggest one. You should just use relative imports explicitly.


It is a problem because stdlib does not use relative imports for other stdlib modules, and neither do most third-party packages, which then breaks you regardless of what you do in your code.


I haven't used Airflow for years but it used to be quite clunky, not sure how much it's improved since. I'd look into Prefect and/or Dagster first, both are more modern alternatives built with Airflow's shortcomings in mind.


I'd guess career-progression points, or even keep-getting-a-paycheck points at worst.


This low-stakes exchange won't affect much, it's about putting Mr. Incomp in his place. Remember it's a PM we're talking about, not "the Boss."


> Its a massive amount of state aggregated from billions of events that needs to be served at extremely low latency, but couldn't it be partitioned somehow???

The bidder/pacer state is not necessarily massive, and certainly it does not consist of all the gazillions of past events. Depending on the strategy/bidding model, it can range from a few MB to several GBs, something that can fit in a beefy node.

> Google Fi/Spanner and BigTable have certainly been developed to support these issues.

I doubt any external store can be used with so low latency constraints (2-10ms) and high throughput (millions RPS). Perhaps Aerospike but even that is a stretch to put it in the hot-path. At this scale you're pretty much limited to fetch the state in memory and update it asynchronously every couple of minutes/hours.

Source: I also work in ad tech.


Why PostgreSQL only? The mara-DB dependency [1] claims to support more.

[1] https://github.com/mara/mara-db


(author here)

Currently there is a hard dependency to Postgres for the bookkeeping tables of mara. I'm working on dockerizing the example project to make the setup easier.

For ETL, Mysql, Postgres & SQL Server are supported (and it's easy to add more).


I'm a bit confused about this. What if the target is HDFS? Why this dependency on SQL databases for ETL?


> Airflow requires task queues (e.g. celery), message broker (e.g. rabbitmq), a web service, a scheduler service, and a database. You also need worker clusters to read from your task queues and execute jobs.

All these are supported but the scheduler is pretty much the only requirement.

Source: been running Airflow for the last two years without a worker cluster, without having celery/rabbitmq installed and sometimes without even an external database (i.e. a plan sqlite file).


Yet another reason for trimming off old jobs after some point; the primary one being nobody cares going through 3+ page long resumes.


I think resumes should be treated less as a report card and more of a brochure. Hiring managers have little time, so keeping it focused on relevant highlights and selling the candidate for that job are the entire point.

A 15 page menu isn't better than a 1 page menu... A spa advertising every stone in its parking lot doesn't make you think nice things about their mud baths...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: