Same lack of desire to go out and I don't even have wife and kids, or even friends for that matter; just one friend I see once a week. Whatever the appeal/reward of socialization is for most people, I don't get it.
Also curious why every comment mentions just the number of rows as the only factor that matters. A 100M rows table of 3 integer columns is quite different from 50+ columns, 5 of which are text up to a few MB long.
Getting a cyclic import error is not a bug, it's a feature alerting you that your code structure is like spaghetti and you should refactor it to break the cycles.
It is a problem because stdlib does not use relative imports for other stdlib modules, and neither do most third-party packages, which then breaks you regardless of what you do in your code.
I haven't used Airflow for years but it used to be quite clunky, not sure how much it's improved since. I'd look into Prefect and/or Dagster first, both are more modern alternatives built with Airflow's shortcomings in mind.
> Its a massive amount of state aggregated from billions of events that needs to be served at extremely low latency, but couldn't it be partitioned somehow???
The bidder/pacer state is not necessarily massive, and certainly it does not consist of all the gazillions of past events. Depending on the strategy/bidding model, it can range from a few MB to several GBs, something that can fit in a beefy node.
> Google Fi/Spanner and BigTable have certainly been developed to support these issues.
I doubt any external store can be used with so low latency constraints (2-10ms) and high throughput (millions RPS). Perhaps Aerospike but even that is a stretch to put it in the hot-path. At this scale you're pretty much limited to fetch the state in memory and update it asynchronously every couple of minutes/hours.
Currently there is a hard dependency to Postgres for the bookkeeping tables of mara. I'm working on dockerizing the example project to make the setup easier.
For ETL, Mysql, Postgres & SQL Server are supported (and it's easy to add more).
> Airflow requires task queues (e.g. celery), message broker (e.g. rabbitmq), a web service, a scheduler service, and a database. You also need worker clusters to read from your task queues and execute jobs.
All these are supported but the scheduler is pretty much the only requirement.
Source: been running Airflow for the last two years without a worker cluster, without having celery/rabbitmq installed and sometimes without even an external database (i.e. a plan sqlite file).
I think resumes should be treated less as a report card and more of a brochure. Hiring managers have little time, so keeping it focused on relevant highlights and selling the candidate for that job are the entire point.
A 15 page menu isn't better than a 1 page menu... A spa advertising every stone in its parking lot doesn't make you think nice things about their mud baths...
reply