Hacker Newsnew | past | comments | ask | show | jobs | submit | proszkinasenne2's commentslogin

Seems like... not really.

> Django’s new Tasks framework makes it easy to define and enqueue such work. It does not provide a worker mechanism to run Tasks. The actual execution must be handled by infrastructure outside Django, such as a separate process or service.

You'll likely still need an actual backend, task runner, cache implementation with a backend etc.


Based on the current state of affairs [1] you may get more structure around tasks in the app (or django packages running tasks) which is probably a nice thing. Other than that you will still need to have a backend implementation [2] and the easiest path there is (that I can think of) a Celery wrapper =)

[1] https://github.com/django/django/blob/main/django/tasks/base... [2] https://github.com/django/django/blob/main/django/tasks/back...


"Was this helpful?" > "No" > "Why can't this bullshit can't be disabled globally?" > "Submit"


Imperva with it's core product around blocking bots publishing statistics on how crowded platforms are with bots. ¯\_(ツ)_/¯


In Chrome/Chromium there is a WebRTC Network Limiter [1] extension that let you set "Use only my default public IP address" policy and render the method I presented ineffective.

[1] https://chrome.google.com/webstore/detail/webrtc-network-lim...


Google abandoned that extension in 2016, which is why the last option (for disable_non_proxied_udp) is greyed out.


It's both security and privacy issue. Whonix wiki explains the latter in more detail https://www.whonix.org/wiki/Data_Collection_Techniques#:~:te...


If you use Chrome-exclusive links, please at least also link to the closest standard section [0] and preferably, mention the Chrome-linked text directly.

That said, they don't say anything about security, I obviously forgot about fingerprinting, but still don’t see security issues?

[0] https://www.whonix.org/wiki/Data_Collection_Techniques#Finge...


Sure, it can be! Also, as some people have already pointed out, this is often a gray area where people go beyond violating ToS. Some good examples are privacy violations (scraping personal data), credentials stuffing etc.

Recently, there is a boom of "anti-bot" services. These are essentially SaaS businesses that "protect" websites from being scraped by automated software. As you onboard the first customer who wants to extract data from a bot-protected website, you are going to run into an unlimited waterfall of stupid troubles. Your bots will be blocked, will consume excessive amount of data, kill your CPU/GPU performance.

I have shared some highlights on how to bypass these recently on HN [1], but it is sadly only the tip of the iceberg. On the other hand, since the post has been featured on HN I have been reached by more than 50 companies and individuals whose business operating model is based solely on data extraction/automated scraping. These are (in my opinion) successful companies, and two out of these are part of YC.

[1] https://news.ycombinator.com/item?id=29060272


Wasn't there a ruling that web scraping was legal now?


The LinkedIn case, it's still up in the air i think - https://news.bloomberglaw.com/us-law-week/supreme-court-scra...


Thanks for this!


- Trust Token API to verify whether you are a bot or not - Federated Learning of Cohorts (FLoC)

https://www.google.com/amp/s/blog.google/products/ads-commer...

As well as there is plenty of techniques allowing device fingerprinting that Google (might) use.

https://github.com/niespodd/browser-fingerprinting


Anything based on Chromium is vulnerable to all specialised fingerprinting techniques such as this one https://niespodd.github.io/persistent-tracking-shader-cache/ and many others that I listed here https://github.com/niespodd/browser-fingerprinting

Some parts of Chromium seem to be intentionally exposing fingerprinting surfaces and, because its changing quickly with new features and addons, keeping up with patches like Bromite does is incredibly challenging task


I thought about it too but when you consider cost of running headless Puppeteer (lets say on AWS) and the cost of a good proxy that is charged per GB its often as expensive (if not more) as some of these SaaS-es. This is the case especially for websites with some heavyweight JS/CSS/img assets.


That's true when it's a one-time job: pull the data and disappear. I also see how this is the case for most freelancers on Fiverr or Freelancer. This is the tool they know, so they use it. However I imagine there is a number of companies that strongly rely on continous data scraping - let it be for price comparison - and I've seen one heavily using Puppeteer


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: