I interpreted your question as "do I now no longer need to escape user-generated data in the HTML sent by the server in response to requests by HTMX?" The short answer is no, you still need to escape it:
- HTMX adds extra significance to HTML attributes which aren't accounted for by the built-in sanitizer
- HTMX can't add a custom sanitizer because it wouldn't be able to distinguish between intentional and malicious uses of those attributes
- Even if the HTMX client library sanitized all of the HTML from the server, you can't guarantee that all requests to the server will come from HTMX: browsers can navigate to your "back-end" URLs directly. While you can protect yourself from this using HTTP headers, that's not something I'd feel comfortable relying on since it would be easy to not notice when you've accidentally gotten it wrong.
The HTMX website has a longer explainer on how to protect yourself from XSS when using the library:
Do you honestly feel that we will ever be in a place for the server to not need to sanitize data from the client? Really? I don't. Any suggestion to me of "not needing to sanitize data from client" will immediately have me thinking the person doing the suggesting is not very good at their job, really new, or trying to scam me.
There's no reason to not sanitize data from the client, yet every reason to sanitize it.
If you sanitize on the server, you are making assumptions about what is safe/unsafe for your clients. It's possible to make these assumptions correctly, but that requires keeping them in sync with all clients which is hard to do correctly.
Something that's sanitized from an HTML standpoint is not necessarily sanitized for native desktop & mobile applications, client UI frameworks, etc. For example, with Cloudflare's CloudBleed security incident, malformed img tags sent by origin servers (which weren't themselves by themselves unsafe in browsers) caused their edge servers to append garbage (including miscellaneous secure data) from heap memory to some requests that got indexed by search engines.
Sanitization is always the sole responsibility of the consumer of the content to make sure it presents any inbound data safely. Sometimes the "consumer" is colocated on the server (e.g. for server rendered HTML + no native/API users) but many times it's not.
> If you sanitize on the server, you are making assumptions about what is safe/unsafe for your clients.
No. I'm making decisions on what is safe for my server. I'm a back end guy, I don't really care about your front end code. I will never deem your front end code's requests as trustworthy. If the front end code cannot properly handle encoding, the back end code will do what it needs to do to not allow stupid string injection attacks. I don't know where your request has been. Just because you think it came from your code in the browser does not mean that was the last place it was altered before hitting the back end.
Are you one of today's 10000 on using parameterized queries and prepared statements?
Unless you're doing something stupid like concatenating strings into SQL queries, there's no need to "sanitize" anything going into a database. SQL injection is a solved problem.
Coming from the database and sending to the client, sure. But unless you're doing something stupid like concatenating strings into SQL statements it hasn't been necessary to "sanitize" data going into a database in ages.
Edit: I didn't realize until I reread this comment that I repeated part of it twice, but I'm keeping it in because it bears repeating.
SQL injection is solved if you use dependencies that solve it of course.
Other than SQL injection there is command or log injection, file names need to be sanitized or any user uploaded content for XSS and that includes images.
Any incoming JSON data should be sanitized, extra fields removed etc.
Log injection is a pretty nasty sort of hack that depending on how the logs are processed can lead to XSS or Command injection
People do it all the time, on any tech stack that lets you execute command strings. A lot of of early databases didn't even support things like parameterized inserts.
As the stuff is rendered on the front-end how do you deal with tags where you do not even have the information to decide how they shall be parsed on the server?
This seems rather ignorant and, in my experience, leads to security issues, such as CVE-2023-38500 or CVE-2023-23627. This is not decidable on the server-side, so you will always mess stuff like this up. Sanitization can only work properly on the client for HTML.
Sanitize as close as possible to where it is used is usually best, then you don’t have to keep track of what’s sanitized and what’s not sanitized for very long.
(Especially important if sanitation is not idempotent!)
It can be a complicated and error-prone process, mainly in scenarios where you have multiple mediums that require different sanitizers. Obviously you should do it. But in such scenarios, the best practice is to sanitize as close to the place it is used as possible. I've seen terrible codebases where they tried to apply multiple layers of sanitization on user input before storing to the DB, then reverse the unneeded layers before output. Obviously this didn't work.
Point being, if you can move sanitization even closer to where it is used, and that sanitization is actually provided by the standard library of the platform in question, that's a massive win.
By "sanitise" what's really meant is usually "escape". User typed their display name as <script>. You want the screen to say their display name, which is <script>. Therefore you send <script>. That's not their display name - that's just what you write in HTML to get their display name to appear on the screen. You shouldn't store it in the database in the display_name column.
Agreed. The codebase I'm thinking of was html encoding stuff before storing it, then when they needed to e.g. send an SMS, trying to remember to decode. Terrible.
You're making a bad assumption that client side code was the last place the submitted string was altered in the path to the server. The man in the middle might have a different idea and should always be protected against on the server where it is the last place to sanitize it.
Well, you have to sanitize for the transport medium, otherwise you can't sanitize at all afterwards. But if I'm sending user content in JSON and I didn't sanitize it for insertion into HTML, what man in the middle is going to be compromised? Furthermore, how can I possibly protect an unknown intermediary without knowing what it is going to do with it?
Maybe it is going to try to copy a value into a 20 char buffer, I don't know!
Easier does not mean better, which seems to be true in this case given the many, many vulnerabilities that have been exploited over the years due to a lack of input sanitization.
In this case easier is actually better. Sanitize a string at the point where you are going to use it. The locality makes it easy to verify that sanitation has been done correctly for the context. The alternative means you have to maintain a chain of custody for the string and ensure it is safe.
if you are using it at the client, sure, but then why is the server involved? if you are sending it to the server, you need to treat it like it is always coming from a hacker with very bad intentions. i don't care where the data comes from, my server will sanitize it for its own protection. after all, just because it left "clean" from your browser does not mean it was not interfered with elsewhere upstream TLS be damned. if we've double encoded something, that's fine, it won't blow up the server. at the end of that day, that's what is most important. if some double decoding doesn't happen correctly on the client, then <shrugEmoji>
Not OP, but to add to your sentiment: It was called installing when I was a child. I would download software—from CNET just casually browsing, or whatever from a warez forum—and open the package to reveal an installer (I was fond of InstallShield-based installers, I do not know why). I could customize the directory which the application would install to, stare endlessly at the verbose “advanced” or “custom” mode, and listen to my HDD spin a little faster.
perhaps because that's installing from a store, not sideloading? however poor (security-wise) the offering may be, you're still using the intended install flow
in this sense i do actually agree about the misuse of 'sideloading' - the planned change would not impact just sideloading, but also 'third party' stores
Yes they are. Unhelpful distractions that are workshopped and focus grouped. Stop adopting the bizarre terminology of the enemy, and their goofy neologisms, and just talk about the issue in straightforward English.
We didn't need a different word for not being able to install an application on your phone without the permission of the company that made it. We needed a different word for the thing that was new, which is the company that makes the thing that you own refusing you permission to use it as you see fit.
No, he's right. The general public has no idea what "sideloading" even means, but they sure as shit would want to be able to load their own apps if they were asked about it. The terminology is meant to obfuscate the issue.
He's not right at all. It is not "part of the problem" to use a term that a poster here doesn't think accurately captures the issue. The only part of the problem is the corporations who are trying to take our rights away.
Also, I think you'll be quite disappointed in what the general public does or does not care about. The iPhone has always been even more locked down than Android and it sells like hotcakes. Even on Android only a tiny minority of users make use of the option to install third-party apps. I think the general public should care about this topic, but all evidence is to the contrary.