To be fair, data retention is a hot topic right now in Europe, the pandemic and the increased screen time that resulted from it, the amount of accounts we had to create left and right require new regulations.
I live in Europe and the only hot topic I can think of, apart from the virus, is energy prices. The same energy prices the center-left wants to increase via CO2 taxes.
I totally support this. It still amazes me that companies still do not delete/anonymize user accounts after periods of inactivity. Everything that is linked to your email address should be purged after 3-12 months of inactivity, including ecommerce like Amazon, game platforms like Steam, cloud storages like Dropbox, or even Hackernews. Good luck trying to find old accounts that you have used years ago, what if they were breached and now they are used by people with bad intentions. In my country (Romania), even barber shops that store user accounts for longer periods than necessary are fined the shit out of them for not closing accounts due to inactivity. Some years ago, I woke up with an inactive G2A account telling me that I have to pay a fee for inactivity. NO! I don't have to pay anything, purge it!
Mildly related: In America, e-mails stored on a server for over 180 days are considered 'abandoned' and can be viewed by law enforcement without warrants. [0]
The bill to fix this relic of a time where people stored emails in noticeably-finite inboxes, the Email Privacy Act, passed the House this session but got knocked out of the bill in the Senate. https://en.wikipedia.org/wiki/Email_Privacy_Act
I wonder the same thing. Civil Asset Forfeiture is at least as awful and should offend everyone regardless of their stance on current political hot topics. Yet it appears to go on unaddressed.
I think for most people in the US, this wouldn't make their top 50 list of things wrong with the US, or our legal system in particular. And many of those people would probably read about this, shrug, and think "eh, nothing in my old emails that I care if the government sees".
It's actually super weird, because US culture has a strong component of distrust of government. But the government is pretty good at making people fear crime, terrorism, etc., which allows them to get the people to "trust" them with mass surveillance and other privacy invasions.
I completely agree with you. There are plenty of reasons that someone might not use a website for a long time. I didn't use Amtrak from March 2020 through November 2021 and I'm glad I didn't lose my account status.
"Sorry, you can't log into this NCAA bracket website because you haven't used it since last year."
Do you have a paid account or a free account? If I store my documents on a free account for a one time send to the university application and then I forget about it, then Dropbox should purge it after a time to protect my data, as I don't have any "contract" with them like a subscription or something. The same for G2A, I have bought from them some game keys at a cheap price sometime ago and then I totally forgot that I have one, I couldn't even find the activation mail in my inbox, lol. One day in the summer I woke up with a mail that I have to pay an inactivity fee even if I'm just a row in their database and I have no contractual obligation with them.
I had a family member go through a major life event that left his OneDrive account unused for about a year. When we needed to access tax documents on it, Microsoft had deleted it. I’m strongly against non-user initiated account deletion.
Yeah: I would take the opposite stance to this whole "accounts should be deleted due to inactivity" BS and say that a company that you entrusted your data to now has a moral responsibility to do everything they can to hold on to that data until such time as you explicitly relinquish them of that duty, and if the cost of such a requirement is scary you shouldn't put yourself in a position to hold on to other peoples' data in the first place.
"... a company that you entrusted your data to now has a moral responsibility to do everything they can to hold on to that data until such time as you explicitly relinquish them of that duty ..."
I completely agree.
I will take this even further: that company should break a data retention law in order to hold ambiguously abandoned data that might be important to that user.
Further: that company should safeguard that data and protect it from unlwaful intercept or surveillance just like the data of any other paying customer.
Finally: no additional costs should be accrued beyond the original terms for this safekeeping of data.
They do not have such moral responsibility. Their responsibilities are defined by laws and their T&Cs, which are known to the customers and customers explicitly opt in.
If I say in my T&Cs that I delete data after certain period of account inactivity, then this is how it is going to work and user shall not expect anything else.
> Their responsibilities are defined by laws and their T&Cs, which are known to the customers and customers explicitly opt in.
You seem to just not believe in morals I guess? ;P
Like, yes: the law says you can do something... but I am claiming it isn't moral to do that. You can assert your terms of service let you, but I am claiming that it wasn't moral of you to put that in your terms of service in the first place. (And to the extent to which the law requires you do the opposite, that is us arguing over what the law should say, given that the entire point of this thread is about a changing law.)
And like, the user of course should expect you to do the things you claim you will do, but I also think it is fair for users to expect you to claim you will do moral things in the first place. If you are going to pull stunts like deleting data users entrusted to you, hopefully your service is sufficiently optional and unimportant that they can just not use your service without losing out on anything at all in life.
I see you work in medicine. Your field collects data on people all the time and then hoards it from them. You take X-rays and then just put them in some filing cabinet. To get a copy of MY X-ray I have to argue with people about it and then I usually get some low-quality shit copy. Meanwhile, you purge your records and delete MY data because I somehow have the gall to not need your specific service for some number of years until I get old and suddenly wish I could get my X-ray and you destroyed it :/.
You should frankly be REQUIRED to give people their data to take with and not take it yourself, a step you can't be trusted to not put it in your terms of service that you get to both hoard it and delete it on a whim. If you must insist on holding it yourself, you should be required to have a trust set up that you make regular deposits into to ensure that the data you are holding will survive at least as long as all of your patients.
That's what I will claim is "moral", and to the extent to which either laws or the terms of service of your organization fails to match then the lawmakers, lawyers, or entrepreneurs are being horrible people. If you believe in a religion that has a place similar to hell, maybe that's where all of the people who push for, allow, or take part in stuff like this will end up :/.
I do not believe in „morals“, yes. Whatever you think is right is just your opinion, unless it is important enough that society decides to codify it in law. Christians think that homosexuality is immoral - should I care about their opinion and lecture my gay friends about their wrong behavior? I rather suggest one billion of people to go to hell with this belief. Same here. If you want to discuss my personality from „moral“ perspective, you can join them. Especially given that you suggest to analyze from „moral“ perspective data retention, which is a pure UX and product topic.
As many people pointed out in the comments here, there are different expectations in this field - there is no common unwritten law about how it should work. If some people make wrong assumptions about it despite having access to the necessary information, it is really their fault. They are not the victims to be saved. If I run a deposit box, I do not offer it for free and I will empty it the moment payment stops. If I run a service with a free plan, I will keep the data as long as possible and will delete it only after economically justified period of inactivity. Contrary to the trials of paid subscriptions, free plans are not meant to be auto-deleted quickly, but since nobody pays for storage, business also cannot take obligation to keep the data of inactive accounts forever.
That said, read the T&Cs and do not assume that your understanding of „morality“ is right.
Interconnectedness of the world today is economically justified, it does not have any morality in it. In the same vein, if we would have to listen to your anti-morality point of view, we should have kept the connections as before even if we contribute to the global warming, to the deaths of many vulnerable people contributed by the rising number of viruses that are spreaded at an accelerated rate, to the number of cyberattacks that have quadrupled. Similarly to economically accessible transit around the world and its complexity, we have the Internet which is clearly becoming more and more prone to breaches exploiting vulnerabilities (log4j literally proved that everything was open for exploitation). Today, while I'm watching a random Romanian TV channel, many psychopaths at a round table are leading you to believe that Covid's risk is self inflicted by people who don't work out & are overweight and that lockdowns are unjust, it is all people's fault, that there's nothing moral in lockdowns and wearing masks, which I strongly disagree with and it is also not supported by data.
If you are triggered by my „anti-morality views“, please re-read my comments again. There are too many attempts in this thread to stretch morality over basic policy and product issues and to shape it into a personal attack on me, I’m going to stop responding to all of them.
If something can be both right and wrong depending on context, it is not guided by morality, it is guided by reason and by data. Some data retention policy can be right for some users and wrong for others. Lockdown can be appropriate solution under certain conditions and an attack to personal freedom in other cases. Neither data retention or hygiene rules are moral or immoral, because there’s no universal judgement for them. If something is highly contextual and disputed, it is better to keep morality off the discussion, otherwise consensus will never be found. It is better to use something people agree on, like human laws or laws of nature.
Especially since the problem can be completely avoided by encrypting the user's data in the first place. Then the whole "we're deleting the data for your privacy" argument doesn't really hold up.
Also, have had similar experiences, and would be livid is someone deleted my data after only a few months.
In fact you have the contract with the services where you sign up. Even if you did not read T&Cs, you have accepted them and only then your relationship with the service started on their terms. You are not just a row in the database, you are a customer getting service in exchange for something. You have at least opted in to their data retention policy, and you have to opt out explicitly. If services will be required to purge the customer data after period of inactivity by default, chances are high that free accounts will simply cease to exist. In any case, quite significant share of customers would prefer to opt out from purge and they will be important enough from commercial perspective to make this opt out default in T&Cs acceptance process.
>"If I store my documents on a free account for a one time send to the university application and then I forget about it, then Dropbox should purge it after a time to protect my data, as I don't have any "contract" with them like a subscription or something."
I found this sentence interesting, as it contained positive and normative statements that I disagree with, with a non-sequitor between them. You say that you have no contract with them, even though you agreed to some sort of 'user agreement'. Then you say that you forgot about it, and that makes your faulty memory their problem. They have to make sure your data is secure for you because you... just don't bother to pay any attention to where you're leaving it? Should they also be responsible for checking your password against known breaches, to make sure it's not compromised? Where does this end?
Yes, they should check for any possible breaches. As any other responsbile company already does, like AWS for example which not only checks for breaches, but also scans public repositories like GitHub and GitLab for leaked credentials. A company should also warn a user from time to time that the respective needs to update his password, some companies are so careless that they don't even pay attention to this latter small detail. Or at least to warn an account holder that he still has an account with them.
> and that makes your faulty memory their problem
It is not only memory that is flawed in humans. Hence the protective measures I'm proposing.
> against known breaches
What about the unknown ones? How do you protect your user's account when under GDPR Dropbox is the controller of the data? By sending mails ocassionally to update the password, to adopt 2FA, by locking account due to suspicious activity or to purge it in the end if no further action is taken. It ends with the deletion of the user.
Strongly disagree, for Steam in particular. I played a lot of computer games in high school and early college, then stopped for about 7 years. When I finally bought a new computer, I somehow remembered my old Steam password and was thrilled to find that all the old games were still on my account, ready to download. In comparison, I had long lost any physical copies of games I had purchased as a youth.
As a bonus, I get the “bragging rights” of having nearly the oldest possible steam account (it can now vote).
I have accounts over 20 years old I use every few years. I would not be very amused if your suggestion takes off.
I can see simple things happening though that work towards this; for my pet project I just coded a feature that hashes email addresses of inactive (3 months without any interaction) and using another differently salted hash of their email address (which we then no longer have after this) to encrypt their data. They can still login, which restores their account and data without them noticing, but they will never receive email and possible breaches hurt less.
This is the sort of experience that you want. In case you don't want to click through, this is someone with over 1700 hours in an MMO who lost all their progress and items because they took a break and missed the GDPR-related opt-in to get their account transferred.
I don't want to lose all my steam games just because I am inactive for a time. That us a terrible idea, I purchased those digital goods, that's like saying crypto markets should dump data from time to time.
So what would your ideal scenario look like? I buy the game download it, backup on S3, pay for that and then lost access when I don't access it in a few months?
I'm super happy I don't have to worry about storage for my large Steam collection.
If so, please make it opt-in. Let users set the auto-delete date themselves, because I don't want to have to make sure that I log in every other week to keep my account alive.
Not Romanian, but you usually need to make an appointment at a barber (especially now that they can't/don't want to have too many people in their shop at once, due to COVID regulations). If you make the appointment online, then you can usually create an account to view/rebook/cancel it later, if necessary.
I book my hair appointment online. they ask for name, email and mobile phone number. They need the name to know who to expect for the appointment. They ask for email and/or phone to send you a reminder (which is nice, IMO).
Very reasonable and totally with the GDPR rules as well, as long as they purge the data after a certain time.
If a data aggregator can create a timeline of an individuals life, watching personality traits, social graphs, income, travel, routine, biometrics and health, stress and recreation, political affiliation, brand and taste preferences, savings, debt, credit, and social media influence traces, local, regional, and national cultural influences, and so on... that email archive is gold.
You can then create predictive models that let you target products, politics, music, media, and so on. It's not about spying on individuals, it's about manipulating populations. It's about rent extraction and wealth consolidation using tools of influence that negate consent. It augments abuses by law enforcement, corrupting the principles by which democratic governments are supposed to operate by hiding tyranny behind EULAs and TOS and private sector proxies.
Imagine a gpt-3 type model, except that instead of predicting text, it's designed to predict behaviors and psychological effects. That gives you a tool that's got a Darren Brown level of manipulation potential that you can scale. It's never going to be 100% accurate at the individual level, but you can target huge collections of individuals to modulate their lives through advertising and media sequencing.
We’ll this is not what the OP is proposing. Data removal after 3 months or a year seems too fast. I game on steam once every two years - do I have to buy all my games each time?
You can fake relevance if you want to sell the company without actually lying. Coincidentally there's a certain class of company that is in a permanent state of being sold and whose communication is under particular scrutiny wrt truthfulness. Seen from any other angle I fully agree, random user data value tends to be greatly overestimated.
In Romania there's an old question to find the character of somebody: are you Roman or Greek? Because if you are Roman, you put practice above theory, you are a pragmatist, skeptic, if you are Greek you will put theory above practice, you are an idealist, optimist, at the expense of hardships in the material world and other potential drawbacks. Rob Pike chose Roman, otherwise it would have cost Google money to train those people and we don't know if those language features could bring any benefits for the kind of software they develop. As he said, they are not researchers working with category theory, instead they program ever-changing software for an imperfect world driven by profits, rather than academic idealism.
I mean, I actually work(ed) at Google and this is nonsense for Google developers.
The most used and deployed code at Google is written in C++, not Java or Python, and while we have a sane subset of it in our style guide it is by no means dumbed down to the level that Pike seems to imply we need it to be.
Google has done fairly fine deploying C++ into production for a couple decades. I haven't found Go superior at all so far, more insulting than anything. And tedious and opinionated in all the wrong ways.
I just replied to your other comment further up thread, but yes this is my anecdotal experience as well. You couldn't walk ten feet without tripping over C++ and Java. Never myself saw Go checked into corp.
”Pragmatism vs Idealism” is pretty much like saying “we're doing the right things and others are wrong”. No software engineer would define themself as an “idealist who don't take reality into account into their design”.
For instance, I'd consider Go being very “idealistic” since it was designed mostly from the vision of a small group of people with a clear mantra: “simplicity”. But of course, Rob Pike would never have advertised his language this way.
On the other hand, one could argue that Rust is pretty much is the “realist” camp, since it was initially designed in a very different shape from what it later became when early adopters started using it (the Servo team at Mozilla in the beginning, and then other groups: for instance the Fushia team at google at an impact on the design of the async part of the language).
I think the JVM would also be a good choice once Loom lands. I've seen that right now the Java architects are trying to unify the ALGOL flavored Java with ML in the sense of algebraic data structures, pattern matching, local type inference. As you said, OCaml does not have a big community, library ecosystem. It also does not have the 100B$ garbage collectors of the JVM, which are a deal breaker, at least for me.
That's a good option too, though in that case the tradeoff is that compiling to a single binary is a bit harder than in the other languages. Someone mentionned Scala too, which is even "higher level" than OCaml, but I think Scala has the issue of future/async (monadic) concurrency, and is even harder to compile to native. There are lots of options in that space, which is great!
> It also does not have the 100B$ garbage collectors of the JVM, which are a deal breaker, at least for me.
Can you expand a bit on that? I was under the impression that even with all that investment, Java code tends to be slower and more memory hungry than equivalent Go code.
No amount of $ spent on JITs and garbage collection can solve the problem that Java was designed with no respect for memory use. It just doesn’t have the features (such as but not limited to value types) that let you save memory.
They’re adding value types of course, but I haven’t looked at how exactly they’ll work.
There's more than just the stack/heap - in a compiled language your constant data is file backed and doesn't need to contribute to memory footprint at all. Java doesn't have that because jar files are missing it (being zipped, syntax that looks like it creates them emit heap allocation bytecode, etc.) and it doesn't have multidimensional arrays or things that might help you use it even if you wrote it in C.
There's some other tricks like tagged pointers, purgeable data it doesn't reliably have either.
Java NIO can mmap a file of bytes or ints or whatever without using the heap’s private pages, and there are some JVMs that persist and reuse jitted machine code.
I’ve read that games and simulators schedule realtime asset loading pretty carefully; what other problems are solved using large constant data?
Eh you can still have compression, what’s important is the OS pager knows how to read the page from disk as opposed to eagerly loading and swapping it.
Sure it's fine, but isn't that not writing in Java?
I don't remember how good the Java interfaces look though, eg if you worked around no constant data by defining your arrays in C and passing them back, would they be bounds checked arrays?
> If one is dumb enough to use C style arrays instead of C++ bounded checked ones, yes that is a problem, again just like C arrays defined in CGO.
I'm talking about defining a 'const int[]' in C and having it appear as an 'int[]' in Java. I think that's even less likely to happen if you define a const std::vector.
Actually I've never even seen someone define constant data in C++ using a vector, but this fits my experience of C++ developers telling me I'm stupid when I do the only thing I've ever seen anyone actually write.
> Can you expand a bit on that? I was under the impression that even with all that investment, Java code tends to be slower and more memory hungry than equivalent Go code.
Go doesn’t need a good GC as much because it can rely on value types sometimes. But for general enough workloads a superior GC can triumph.
Also, Java is memory hungry only in that doing GC when it is not absolutely needed is useless work. The JVM is actually quite power efficient due to trading off memory for better throughput — so especially in server environments with huge (up to multi-TB) memory which the JVM is free to use, it can be ridiculously fast. So no, java is not slower than equivalent Go program though correctly comparing languages is nigh impossible.
This is why I prefer to write core components in C and then call them through the FFI available in the Java, Python or Node runtimes. Or you can go the "zero-cost abstraction" path which is the middle ground, like C++ or Rust, only that abstractions are not alaways zero cost and low level details will creep into the high level APIs. Different people, different preferences. Like seriously, the only reason I still use C++ is because of its HPC ecosystem coupled with its metaprogramming capabilities which are light years ahead from anything that is offered on the market in that department. This is also the reason why game development can't get out of the C++ trap.
The future of critical low-level programming is a better C or just plain simple C used in tandem with code generators like F*, deductive program verification like Why3 or just good old mathematics that many are irrationaly afraid of like TLA+.
The complexity that comes from the zero-cost abstractions promoted by both C++ and Rust here isn’t in performance (not runtime nor compiletime) but in programmer reasoning and refactoring.
C++ is indeed the most useful for use-cases where high-performance coincides with heavy OOP needs.
Games fall right into that category. But my hope is that one day people will write games in more secure and slightly less verbose languages, like Rust.
Most other software needs can be fulfilled by wrappping C function calls in Python or some other interpreted language.
I agree with you, I believe that simulators, games, software that are "walled-gardens" in the sense that they don't have a lot of interactions with the external world aside from IO and netcode for multiplayer games (that's a pita though), will be usually better in a purely functional model. We must either stay single-threaded or adopt purely functional programming or transactions in those kind of systems. Now, this can be achieved in both C++ and Rust as those kind of systems are not "safety critical", C++ has the edge in metaprogramming while Rust is one step forward in moving the lifetime correctness on the programmer's turf. It's hard to make a point which one is better suited right now as both in my opinion are flawed and have their advantages, it gets evangelical pretty fast. One thing I know is that C++ has the upperhand just because of the sheer amount of libraries that exist in the space and its supporters like Autocad, Valve, Epic and other companies that have billion $ infrastructure built on it.
Right now I'm working with fax machines and I have to provide a C library which is consumed by Scala services. Neither C++ nor Rust would have been an advantage as we are familiar with other methods of verification which are more battletested.
I think PHP gets all the clapping when in fact the statistic that the majority of the Internet uses it to power its services is misleading. Let's face it, Wordpress is a great project, it could have been written in Perl and it would still have had the perks of being Wordpress. When people use it, they don't know what happens behind, I think WP is the perfect example of what a good extensible product is and how it can build an insane ecosystem around it. The plugin architecture is there since 2004, it was and still is very ergonomic. Though Wordpress was not unique in that time, phpBB, vBulletin should be mentioned.
There's no point in creating a new VM, given that you can target almost everything by supporting JVM, CLR, BEAM, V8(JS transpiler) or LLVM/WASM for C-like languages. All you need is an optimized, general purpose IR that will be able to spit bytecode for each one of those and JS in the case of V8. To be fair, I don't think that a language like Kotlin will be able to accomplish this, it is just too complex. I believe that languages with small cores like Clojure with its monumental extension power through macros or Eff or Koka which "let you define advanced control abstractions, like exceptions, async/await, iterators, parsers, ambient state, or probabilistic programs, as a user library in a typed and composable way" such that it will fit runtimes. Like, provide the IR, the building blocks and let ecosystems develop. Some will argue that now you are moving the meaning of polyglot from the programming language to the libary-level, which is true, as seen in the Scala community, the divide between better-Java (Play), a different kind of OOP (Akka) a Haskell-like ecosystem (Typelevel) and an idiomatic Haskell-like ecosystem (ZIO). So you get Java, Erlang and different flavors of Haskell in one language which is Scala :). Many say that Scala is big and messy but in fact the ecosystems spawned by the core of the language made it like that. Scala's spec is smaller than Java's. I would argue that Scala is less complex than Kotlin, but this is highly opinionated.
And even on Android, many open-source projects decide to use Java as they can have access to a wider pool of programmers (Signal, etc). C# is only decreasing in usage, the runtime is just not there, C# has always been known in the enteprise as the language that skipped leg day. It's impressive how many features they've added in a short amount of time that make sense (though some will say, even a few of the Microsoft devs, that async/await was kind of rushed), but the runtime has only received some attention in the most recent times, lagging behind Java with approximately 10 years of research. JS is just JS, hate it or love it, it is here to stay, no point in arguing about this. Put some makeup on it to make things bearable with TypeScript and that's it.
Aside from nullable types, I don't see any other feature that Java could borrow from Kotlin today to improve. Java looks up to Scala and Clojure to get ideas, there are some strong functional programming ecosystems that were bred by these languages, like Typelevel, ZIO, actor-systems like Akka, Datalog queries, contract-based systems like spec2, Stateflow testing, matcher-combinators, generally things that Clojure does differently than the evangelical notion that type-systems are the end all be all. The rest of improvements are on the JVM, like primitive objects (value types) and generic specialization.
What are the innovations brought by Allegro and Lisp Works, relative to Clojure? Just asking as I haven't heard about those, I only have experience with Clj.