Hacker Newsnew | past | comments | ask | show | jobs | submit | antonhag's commentslogin

Congrats! Being able to run a nice company bootstrapped seems amazing.

Turning 10, you might want to stop ditching WordPress for being 15 on your homepage though ;)

   Your customers demand blazing-fast digital products, web standards are evolving at the speed of light, yet you rely on 15-years-old solutions like WordPress that force you to deliver heavy, low-quality user experiences. 
After all, you'll be there in only 5 years!


ahaha, true. on the other hand, wordpress is more 20 than 15 now :)


`Yet you rely on ${new Date().getFullYear() - 2005}-years-old solutions like Wordpress`


From the readme:

  VSCode Extension
    1. Open VSCode
    2. Go to Extensions (Ctrl+Shift+X)
    3. Search for "mbake Makefile Formatter"
    4. Click Install


thanks! apologies i was on mobile and missed this. im excited to try it out


I often reach for jq to understand what unicode is in a string, e.g.:

  [wl-paste|xclip-o|pbpaste] | jq -R --ascii-output
It doesn't provide any per-character explanation, but it is local and I already have jq installed.


AWS has good base building blocks (ALB, EC2, Fargate, RDS, IAM etc). But it takes knowledge to put the pieces together. Thus AWS tries to create services/tools that orchestrate the base blocks (Amplify, Beanstalk) for you, which in my experience always becomes a mess where you don't actually understand what you are running in your cloud setup.

I'd recommend either learning the basic building blocks (these skills also transfers well to other clouds and self hosting) or using a higher level service provider than AWS (Vercel etc) - they do it better than AWS.


For AWS the solution for container deployments (without dealing with VMs) is Fargate, which imo works reasonably well.


I believe I was actually trying to use that. It’s been a few years so my memory is hazy, but isn’t Fargate just a special case of ECS where they handle the host machines for you?

In any case, the problem wasn’t so much ECS or Fargate, beyond the complexity of their UI and config, but rather that Cloudwatch was flaky. The problem that prevented the deployment was in my end, some issue preventing the health check from succeeding or something like that, so the container never came up healthy when deployed (it worked locally). The issue is that AWS didn’t help me figure out what the problem was and Cloudwatch didn’t show any logs about 80% of the time. I literally clicked deploy, waited for the deploy to fail, refresh Cloudwatch, saw no logs, click deploy, repeat until logs. It took about five attempts to see logs. Every single time I made a change (it wasn’t clear the error was on my end so it was quite a frustrating process).

On digital ocean, the logs were shown correctly every single time and I was able to determine the problem was on my end after a few attempts, add the required extra logging to track it down, fix it, and get a working deployment in under ten minutes.


+1, but I'm not sure if the "simple is robust" saying is straightforward enough? It opens up to discussion about what "simple" means and how it applies to the system (which apparently is a complex enough question to warrant the attention of the brilliant Rich Hickey).

Maybe "dumb is robust" or "straightforward is robust" capture the sentiment better?


Copy/paste is robust?

As a biomedical engineer who primarily writes software, it’s fun to consider analogies with evolution.

Copy/pasting and tweaking boilerplate is like protein-coding DNA that was copied and mutated in our evolutionary history.

Dealing with messy edge cases at a higher level is like alternative splicing of mRNA.


The usual metric is complexity, but that can be hard to measure in every instance.

Used within a team setting, what is simple is entirely subjective to that set of experiences.

Example: Redis is dead simple, but it's also an additional service. Depending on the team, the problem, and the scale, it might be best to use your existing RDBMS. A different set of circumstances may make Redis the best choice.

Note: I love "dumb is robust," as it ties simplicity and straightforwardness together, but I'm concerned it may carry an unnecessarily negative connotation to both the problems and the team.

Simple isn't necessarily dumb.


Dull?


Indeed, simple is not a good word to qualify something technical. I have a colleague and if he comes up with something new and simple it usually takes me down a rabbit hole of mind bending and head shaking. A matter of personal perspective?


Is my code simple if all it does is call one function (that's 50k lines long) hidden away in a dependency?

You can keep twisting this question until you realize that without the behemoths of complexity that are modern operating systems (let alone CPUs), we wouldn't be able to afford the privilege to write "simple" code. And that no code is ever "simple", and if it is it just means that you're sitting on an adequate abstraction layer.

So we're back at square one. Abstraction is how you simplify things. Programming languages themselves are abstractions. Everything in this discipline is an abstraction over binary logic. If you end up with a mess of spaghetti, you simply chose the wrong abstractions, which led to counter-productive usage patterns.

My goal as someone who writes library code is to produce a framework that's simple to use for the end user (another developer). That means I'm hiding TONS of complexity within the walls of the infrastructure. But the result is simple-looking code.

Think about DI in C#, it's all done via reflection. Is that simple? It depends on who you ask, is it the user or the library maintainer who needs to parametrize an untyped generic with 5 different type arguments?

Obviously, when all one does is write business logic, these considerations fall short. There's no point in writing elegant, modular, simple code if there's no one downstream to use it. Might as well just focus on ease of readability and maintainability at that point, while you wait for the project to become legacy and die. But that's just one particular case where you're essentially an end user from the perspective of everyone who wrote the code you're depending on.


I like the house analogy, but I like to think of it as if the people building the house did not know how it was supposed to look (or function). This is mostly true, since very few developers know exactly how the end result (product/service) should look and function when the start coding.

e.g. "We did not know where to put the piping at the start, so we put it on the outside and now installing a new restroom is sort of tricky."


This is why nobody can decide if computer science is actually science, engineering, or art. It's such a vast industry that it's clearly all 3 depending on what your doing.


I think everybody agrees it is a craft. Like woodworking - it is part engineering, part art, and a lot of experience.


> I'm currently building a skyscraper on the foundations of a bikeshed.

Not sure if you are joking or not, but I often hear similar things and I believe that it misses the point. What constitutes a good foundation in software is very subjective - and just saying "foundation bad" does not help a non-technical person understand _why_ it is bad.

It's better to point at that one small rock (some ancient perl-script that no-one longer understands) which holds up the entire thing. Which might be fine until someone needs to move that rock. Or something surrounding it.


I like this thinking because it's a true reflection of how things work. I strongly doubt any housebuilder goes back to the architect and says "can't do that, foundations bad." They'd explain what the problem is: maybe the design is rated to a certain weight/height, or what's in the ground composition that prevents the requested changes.

We should do the same in software engineering. What exactly in our design (e.g. that Perl script that's running half the operation that we need to investigate) is stopping us?


xkcd: Dependency

https://xkcd.com/2347/


From the code samples it's hard to tell whether or not this has to do with de-serialization though. It would have been fun to see profiling results for tests such as these.


Author here, I'm away from my computer atm, but I can cook up a repo with each test in a few hours when I get home.

I designed the tests as a drag race because that mimics my real world usage.


That's nice - I'd encourage you to play around with attaching e.g. JMC [1] to the process to better understand why things are as they are.

I tried recreating your DataInputStream + BufferedInputStream (wrote the 1brc data to separate output files, read using your code - I had to guess at ResultObserver implementation though). On my machine it roughly in the same time frame as yours - ~1min.

According to Flight Recorder:

  - ~49% of the time is spent in reading the strings (city names). Almost all of it in the DataInputStream.readUTF/readFully methods.
  - ~5% of the time is spent reading temperature (readShort)
  - ~41% of the time is spent doing hashmap look-ups for computeIfAbsent()
  - About 50GB of memory is allocated - %99.9 of it for Strings (and the wrapped byte[] array in them). This likely causes quite a bit of GC pressure.
Hash-map lookups are not de-serialization, yet the lookup likely affected the benchmarks quite a bit. The rest of the time is mostly spent in reading and allocating strings. I would guess that that is true for some of the other implementations in the original post as well.

[1] https://github.com/openjdk/jmc

edit: better link to JMC


JMC is indeed a valuable tool, though what you see in any java profiler is to be taken with a grain of salt. The string parsing and hash lookups are present in most of the implementations, yet some of them are up to 10 times faster than the DataInputStream + BufferedInputStream code.

It doesn't seem like it can be true that 90% of the time is spent in string parsing and hash lookups if the same operation takes 10% of the time when reading from a filechannel and bytebuffer.


Aren't the versions that take 10% of the time only reading each city name once, and then doing an array lookup rather than a hashmap lookup?


Nope, see for example "Custom 1":

  var buffer = ByteBuffer.allocate(4096);
  try (var fc = (FileChannel) Files.newByteChannel(tempFile, 
                        StandardOpenOption.READ)) 
  {

    buffer.flip();

    for (int i = 0; i < records; i++) {

        if (buffer.remaining() < 32) {
            buffer.compact();
            fc.read(buffer);
            buffer.flip();
        }

        int len = buffer.get();
        byte[] cityBytes = new byte[len];
        buffer.get(cityBytes);
        String city = new String(cityBytes);
        int temperature = buffer.getShort();

        stats.computeIfAbsent(city, k -> new ResultsObserver())
             .observe(temperature / 100.);
    }
  }


My bad - I got confused as the original DIS+BIS took ~60s on my machine. I reproducing the Custom 1 implementation locally (before seeing your repo) and it took ~48s on the same machine. JFR (which you honestly can trust most of the time) says that the HashMap lookup now is ~50% of the time and the String constructor call being ~35%.


JFR only samples running Java methods.

I would guess at least some of the bottlenecks are in hardware, the operating system or in native code (including the JVM) in this case.


Hi,

Please add https://github.com/apache/fury to the benchmark. It claims to be a drop-in replacement for the built-in serialization mechanism so it should be easy to try.


Will do!


Hi! Crate author here, happy to see my old project getting mentioned. Let me know if you have any questions!


Do you actively use this function in any projects? What was your inspiration to write the crate?


I don't actively use it, unfortunately. The main inspiration was to faster sort inputs into the https://github.com/BurntSushi/fst crate, which I in turn used to try to build a search library.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: