More

pedrokost · on Jan 19, 2024

We recently completed the migration of a self-hosted 3 TB PostgreSQL database from version 12 to 16, transitioning from Ubuntu 18 to Ubuntu 22. Concurrently, we had to upgrade various extensions, most notably Timescale, for which a compatible version did not exist across all scenarios. We performed the upgrade by updating a replica in the following sequence:

- Start: PG12, Ubuntu 18, TS2.9

- Step 1: Set up a read-only replica with PG12 on Ubuntu 22, maintaining TS2.9.

- Step 1.5: Enter maintenance mode and halt all services.

- Step 2: Detach the the read-only replica, upgrading from PG12 to PG15 on Ubuntu 22 with TS2.9.

- Step 3: Upgrade from PG15 with TS2.9 to TS2.13 on Ubuntu 22.

- Step 4: Upgrade from PG15 to PG16 on Ubuntu 22 with TS2.13.

- Step 4.5 : Reconnect services to the new database server, resume all services, and exit maintenance mode.

All the database upgrade steps were well-tested and automated using Ansible. Nonetheless, we did encounter an issue that had not arisen during testing. This extended our downtime to approximately half an hour, which, for our use case, was perfectly acceptable.

Employing logical replication could have mitigated the last-minute surprise. So we will consider this approach for our next upgrade cycle.

pbrooko · on Jan 19, 2024

We recently followed an almost identical upgrade path (however at the time PG16 wasn't yet supported by Timescale, so stopped at PG15 + TS 2.12).

We did look into using logical replication to reduce the downtime of the upgrade, but because database schema and DDL commands aren't replicated, it seems that it isn't recommended with Timescale in the loop.. (I suppose the underlying schema changes that Timescale needs to make under the hood are mostly a function of your hypertable chunk sizing and what your incoming writes look like, so this could be planned around / timed well, but we felt it added too much potential complexity & risk compared to simply opting for a small maintenance window while pg_upgrade completed).

pedrokost · on April 3, 2023

Sentinel Marine solutions | Backend Engineer (Python) | HYBRID (Ljubljana, Slovenia) | Full-Time

We are looking for a Backend developer with a 5+ years of relevant experience. Our stack is Python, Django, PostgreSQL, TimescaleDB, RabbitMQ. We use Docker, Ansible, and more.

Sentinel is the maker of the "App for boats". We help make the ownership experience of the boat easier with telematics, by keeping you connected to your boat, giving you at-a-glace overview of vital boat equipment, and helping you with boat maintenance with reminders. We also connect the boat owner with the dealer and the broader ecosystem of partners relevant for boat ownership. Our products are a keystone piece of equipment on boats from the largest boat builders worldwide. We are still a small team (20 people), and are making waves in the boating industry. Your contributions will have a huge and direct on the boating experience of current and new boats.

This is a position requiring at least partial onsite presence, so it's local to Ljubljana and nearby areas. This is to facilitate working with hardware/firmware engineers when necessary, and mentoring less experienced developers.

To apply, email: jobs {at} sentinelmarine.net

pedrokost · on Aug 29, 2021

We are experiencing very high CPU load caused by tinc [0], which we use to ensure all communication between cloud VMs is encrypted. This is primarily affecting the highest traffic VMs, including the one hosting the master DB.

I am starting to consider alternative tools such us wireguard to reduce load, but I am concerned of adding too much complexity. Tinc's mesh network makes setup and maintenance easy. The wireguard ecosystem seems to be growing very quickly, and it's possible to find tools that aim to simplify its deployment, but it's hard to see which of these tools are here to stay, and which will be replaced in a few months.

What is the best practice, in 2021, to ensure all communication between cloud VMs (even in a private network) is encrypted?

[0]https://www.tinc-vpn.org/

PhilippGille · on Aug 29, 2021

Apart from some smaller projects building on top of WireGuard, there's Tailscale [1]. One of the founders is Brad Fitzpatrick who worked on the Go team at Google before and built memcached and perkeep in the past.

Outside of the WireGuard ecosystem there's ZeroTier [2] which has been around for a while and they're working on a new version; and Nebula [3] from Slack, which is likely to be maintained as long as Slack uses it.

There might be others, but with tinc these four are the ones I've seen referred to most often.

[1] https://tailscale.com

[2] https://www.zerotier.com

[3] https://github.com/slackhq/nebula

iyn · on Aug 29, 2021

+1 for Tailscale, the product is great. I've used it in a very limited scale but can vouch for quality and performance. No CPU issues at all (even on rPi).

qchris · on Aug 29, 2021

Similar to Tailscale is the Innernet project, which has similar goals but is fully open source (also built on Wireguard). I've heard that set-up is a bit more painful, but for those who are interested in FOSS or self-hosting, it might be worth looking into.

[1] https://github.com/tonarino/innernet

ignoramous · on Aug 29, 2021

NoCode: fly.io with its 6pn (out-of-the-box private networking among clusters in the same org).

DIY: envoyproxy.io / HashiCorp Consul for app-space private networking over public interfaces.

LowCode: Mesh P2P VPN network among your clusters with FOSS/SaaS like WireTrustee / tailscale.io / Slack Nebula.

udev · on Aug 29, 2021

What kind of loads are we talking about here? How many requests per seconds? Or is each request response large?

Have you noticed whether it is worse for lots of small requests vs large data transfers?

I use a very similar setup, but haven't seen tinc CPU usage matter yet, though for very low traffic.

pedrokost · on Nov 12, 2020

The author has gained a significant amount of Postgres-related knowledge when performing this migration.

Taking this into consideration, is it still worth using the managed Aurora, instead of the EC2 self-managed instance?

sciguymcq · on Nov 12, 2020

I currently manage, or am "lead dev", for around a half dozen to 10 apps using Postgres as the database. For two of those I use a managed database service (AWS RDS / AWS Aurora Postgres) because they drive mission critical high value products / services and I wouldn't consider using a standalone install of Postgres. For other apps that are not so mission critical I am perfectly happy saving considerable money and running them on a VPS on AWS EC2 or Digital Ocean or Linode. So ultimately it depends on the app and the risk tolerance for Recovery Time Objective and Recovery Point Objective.

rohansahai · on Nov 12, 2020

That's super interesting! Have heard people do the opposite for performance reasons (run on bare EC2) and storage cost reason. Do you notice a big performance difference between the EC2 apps and RDS apps? Also have you had any EBS volume related issues on RDS or unexplained latency/downtime ?

fastball · on Nov 12, 2020

Worth noting that DO also has a hosted Postgres service.

takeda · on Nov 13, 2020

If you have a CM infrastructure set up, you're not gaining much. Tools like repmgr/barman (in public cloud you probably would use WAL-E/WAL-G) provide nearly all of the benefits.

PostgreSQL is generally low maintenance, the only cases when this might not be true is when you're hitting performance bottlenecks and need to tune your database, optimize your tables, or use bigger instance etc, but RDS doesn't save you from that.

jrochkind1 · on Nov 12, 2020

RDS Postgres (actual postgres, not the Aurora product which can be "mostly postgres compatible") is another option, not just "EC2 self-managed instance".

I guess we'd have to look at price comparisons for the size you actually need. I think depending on load, the price differences may be nominal.

mijoharas · on Nov 12, 2020

Thought I'd weigh in here, since we moved from heroku to ec2 to aurora to RDS postgres, so I can probably speak to this a little more.

* ec2 self managed is easily the cheapest, we had a solid setup, with continuous backups and a read replica, if cost is a factor, it's easily a winner. However, there is a _lot_ of knowledge that goes with it. When it comes down to it, you can pay someone else to handle that. This isn't just the setup cost, you need to factor in ongoing maintenance (the number of people at my company that could have done complicated things with the instance was probably me and another engineer, and we didn't want to be permanently on call for this) and general risk.

* EC2 did probably work out to between 1/2 to 3/4 (probably 2/3) the price of the equivalent RDS (tough to say exactly, as I'd need to factor in all the ancillary costs that are more "bundled together" in RDS)

* RDS was much cheaper than aurora for our workload. (almost 1/2 the price).

I think the main thing to remember, is that ec2 is much cheaper than RDS, but RDS is cheaper than engineers. With that, while we didn't need to do anything complicated other than the migration, the risk and possible engineering time and bus factor didn't feel worth it to stay on ec2.

craigkerstiens · on Nov 13, 2020

We've had a few people migrate from Heroku and RDS over to Crunchy Bridge and see 2.5x performance improvement on warm cache and up to 5-6x performance improvement on cold cache for the exact same $ spend. I can't say easily how that would compare to raw EC2 cause it really depends on how you tune and configure. But there is definitely some opportunity to optimize vs. any stock install and setup.

The real thing about EC2 is it'll be cheaper. But when you do have a bad day, say page corruption you're gonna have more than a bad day and a bad week/month trying to untangle that. I'm not sure how helpful RDS is in those cases. I can say that Heroku historically was solid (I was there way back in the day having helped build the service). And there are some other options that can still deliver good performance, good support, at a balanced price.

rohansahai · on Nov 12, 2020

Been on RDS for awhile now but have recently had some issues that are pushing us to heavily consider EC2 - especially since we've heard about considerable performance gains running on some of the bare metal EC2 options out there vs RDS.

mijoharas · on Nov 12, 2020

We had equivalent performance (and we'd benchmarked both) for equivalent RDS and EC2 instances. (EDIT: to note, we didn't dig in heavily, just looked and saw it seemed the same, maybe there's something smart you can do to make EC2 a lot faster)

First thing to think about for RDS (or any postgres instance really) is to figure out what the bottlenecks are, if your cache hit ratio is slightly low, or queries are generally feeling a little sluggish but cpu and ram aren't too high, I'd recommend looking into increasing the provisioned IOPS of the machine (requires no downtime, basically you can just make the storage faster).

If your issues are ram/cpu being high you might want to consider moving to a larger instance.

It can also be worth looking into the number of temp files being created by your queries and tweak your work_mem setting (you can use log_temp_files to see if this is likely to help a bit).

Other than that, remember that every connection to the db is a separate process, so you don't want too many. It could be worth looking at RDS proxy or pg_bouncer if you think you have a lot of connections to your db.

pedrokost · on Oct 25, 2020

This sounds great. Instead of trusting each company to implement their own billing system with some payment processor, just have the payment processors and banks provide "apps" that support these protocols, and users can chose which one to use to complete their payment.

One interesting follow up question in who covers the processing fee (e.g. the ~3% or so that Braintree or Paypal currently take)? Today, processing fees are mostly hidden to the user. If the user can chose their own payment processor, will they be required to cover the processing fee instead of the merchant?

teddyh · on Oct 25, 2020

> Instead of trusting each company to implement their own billing system with some payment processor, just have the payment processors and banks provide "apps" that support these protocols

To me, this sounds dangerously close to the argument for the W3C standardization of DRM in the browser. I remain wary.

pedrokost · on March 30, 2020

Chrome has this built-in as a flag: chrome://flags/#enable-force-dark

It will render all web pages with dark mode

galacticaactual · on March 30, 2020

Cool, didn't know that. The extension allows for whitelisting, dynamic toggling, etc. FYI.

pedrokost · on Feb 27, 2020

I have been observing something similar recently, but related to other technologies. People claiming they are good at Excel, but can't do a vlookup or a pivot table. People using 5% of the power of an IDE. People not realizing that if they are able to use a web-based tool at work, they could (probably) also access it remotely.

And these are just the things I have noticed in the past week.

We utilize just a tiny percentage of options that are available to us, barely enough to get the basics of out tasks done. But we are completely oblivious that there are probably better/hidden/power tools available in the software we are already using that could help us achieve our tasks much more efficiently.

Interestingly, most people probably don't care about it, or are so oblivious of their lack of knowledge that they don't even search for better ways to do things.

It's hard to blame thought. Improving this state requires a mentality of continuous active learning, where you don't just wait for someone to show you how to do your task better, but to constantly expand your knowledge into areas you don't even think you need to improve. However, most of us usually have "better things to do" than reading software manuals.

Smoosh · on Feb 27, 2020

I think part of this was that in the past, computers and software were new and novel. People had no comparative experience, so training had to be provided to fully explain what you could do & how you could do it.

Compare to today where most people "learn" by immersion - you have to use Google Docs or Excel for your school work, so you adsorb just enough to get your work done without really understanding the fundamentals or the more complex or non-obvious features, since you weren't taught systematically or comprehensively.

Then you "know" Excel, or Word well enough to get by - but don't know that you are barely scratching the surface of it's capabilities, or don't realize that being more competent with the tool will make you more capable and productive. And thus the motivation/opportunity isn't there to invest that time and effort.

_8ljf · on Feb 28, 2020

The obvious observation about Excel: spreadsheets are a highly specialized expert tool for highly specialized expert users (accountants).

Most Excel users aren’t; and most don’t even use it for its designed purpose, i.e. user-programmable parallel calculations, but rather for ad-hoc ersatz databases.

Of course, said users are rarely any better at designing and using databases than they are at bookkeeping.

pedrokost · on Oct 11, 2019

2 months ago I discovered Kepler[0] a tool by Uber for visualizing geo data. It has been of tremendous help when analyzing large geospatial datasets. It allows me to quickly visualize the results of my computations, and spot anomalies and patterns that I wouldn't have noted without it.

This seems to be a similar tool, but for charting non-geographic datasets. It should also be extremely useful.

I am glad companies are making these tools available for the masses.

[0] https://kepler.gl/

noen · on Oct 11, 2019

Both Kepler and Sanddance are built on top of another Uber creation called https://deck.gl

pedrokost · on May 11, 2019

The value Twilio's offering is when you need coverage also outside Europe, in remote countries. Yes, you can get cheaper data plans to cover the EU, but those often come with very high data rates in other areas.

ohashi · on May 11, 2019

I get $10/GB on Google Fi and can have 9 data only sim cards on my plan for $20/month.

pedrokost · on Jan 6, 2019

I have just completed a first project with OpenShot. It worked perfectly fine initially, but as the movie became more complex, everything took much longer. Saving took several seconds, sometimes freezing everything for several minutes. Dragging multiple clips in the timeline would freeze the editor for several seconds. It crashed multiple times. Just to be clear, this was from having about 30 original clips, but cut multiple times (each probably at least 10 times), so over 300 individual clips in the timeline. The bottleneck seems to be related to parsing of the JSON, which is the format in which the data is encoded internally. The final file was about 1.1MB of JSON. I've also encountered strange artifacts when adding a new layer at the bottom of the screen - I couldn't drag clips onto it, they were placed onto another layer instead.

This is on a laptop with an i7 and 12 GB of RAM.

Compared to my experience of using Premiere Pro several years ago on Windows, with just 4GBs of RAM, OpenShot has a long way to go even for basic editing.

But, having said that (and considering that Premier had years of paid development), it's a good tool and it's great that some developers are trying to built video editing tools for Linux. Things are slowly improving.

app4soft · on Jan 6, 2019

> This is on a laptop with an i7 and 12 GB of RAM.

> Compared to my experience of using Premiere Pro several years ago on Windows, with just 4GBs of RAM, OpenShot has a long way to go even for basic editing.

Can you try Olive and compare it with all above?

As for me, I can fully use Olive under Linux (Debian 9.x) on my 10-year-old notebook with just 2GBs of RAM ;)