> he system can't recover from an error in an individual flight plan, bringing t...

akira2501 · on Nov 18, 2024

> obscuring this flight's waypoint problem have resulted in a potentially conflicting flight not being tracked among other flights?

Flights are tracked by radar and by transponder. The appropriate thing to do is just flag the flight with a discontinuity error but otherwise operate normally. This happens with other statuses like "radio failure" or "emergency aircraft."

It's not something you'd see on a commercial flight, but a private IFR flight (one with a flight plan), you can actually cancel your IFR plan mid flight and revert to VFR (visual flight rules) instead.

Some flights take off without an IFR clearance as a VFR flight, but once airborne, they call up ATC and request an IFR clearance already en route.

The system is vouchsafing where it does not need to.

cryptonector · on Nov 19, 2024

The appropriate thing to do was to reject the flight plan (remember, the flight plan is processed before the flight starts, and anyways there were hours over U.S. in which the flight could have been diverted if manual resolution was not possible), not to let the flight continue with the apparent discontinuity, nor to shutdown the whole system.

CPLX · on Nov 19, 2024

> flight plan is processed before the flight starts

Not necessarily. Also they regularly change while the flight is in the air.

CPLX · on Nov 19, 2024

This isn’t quite correct. The oceanic tracks do not have radar coverage and aren’t actively controlled.

There are other fail safe methods of course all the way up to TCAS, but it’s not great for an oceanic flight to be outside of the system.

martinald · on Nov 18, 2024

Yes I agree. The reason the system crashed from what I understand wasn't because of the duplicate code, it was because it had the plane time travelling, which suggests very serious corruption.

kevin_thibedeau · on Nov 18, 2024

Waves hand... This is not the SQL injection you're looking for. It's just a serious corruption.

outworlder · on Nov 18, 2024

> From the system's POV maybe this is the right way to resolve the problem. Could masking the failure by obscuring this flight's waypoint problem have resulted in a potentially conflicting flight not being tracked among other flights? If so, maybe it's truly urgent enough to bring down the system and force the humans to resolve the discrepancy.

Flagging the error is absolutely the right way to go. It should have rejected the flight plan, however. There could be issues if the flight was allowed to proceed and you now have an aircraft you didn't expect showing up.

Crashing is not the way to handle it.

aftbit · on Nov 18, 2024

It seems fundamentally unreasonable for the flight processing system to entirely shut itself down just because it detected that one flight plan had corrupt data. Some degree of robustness should be expected from this system IMO.

mannykannot · on Nov 18, 2024

It does not seem reasonable when you put it like that, but when could it be said with confidence that it only affected just one flight plan? I get the impression that it is only in hindsight that this could be seen to be so. On the face of it, this was just an ordinary transatlantic flight like thousands of others, with no reason to think there was anything unusual about it to make it more vulnerable than the rest - and really, there was not, it just had an unlucky combination of parameters.

In general, the point where a problem first becomes apparent is not a guideline to its scope.

Air traffic control is inherently a coordination problem dependent on common data, rules and procedures, which would seem to limit the degree to which subsystems can be siloed. Multiple implementations would not have helped in this case, either.

cryptonector · on Nov 19, 2024

Shutting down a flight control system might have other knock-on effects on flight safety. Even if it merely only grounded flights not yet in the air, the resulting confusion might lead to manual mistakes and/or subsequent air lane congestion that might cause collisions.

mannykannot · on Nov 19, 2024

every option had risks associated with it, and they are hard to assess until you know how deep the problem goes.

MBCook · on Nov 18, 2024

I think you’re on the right track, I assume it’s safety.

If one bad flight plan came in, what are the chances other unnoticed errors may be getting through?

Given the huge danger involved with being wrong shutting down with a “stuff doesn’t add up, no confidence in safe operation” error may be the best approach.

HeyLaughingBoy · on Nov 18, 2024

It depends on what the potential outcomes are.

I've worked on a (medical, not aviation) system where we tried as much as possible to recover from subsystem failures or at least gracefully reduce functionality until it was safe to shut everything down.

However, there were certain classes of failure where the safest course of action was to shut the entire system down immediately. This was generally the case where continuing to run could have made matters worse, putting patient safety at risk. I suspect that the designers of this system ran into the same problem.

cryptonector · on Nov 19, 2024

There is no need to shut down the whole system just because of one flight plan that the system was able to reject. Canceling (or forcing manual updates to) one flight plan is a lot better than canceling 1,500 flights.