1

Closed

type propagation of overloads is broken

description

A filter for boolean or number, works for literal applications
var BN : { (b:boolean):boolean; (n:number):number; }
    = (x=>x);

[BN(true)
,BN(1)
,BN("hi") // type error
,BN([]) // type error
];
but things break down when type propagation interacts with overloads
var f = (x)=>BN(x);
    // type information both imprecise and incorrect
    // infered as: (x:any)=>boolean
f(true);
f(1); // incorrect return type
f("hi");  // missing type error
This is partially related to the lack of type unions (eg, TS has no precise type to
use for x in f's body) but it should infer an overloaded type for f. Also, there is
some buggy "go with the first overload"involved. The result is seriously broken.

Manifestation of bug changes slightly when defining a higher-order filter instead:
function BNF<T>(f:(b:boolean)=>T):(b:boolean)=>T;
function BNF<T>(f:(n:number)=>T):(n:number)=>T;
function BNF<T>(f:(bn:any)=>T):(bn:any)=>T {
    return f;
}

var g = BNF<string>(x=>x.toString()); // incorrect type (b:boolean)=>string
g(true);
g(1);   // incorrect type error
related issue: https://typescript.codeplex.com/workitem/1667
Closed Oct 14, 2013 at 7:10 PM by ahejlsberg

comments

RyanCavanaugh wrote Oct 7, 2013 at 6:22 PM

This is all behaving according to spec. Contextual types only apply in the presence of exactly one non-generic overload, and function invocations choose the first applicable signature. Without union types, this is the best we can do.

** Closed by RyanCavanaugh 10/07/2013 10:22AM

clausreinke wrote Oct 7, 2013 at 10:37 PM

That only means that spec and implementations are consistently broken.

For the function-with-overloads case, the last signature is the one covering the more specific ones preceding it ((bn:any)=>string would be an adequate type for g).

For the var-with-overloaded-interface case, there is no explicitly given signature that can be used, but there are several options that could be infered well within the TS type language:
  1. f:(x:any)=>boolean // current choice, terrible; wrong in every dimension
  2. f:(x:any)=>any // bad, but still better than 1 (in giving no wrong guarantees)
  3. f:<T>(x:T):T // still not right, in some senses better than 2, but accepts some wrong code without invoking any
  4. f:{ (b:boolean):boolean; (n:number):number; } // the obvious/expected choice
There will still be situations that call for union types or other type system extensions, such as the typing of x within f, but -for the examples given- TS is not doing the best it can.

Another way of looking at it: overloads are union types, incompletely realized and not correctly propagated.

danquirk wrote Oct 9, 2013 at 10:56 PM

The situation looks like this to me: an ambiguous function call can either be an error or there can be some method for determining which call to resolve to. Given that there will be many instances where function arguments involve the 'any' type there will be many cases where an overload call is ambiguous (as in this case where contextual typing has pushed an 'any' type through an overload). So we must choose some way to resolve a function overload to something concrete if there is at least 1 applicable signature which could be invoked. As you've noted, there are no union types in the type system, and the TypeScript compiler is not in the business of trying to synthesize new signatures as a best guess (whether it be union, intersection, any, or something else) in cases of ambiguity.

Note that if the compiler did synthesize signatures like your suggestion 2 or 3 it is meaningfully altering the contract you explicitly encoded in this function signature. That seems quite bad to me. Incorrectly only allowing 'part' of what you specified seems significantly better to me than incorrectly allowing more than what you specified, as at least in the former case you're guaranteed to have that part handled correctly, unlike the latter case where you've now allowed callers to pass values that will surely result in runtime errors.

So that basically leaves choosing an overload signature that actually exists.

Note that for your function-with-overload case, the signature you're referencing is not actually visible. If you want the implementation signature to be visible you need to add it as a separate overload signature as well.

Regardless, you're asking that in cases of ambiguity the compiler pick the least specific available candidate rather than the most specific. The implications of a change like this lead to plenty of undesirable semantics in situations other than this one.

The type system is most certainly not perfect while trying to model a dynamic world with a static system and you're seeing some of the seams that are a natural by product of the trade offs that requires.

danquirk wrote Oct 9, 2013 at 10:56 PM

** Closed by danquirk 10/09/2013 2:56PM

clausreinke wrote Oct 10, 2013 at 4:24 PM

Thanks for expanding on your reasoning.
The situation looks like this to me: an ambiguous function call can either be an error or there can be some method for determining which call to resolve to.
I think I see where you are coming from: if you insist on resolving overloads early, you are indeed in all kinds of trouble. Experience with overloading in Haskell shows that in most of these troublesome cases, the solution is simply to delay overload resolution until there is enough information (and to raise an error only if sufficient information is never obtained). That gives a third, in many case preferred, alternative to your summary: an ambiguous call can be linked to other sites where unambiguous resolution becomes possible.

Much of that experience translates to TypeScript, where there is a single runtime implementation for all static overloads (Haskell implementations can also delay selection of overloaded implementations to runtime, so the situation there is a similar combination of static typing and dynamic selection as it is in TypeScript/JavaScript).
As you've noted, there are no union types in the type system, and the TypeScript compiler is not in the business of trying to synthesize new signatures as a best guess (whether it be union, intersection, any, or something else) in cases of ambiguity.
Actually, I noted that TypeScript does have union types, in the form of overloaded signatures - it just lacks some type language expressiveness and inference rules to propagate this information to where it is needed. A type system is synthesizing new signatures all the time, there should be no guessing involved.

In the example code, f directly calls BN, so f should have the same overloaded signature as BN has (they share the same implementation). Further typing of applications of f would proceed the same way as typing of BN applications would, with the overload resolution of f linked to the overload resolution of BN.
Note that if the compiler did synthesize signatures like your suggestion 2 or 3 it is meaningfully altering the contract you explicitly encoded in this function signature. That seems quite bad to me.
For 3 (the generic type), you are correct. For 2 (any=>any), I don't see that.
Incorrectly only allowing 'part' of what you specified seems significantly better to me than incorrectly allowing more than what you specified, as at least in the former case you're guaranteed to have that part handled correctly, unlike the latter case where you've now allowed callers to pass values that will surely result in runtime errors.
That is not what happens, though: TS infers 1 (any=>boolean), which is simply incorrect when applied to a non-boolean. Whereas my 2 (any=>any) would make no incorrect guarantees.

In fact, the TS overload resolution algorithm is not (or should not be) applicable at that position: x is bound outside the call but not yet instantiated, so you can neither instantiate it yourself to fit an overload nor can you match its instantiation against candidate signatures.

Also, the incompleteness you argue for is not necessary - there is a perfectly valid signature for f (the same as for BN).
Note that for your function-with-overload case, the signature you're referencing is not actually visible. If you want the implementation signature to be visible you need to add it as a separate overload signature as well.
My idea of how this works: TS verifies the overloads and the implementation code against the implementation signature; TS never needs to show the implementation signature unless the overloads are not sufficient; unlike designed-for-types languages, JS may do things in the implementation that defy inference/unification, so the user-supplied and system-verified implementation signature may be the best available approximation to a most-general unifier; TS needs to keep the implementation signature around unless it can reconstruct a better one from the overloads.
Regardless, you're asking that in cases of ambiguity the compiler pick the least specific available candidate rather than the most specific. The implications of a change like this lead to plenty of undesirable semantics in situations other than this one.
Using the most general unifier (modulo a suitable theory for advanced type systems) is standard procedure in type inference system, as far as I know: the idea is to pick the most specific signature that is still general enough to cover all signatures under consideration. Normally, the type system (and language) are designed to ensure that most general unifiers exist. For JS, using the implementation signature (user-specific, system-verified) seems to be the safest approximation.

I would be interested to hear details about those undesirable semantics situations.
The type system is most certainly not perfect while trying to model a dynamic world with a static system and you're seeing some of the seams that are a natural by product of the trade offs that requires.
I am aware that this is a very difficult problem. Even in the designed-for-types Haskell world, the boundaries of what the type system can handle have been refined year after year, now coming up on two decades of never being satisfied with the status quo, with no end in sight.

That is why it is important to report boundary failures, and to keep those issues open until they are resolved - sometimes it takes a while until a suitable improvement is discovered that removes some of the boundary failures. A type system designer's work is never done and ongoing discussion with users is key to type system evolution.


Btw, I have experienced both static and dynamic type systems and my impression has been that static types, combined with Dynamic/typecase for the interface between static and dynamic typing, can cover all use cases. That doesn't mean that type system design suddenly becomes trivial, just that becomes concerned with quality of static type information, no longer with edges that cannot be handled at all.

TypeScript's optional typing approach leads to a less tidy handling of the edges (eg, it has become accepted that one can get from static types to any, but never back; also, since code is generated anyway, types get relaxed until the squiggles disappear, instead of getting refined until they express code intention).

I seem to recall experiments with control-flow typing for TS, which most other JS type systems consider necessary anyway (to cope with legacy JS code). But I haven't seen any efforts to support Dynamic/typecase in TS. It is easy enough to encode this for the typeof types, and I can see how to expand it to cover classes other than Object/Function. It would be interesting to see how far one could drive this to cover more of the richness of TypeScript types.

RyanCavanaugh wrote Oct 11, 2013 at 9:07 PM

Please start a forum thread on this. We can't take a wholesale redesign of the type system as a work item and this isn't a good communication format for these kinds of discussions. I can see why this result can be unintuitive, but it's built on some very foundational rules in the type system that can't be easily changed.

As a starting point for that discussion, we'd want to consider the basic result of overload resolution in the presence of multiple matching overloads. I think starting there with a proposed design change would be the most productive. Thanks!

** Closed by RyanCavanaugh 10/11/2013 1:07PM

clausreinke wrote Oct 12, 2013 at 10:50 AM

Please start a forum thread on this. We can't take a wholesale redesign of the type system as a work item
The discussion might be better elsewhere, but the codeplex forum is no better than the tracker (other than that it might keep you from closing things!-). The work item should remain open until this is resolved, and it is quite possible to split off smaller work items as they emerge (eg, the type system seems to behave differently, depending on which types are involved in the overloads), until the big issue (which might require a redesign) is fixed.

Any redesign would require detail work that won't fit here (or in the forum). However, any such documents and code can be linked from here, making this the entry point. An alternative would be a wiki page linking to work items and other documents, but I don't think I have write permissions on the wiki.

At the moment, I'm still trying to convince you that you have a problem. Until you see that, we cannot begin to discuss whether the problem is foundational or incidental. The TS type system is definitely imprecise, and it definitely has a bug (see below), but the bug may not be systematic.
I can see why this result can be unintuitive,
It is not unintuitive, it is plain incorrect! Here is a simple concrete example of a runtime type error for TS-typechecked code (tested in playground):
var o : { (b:string):string;
          (n:number):number; }
    = (x=>x);

[o("hi")  // ok
,o(1)   // ok
];

var f = (x)=>o(x);
    // type information both imprecise and incorrect
    // infered as: (x:any)=>string
f(1);   // ok
f("hi");    // ok
f(true).toLowerCase();  // no static type error!
    // Uncaught TypeError: Object #<error> has no method 'toLowerCase'

clausreinke wrote Oct 12, 2013 at 11:05 AM

Here is a slightly modified example, showing that the TS type system behaves differently if the types in the overloads get more complex. This time, the type of f is the expected any=>any, which is imprecise but not incorrect (it makes no incorrect promise about the return type).
var o : { (sff:(sf:(s:string)=>string)=>(s:string)=>string);
          (n:number):number; }
    = (x=>x);

[o(f=>f)  // ok
,o(1)   // ok
];

var f = (x)=>o(x);
    // type information imprecise 
    // infered as: (x:any)=>any
console.log( f(1) );    // ok
console.log( f(f=>f)("ho") );   // ok
console.log( f("hi")("ho") );   // not ok
    // Uncaught TypeError: string is not a function

jonturner wrote Oct 14, 2013 at 5:00 PM

Eta conversion is not something handled by the type inference algorithm because in no case do no infer from inside the function body to its arguments. This is a fundamental design choice of the language and will likely not be changed.

The goal of the type inference system is to be fast and helpful, not to be precise in all cases.

ahejlsberg wrote Oct 14, 2013 at 7:10 PM

As Jonathan points out, we never infer types for parameters from their use in function calls. While it might be possible to do so in the examples you list, it quickly devolves into highly non-local inference which is both the power and the bane of some functional languages. The simple axiom in TypeScript is that parameter types are either written explicitly or contextually inferred from the assignment target (i.e. a local inference).

It is true that you can use overloading to represent some union type patterns, but it was never the intent for overloads to be a functionally complete substitute. We may at some point consider union types (they definitely have their uses), but at this point we feel their added complexity just trades one problem for another.

clausreinke wrote Oct 14, 2013 at 10:32 PM

Eta expansion was just the simplest way to demonstrate the issue, which is type propagation -or highly non-local inference- interacting with overloads. The former has been part of functional language type systems for longer than I've been a computer scientist, and the latter since the late 1980s, so it is hard for me to accept a type system as a type system when it doesn't use this;-)
the type inference algorithm because in no case do no infer from inside the function body to its arguments
Thanks, that explains our difference of opinion.

I'm really struggling with this design choice - for JS it could be justified because there are hardly any type crashes in JS, only implicit type conversions; but does this really carry over to a language like TypeScript?

Even leaving out the inter-convertible JS base types, type systems for extensible records and qualified types have got me used to the idea that something like x=>x.name tells me that x should have a property name. On the other hand, in JS, that just gives another undefined otherwise (or runtime type error if it happens to be x=>x.name()). On the gripping hand, TypeScript tries to do better than JS, and this looks like an obvious improvement.

The wish to stay away from full union types is easier to understand - I'm just afraid that half implementing them makes for an unbalanced and inconsistent type system (eg, you can't express the type of x within o's body, so intellisense won't work there, etc.).

Anyway, those are your decisions.

That leaves the bug: TS should not claim type string for f(true) when the type is boolean.
And the inconsistency: why select the first overload in the simple case, but any=>any in the complex case?
And the loss of subject reduction: function call unfolding can make type-correct code type-incorrect.

I don't know what is more worrying: the number of dark corners and asymmetries, even at this early stage in TS' type system evolution, or the idea that this is "by design".

But I find these close/open wars too tiresome to continue.