Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporating GATlab more deeply #949

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

Incorporating GATlab more deeply #949

wants to merge 9 commits into from

Conversation

kris-brown
Copy link
Contributor

@kris-brown kris-brown commented Oct 24, 2024

Summary

This is a big refactor - right now just a draft. The goal is to rely less on type dispatch and use explicit models from GATlab. This will ultimately address a number of persistent problems we've seen in Catlab over the years, including issues with empty lists, ACSets vs VarACSets, DenseACSets vs SparseACSets, C-Set vs C-Par. It will still be convenient at times to use a Julia abstract_type to handle informal interfaces, but for things like different implementations of FinSets, different implementations of FinCats, etc. we will use a data structure with a impl field and a model field which controls how the implementation satisfies the desired interface.

The goal is for only minor breaking changes, keeping the user-experience as unchanged as possible despite big changes under the hood.

It is a bit difficult to track the changes because this is also an appropriate time to restructure the files. The new style encourages a separation between a file that declares an interface (e.g. ThFinCat, a struct FinCat for models of ThFinSet, some common methods that apply to FinCats) vs a folder of individual files for various implementations of FinCats. Furthermore, CategoricalAlgebra is now split into further subfolders, BasicSets (sets, functions ... no category theory), Cats (e.g. (Fin)Category,(Fin)Functor, generic (co)limit infrastructure), SetCats (categories of + (co)limits of sets ... this depends on the previous two folders), and lastly CSetCats for C-Set-like categories, hom search, the chase, etc.

Common design decisions

  • How much information should be at the type level? For example, FinSet used to have a type parameter for the Julia type of the elements of that FinSet (thus both FinSetInt and TypeSet{Int} were both subtypes of FinSet{Int}. Likewise for categories having the Julia types of the ob/homs. However, we don't really ever want to dispatch on this. Implementations of theories need type parameters because instances require assigning Julia types to the type constructors of a theory. For example,
struct FinDomFunctorMap{DomOb,CodOb,DomHom,CodHom,DomGen, ObMap, HomMap} 
  <: FinDomFunctorImpl{DomOb,CodOb,DomHom,CodHom,DomGen}

However, a FinDomFunctor (which might wrap a FinDomFunctorMap) has no type parameters. One can call obtype(dom(F)) etc. to recover these at runtime.

  • Suppose we have a struct for FinCat (wraps models of ThFinCat). We can either have FinCat <: CatImpl (a FinCat is an implementation of ThCategory) or have struct FinCatAsCat <: CatImpl which wraps a FinCat and says it is to be interpreted as a Category. If a struct is really canonically thought of as a model for a particular theory, it makes sense to do the former. Of course, the same data could be models for many theories. But only one can be a 'default' theory for which it is a model.
    • There's a tension here with the above point, though. Above we made a distinction between "wrapper structs around implementations" (which tend to not have type parameters) and "implementation structs" (which do have type parameters). The present point is blurring the distinction between the two. This is something to worry about.

GATlab

Using GATlab in a very serious way has revealed a lot of improvements to make to it. Some of this is in the GATlabUpstream.jl file. Some needed fixes were deeper, so this PR will only work with the following branch checked out: AlgebraicJulia/GATlab.jl#166

There is a very common design pattern of needing a datatype for models of a particular theory, where one can call the theory operations on the wrapper struct which simply calls the theory method on the implementation. This boilerplate is repeated a lot through this PR - it can probably be straightforwardly abstracted in a future GATlab PR, but that is not a priority.

There is a need for integrating Julia types with theories in the same way ACSets integrate Julia types with schemas. For example,

@theory ThSet′ begin
  Bool′::TYPE
  Any′::TYPE
  in′(e::Any′)::Bool′
  eltype′()::Any′
end

Morally, we want all models of this theory to assign Julia's Bool to the type constructor Bool' and Any to Any'. We enforce this by declaring abstract type SetImpl <: Model{Tuple{Bool, Any}} end . Now we get an error if we try to implement ThSet′ for a subtype of SetImpl that does not respect this.

Changes

It's hard to keep this in sync with actual progress

  • A Set's element type should not be part of the FinSet (or SetOb) Julia type.
  • A TypeCat is a "Pair of Julia types regarded as a category. The Julia types should form an @instance of the theory of categories". This means TypeCat is our name for the wrapper struct of an arbitrary model of ThCategory.
  • There is an extension of ThCategory which includes nullary ob_set and hom_set operations which produce AbsSets. We want to be able to distinguish a category whose objects are FinSetInt and TypeSet{Int}, so using Julia type parameters is not very helpful. Thus the main Category struct is a wrapper around ThCategoryExplicitSets rather than merely ThCategory.
  • Remove is_hom_equal (not part of the ThCategory, doesn't get used anywhere in Catlab)
  • ob(::Cat{Ob,Hom}, x) and hom coercions cannot be handled by dispatch if we are now allowing for multiple different categories with the same Ob and Hom.
  • There are a lot of subtleties about how FinCats actually work - the relationship between homs and hom generators (which are often conflated by hom_map in Catlab presently). This is now more explicit though the API around FinCats is slightly different to avoid ambiguous situations (such as when homs and hom generators are the same Julia type).
  • Rather than having zillions of ACSetTransformation types, there will be one struct which contains a NamedTuple and dom + codom ACSet. For validation, all we expect is for there to be some SetFunction for each component. There is a lot more validation that can be for a given category, such as CSetCat (which makes sure dom/codom have no attributes), CParCat, VarACSetCat, etc.). We can still do some validation if the user passes in raw vectors of data (which can be interpreted as SetFunctions in some default ways), but in general the data structure won't inherently make any guarantees about what the SetFunctions should be without the context of a particular explicit Category struct.

Progress

Done with BasicSets, Cats (though I've temporarily skipped Natural Transformations), SetCats, and some partial progress through CSetCats.

@kris-brown kris-brown self-assigned this Oct 24, 2024
@jpfairbanks
Copy link
Member

@kris-brown what is the timeline around this? I could see how keeping this up to date with any Catlab changes is going to be a nightmare.

@kris-brown
Copy link
Contributor Author

kris-brown commented Oct 25, 2024

I'm working full speed on it now, I expect to be ready for review by end of next week (really depends on how many dependent packages on Catlab use implementation details that I wasn't expecting though)

@jpfairbanks
Copy link
Member

Ok, let's block the devs meeting next week to talk about this and you can have some kind of document that describes the changes that we can go through together. Rather than just using traditional code review. Because this is basically doing a systematic change across the whole codebase

@kris-brown
Copy link
Contributor Author

kris-brown commented Nov 20, 2024

Comments from devs meeting:

  • BasicSets could be a top-level folder (parallel to CategoricalAlgebra)
  • We might be able to remove type parameters even from implementation structs (to make @instance work we'll need to put function calls in the signatures)
    -hom_set isn't parameterized by dom/codom, could we change the interface? Or make a indexed impl for ThSet?

@kris-brown
Copy link
Contributor Author

kris-brown commented Dec 1, 2024

An update before I go back and refactor an earlier part of this PR.

  • I've gotten to C-Sets. Trying to make it work has made me realize I didn't go far enough in GATlabification earlier. For context, for various datatypes in Catlab, there've been three options for what to do: 1.) Make an explicit GAT for the type's interface. Allow there to be many difficult implementations of it and only interact with the type through its interface. For example, ThFinSet (FinSetInt , TabularSet, FinSetHash ) and ThCategory (OpCategory , TypeCat ). 2.) Make an abstract type which has an informal interface. Managed via dispatch. For example: DiagramImpl (DiscreteDiagram , Span , BipartiteFreeDiagram ) and ColimitImpl (ColimitCocone , CompositePushout , ACSetColimit ). 3.) Just have a struct (e.g. Span ).
  • The problem I've encountered relates to computing (co)limits of ACSets. The problem arises from the fact that dispatch is used to address the fact that limits and colimits are handled in a category-specific way.
  • If I have limit(d::DiscreteDiagram, c::Category) = ... , then the value of c determines the implementation of what I'm supposed to do. However, there is no way to do this with just the official interface of a category. So, we illegally use the fact that every category has some implementation and write this method as limit(d, c.impl) . Once we do that, it's up to us to define limit for this pair (of DiagramImpl and ThCategory impl), otherwise face a MethodNotFound error. For example, struct SetC end is a particular implementation of ThCategory . There are many limit methods where the second parameter is ::SetC .
  • The interface of an ACSetCategory is: 1.) the interface of a profunctors (i.e. two copies of ThCategory + heteromorphisms) and 2.) a means of extracting ob/hom/attr/attrtypes from an ACSet. This is meant to be very flexible because we might want to work flexibly with, e.g., C-Set, C-Par, ACSet (loose or tight), VarACSet, or VarACSet with a certain unusual notion of 'product'.
  • For example, the vanilla ACSet category has for the domain category FinSets for Ob and FinFunctions for Hom . For heteromorphisms, SetFunctions for Attr. In the discrete codomain category, Julia types for AttrTypes . In Implementing ThACSetCategory , we might say compose(f::Hom,g::hom) = compose[SetC()](f,g) to reuse the code we already wrote for how to compose morphisms in Set.
  • If we want to compute limits pointwise, we loop over Ob and AttrType and compute limits in either the domain or codomain category. How is one supposed to get the domain category from a ThACSetCategory implementation? Given that we've provided composition / id methods as part of providing that interface, we can make a struct DomACSetCategory which wraps a ACSetCategory and show how we can implement ThCategory (this is also the sort of thing we could do via a pullback along a theory monomorphism). But, how do we know that this resulting category IS Set, the category which we defined limits and colimits for above? Julia dispatch can't figure out how to use the limit methods defined for SetC. The fact that the domain category is Set is due to what the code of the implementation of ThACSetCategory is, which isn't transparent to Julia, especially at the type level.
  • So what is the solution? A bandaid fix would be to change ThACSetCategory to just have a dom_cat()::Cat and cod_cat()::Cat method, so that one can, e.g., plug in SetC into the definition of CSetCat .
  • But to address the root of the problem we should never dispatch on an implementation. Rather we need, e.g. ThCategoryLimits. This is the sort of thing I wanted to wait on doing until we had sugar for "structs" in GATlab officially supported. But it's hopefully not as bad as I'd thought if we use the trick that has been working out well where we have some GAT types be forced to be certain Julia types.

@kris-brown
Copy link
Contributor Author

kris-brown commented Dec 20, 2024

Consider this GAT which is meant to abstract the common ways DiscreteDiagram, ParallelHoms, ComposableHoms, (Multi)(Co)Spans, BipartiteFreeDiagram and FreeDiagram are all used.

"""
This is simpler than a `FinDomFunctor` because the domain is not a FinCat but
rather a finite Graph. We also don't worry about the category structure of the
codomain (e.g. homs need not specify dom/cod, no composition structure). 
To get a real diagram out of this, one must supply a Category with the
appropriate Ob/Hom types which are supertypes of the Ob/Hom types of the 
FreeDiagram.
"""
@theory ThFreeDiagram begin 
  V::TYPE; E::TYPE; Ob::TYPE; Hom::TYPE; FinSet::TYPE;
  src(h::E)::V;
  tgt(h::E)::V;
  obset()::FinSet
  homset()::FinSet
  obmap(v::V)::Ob
  hommap(e::E)::Hom
end

For example, a Span{Ob,Hom}(f,g) will always have V -> FinSet(3), E -> FinSet(2) with src(::Int)=1 and tgt(n)=n+1. And hommap(1) = f and hommap(2)=g.

By not presupposing the category structure of the codomain of this diagram, this means we don't have to carry around a bunch of methods for how to do identities/composition -- this is good; the point of a FreeDiagram is that we can really just treat it like a piece of concrete data. We can simply say "here is a pair f+g I want to treat as a span" and defer anything that depends on what category we take f and g to live in until later, such as when we compute the pushout.

However, there is one wrinkle in this, which is that the dom/codom structure is also something that depends on what category we interpret f and g as living in (in this new refactor, that is no longer determined by what type f and g are). The problem is that, although hommap in the example above was determined by f and g, it's not the case that obmap is determined by f and g. The way Catlab currently handles this is effectively implementing obmap(3) with codom(hommap(2)) and so on. But here it's clear that codom is relying on type-dispatch.

An obvious situation where this is a problem is if we want to have a span in a category of 'bare' morphisms (where the morphisms don't carry full info about their domain and codomain). Or, in the case that brought this issue to my attention, the category SkelFinSet has FinSetInt as its objects and FinFunctions as its morphisms.

Note

struct FinSetInt i::Int end is a particular struct which implements ThFinSet, whereas a FinSet is a wrapper for any stuct which implements ThFinSet.

So the domain of a FinFunction is a FinSet which is not a FinSetInt (neither a super nor subtype). There's no way for a span made up of two FinFunctions to know how to get the apex which needs to be a FinSetInt. It would be tedious to make data structure that behaves exactly like FinFunction except that expects FinSetInts for its dom and codom rather than FinSets. It's much easier to write SkelFinSet code where we expect that ThFinSet.dom(f) == FinSet(ThSkelFinSet.dom(f)) (i.e. we have to wrap Ob values with a FinSet(-) before we use them as the dom/codom of a Hom).

It seems the easiest way around this issue would be to just add the extra data to the various structs, e.g.:

@struct_hash_equal struct Multispan{Ob,Hom,Legs<:AbstractVector{Hom}}
  apex::Ob
  legs::AbstractVector{Hom}
  feet::AbstractVector{Ob} # NEW: we add this data, which will sometimes be redundant
end

With constructors that fill in this extra data using dom/codom type dispatch if its not explicitly provided, thus making this a non-breaking change.

Major overhaul of Limit/Colimit interface to not use dispatch. Overhaul of the ACSet category approach to not rely on dispatch.

Actual models of theories of cats with limits/colimits just implemented for SkelFinSet and PointwiseCats.

Modified some FreeDiagram models so that they no longer implicitly rely on type dispatch for some ThCategory structure (i.e. getting the feet of a cospan via dispatch on `dom`).

Removed `compose(f,g,h,fs...)` method (and the like) in Theories: unnecessary type piracy which complicated other things

Convenience `Tagged` method for working with the eltypes of SumSets. And `untag` method for projecting out of SumSets.

Using GATlab branch which has removed @import

Improved Matrices.jl to use new GATlab explicit models which addresses some old problems with it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants