Introduction
Data can be viewed as a collection of associated items.
These associations are interpreted as meaningful relationships.
For accessing and processing these relationships
they need to be represented on a computer.
The data relationships are machineindependent.
Their computer representations are not.
In 1965
ARPA initiated a research project
to explore the existence of a mathematical foundation
for defining and manipulating
data representations on computers.
ARPA considered current data structure development
too
application and platform dependent,
thus be unable to
support optimal function and performance
capabilities
of future systems.
It was ARPA's contention that
machine representations of data
should be transparent to
all applications no matter the platform,
i.e.
machineindependent.
Data relationships and mathematical sets
have something in common,
they are both devoid of any physical dependencies.
Set operations already exist for processing relationships
mathematically defined as relations.
Set operations are defined in terms of a set's
mathematical identity,
not in terms of a set's physical representation.
Set theory also supports a wealth of operations
for processing sets by their mathematical identities.
If data relationships had mathematical identities,
then set operations could be used to
access and process data.
This observation was used in 1965 to initiate
research on the feasibility of developing a
mathematically welldefined
settheoretic data structure.
STDS was later used as the storage management system to support
an Information Management System developed at the University of
Michigan and commercially available from 1971
through 1998.(see
STDSIMS)
Set processing in a network environment was implemented in 1975.
(see
Hardgrave)
Data as a Mathematical Object
Data is essentially a collection of
meaningful relationships of interest.
Exploiting this interest
with the aid of a computer
requires representing data relationships
in a machinefriendly format.
Thus, by definition,
data representations are machinedependent.
A mathematical foundation
for defining and manipulating data relationships
requires that data relationships have mathematical identities.
Since any actual manipulation is done by a computer,
the machine representations of data relationships
must also have mathematical identities.
Binary relationships already have an
identity as binary relations defined by set theory.
The wealth of set operations already available
for welldefined relations
made set theory an attractive candidate as a
mathematical foundation for developing
future data processing systems.
Unfortunately, providing data relationships
with a unique mathematical identity was not
feasible with any then available set theory.
Records as nTuples
No matter what data model is chosen for a specific
data processing application, the data has to be
physically structured in order to be accessed.
To support transparency
for all user data models, machineindependence
was required at the I/O level.
At the I/O level all data structures are defined in terms of
records, fields, and files.
The structural similarity between
computer records
and
settheoretic ntuples
is difficult to ignore.
To model data structures at the I/O level
STDS was developed in terms of
tuples, domains, and sets.
Since records are a basic representation of data
and ntuples are mathematical objects for representing
item relationships,
formally identifying the two
seemed to meet ARPA's research objective
for a mathematical foundation for
supporting data structures.
And in principle it did, but in practice it did not.
The fundamental problem was that manipulating records
relied on programming item access to the contents of records.
Even though ntuples have a settheoretic identity
it is declared instead of being defined by setmembership.
Specifically, the tuple < a, b, c > does not contain a, b, or c
as an item. While the record [ c, b, a ] contains all three.
Though the set { a, b, c } contains exactly
the three items, so does { c, b, a }.
Not surprising
since it is exactly the same set.
But < a, b, c > and < c, b, a >
are not the same set.
Though they belong to the same equivalence class,
they are inverses of each other.
This is not easily verified
using classical set notation.
Determining that two records are equal
is simple.
If for all i, the ith items
of record r and record s
are the same,
the records are equal.
Determining if two ntuples are equal
is not so simple.
The ith item of an ntuple
is not welldefined in set theory.
Without a formal mechanism to determine
if two ntuples are equal,
representing records by ntuples is not of much value .
Record Reordering
In set theory,
operations on sets are defined in
terms of set membership.
To define operations on ntuples that
mimic record operations,
ntuples need a welldefined membership condition.
Assume that ntuples have a
suitably defined membership condition,
then define
ρ_{i}
( < x_{1}, x_{2}, .. , x_{n} > ) = x _{i}.
^{}
Item Value:
ρ_{i}
( < x_{1}, x_{2}, .. , x_{n} > ) = x_{i}
^{}
Given the ith item value operation
defied by
ρ_{i}
( r ) = r_{i}.
determining if two records, possibly on separate platforms,
are equal can now be defined set theoretically.
^{}
Theorem: Record r equals record s iff
(∀i)
( ρ_{i}
( r )
=
ρ_{i}
( s ) ).
^{}
Being able to determine whether two records on separate platforms
are equal is necessary, but nowhere near sufficient.
Field ordering does not affect the represented relationship,
but it does impact system data accessing and performance
dictated by resulting data structures.
Since a record of cardinality n, n,
provides n! different representations for the
same relationship.
For relationships to be shared across platforms
some mechanism is needed to detect relationship equivalence.
A remapping operation can be defined that allows
any tuple in an equivalence class
to be remapped to any other tuple in the same
equivalence class.
^{}
Remap:
Given
σ with
σ = k
and record r,
let
ρ_{σ}(r)
=
<
ρ_{σ1}(r),
ρ_{σ2}(r),..,
ρ_{σk}(r)
>.
^{}
Example:
ρ_{< 3, 2, 4, 1 > }(< a, b, c, d >)
= < c, b, d, a >
^{}
Since an equivalence class of n degree records
has n! different representations,
and since mapping directive σ
has exactly n!, all possible mappings are covered.
This provides platform independence.
Actually a little too much independence since
platforms have no way to recognize relationship equivalence.
Some commonality needs to be shared between platforms.
By sharing appropriate mapping directive,
σs,
relevant relationships can be easily shared across platforms.
Relevant Reordering
Fortunately, records come equipped with item Fields.
Field designations are required to be unique,
even if they represent the same set of item values.
Field orderings
determine item order and
provide a basis
for equating relationship equivalence between files
of the same equivalence class.
By extension
physically defined data Fields equate to
settheoretically welldefined Domains.
Just as Field ordering determined record ordering,
Domain ordering and selection
dictate ntuple orderings.
Since not all applications require
access to all relationship Domains,
I/O and processing performance can be dramatically improved by
transferring only those Domains actually
required by an application.
Remap provides a realtime ability
to provide a record reduction and reordering
during data access.
This Domain selection capability
was included
in the 1968 version of STDS.(see
AFIP)
In 2005 Information Builders Inc,
compared the performance of an
STDS supported Information Access Accelerator
with two of the current leading
DBMSs.(see IAA)
The realtime application of settheoretically
defined Domain selection gave typical
performances improvements between 40 to 90 times
that of commercial DBMSs.
^{}
The XST architecture preserves membership across environments while making it possible for
the data representation to vary from one environment to the next. Thus, performance at the
process level can remain independent of how relationships are expressed at the conceptual
level and how data is represented at the storage level.
This is the case for the XST approach in a nutshell:
(1) Set membership determines functionality.
(2) Data representation at the storage level determines performance.
(3) Strong data independence separates functionality concerns from performance
concerns(IAA13)
Since extended set operations are
application independent,
program language independent,
platform independent, and
storage organization independent
the only performance dependencies are the program blueprint and
the programmer's
imagination.(
IAA12,
IAA14)
Though the industry has long believed that
index structures
are essential for
best performance, the exact opposite is
true.
Research has shown that over 90% of an application process can be indexedaccess
overhead.(SIGMOD)
By eliminating index structures
in favor of direct data access using extended set operations
performance can be improved by orders of
magnitude.(IAA30)
For more extensive performance results of two industry
benchmark studies
see IAA
report pages 2339.
RELATIONS (set theoretic)
In set theorey, relations are
welldefined as subsets
of the Cartesian product.
The Cartesian product is
welldefined
in terms of ordered pairs.
The ordered pair is not welldefined.
It is declaired into existence
by an individual.
Not just one individual, but
by several. Each with a different agenda.
None of which allow the
Cartesian product
to be associative.
Under the axioms of extended set theory,
a definition of ordered pair
was welldefined that allowed the
Cartesian product to be associative.
This in turn allowed nary relations to
be welldefined subsets of a
welldefined
extended Cartesian product.
XSP: Smoke & Mirrors?
To some, extended set processing, XSP,
seems to be more "smoke & mirrors"
than
"sets & mathematics".
There is no such thing as an "extended set".
This is not a totally
unjustified view.
Without a deep understanding of
set membership condition
and the difficulties imposed by the
Kuratowski definition of an ordered pair,
it would not be obvious what role an
extended set membership condition
could have on
relations defined as subsets of the
Cartesian Product.(see Berztiss)
XML, Categories, λCalculus
The 1968 implementation of STDS only supported access to
homogeneous data arrays.
This was far short of ARPA's vision of a formal
foundation for accessing any and all possible
computer representations of data.
By 1998 an implementation of STDS
was equipped to recognize and process
XML structures as mathematical objects.
It
demonstrated XSP ability to establish
content equivalence between
RDM tablestructures and
XML computer
representations.
(see
[T&P])
A mathematical foundation for developing
executable systems is of little value
without capable tools for exploiting its potential.
Category theory and the Lambda calculus
provide just such tools.
Both have known difficulty
associating with Classical set theory.
Extended set theory proved more
accommodating.
(see
Blass,
Kats)
Summary
Mathematically defined data structures
access data by its content.
Physically defined data structures
access data by its location.
Mathematically defined data structures
are intrinsically stable.
Physically defined data structures
are intrinsically fragile.
Mathematically defined data structures
are global.
Physically defined data structures
are local.
Mathematically defined data structures
are machineindependent.
Physically defined data structures
are machinedependent.
Without a unifying mathematical foundation
data accessing is a disjoint universe
of physically intractable data structures.
With a unifying mathematical foundation
data accessing is a unified universe
of mathematically synergistic data structures.
References

[Skolem]
Two Remarks on Set Theory,
Skolem, T.: 2. The ordered ntuples as sets, MATH. SCAND, 5 1957,
"It is still a problem how the ordered ntuple can be defined in the most suitable way."

[Suppes]
Axiomatic Set Theory,
Suppes, P.:Van Nostrand, 1960,
"All relations are binary."[p. 57]

[AFIP]
Description of a Settheoretic Data Structure:  1968

[ARPA]
CONCOMP: Research in Conversational use of Computers: final report, December 1970

[Berztiss]
DATA STRUCTURES: Theory and Practice, Berztiss, A.T., Academic Press, New York, 1971,
"We have to develop a completely new theory of ordered objects."[p. 27]

[STDSIMS]
UoM Information Management System,
STDS supported, 19711998

[Hardgrave75]
Set Processing in a Network Environment
Extended set theory is an effective method for describing
the complex data structures needed to support largescale data base
applications.
[Hardgrave, W.T. 1975]

[Hardgrave76]
A technique for implementing a set processor,
Hardgrave, W. T. 1976

[VLDB77]
1977 VLDB Extended Set Theory:
A General Model For Very Large, Distributed, Backend Information Systems.

[LLL]
Set Theoretic Data Structures (STDS), Lawrence Livermore Lab.
Technical Report, 1977.

[NATO]
A Mathematical Foundation for Systems Development  NATOASI Series, Vol. F24, 1985

[VLDB84]
1984 VLDB Panel: Inexpensive Large Capacity Storage Will Revolutionize
Future Systems.

[Champion]
XSP: An Integration Technology for Systems Development, Champion, M.  2001

[SIGMOD]
OLTP Through the Looking Glass, and What We Found There
Harizopoulos,~S.; Madden,~S.; Abadi,~D.; Stonebraker,~M.:
SIGMOD'08, 2008
♦
Substantial time is spent in logging, latching, locking, Btree, and buffer management operations.
Over 90% of an application process is indexedaccess overhead.

[IAA]
Information Access Accelerator
iXSP Performance evaluation, IBI, Stout, R.  2005

[MSLab]
Managing Data Mathematically:
Data As A Mathematical Object:
♦
"Using Extended Set Theory for High Performance Database Management"
Presentation given at Microsoft Research Labs. with an introduction by Phil Bernstein.
(video: duration 1:10:52)  2006

[North]
Sets, Data Models and Data Independence
Ken North a Dr. Dobbs''s Blogger, March 10, 2010

[SETS?]
Why Not Sets? All computer processing is settheoretic
in nature.  2013

[C&C]
ContentContainer Data Access Strategies Content for Functionality  Containers for Performance  2016
♦
Since data represented for processing is not always ideal
for preservation, and since data represented for
preservation is not always ideal for processing,
accessing applicationfriendly data
from storagefriendly data
poses a genuine challenge.

[DDRvsPDS]
Set Processing vs. Record Processing Performance:
Dynamic Data Restructuring vs. Prestructured Data Storage  2016
♦
Record processing systems and set processing systems are very different.
This paper attempts to clarify the major differences and demonstrate
the performance advantages of set processing.

[XSP]
XSP: Extended Set Processing:
Mathematically Managing Data Representations  2016
♦
This paper presents a high level overview of why a mathematical model of
data representations is necessary, and how an extended set processing
model accomplishes the task.
«
Copyright © 2023
INTEGRATED INFORMATION SYSTEMS
« Version: 01/15/2023 v03»
EXTENDED SET THEORY: Historical Lineage
Functions, Relations, Processes, & Categories
Extended set theory, XST,
complements classical set theory, CST,
in the following ways:
arbitrarily large sets,
fixed rank for ntuples of constant rank items,
associative Cartesian product,
selfapplication,
and
support of category theory.
References

[2T]
Two Theoremsf:
CST vs XST Functions  2019

[PFAC]
Processes, Functions, Applications & Composition
♦ Lambda application & category composition of processes.  2018

[CSTasXST]
XST Definitions, Operations, & Properties Tuples, Graphs, Functions 
♦ CST operations as XST operations.  2018

[Kats]
Kategories & Morfisims:  2017

[FUN]
Functions As Set Behavior Essential Concepts: Conceptual &
Formal Modeling Foundations.  2016

[Ellis]
Extended Set Theory: A Summary A clarification of extended set notation.  2015

[Blass]
Axioms and Models for an Extended Set Theory
Blass, A., Childs, D L  2011
♦
This paper presents the formal foundation for "extending" the axioms
of ZFC, which only addresses the property of membership, to include
a "scope" property for addressing the property of "structure".

[T&P]
XSP TECHNOLOGY: Theory & Practice Formal Modeling &
Practical Implementation of XML & RDM Systems  2007
♦
The RDM works in spite of set theory, not because of it.
XMLStructures and operations on these structures are settheoretically
sound under XST.
In this paper the formal foundations of RDM and XML systems are examined
in the light of XST to provide practical relevance of XSP Technology to
the field of software systems development.

[XAXIOMS]
Summary of Axioms for Extended Set Theory:  2005

[XSPXML]
XSP for XML Systems Design & Development:  2000

[XST]
Extended Axiomatic Set Theory:
A Focus on Functions (early draft)  1999

[IFIP]
Feasibility of a Settheoretic Data Structure:
A Reconstituted Definition of Relation,  1968
