The Special Composition Question is something computers have always struggled with, even highly advanced ones.
If you don't know what that is, it's not terribly complicated--at least, the question isn't. What makes a set of other objects into a different object altogether? For example, if you put a slab of wood on top of four wooden legs, that's a table, isn't it? What if it has 3 legs? Is it still a table? Not many tables would have two legs though it's not impossible, depending on the shapes of the legs. Of course, there are tables with one "leg," as such, where the table stands on a flared post or is simply mounted on a pole that is secured to the ground. What if a surface is extended out from a wall with nothing under it at all? Is that still a table?
Suddenly, the question of what is and isn't a table becomes a lot more complicated. Still, it is the surface that makes the table. That is its irreducible quality. A table is, above all else, a flat surface upon which other objects can be placed. But other things have that same quality: shelves, countertops, floors, roofs. Although floors and roofs are easily distinguished due to their relationship with other objects (floors, foundations, etc.) the difference between a table, a countertop, and a shelf may not always be so obvious.
How does a computer solve this problem? Traditional machine learning methods involve showing a computer many, many instances of many, many objects, each one labeled with its name and possibly other characteristics. A well-trained machine learning model might always be able to distinguish that something is a table and not a turtle, but it may not be completely certain that a table is not a shelf.
For computers to make decisions, the ability to distinguish objects is crucial. Let's look at another one.
What makes a pile of objects? To complicate it further, what constitutes a "pile of shoes"? For one thing, designating a "pile of shoes" tells you nothing about whether the pile contains no complete pairs, only complete pairs, or just some complete pairs. "A pile of pairs of shoes" is a clumsy turn of phrase. The quality of being a pile is clearly more important than any particular members of the pile.
But what makes it a pile in the first place? Shoes arranged neatly in a row are not a pile. Shoes carefully stacked into a tower? Not a pile. If you spend an hour constructing a pyramid out of shoes, a machine learning system might say it's a pile, but most people wouldn't. It would obviously be a pyramid of shoes. But then pyramids of shoes aren't exactly everyday encounters.
How many shoes makes it a pile? Two wouldn't do it, even if they are from different pairs. That's just a mismatched pair of shoes. Three shoes, if arranged haphazardly, might be a pile, but most people would probably argue it doesn't contain enough shoes to be a proper pile. It's just three shoes. With four shoes, you have one or two pairs, or some quantity of shoes that don't go with each other. It's still four shoes. What we're seeing is that the ability to quickly enumerate the members is an exclusionary criterion from the status of "pile." A pile is therefore a quantity of objects that aren't easily, instantly countable.
This is a nonsensical proposition for a computer, which of course could count thousands of shoes in no more than a few seconds with a properly-trained ML model. So now we have a new wrinkle: the way humans define objects, or at least composed objects, rests at least partially on human limitations. If we could all easily identify a pile of exactly 50 shoes as containing 50 shoes, we would no doubt prefer the more precise "50 shoes" over "a pile of shoes."
The intelligence of the computer, then, is rather paradoxical: it can identify objects based on what it has been taught to recognize, and it can count them far more efficiently than any human could, but if you present it with a slightly ambiguous representation of a composed object, it may very well fail to identify it or identify it as something else entirely.
What does any of this have to do with anything??
I'm getting to it! Let's take a very complex object: a car. It's made up of thousands of individual parts of many different sizes. How many pieces can you remove before it's no longer a car? This is perhaps an unfair question. You could remove everything but the empty shell of the body, and most people would still call it a car, or the remains of one. There would be no question that it was once a car. What if you were to start building a car from scratch? There is a point where it would be recognizable as a car before the body is affixed to the chassis. Would a chassis with wheels be identified as a car? What if seats and an engine are added? When does it move from "parts that are known to belong to a car" to "this is actually a car"?
Computers struggle mightily with this sort of ambiguity. ML models designed to identify objects, especially composed objects, fall down when faced with only partially composed objects. A human, on the other hand, can intuit from the objects at hand whether they are dealing with a car under construction, a car in fact, or the remnants of what was once a car. Likewise, we can easily point out a car that is damaged or is missing particular pieces. We may argue over the exact details, but the essential carness is not in dispute.
It might occur to you that this is similar to the Ship of Theseus thought experiment, but the question of composability isn't concerned with which instance of an object we're dealing with. A ship is a ship; it doesn't matter if it's the original ship or if 90% of it has been replaced. The question is what makes a ship in the first place.
Computer software has ways of attempting to resolve these questions. This is the entire basis of object-oriented programming, though the objects in question need not be analogues of real objects. Nevertheless, an object-oriented programming language does offer facilities for defining objects composed of other objects. How do such languages handle ambiguity? Not very well, as it turns out: a programmatic object may inherit its characteristics from one or more kinds of objects, and those characteristics may be simple values (numbers, text) or other objects altogether. The ability to inherit is an "is-a" relationship, while composability is a "has-a" relationship. That is to say, a "ship" and a "car" would both inherit the traits of a generic "vehicle" type, which is to say they are vehicles. They would have in common all characteristics that are common to vehicles, though in comparing a ship and a car you might find that they aren't terribly similar apart from the fact that they move, meaning they have a form of locomotion, they transport people and possibly objects (and these imply capacities in terms of volume and/or weight), and they have a guidance mechanism.
From here, things are more or less intuitive. A person would be another kind of object, and so a car or ship could "have" a person inside it, slotted into variables representing passenger capacity. Implementing this robustly is not trivial: if you must account for counts of passengers and objects and their weight, code must analyze each time a person or object "leaves" the vehicle, to determine how its capacity has been affected and to prevent going over. Of course, it might be an intended behavior to allow exceeding capacity, and to program the outcome of that circumstance.
Here we see that a computer system possesses roughly toddler-level intelligence: it must be taught what things are, what other things they are like, what things they might contain and what things might contain them, and so on. Any ambiguous circumstances must be analyzed by custom code to determine what the proper response of the program should be. Variables and sub-objects in our composed object are either optional or mandatory, barring--once again--custom code to reduce fuzzy interpretations down to raw numbers.
What we end up with is a very basic problem in communication, particularly with species who may not have the same referents as humans. At least as human beings, we all have an essentially common understanding of our world, even if we express it through different tongues. Most cultures have the notion of a table, so if you each point to a table and say your word for it, you now know each other's term for "table."
Now, it's possible one of you does not know what a table is, or misunderstands that you are being asked to identify a table, and instead you say its color or what it is made of. This sort of confusion is reduced through the process of sharing more and more words and correcting mistakes as you find them.
But what about alien species who do not know what a table is, have never seen a table, have never had any need for a surface upon which to place other objects? If you were to point to a table and say "table," assuming the alien could even hear what you said--maybe they aren't sensitive to sonic vibrations the way humans are, or at all--they might imagine you are commenting on its angular nature, or counting the legs, or offering a relational term to distinguish it from the floor and ceiling. Or perhaps they do not have tables as such, but they do have religious altars, and believe that "table" is a word in a human language for "sacrificial altar." And then they assume you are offering to have yourself sacrificed upon it, because why else would you have pointed so enthusiastically at it? The next thing you know, there's been an interstellar incident, your family is finding out how long it will take for your (consecrated, at least) ashes to be returned to them, and a few dozen diplomats are trying to smooth things over without anyone else getting chopped up and bleeding all over your brand-new Montez.