This is one of those questions that has no right answer, because it depends on how you define the concept. So one way to look at it is that your instructor is right, by definition.
You answer closely approximates the most rigorous definition of the concept, employed by taxonomic structuralists (who, b.t.w. fell into disfavor in the 60's). That is, they viewed a phoneme as being the union of a set of phones (in saying specifically "of a single phoneme", we call them "allophones"). They viewed phonemes, morphophonemes, morphemes and so on as concepts for organizing the facts. There are a number of principles at play in deciding how to organize phones into phonemes. The basic idea is that two phones cannot be subsumed under the same phoneme unless the contexts where they appear is complementary. Given the words [pʰɪt] and [bɪt], since these are distinct words, we know that [pʰ] and [b] cannot be members of a single phoneme (they are members of different phonemes). Given words like [pʰɪt] and [spɪt], it is possible that [pʰ] and [p] are members of the same phoneme (allophones of a single phoneme).
Under that view, or honestly any view, "changes meaning" is not the right way of looking at the pit/bit difference. It's not that changing p to b changes meaning, since then you'd expect there to be some change in meaning if you replace p with b in play. Instead, you don't get a word at all. Different words are made up of strings of phonemes, so by selecting a different phoneme, it is possible that you will identify a different word. And different words tend to have different meanings.
There are other views of the popular relationship "phoneme", some of which depend on having the "phoneme" and the "allophone" be made of the same "stuff", typically some kind of definition in terms of phonetic features. This view is wide-spread amongst practicioners of generative phonology (not exclusive to them, though). Thus the "phoneme" /p/ would be a voiceless unaspirated bilabial stop, and its two main allophones are a voiceless unaspirated bilabial stop [p] and a voiceless aspirated bilabial stop [pʰ] -- note the the phoneme and one of the allophones are the same thing. In that view, the aspirated allophone is the result of applying a rule.
Now to focus on this "idea of a sound" matter. In the structuralist view, allophones are not physical sounds, indeed they had no way of dealing with physical sound, other than to listen and transcribe, assuming some reference sounds. Structuralist phones are abstractions, although they are not "ideas" since they were behaviorists. They refer to classes of behavior – in fact, the behavior of the linguist who is exposed to language stimulus and responds with transcriptions. Though they tended to think of themselves as outside the equation, thus ideally they are mindless automata that transcribe the language behavior of speakers. In the generative view, segments (phones / allophones) are not physical sounds, they are mental representations of sounds. (Specifically, they are the conjunction of a set of features, which are the "intent" to articulate in a particular way). This view is a consequence of the rationalist philosophy adopted by generative grammar: we are supposedly modeling mental states that cause speaker behavior, not modeling the behavior itself.
As I hope to have shown you (superficially and incompletely), the ontology of "phoneme" is very complex (historically) and generally is not at all well handled at the introductory level. Generally, we are happy if students can grind out marginally correct answers to questions like "are p and b allophones in English".