This is meant as a supplementary answer to my own question.
I know this has been many years after I have asked the original question, but something about distinguishing /t/ and /d/ being voiced and unvoiced in this specific scenario hadn't quite convinced me. (As some users on this page has already mentioned telling apart d and t might not be as important in this situation by whether it is voiced or not.) I didn't quite believe anyone, whether native or not, could tell the difference at whether the consonant was really voiced or unvoiced in the scenario anyway.
I just stumbled upon this video by Dr Geoff Lindsey (co-edited CUBE dictionary): https://www.youtube.com/watch?v=U37hX8NPgjQ In one example, he showed:
- English does not contrast
b and unaspirated p after s
- "discussed" VS "disgust", where native and non-native would not be able to distinguish recordings of the two words.
- aspirated VS unaspirated sounds comparison between English and French with waveform visualization, where French unaspirated unvoiced
p is very similar to the English unaspirated voiced b.
- native speakers may choose to aspirate or not for many words.
In his conclusion, he simply argued it could be as simple as dictionaries have chosen the wrong symbols by transcribing speech as /spiːtʃ/ whereas /sbiːtʃ/ may be more appropriate.
I quite like his approach of cropping audio clips to prove the point that there's no chance anyone can claim to be able to tell whether the consonant was voiced or unvoiced to prove his point.