[E-trademarks] has USPTO documented whether and how it maps Unicode to ASCII? (was The ZWSP mistake at the Trademark Office)

Ken Boone boondogles at hotmail.com
Wed Oct 29 18:37:57 UTC 2025


The following are 8 pending applications in the publication queue relevant to this subject.  I have been providing the USPTO feedback on trademarks like these for the past few months. Are my comments below appropriate?  If not, what corrections if any do you recommend for these trademarks with Unicode characters outside the USPTO's standard character set?  (Mostly, I recommend replacing the unexpected Unicode characters with standard characters that resemble the Unicode characters.)

#
SN
Wordmark
Drawing
Comment
1
79404822
KOOKAЇ
[Image for 79404822]
The final letter Ї is the Unicode character with the decimal value 1031 , the CYRILLIC CAPITAL LETTER YI. The search WD:*Ї* AND MD:4 using this CYRILLIC CAPITAL LETTER YI retrieves only this KOOKAЇ trademark, while the WD:*Ï* AND MD:4 search using the standard character Ï retrieves over 3.7 million trademarks. Revise the wordmark for this pending trademark to use the standard character Ï; else, add the Cyrillic characters design code.
2
79408087
FREEZETECH Α
[Image for 79408087]
Scheduled for publication, considering the  in the drawing and the description of mark [The mark consists of the stylized wording FREEZETECH A with the lower case letter "A" taking the form of the Greek letter alpha.], isn't the design code 28.01.05 - Alpha (Greek letter)  appropriate for this STYLIZED TEXT application?
3
79416509
GX・SONIC STREAM
[Image for 79416509]
Substitute the standard character   ·  for  • ( the Unicode character with decimal value 65381 [Halfwidth Katakana Middle Dot?] that is NOT in the Standard Character Set) in the wordmark for this Standard Character Mark?
4
79418788
ΑΜΟΙ
[Image for 79418788]
There are Unicode Greek (non-Latin) characters  in the wordmark.  Without the Greek characters design code, the current wordmark is deceptive and could impact searching for this STYLIZED TEXT mark.
  Α = U+0391 : GREEK CAPITAL LETTER ALPHA
  Μ = U+039C : GREEK CAPITAL LETTER MU
  Ο = U+039F : GREEK CAPITAL LETTER OMICRON
  Ι = U+0399 : GREEK CAPITAL LETTER IOTA {iota adscript}
5
98875872
W.E.L.L. NEWS  ―  WELLNESS, EATING, LIVING, & LEARNING.
[Image for 98875872]
 The ―  character is the Unicode character with the decimal value 8213, a  Horizontal Bar character, but since it is not a standard character, shouldn't  the USPTO substitute a similar valid standard character?
6
99095415
RАРА+
[Image for 99095415]
There are non-Latin characters in the wordmark but no design codes for those non-Latin characters.  More specifically, the 2nd, 3rd & 4th characters are:
U+0410 : CYRILLIC CAPITAL LETTER A
U+0420 : CYRILLIC CAPITAL LETTER ER
U+0410 : CYRILLIC CAPITAL LETTER A
7
99181596
OMMISIMQIST​
[Image for 99181596]
How many characters do you see in the wordmark?  Just 11?  Actually, there are 12.  There is an unusual Unicode character with decimal value 8203 ( the ZERO WIDTH SPACE, a non-printing character that can be used to indicate a potential line break opportunity within a word or phrase, without introducing a visible space or hyphen)  appended to the wordmark.  Since the ZERO WIDTH SPACE is not included in the standard character set, delete this ZERO WIDTH SPACE from the wordmark.
8
99194118
OК
[Image for 99194118]
The second  letter К of the wordmark is the Unicode character with decimal value 1050 (Cyrillic Capital Letter Ka), not the standard character K.   Substitute the standard character K for the Cyrillic letter К; else, add the appropriate design code for Cyrillic letters.

For your convenience, the search SN:( 79404822 79408087 79416509 79418788 98875872 99095415 99181596 99194118 ) retrieves these 8 trademarks.

For the second trademark (79408087 - FREEZETECH Α), it looks like the conversion of wordmarks from lower-case to upper-case changed the Greek α to the upper-case Alpha character that strongly resembles the standard character A.

As it happens, only the 6th trademark (99095415 - RАРА+) was not included in the feedback that I provided to the USPTO on October 7th.

Thank you in advance for your comments,
Ken Boone

PS - Everyone noticed that registration 8,000,000 (8 Million) was issued on Tuesday, right?
________________________________
From: E-trademarks <e-trademarks-bounces at oppedahl-lists.com> on behalf of Ken Boone via E-trademarks <e-trademarks at oppedahl-lists.com>
Sent: Monday, October 13, 2025 5:19 PM
To: For trademark practitioners. This is not for laypersons to seek legal advice. <e-trademarks at oppedahl-lists.com>
Cc: Ken Boone <boondogles at hotmail.com>
Subject: Re: [E-trademarks] has USPTO documented whether and how it maps Unicode to ASCII? (was The ZWSP mistake at the Trademark Office)

I don't know where to begin, so I'll begin with what we (i.e., the USPTO developers) did with TESS.

TESS was on an 8-bit character system.  That is, Unicode characters with decimal values above 255 did not exist on TESS.

The TESS online help (yes, I saved a copy, a 170KB htm file) listed the searchable characters, namely

Searchable Characters: The following characters are searchable on TESS.

! # $ % & * + - . / 0 1 2 3 4 5 6 7 8 9
: < = > ? @ A B C D E F G H I J K L M N
O P Q R S T U V W X Y Z ^ ` a b c d e f
g h i j k l m n o p q r s t u v w x y z
~ € ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ • – — ™ š › œ
Ÿ ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª " ¬ ® ¯ ° ± ² ³ ´
µ ¶ · ¸ ¹ º " ¼ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò
Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ
ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ù ú û ü
ý þ ÿ

Obviously, Trademark Search allows for many more characters (including some ASCII characters that were reserved on TESS for search syntax/parameters).

TESS did have characters that were considered equivalent for searching.  Here's a partial list.
The following table lists character equivalents (aside from UPPER CASE to lower case equivalents).
Character
Decimal
Description
Search Equivalents
-
45
minus sign, hyphen
- -­
A
65
capital A
A a À Á Â Ã Ä Å Æ à á â ã ä å æ
C
67
capital C
C c Ç ç
D
68
capital D
D d Ð
E
69
capital E
E e È É Ê Ë è é ê ë
F
70
capital F
F f
I
73
capital I
I i Ì Í Î Ï ì í î ï
N
78
capital N
N n Ñ ñ
O
79
capital O
O o Œ œ Ò Ó Ô Õ Ö Ø ð ò ó ô õ ö ø
S
83
capital S
S s Š š
It's been a while. Since G & H were skipped, they must have not had any diacritical mark equivalents.  Note that F was NOT equated to the standard character ƒ (small italic f, function of - decimal 131).  Why?  I recall a meeting with the OCIO to discuss character equivalents.  No one else from Trademarks attended.  No attorneys. No managers.  As a mathematician (and since TESS did NOT have the Decimal Mark index at that time), I wanted TESS to distinguish ƒ from the ordinary f & F, as I knew ƒ occurred in a few wordmarks, so I insisted ƒ be indexed separately from f & F.  Same for × (multiplication sign, decimal 215) that resembles X and x. But I digress.

I see no similar character equivalence listing in the Trademark Search online help.  I have noted that Trademark Search equates the standard character ß (small sharp s, sz ligature, decimal 223) to the SS character pair (and consequently have not been able to find wordmark entries having ß without also retrieving thousands of SS wordmarks).

TESS also had a plurals table.  Here's a partial listing.

      AEROPLANE       AIRPLANE        AEROPLANES      AIRPLANES
           CATALOG         CATALOGUE       CATALOGS        CATALOGUES
      CENTER          CENTRE          CENTERS         CENTRES
      CHECK           CHEQUE          CHECKS          CHEQUES
      COLOUR          COLOR           COLOURS         COLORS
      COMPUTERIZED    COMPUTERISED
      DISC            DISK            DISCS           DISKS
      ENCYCLOPEDIA    ENCYCLOPAEDIA   ENCYCLOPEDIAS   ENCYCLOPAEDIAS
      FERTILISER      FERTILIZER      FERTILISERS     FERTILIZERS
      INQUIRY         ENQUIRY         INQUIRIES       ENQUIRIES

Basically, that plurals table allows for some equivalents between USA and British terms (aside from allowing unusual plural forms).  I see no similar plurals table in the Trademark Search online help.  By my quick check, AEROPLANE and AIRPLANE are NOT equivalent for searching on Trademark Search, so Trademark Search apparently did not incorporate the plurals table of TESS.

Historically, Pre-Exam was tasked to proof read new applications, checking that all application text was loaded correctly on TRAM, standardizing wordmarks (primarily to update literal elements provided in the application to USPTO standards), adding pseudo marks, updating the mark drawing code, and adding design codes if appropriate.  (When adding design codes, Pre-Exam sometimes deleted the standard characters claim when adding design codes for characters like <, >, ¢, ° and µ. Ooops!)

I was not aware of any automated pseudo mark system at the time of my 2012 retirement and seriously doubt any such automated pseudo mark system currently exists.

I suspect the USPTO implemented Trademark Search with basically default search parameters provided with the search engine.  Trademark Search does use logical fields (e.g., LD:false) and date fields (where TESS simply used dates in the 8-digit YYYYMMDD format without checking if the numeric values corresponded to actual calendar dates).

All that said, I'm thinking Pam & Carl are speculating on pseudo marks for Unicode characters outside the standard character set, but that no automated psuedo marking currently occurs to associate non-Latin characters (e.g., Greek or Cyrillic or whatever Unicode allows) to characters in the standard character set.

Also, I'm pretty sure Pre-Exam (or their contractors) is not populated with IT professional who would be skilled to recognize Unicode characters outside the standard character set that happen to resemble standard characters.

Since Trademark Search was first introduced, I have been working on searches on Trademark Search that identify characters outside the standard character set. I think I have some effective searches, but I'm not inclined to share those searches (other than several searches that I've already provided).  The TESS online help does not include any USPTO guidelines for pseudo marks or wordmarks.  As I recall, those guidelines were included in the help file for TradeUps, the edit tool used by Pre-Exam and other trademark employees to add/edit records on TRAM.

Basically, I don't believe the USPTO currently maps Unicode to ASCII beyond the associating of some a-z characters with diacritical marks to the simple a-z equivalents. (I recall some standard characters with decimal values above 255 were not equated to the simple a-z characters but did not keep firm records of those exceptions, plus Trademark Search has been rebuilt several times since I performed those checks.)

Happy Columbus Day,
Ken Boone

________________________________
From: E-trademarks <e-trademarks-bounces at oppedahl-lists.com> on behalf of Carl Oppedahl via E-trademarks <e-trademarks at oppedahl-lists.com>
Sent: Monday, October 13, 2025 10:42 AM
To: For trademark practitioners. This is not for laypersons to seek legal advice. <e-trademarks at oppedahl-lists.com>
Cc: Carl Oppedahl <carl at oppedahl.com>
Subject: [E-trademarks] has USPTO documented whether and how it maps Unicode to ASCII? (was The ZWSP mistake at the Trademark Office)

On 10/13/2025 7:53 AM, Pamela Chestek wrote:
That's the assumption i'm not willing to make because I'm not sure what happens when they are added to the pseudomark field. Do the lookalike Cyrillic letters get coded as the regular character lookalike? if you're searching the CM field for the mark, you will be searching the pseudomark database too.

Thank you Pamela for posting this.  Thanks to you I now know to wonder about this.  For example I wrote this:

And after that is [Unicode] U+0415 : CYRILLIC CAPITAL LETTER IE.  It looks like an "E" but you won't get a match if you search for a "E".

The two search examples that you provided suggest that maybe some human-initiated process at the Trademark Office sometimes leads to a mapping of a U-0415 into an ASCII 76 (a Latin "E") in a PM field.  Or maybe this is actually an automated process that reacts to any U+0415 character by (a) creating a PM field and (b) mapping U+0415 to ASCII 76 in that newly created PM field.  So now we get to start asking questions.

If a mark in an application or registration contains U+0415, then is it ensured by Trademark Office software that (a) that application or registration automatically gets a pseudomark entry, and (b) does that Unicode character get coded as an ASCII decimal 76?  (Call this "automated software mapping of Unicode to ASCII into the PM field" or "ASM".)

Or does the Trademark Office rely upon human-being processing of each application for some human being in the Trademark Office to actively decide to create a pseudomark entry for that application, meaning that there might be applications and registrations where no PM entry got created despite presence of (say) a U+0415 in the mark?  In other words human error might lead to a failure to create a PM entry in an application where it is actually needed?

If the latter (if the Trademark Office relies upon human-being processing), then when the human being decides to create a pseudomark entry for that application, is it automatic (in software) that the U+0415 gets mapped to ASCII 76?  Or does it rely upon alertness and accuracy for the human being to map the the U+0415 to ASCII 76?   In other words human error might lead to a failure to map the the U+0415 to ASCII 76?  Maybe the human being might inadvertently failure to map it at all?  Maybe the human being might inadvertently map it to some other ASCII value?

Let's suppose that it was only recently that the Trademark Office commenced ASM for newly filed applications.  If so, then it would be safe to assume that there are countless numbers of earlier-filed applications and registrations in which the ASM had not taken place and in which U+0415 characters represent guaranteed search failures.  Maybe the Trademark Office needs to carry out a "clean-up" ASM on the entire corpus of earlier-filed applications and registrations?  Has the Trademark Office done so?

One could imagine a number of ways that an ASM process directed to the entire corpus of earlier-filed applications and registrations could lead to unintended results.  Suppose that a particular registration contains an ASCII 36 (a dollar sign) that got mapped (years ago) into a PM entry with an ASCII 83 (a Latin S).  And suppose that particular registration also contains a U+0415 that until now had not gotten mapped into a PM entry.  Now let's say the Trademark Office decides to do a "clean-up" ASM on the entire corpus of earlier-filed applications and registrations.  The ASM will presumably encounter the U+0415 in that particular registration and will decide "we had better create a PM entry".  Does that lead to a discard of the previous PM entry that mapped the ASCII 36 to the ASCII 83?

But also recall the Invitation to the Public to Submit Suggestions Regarding Database Design Codes and Pseudo-Marks<https://www.uspto.gov/web/offices/com/sol/og/2009/week52/TOCCN/item-381.htm#cli381> (OG notice December 29, 2009).  Some PM entries exist not because of some automatic mapping but instead because a human being decided the PM entry needed to exist.  It might be that a Trademark Office person decided it.  Or the applicant or registrant decided it.  Or a third party decided it.  In any case suppose some PM entry in a particular application or registration exists because a human being decided it should be so.  And now suppose that particular application or registration also contains a U+0415 that until now had not gotten mapped into a PM entry.  Will the "clean-up" ASM lead to a discard of the previous PM that happened due to human intervention?

I would be astonished if it were to turn out that ASM already exists in USPTO software.   There are at least two reasons why this would astonish me.  First, you can go to the search box at the USPTO web site and plug in "Unicode".  Yes you will get a few hits, but they are limited to two areas -- patent application DOCX, and ST26 genetic sequence listings.  Not one of the Unicode hits leads to anything about trademarks or pseudomarks.

Now we all know the search box at the USPTO web site is very limited in what it can find.  We all know that Google (for example) finds lots of stuff on the USPTO web site that the local search box won't find.  So you can also use Google to try to find places where the USPTO somehow discussed or discusses Unicode as it relates to trademarks or pseudomarks.  But that also comes up empty.

It seems to me that if the Trademark Office had gone to the trouble to create ASM, surely this would have somehow leaked out into some place that would reveal itself in a web search.

But let's suppose that the Trademark Office has, against all odds, actually created and implemented ASM despite no hint or suggestion of it in the public record.  If so, then one would hope that in the spirit of transparency, the Trademark Office would publish the mapping.  Some published document would say "we always map U+0415 to ASCII 76".  This way everyone would know what to expect.  And if the mapping were flawed (suppose the Trademark Office failed to realize that U+0415 needs to get mapped to ASCII 76?) then there would be an opportunity for the trademark community to suggest remedies for such failures.

It should come as no surprise that in various open-source software communities, this mapping of Unicode to ASCII is well-plowed ground.  Each open-source software community is filled with very smart people who think about such things.  Each open-source programming language (for example PHP and Python) has one or or more functions, for example, for trying to do such mappings.

Maybe the Trademark Office has already reached out to one or more of the open-source software communities and has learned ways to be smart about Unicode.  Or maybe not.




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oppedahl-lists.com/pipermail/e-trademarks_oppedahl-lists.com/attachments/20251029/7e971018/attachment.html>


More information about the E-trademarks mailing list