Search This Blog

Sunday, August 25, 2024

Ouroboros, an apt symbol for AI model collapse

Engraving of a wyvern-type ouroboros by Lucas Jennis, in the 1625 alchemical tract De Lapide Philosophico

by Ariella Brown


AI hits the ouroboros (sometimes written uroboros) stage. You've likely seen it in the form of a snake in a circle, eating its own tail. The ancient symbol also sometimes showed dragons or a wyvern, so I chose this engraving by Lucas Jennis intended to represent mercury in the 1625 alchemical tract "De Lapide Philosophico," for my illustration instead of just going with something as prosaic as "model collapse"


To get a bit meta and bring generative AI into the picture (pun intended, I'm afraid) here's an ouroboros image
made with generative AI. ked Google

Ouroboros image generated by Google Gemini



Model collapse is what the researchers who published their take on this in Nature called the phenomenon of large language models (LLMs) doing the equivalent of eating their own tails when ingesting LLM output for new generation. They insist that the models should be limited to"data collected about genuine human interactions."

From the abstract:
"Here we consider what may happen to GPT-{n} once LLMs contribute much of the text found online. We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear. We refer to this effect as ‘model collapse’ and show that it can occur in LLMs as well as in variational autoencoders (VAEs) and Gaussian mixture models (GMMs). We build theoretical intuition behind the phenomenon and portray its ubiquity among all learned generative models. We demonstrate that it must be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of LLM-generated content in data crawled from the Internet."

Shumailov, I., Shumaylov, Z., Zhao, Y. et al. AI models collapse when trained on recursively generated data. Nature 631, 755–759 (2024).

Let me know in the comments which illustration you like more. 

Thursday, August 15, 2024

How to increase traffic 16,500%: clickbait vs. reality

 Fish on computer monitor  caught on hook


by Ariella Brown

I just attended an event with the title " Information Gain Content: How to Increase Traffic 16,500% by Going Above & Beyond with Bernard Huang." Did the session live up to its clickbait title?


Not at all. 


This wasn't necessarily Bernard Huang's fault. The presentation was hosted by the Top of the Funnel group that favors these types of large numbers that sound just specific enough that people may believe they are real.



Earlier this month, it offered "Social Copywriting Secrets: Building an Audience of 114,985 with Eddie Shleyner" I attended that one, too, and there was nothing in it that justified that number as the guaranteed result of some tactic you could apply. Shleyner just emphasized sticking to good, authentic storytelling to keep your audience engaged.


There were no easy-to-apply tricks in this session. If anything, it was about the reason the old tricks no longer work.  Huang explained that Google is currently applying its stated standard of Experience, Expertise, Authoritativeness, and Trustworthiness (E-E-A-T) in the context of its AI overview.


My main takeaway from this session was not that it's easy to increase your traffic on Google but that it simply has switched one set of algorithmic rules for another, now including its own AI reads on content and that this is not necessarily a good thing.

What bad about this? 


The bottom line is that Google is relying very heavily on consensus as well as pre-established authority. That means that it is very easy for the sites that have already built up big audiences and strong rankings on Google to leverage that to put out proclamations that will be accepted as true, especially when they are -- inevitably -- echoed by the wannabee followers and all those who pretend to be thought leaders by parroting what influencers already say.


In other words, truly original thoughts by those who are not just reinforcing group think will likely be buried. I did raise this question and wasn't wholly reassured by the answer. It was that, yes, Google expects all points of consensus on a particular topic to be represented. If they are not present, your content will be an outcast. The only way you have a chance to be noted is if you play the contrarian game by refuting the consensus views point by point. Not giving it that nod is SEO suicide.

And so we the current state of search is one that aims to homogenize information according to preset parameters from those already granted expert status where those who are not in line with the consensus will be buried in the obscurity of high number page results. 

This is truly the opposite of a democratic platform in which understanding of history and current events is allowed to rise or fall based solely on its own merit rather than the pre-established narrative. George Patton would be appalled. He's the one who said: "If everyone is thinking alike, then somebody isn't thinking."

Google's algorithms are designed to reward those who follow in the paths preset by others rather than really thinking for themselves. 


P.S. For the story behind the illustration above, see my LinkedIn post


Related:

Aim higher than SEO for your marketing

The 6 step plan that fails

What Edison can teach us about SEO

Put SEO in the picture


You can also follow Ariella Brown.  

Thursday, July 25, 2024

Headline check

Johnny Rose on "Schitt's Creek," saying "What am I looking at?"


By Ariella Brown

hashtag

I've never cared for the way commas are used in headlines set for AP style. Written text should correspond to the language we speak as much as possible. 

Substituting a comma for the word "and" in this instance actually slows the readers down as they have to consider that, though the standard formulation of a comma between two adjectives makes one assume they both apply to the same noun, that can't be the intent of the headline.

Really, why write "How younger, older B2B marketers differ" instead of "How younger and older B2B marketers differ"?

The latter is both clearer and flows better than the former. Saving those three keystrokes is an insignificant gain for a loss of clarity and rhythm. IMHO. Does that make it wrong? I decided to investigate.

As a grammar geek, I got very excited to discover that I am not alone in feeling these headlines mangle English. That very point is the subject of Stack Exchange discussion from a few years ago. The gist of the investigation amounts to these points:

1. The substitution of a comma for "and" is assumed to be motivated by a desire to save space or to sound punchier, though it's not something mandated by AP style. In other words, it's not incorrect to include the "and" in the title.
2. The suspicion that this practice is related to digital publications is not supported by evidence. There are examples of such headlines in print going back at least to 1990.
3. It's also not a practice peculiar to American outlets, a theory that may have arisen from those with some bias against American writing.

Now to return to the very first point here, I stick with my objection. For this particular formulation, the effect of dropping the "and" is not punchier, definitely not clearer, and seems to be an affectation of the writer or journalist who seems to believe that an"and" never belongs in a headline.

Whoever is responsible for blindly following this unfortunate trend has forgotten the one golden rule of writing that George Orwell proclaimed in his famous essay: “Break any of these rules sooner than say anything outright barbarous.”

Related

You can also follow Ariella Brown.  

Tuesday, July 23, 2024

Aim higher than SEO for your marketing content


You know, Jane Austen could have opened "Pride and Prejudice" with the standard line, "Once upon a time there was a family with five girls and no sons to inherit the estate that they depended on for their support." She didn't.

Pride and Prejudice tote with the opening line of the novel
https://www.zazzle.com/pride_and_prejudice_tote_bag-149363338488539996


Instead of sticking with the safe formulation, she crafted one of the most memorable openers for a novel that also gives the readers a taste of her wit and sense of irony. That opening line is Austen's brand in a nutshell.


This is what businesses should be striving for in their opening lines on their sites and their reports. Generative AI will not deliver that because it will work off pre-existing models. Simply tweaking that output will still not result in something truly fresh, though it may be just good enough to not incur the generative AI penalty Google has promised to deliver for those who aspire to achieve high SEO results.


Achieving SEO goals is not the same as making a memorable impression on your target audience when they click through to your site or blog.


What impresses Google is not necessarily going to move your target market to establish a relationship with your brand. The content that does can only be produced by a combination of analytics and human creativity.


You can't just be content with optimizing for search engines by following SEO guidelines when you need RO -- responsiveness optimization -- that requires blazing your own brand path.


That's what Write Way branding and marketing is all about. Learn more about my business offerings here.


Related

What B2B content marketers get wrong
Add a pinch of salt to creative claims for AI
Most Memorable Brand Slogans
What Edison Can Teach Us About SEO
Pride, Prejudice and Persuasion: Obstacles to Happiness in Jane Austen's Novels


You can also follow Ariella Brown.  

Friday, June 28, 2024

An Apology to Generative AI

ChatGPT spelled out in Scrabble tiles

By Ariella Brown


I'm not a generative AI fangirl. If anything, I'd consider myself more of a skeptic because people tend to not just use it as a tool to improve their writing but as a tool to replace the work of research, composition, and revision that is essential to good writing.

It is generally embraced by people who consider online research to be too much work and who believe that anything that comes out of a machine that will charge them no more than $20 a month for writing to be too good a deal to pass up. 

For those of us who actually read, the output of ChatGPT and similar LLMs is not exactly something to write home about. Unless you know how to prompt it and train it to write in a truly readable style, it will default to the worst of wordy, opaque corporate style text. 

But this isn't the fault of the technology. It's the fault of the mediocre content that dominates the internet that trained it. Below is one example that I pulled off  the "About" section of a real LinkedIn profile (first name Kerri maintained in the screenshot that proves this is real and not something I made up):  LinkedIn screenshot

As a strategic thinker, problem-solver, and mediator, I thrive in managing multiple, sometimes differing inputs to achieve optimal messaging and positioning. My proactive nature drives me to partner with leaders across marketing teams and internal business units, aligning efforts, connecting dots, and adding context to enable flawless execution of communication strategies and tactics.


In fast-paced, fluid environments, I excel in effectively prioritizing tasks and ensuring they are completed efficiently. I have a proven track record of setting and meeting strict deadlines and budgets, leveraging my ability to navigate dynamic landscapes seamlessly.

Driven by natural curiosity, I am constantly seeking to understand and implement the latest trends, technologies, and tactics essential for driving B2B sales opportunities. My keen interest in exploring new channels for messaging and content distribution fuels my passion for innovation and continuous improvement to not just meet but exceed expectations.

Let’s connect to explore how we can drive success together.

You know what sounds exactly like this? Cover letters you ask ChatGPT to compose for you. 

I've tried those out a few times and never been happy with the results because they always sounds like the text above. Trying to tell it to sound less stiff doesn't make it sound any less canned, and forget about getting it to copy my own writing style.

It's possible that Kerri used ChatGPT to create her "About" section. Given that she's been in the marketing biz for some time, though, I'd think she had to have had something filled out for years before ChatGPT was available, and it likely sounded very much like this even if she did let some LLM or something like Grammarly tweak it for her.  

People like Kerri, who ignore all writing advice from the masters like Orwell, White (watch for a upcoming  blog about him), and others made this the public face of corporate communication who are to blame for the bombastic and soulless style that LLMs replicate at scale. 


That's the reason for this apology too ChatGPT for mocking its output. You're not the one at fault. You had no way of knowing better. Humans do, and they should have provided you with better models for writing. 

Note on the title: I thought of giving this post the title "Apology" intended in the classical sense of a defense or justification for something others take as wrong with the hint of an apology to AI. Knowing that that wouldn't be clear to some readers, I opted to make this just a straight apology instead. 

Related:

A new generative AI comparison




Sunday, June 23, 2024

A new generative AI comparison






Photo by Andrew Neel: https://www.pexels.com/photo/monitor-screen-showing-chatgpt-landing-page-15863066/



In the past, I've ncomparted Cha GPT to Google Bard (now Gemini). For today's exercise, I put the same query to Claude, ChatGPT, Gemini, and Copilot. I put in the name of a real person and identified the location for that individual in Queens, NY.  None could give me the information I sough. 
What's interesting, though, is noting the differences in how they contextualized their responses. Out of the four, Claude shows the greatest self-awareness, and I'd say that Copilot shows the least.  

Claude's response

As this appears to be asking about a relatively obscure individual, I want to remind you that while I try to be accurate, I may hallucinate or generate incorrect information when asked about very specific people who are not widely known public figures. If you're looking for factual information about this person, I'd recommend checking other reliable sources.

I award Claude points for warning me about the possibility of hallucinations occurring when generative AI attempts to answer questions in the absence of data. See 

ChatGPT's response

I don't have specific information about an individual named ____from Queens, NY. It's possible that she is a private individual or someone not widely known beyond her personal circles. If she is known for something specific like public contributions, achievements, or notable activities, that information might be available through local news sources or community records.

ChatGPT's response failed to warn about hallucination, though it's interesting that it inferred the person in question was a she (based on the first name, I suppose), a gender identification that Claude managed to avoid.

Google Gemini's response

Gemini just bowed out, practically chiding me for giving it this query that it could not answer:
 
I do not have enough information about that person to help with your request. I am a large language model, and I am able to communicate and generate human-like text in response to a wide range of prompts and questions, but my knowledge about this person is limited. Is there anything else I can do to help you with this request?


Copilot's response

Copilot went in a completely different direction. I wouldn't call it a hallucination, but it is still inaccurate. It wanted to find a match, so in one case it named a business with the last name appearing twice as a partnership, though it admitted that was Brooklyn rather than Queens. Its second suggestion was a male doctor in Queens with a similar last name, though it is spelled differently. Honestly, I find it hard to believe that these are the only two results for the last name in all of NYC. I mean once you're expanding the parameters, there really should be a lot more than two possible results. 


RelatedAn A/B test of generative AI
Bard brings in the clowns


You can also follow Ariella Brown.