About TED TEDTalks > TED transcript text analysis

Before the TED Open Translation Project could begin, the talks on TED.com had to be transcribed. It turns out that over 100 hours of footage can render an incredible amount of text -- in fact, over 1,000,000 words.

One of the ancillary benefits of having TED's content as single body of text (besides exposing the content to search engines) is that it can now be directly analyzed. We compiled the transcripts into a single text corpus and, using TextSTAT software to track frequencies and concordances, followed our curiosity. Here are some of our findings.

Text data from the first 400 TEDTalks (approximate figures)

1,077,000 words spoken
37,000 unique word forms used

The top 10 most common ...

Words:

1. the (48918)
2. to (30049)
3. of (28977)
4. a (25859)
5. and (25177)
6. that (20350)
7. in (18023)
8. I (17935)
9. is (16043)
10. you (15613)

6-letter words

7-letter words

8-letter words

9-letter words

10-letter words

people (3679)
really (2652)
things (2253)
little (1686)
around (1101)
called (876)
should (655)
before (654)
change (638)
design (630)

because (2631)
through (952)
thought (797)
percent (729)
started (726)
another (712)
looking (607)
million (603)
problem (549)
working (502)

actually (1828)
together (582)
question (529)
building (514)
computer (444)
thinking (439)
children (427)
anything (411)
probably (409)
happened (388)

something (1600)
different (986)
important (654)
everybody (334)
basically (324)
beautiful (292)
countries (256)
wonderful (254)
community (251)
beginning (223)

technology (620)
understand (485)
everything (473)
experience (369)
completely (259)
particular (219)
interested (218)
difference (206)
themselves (196)
government (177)

The nine longest words:

methoxydimethyltryptamine (25) - Wade Davis
electroencephalographic (23) - Sherwin Nuland
neuroendocrinologists (21) - Stuart Brown
Buckminsterfullerene (20) - Peter Ward
electroencephalogram (20) - Robert Fischell
institutionalization (20) - Ben Dunlap
paleoanthropologists (20) - Zeresenay Alemseged
radiopharmaceuticals (20) - Ernest Madu
uncharacteristically (20) - Richard Dawkins

200 interesting words that appear 100+ times (rank / word / occurrences):

123. life (1251)
143. different (985)
165. thought (794)
168. idea (780)
178. percent (728)
183. human (699)
196. important (650)
201. change (635)
203. design (628)
211. technology (602)
218. million (590)
228. Africa (548)
230. problem (542)
237. question (527)
241. space (520)
244. system (517)
248. building (505)
253. dollars (487)
255. understand (485)
264. money (460)
270. interesting (453)
273. number (446)
274. brain (445)
279. example (437)
282. thinking (435)
289. computer (419)
290. water (419)
296. book (405)
298. children (404)
305. planet (381)
307. data (375)
311. car (370)
315. create (364)
318. future (360)
327. energy (353)
328. God (350)
329. information (350)
332. school (346)
347. universe (331)
354. science (326)
355. simple (325)
358. reason (323)
363. species (315)
365. process (314)
372. business (306)
376. billion (302)
379. food (300)
381. Earth (299)
384. hope (295)
386. possible (295)

391. beautiful (292)
392. history (292)
399. war (289)
401. body (284)
405. TED (281)
406. answer (280)
409. nature (279)
421. amazing (265)
424. wrong (264)
430. company (258)
431. imagine (258)
439. wonderful (253)
440. word (253)
453. community (244)
464. learn (240)
465. family (239)
467. feet (237)
478. child (231)
488. America (226)
491. happy (224)
500. market (221)
512. social (216)
516. game (213)
520. art (211)
527. heart (207)
529. per (207)
530. surface (207)
535. disease (206)
539. natural (205)
545. DNA (201)
547. machine (200)
552. global (196)
555. air (194)
558. program (193)
559. theory (193)
560. Internet (192)
565. mother (191)
567. phone (190)
568. research (190)
569. language (189)
570. learned (189)
571. force (188)
575. shape (187)
580. structure (186)
584. environment (184)
595. ourselves (180)
598. health (179)
601. evolution (178)
602. education (177)
603. oil (176)

605. cell (175)
606. incredible (175)
610. government (174)
626. study (168)
627. American (167)
628. film (167)
631. society (166)
642. culture (162)
643. death (162)
660. die (157)
667. pattern (156)
680. image (153)
685. movie (152)
693. material (149)
706. industry (146)
707. learning (146)
709. powerful (146)
713. animal (145)
719. population (144)
724. eat (142)
727. bacteria (141)
728. behavior (141)
735. software (139)
738. complex (138)
739. focus (138)
745. ocean (136)
746. song (136)
749. knowledge (135)
750. poor (135)
758. cancer (133)
760. sea (133)
761. cities (132)
769. Google (130)
772. relationship (130)
776. exciting (129)
777. experiment (129)
781. average (128)
785. happiness (127)
789. African (126)
792. code (126)
793. color (126)
794. effect (126)
795. fish (126)
797. parents (126)
798. sun (126)
799. died (125)
802. tree (125)
804. generation (124)
808. digital (123)
809. economic (123)

810. media (123)
812. truth (123)
816. father (122)
819. reality (122)
824. developing (121)
825. economy (121)
826. explain (121)
833. fly (120)
836. result (120)
838. version (120)
839. China (119)
848. solar (118)
849. biology (117)
852. dollar (117)
854. nuclear (117)
855. particles (117)
861. map (116)
866. situation (116)
867. English (115)
870. web (115)
872. India (114)
874. genes (114)
878. teach (114)
882. green (113)
884. product (113)
887. aid (112)
888. alive (112)
890. dead (112)
891. foot (112)
892. growth (112)
906. challenge (110)
908. poverty (110)
909. solve (110)
913. blue (109)
917. physical (109)
918. physics (108)
938. critical (105)
942. political (105)
944. screen (105)
946. slide (105)
952. beauty (104)
953. climate (104)
954. develop (104)
956. forest (104)
974. TV (102)
979. ants (101)
981. creative (101)
982. intelligence (101)
986. Mars (100)
992. moral (100)