Difference between revisions of "Webpage Guide"

From UCB Math Wiki
Jump to: navigation, search
(Basic HTML)
(Setting up a web page (Deprecated))
 
(51 intermediate revisions by 6 users not shown)
Line 1: Line 1:
This page provides a basic tutorial on how to create your own webpage on the math department server. Before proceeding, please make sure you understand basic [[File_Management|file management]], especially setting file permissions.
+
This page provides a tutorial on how to manually create your own web page on the Math department server. You have to learn basics of [http://en.wikipedia.org/wiki/HTML HyperText Markup Language (HTML)] language to do that. It would be nice to know elements of [http://en.wikipedia.org/wiki/CSS Cascading Style Sheets (CSS)], if you care about presentation aspects of your web page.
  
In your home directory, you need to create a directory called <tt>public_html</tt> . All files and directories for your webpage need to go in this directory, and the directory and all its contents must have their [[File_Management#Permissions|permissions set]] so that others can read the files and read and execute the directories. Make sure that any content you want on your webpage, like pictures, pdf's, etc., is  located in <tt>public_html</tt>.
+
== Setting up a web page (Deprecated)==
  
Additionally, <tt>public_html</tt> should have a file called <tt>index.html</tt> , which will be your homepage, i.e. the first page that pops up when someone visits your website. Once you've created this file (and set its permissions), your website's homepage will have the URL: <tt>/~&lt;username&gt;</tt>. In general, if you create a webpage called foo.html, it's URL will be <tt>/~&lt;username&gt;/foo.html</tt>.
+
===Hello World page===
  
Create html files using your favorite text editor. On the departmental computers, emacs and pico are two standard text editors.
+
Below is a minimal example of well formed ([http://validator.w3.org/ W3C]-compliant) HTML document.
 +
<pre>
 +
<!DOCTYPE html>
 +
<html>
 +
  <head>
 +
    <title>Hello World</title>
 +
  </head>
 +
  <body>
 +
    <h2>Hello, World!</h2>
 +
  </body>
 +
</html>
 +
</pre>
  
===Basic HTML===
+
===Unix command line method===
  
Each individual webpage on your site corresponds to a file of the form <tt>name.html</tt> . An HTML file just consists of plain text with a couple commands that tell the browser to do something to the display (for example, make the text bold font). Just like in LaTeX, there are some special characters that tell the browser "I am giving you a command"; they are &lt;, &gt; and /. Every command takes the form <code>&lt;command&gt;</code>, and most commands also allow you to turn them off by typing <code>&lt;/command&gt;</code>. For example, if you want to turn on bold font, type <code>&lt;b&gt;</code>. When you are done typing the text that you'd like bold, turn off bold font with <code>&lt;/b&gt;</code>.
+
This section explains how to create a webpage from a Unix terminal session, assuming you already know a little of Unix commands and HTML. For an introduction to HTML, see [[#Basic HTML|below]].
  
The nice thing about HTML is that it is very forgiving--even if you make a mistake it will try its best to render something in the browser. This makes it easy to see the mistake and usually makes it easy to see how to fix the mistake too. Commands are case-insensitive.
+
* First you need to login to <tt>login.math.berkeley.edu</tt> (department Unix shell server) via SSH terminal emulator or start the Terminal application (menu item: Applications/Accessories/Terminal) if you are logged in to Math thin client server <tt>keira.math.berkeley.edu</tt>.
 +
* In your home directory, you need to create a directory called <tt>public_html</tt> with the appropriate permissions. You can do this from a math department computer or over [[File Management#SSH|SSH]] by typing:
 +
<pre>
 +
mkdir public_html
 +
chmod 755 public_html
 +
</pre>
 +
* All files and directories for your webpage should be placed in the <tt>public_html</tt> directory, and the directory and all its contents must have their [[File_Management#Permissions|permissions set]] so that others can read the files and read and execute the directories.
 +
* Additionally, inside <tt>public_html</tt>, you should have a file called <tt>index.html</tt>, which will be your homepage, i.e. the first page that pops up when someone visits your website.
 +
* You can edit this file with any Unix editor. To use emacs, for example, type
 +
<tt>emacs public_html/index.html</tt>
 +
* An alternative to <tt>emacs</tt> is <tt>pico</tt>, which is used for writing emails in alpine. Other popular command line editors are <tt>vi</tt>, <tt>vim</tt>, and <tt>nano</tt>. When you're done, type <tt>chmod 644 public_html/index.html</tt> to give the file the right permissions. Your page should now be visible at <tt><nowiki>/~USERNAME/</nowiki></tt>. You can create other pages by creating other files in the <tt>public_html</tt> directory. For example, the file <tt>public_html/math1b.html</tt> will have the URL <tt><nowiki>/~USERNAME/math1b.html</nowiki></tt>.
  
* <code>&lt;HTML&gt;</code> In principle, every .html file should begin with <code>&lt;HTML&gt;</code> and end with <code>&lt;/HTML&gt;</code> to let the browser know that you are going to use HTML commands. But if you plan on just writing plain text, then this is not necessary.
+
For more information on managing files on the math department's file server and what the permissions mean, see the page on [[File Management|file management]].
* <code>&lt;Title&gt;</code> This is the title of the webpage--what appears in the top bar of the browser window. Be sure to turn it off after using it, so that all the text doesn't appear in that one bar!
 
* <code>&lt;Body&gt;</code> This delineates the body of the webpage: all the text, images, etc that you want to appear on the page should go in between the Body on and off commands.
 
* <code>&lt;br&gt;</code> and <code>&lt;p&gt;</code> Notice that if you just type text in an HTML file, the browser will ignore all enter keystrokes. In general, browsers treat extra "whitespace" as just one click of the spacebar. To tell a browser that you would like a line break, use br. If you would furthermore like a blank line to start of the next paragraph of text, use p instead.
 
* <code>&lt;b&gt;</code>, <code>&lt;i&gt;</code>, <code>&lt;u&gt;</code>, and <code>&lt;s&gt;</code>. Bold, italic, underlined and strikethrough text, respectively.
 
  
 +
===GUI method===
  
* <code>&lt;font size="n" color="rrggbb" align="rcl"&gt;</code> Set many optional properties on the font. Size takes a number as an argument; align takes one of "right", "center" or "left" as its argument; and color takes an RGB value as its argument. An RGB (red-green-blue) value is simply a 6 digit base-16 number, which means the digits can take any value from 0 to 9 and A to F. The first two digits give you the amount of red, the next two give you green, the last two give you blue. For example, fa8072 yields a tasty salmon color. To turn all these properties off, you just have to type <code>&lt;/font&gt;</code>.
+
This method works from any Math thin client computer connected to <tt>keira.math.berkeley.edu</tt>. The steps are essentially the same but no knowledge of Unix commands is necessary.
  
 +
* Start GUI file manager (a.k.a. Nautilus): either double-click on home folder icon on your desktop or select Home Folder from Places menu.
 +
* Create <tt>public_html</tt> directory in your home: select Create Folder from right-click menu in the Nautilus window.
 +
* Verify the directory access permissions: right-click on newly created "public_html" directory icon and select Properties, then select Permissions tab in the Properties dialog window. Make sure the owner can "create and delete files" while group and others can only "access files".
 +
* Create an empty HTML document in the <tt>public_html</tt> directory: double-click on the "public_html" icon and select Create Document (Empty File) from right-click menu in the Nautilus window. Call the empty document <tt>index.html</tt>.
 +
* Similarly verify the index files permissions: right-click on newly created index.html icon and select Properties, then select Permissions tab in the Properties dialog window. Make sure the owner can "read and write" while group and others can "read-only".
 +
* Edit the index document: right-click on newly created index.html icon and select Open with "Bluefish Editor" (or chose any other GUI editor you are comfortable with).
 +
* You may copy the above [[#Hello World page | sample page]] and paste to the editor window, save the document and view it in the web browser (URL: <tt><nowiki>/~USERNAME/index.html</nowiki></tt>).
 +
: '''Note:''' All your changes become visible to the world the moment you save the file. It is not recommended to use this error and try approach with the <tt>index.html</tt> file. Use a different file instead, e.g. <tt>test.html</tt>, and rename it to <tt>index.html</tt> when you are satisfied with changes.
  
* <code> &lt;body background="file/URL" bgcolor="rrggbb" link="rrggbb" vlink="rrggbb"&gt;</code> The optional properties for the BODY command, which set the background and the colors of links. Background takes either a filename (from you public_html directory) or a URL of another website, and sets that as the background. If it is too small of an image, it will be tiled. Bgcolor sets the background to a solid color if you don't want an background image. It, like link and vlink, takes an RGB value to set the color. Link is the color of an unclicked link; vlink is the color of a clicked link.
+
===Upload method===
  
 +
This method works from any (non-department) computer. Again, the essential steps are the same, but you edit your HTML code on a personal laptop or home computer and upload ready-made web pages to the <tt>public_html</tt> subdirectory of your Math department home directory.
  
* <code> &lt;img src="file/URL"&gt;</code> For inserting an image onto you webpage. The argument for src is either a filename or pathname for an image in your public_html directory, or else a complete URL to an image elsewhere.
+
# Login to <tt>login.math.berkeley.edu</tt> via SFTP and go to you home directory. Use any GUI or text-based SFTP client software available for your platform.
 +
#: Here are some popular free SFTP clients:
 +
#:* [http://cyberduck.io/ Cyberduck] - GUI client for Mac or Windows.
 +
#:* [http://winscp.net/eng/download.php WinSCP] GUI client for Windows.
 +
#:* Linux [http://en.wikipedia.org/wiki/GNOME GNOME] file browser [http://en.wikipedia.org/wiki/Nautilus_%28file_manager%29 Nautilus] comes with a built-in "Connect to Server" feature.
 +
#:* A great Linux text-based SFTP client (aside from relatively limited <tt>sftp</tt>) is called [http://lftp.yar.ru/lftp-man.html <tt>lftp</tt>].
 +
# Create <tt>public_html</tt> directory in your home if it does not exist already.
 +
# Upload HTML documents, CSS style sheets (if any) and other files and folders (as needed) to remote <tt>public_html</tt> directory.
 +
# Make sure that the remote directories and files are not writable by group and others.
  
* <code> &lt;A HREF="URL"&gt;text&lt;/A&gt; </code>. For links to pages on your site or on other sites. This can also be used to link to a PDF file or some other file for download. The URL is either the full URL for some other website (including the initial <tt>http://</tt>), or else just a pathname for an html file in your public_html directory. When the link is clicked, this is the page the browser directs you to. "Text" is just the text of hyperlink. For example, you may want the text to say "Solutions to Homework 9", but the URL will be something like "solutions9.pdf".
+
===Potential problems===
  
===Learning More HTML===
+
Some things which might go wrong when you try to view the page in a browser:
 +
* Error 403: This means that you don't have the permissions set correctly or the <tt>index.html</tt> file is missing. To make sure the permissions are correct for your homepage, type on a terminal command line:
 +
<pre>
 +
chmod 755 ~/public_html
 +
chmod 644 ~/public_html/index.html
 +
</pre>
 +
* Error 404: This means that the web server couldn't find your file at all. Make sure you have the files in the <tt>public_html</tt> directory.
 +
* Error 403 while accessing a subdirectory of <tt>public_html</tt>: Again, this means incorrect permissions or missing index.html file.
 +
: '''Note:''' The automatic directory listing is turned off by default for security reasons but can be turned on if needed (you have to learn how).
  
If you come across a webpage and want to know how they did that, you can try looking at the webpage's source. Most browsers have a View Source option in their menu; this will bring up the HTML page which you can examine.
+
==Basic HTML==
  
Also, there are many other HTML tutorials and reference guides. One reference can be found [http://www.htmlhelp.com/reference/wilbur/ here].
+
Each individual web page on your site corresponds to a file of the form <tt>name.html</tt> . An HTML document file consists of plain text with formating (markup) commands that tell the browser how to the display the text. Just like in LaTeX, there are some special meta-characters that tell the browser "I am giving you a command"; they are &lt;, &gt; and /. Every command starts with the code <code>&lt;tag&gt;</code>, and most commands end with the code <code>&lt;/tag&gt;</code>. For example, if you want to turn on '''bold''' font, type <code>&lt;b&gt;</code>. When you are done typing the text that you'd like to be bold, turn off bold font with <code>&lt;/b&gt;</code>.
 +
 
 +
A nice thing about HTML is that the browsers are very forgiving. If you make a mistake in the HTML code the browser will try its best to render your page as it is. This makes it easy to see the mistake and fix it. HTML tags are case-insensitive.
 +
 
 +
; <code>&lt;html&gt;</code>: In principle, every .html file should begin with <code>&lt;html&gt;</code> and end with <code>&lt;/html&gt;</code> to let the browser know that you are going to use HTML markup. But if you plan on just writing plain text, then this is not necessary.
 +
; <code>&lt;head&gt;</code>: This starts the head section of HTML document. It is used for setting up the title, style and other parameters applied to the visible elements of the web page. Don't forget to close the head section with the <code>&lt;/head&gt;</code> tag.
 +
; <code>&lt;title&gt;</code>: The title element is placed inside of the head section. The text between <code>&lt;title&gt;</code> and <code>&lt;/title&gt;</code> tags is the title of the web page that appears in the top bar of the browser window.
 +
; <code>&lt;body&gt;</code>: This marks the limits of the body of the web page: all the text, images, etc that you want to appear on the page should go in between the body start and end tags.
 +
; <code>&lt;br&gt;</code> <code>&lt;p&gt;</code>: Note that if you type text in an HTML file, the browser will ignore all newline characters (Enter keystrokes). In general, browsers treat multiple "whitespace" characters as just one space (as if the space bar is pressed once). To tell a browser that you would like a line break, use the <code>&lt;br&gt;</code> tag. If you would furthermore like a blank line to start a new paragraph of text, use <code>&lt;p&gt;</code> instead.
 +
; <code>&lt;b&gt;</code> <code>&lt;i&gt;</code> <code>&lt;u&gt;</code> <code>&lt;s&gt;</code>: '''Bold''', ''italic'', <u>underlined</u> and <del>strike-through</del> text, respectively.
 +
; <code>&lt;a href="URL"&gt;link text&lt;/a&gt;</code>: Use the <tt>a</tt> tag for links to other pages on your site or on other sites. This can also be used to link to a PDF file or some other file for download. The [http://en.wikipedia.org/wiki/URL URL] is either the full URL for some website resource (including the initial <tt>http://</tt>), or else just a pathname for a file in your <tt>public_html</tt> directory. When the link is clicked, the browser directs you to the resource specified by the <tt>href</tt> attribute. The "link text" is the text of hyperlink. For example, you may want the text to say "Solutions to Homework 9", but the URL will be something like "solutions9.pdf".
 +
; <code>&lt;img src="URL" alt="alternate text"&gt;</code>: Tag for inserting an image into your web page. The value of <tt>src</tt> attribute is either a filename path of the image in your <tt>public_html</tt> directory, or a complete URL to an image elsewhere. The required <tt>alt</tt> attribute specifies an alternate text for an image, if the image cannot be displayed.
 +
; <code><nowiki>&lt;a href="mailto:jdoe@example.com"&gt;jdoe@example.com&lt;/a&gt;</nowiki></code>: Creates a link that when clicked will open up the user's email program to send an email to "jdoe@example.com". It is '''not''' advisable to expose your email address in clear text. Instead, use [http://en.wikipedia.org/wiki/JavaScript JavaScript] to conceal your email from spammers (see [[#Concealing_Your_Email_From_Spammers | below]]).
 +
 
 +
==Learning more HTML==
 +
 
 +
There are many [http://en.wikipedia.org/wiki/Html HTML] tutorials and reference guides. For example, have a look at [http://www.w3schools.com/html/ w3schools.com].
 +
 
 +
If you come across a web page and want to know how they did that, you can try looking at the webpage's source. Most browsers have a View Source option in their menu (e.g. Firefox has a convenient keyboard shortcut Ctrl-U for source viewing); this will bring up the HTML page which you can examine.
 +
 
 +
There are also free templates (a.k.a. [http://en.wikipedia.org/wiki/Css cascading style sheets or CSS]) available online. A particularly good site can be found at [http://www.openwebdesign.org/ openwebdesign.org].
 +
 
 +
==Concealing your email from spammers==
 +
 
 +
Spammers use automated programs called webcrawlers to browse through webpages searching for email addresses. If your email address is just written in plain text in the HTML file (especially if you use the <code>mailto:</code> command) a webcrawler will pick it up and you will get more spam. There are a few ways to conceal your email address from webcrawlers. We have listed them below, in increasing difficulty of implementation (which also happens to be increasing order of strength).
 +
 
 +
* Use words instead of punctuation. Many primitive webcrawlers are designed to look for the @ symbol and then copy the text around it. If you write words, like <code>mgsa AT math DOT berkeley DOT edu</code>, many webcrawlers will not realize it is an email address. This is a pretty commonly implemented trick to fool webcrawlers, but it's easy for a spammer to adapt and include searches for "AT &lt;word&gt; DOT" and still find your email. However, if your email address has some natural form, then you can use that in the description and webcrawlers will not be able to get it. For example: <code>my last name AT math DOT berkeley DOT edu</code>.
 +
* Another way to fool webcrawlers who search for @ is to replace @ with an image of the symbol @. Webcrawlers cannot "see" what an image says, so this method completely works. The only downside is that someone trying to copy your email address has to remember to add the @ symbol. A large selection of images can be found doing an Google advanced image search for "at.gif" with the requirement that all images be small size.
 +
* [http://en.wikipedia.org/wiki/JavaScript JavaScript] is used to embed programs in HTML and we can use it in a very simple way to conceal the <code>mailto:</code> command from spammers. This method also allows viewers to simply click on the email address link to invoke their email program. Below is an example of JavaScript code for email link [mailto:jdoe@example.com jdoe@example.com]:
 +
<pre>
 +
<script type="text/javascript">
 +
//<![CDATA[
 +
GoFish=new Array();
 +
GoFish[0]="%3c"+"%61%20%68%72%65%66%3d%22%6d%61%69%6c";
 +
GoFish[1]="%74%6f%3a%6"+"a%64%6f%65%40%65%78%61%6d%70%6c%65%2e%63%";
 +
GoFish[2]="6f"+"%6d%22%3e%6a%64%6f%65%";
 +
GoFish[3]="40%6"+"5%78%61%6d%70%6c%65%2e%6";
 +
GoFish[4]="3%6f%6d%3c%"+"2f%61%3e";
 +
OutString="";
 +
for (j=0;j<GoFish.length;j++){
 +
OutString+=GoFish[j];
 +
}document.write(unescape(OutString));
 +
//]]>
 +
</script>
 +
<noscript>Sorry, you need to enable JavaScript to email me.</noscript>
 +
</pre>
 +
 
 +
You may generate the above code from Unix command line using the following python script:
 +
<pre>
 +
import sys
 +
import random
 +
sys.stdout.write('your email address: ')
 +
a = sys.stdin.readline().strip()
 +
t = '<a href="mailto:%s">%s</a>' % (a, a)
 +
l = ['%%%2x' % ord(c) for c in list(t)]
 +
s = ''.join(l)
 +
n = len(s)
 +
i = random.sample(range(1, n - 1), 9)
 +
i = [0] + sorted(i) + [n]
 +
c = '''
 +
<script type="text/javascript">
 +
//<![CDATA[
 +
GoFish=new Array();
 +
GoFish[0]="%s"+"%s";
 +
GoFish[1]="%s"+"%s";
 +
GoFish[2]="%s"+"%s";
 +
GoFish[3]="%s"+"%s";
 +
GoFish[4]="%s"+"%s";
 +
OutString="";
 +
for (j=0;j<GoFish.length;j++){
 +
OutString+=GoFish[j];
 +
}document.write(unescape(OutString));
 +
//]]>
 +
</script>
 +
<noscript>Sorry, you need to enable JavaScript to email me.</noscript>
 +
'''
 +
print c % tuple(s[i[j]:i[j+1]] for j in range(0, 10))
 +
</pre>
 +
Just copy and paste the script to a text file <code>obfuscate.py</code> and run it in a terminal session as follows:
 +
<pre>
 +
python obfuscate.py
 +
</pre>

Latest revision as of 07:58, 16 August 2024

This page provides a tutorial on how to manually create your own web page on the Math department server. You have to learn basics of HyperText Markup Language (HTML) language to do that. It would be nice to know elements of Cascading Style Sheets (CSS), if you care about presentation aspects of your web page.

Setting up a web page (Deprecated)

Hello World page

Below is a minimal example of well formed (W3C-compliant) HTML document.

<!DOCTYPE html>
<html>
  <head>
    <title>Hello World</title>
  </head>
  <body>
    <h2>Hello, World!</h2>
  </body>
</html>

Unix command line method

This section explains how to create a webpage from a Unix terminal session, assuming you already know a little of Unix commands and HTML. For an introduction to HTML, see below.

  • First you need to login to login.math.berkeley.edu (department Unix shell server) via SSH terminal emulator or start the Terminal application (menu item: Applications/Accessories/Terminal) if you are logged in to Math thin client server keira.math.berkeley.edu.
  • In your home directory, you need to create a directory called public_html with the appropriate permissions. You can do this from a math department computer or over SSH by typing:
mkdir public_html
chmod 755 public_html
  • All files and directories for your webpage should be placed in the public_html directory, and the directory and all its contents must have their permissions set so that others can read the files and read and execute the directories.
  • Additionally, inside public_html, you should have a file called index.html, which will be your homepage, i.e. the first page that pops up when someone visits your website.
  • You can edit this file with any Unix editor. To use emacs, for example, type
emacs public_html/index.html
  • An alternative to emacs is pico, which is used for writing emails in alpine. Other popular command line editors are vi, vim, and nano. When you're done, type chmod 644 public_html/index.html to give the file the right permissions. Your page should now be visible at /~USERNAME/. You can create other pages by creating other files in the public_html directory. For example, the file public_html/math1b.html will have the URL /~USERNAME/math1b.html.

For more information on managing files on the math department's file server and what the permissions mean, see the page on file management.

GUI method

This method works from any Math thin client computer connected to keira.math.berkeley.edu. The steps are essentially the same but no knowledge of Unix commands is necessary.

  • Start GUI file manager (a.k.a. Nautilus): either double-click on home folder icon on your desktop or select Home Folder from Places menu.
  • Create public_html directory in your home: select Create Folder from right-click menu in the Nautilus window.
  • Verify the directory access permissions: right-click on newly created "public_html" directory icon and select Properties, then select Permissions tab in the Properties dialog window. Make sure the owner can "create and delete files" while group and others can only "access files".
  • Create an empty HTML document in the public_html directory: double-click on the "public_html" icon and select Create Document (Empty File) from right-click menu in the Nautilus window. Call the empty document index.html.
  • Similarly verify the index files permissions: right-click on newly created index.html icon and select Properties, then select Permissions tab in the Properties dialog window. Make sure the owner can "read and write" while group and others can "read-only".
  • Edit the index document: right-click on newly created index.html icon and select Open with "Bluefish Editor" (or chose any other GUI editor you are comfortable with).
  • You may copy the above sample page and paste to the editor window, save the document and view it in the web browser (URL: /~USERNAME/index.html).
Note: All your changes become visible to the world the moment you save the file. It is not recommended to use this error and try approach with the index.html file. Use a different file instead, e.g. test.html, and rename it to index.html when you are satisfied with changes.

Upload method

This method works from any (non-department) computer. Again, the essential steps are the same, but you edit your HTML code on a personal laptop or home computer and upload ready-made web pages to the public_html subdirectory of your Math department home directory.

  1. Login to login.math.berkeley.edu via SFTP and go to you home directory. Use any GUI or text-based SFTP client software available for your platform.
    Here are some popular free SFTP clients:
    • Cyberduck - GUI client for Mac or Windows.
    • WinSCP GUI client for Windows.
    • Linux GNOME file browser Nautilus comes with a built-in "Connect to Server" feature.
    • A great Linux text-based SFTP client (aside from relatively limited sftp) is called lftp.
  2. Create public_html directory in your home if it does not exist already.
  3. Upload HTML documents, CSS style sheets (if any) and other files and folders (as needed) to remote public_html directory.
  4. Make sure that the remote directories and files are not writable by group and others.

Potential problems

Some things which might go wrong when you try to view the page in a browser:

  • Error 403: This means that you don't have the permissions set correctly or the index.html file is missing. To make sure the permissions are correct for your homepage, type on a terminal command line:
chmod 755 ~/public_html
chmod 644 ~/public_html/index.html
  • Error 404: This means that the web server couldn't find your file at all. Make sure you have the files in the public_html directory.
  • Error 403 while accessing a subdirectory of public_html: Again, this means incorrect permissions or missing index.html file.
Note: The automatic directory listing is turned off by default for security reasons but can be turned on if needed (you have to learn how).

Basic HTML

Each individual web page on your site corresponds to a file of the form name.html . An HTML document file consists of plain text with formating (markup) commands that tell the browser how to the display the text. Just like in LaTeX, there are some special meta-characters that tell the browser "I am giving you a command"; they are <, > and /. Every command starts with the code <tag>, and most commands end with the code </tag>. For example, if you want to turn on bold font, type <b>. When you are done typing the text that you'd like to be bold, turn off bold font with </b>.

A nice thing about HTML is that the browsers are very forgiving. If you make a mistake in the HTML code the browser will try its best to render your page as it is. This makes it easy to see the mistake and fix it. HTML tags are case-insensitive.

<html>
In principle, every .html file should begin with <html> and end with </html> to let the browser know that you are going to use HTML markup. But if you plan on just writing plain text, then this is not necessary.
<head>
This starts the head section of HTML document. It is used for setting up the title, style and other parameters applied to the visible elements of the web page. Don't forget to close the head section with the </head> tag.
<title>
The title element is placed inside of the head section. The text between <title> and </title> tags is the title of the web page that appears in the top bar of the browser window.
<body>
This marks the limits of the body of the web page: all the text, images, etc that you want to appear on the page should go in between the body start and end tags.
<br> <p>
Note that if you type text in an HTML file, the browser will ignore all newline characters (Enter keystrokes). In general, browsers treat multiple "whitespace" characters as just one space (as if the space bar is pressed once). To tell a browser that you would like a line break, use the <br> tag. If you would furthermore like a blank line to start a new paragraph of text, use <p> instead.
<b> <i> <u> <s>
Bold, italic, underlined and strike-through text, respectively.
<a href="URL">link text</a>
Use the a tag for links to other pages on your site or on other sites. This can also be used to link to a PDF file or some other file for download. The URL is either the full URL for some website resource (including the initial http://), or else just a pathname for a file in your public_html directory. When the link is clicked, the browser directs you to the resource specified by the href attribute. The "link text" is the text of hyperlink. For example, you may want the text to say "Solutions to Homework 9", but the URL will be something like "solutions9.pdf".
<img src="URL" alt="alternate text">
Tag for inserting an image into your web page. The value of src attribute is either a filename path of the image in your public_html directory, or a complete URL to an image elsewhere. The required alt attribute specifies an alternate text for an image, if the image cannot be displayed.
<a href="mailto:jdoe@example.com">jdoe@example.com</a>
Creates a link that when clicked will open up the user's email program to send an email to "jdoe@example.com". It is not advisable to expose your email address in clear text. Instead, use JavaScript to conceal your email from spammers (see below).

Learning more HTML

There are many HTML tutorials and reference guides. For example, have a look at w3schools.com.

If you come across a web page and want to know how they did that, you can try looking at the webpage's source. Most browsers have a View Source option in their menu (e.g. Firefox has a convenient keyboard shortcut Ctrl-U for source viewing); this will bring up the HTML page which you can examine.

There are also free templates (a.k.a. cascading style sheets or CSS) available online. A particularly good site can be found at openwebdesign.org.

Concealing your email from spammers

Spammers use automated programs called webcrawlers to browse through webpages searching for email addresses. If your email address is just written in plain text in the HTML file (especially if you use the mailto: command) a webcrawler will pick it up and you will get more spam. There are a few ways to conceal your email address from webcrawlers. We have listed them below, in increasing difficulty of implementation (which also happens to be increasing order of strength).

  • Use words instead of punctuation. Many primitive webcrawlers are designed to look for the @ symbol and then copy the text around it. If you write words, like mgsa AT math DOT berkeley DOT edu, many webcrawlers will not realize it is an email address. This is a pretty commonly implemented trick to fool webcrawlers, but it's easy for a spammer to adapt and include searches for "AT <word> DOT" and still find your email. However, if your email address has some natural form, then you can use that in the description and webcrawlers will not be able to get it. For example: my last name AT math DOT berkeley DOT edu.
  • Another way to fool webcrawlers who search for @ is to replace @ with an image of the symbol @. Webcrawlers cannot "see" what an image says, so this method completely works. The only downside is that someone trying to copy your email address has to remember to add the @ symbol. A large selection of images can be found doing an Google advanced image search for "at.gif" with the requirement that all images be small size.
  • JavaScript is used to embed programs in HTML and we can use it in a very simple way to conceal the mailto: command from spammers. This method also allows viewers to simply click on the email address link to invoke their email program. Below is an example of JavaScript code for email link jdoe@example.com:
<script type="text/javascript">
//<![CDATA[
GoFish=new Array();
GoFish[0]="%3c"+"%61%20%68%72%65%66%3d%22%6d%61%69%6c";
GoFish[1]="%74%6f%3a%6"+"a%64%6f%65%40%65%78%61%6d%70%6c%65%2e%63%";
GoFish[2]="6f"+"%6d%22%3e%6a%64%6f%65%";
GoFish[3]="40%6"+"5%78%61%6d%70%6c%65%2e%6";
GoFish[4]="3%6f%6d%3c%"+"2f%61%3e";
OutString="";
for (j=0;j<GoFish.length;j++){
OutString+=GoFish[j];
}document.write(unescape(OutString));
//]]>
</script>
<noscript>Sorry, you need to enable JavaScript to email me.</noscript>

You may generate the above code from Unix command line using the following python script:

import sys
import random
sys.stdout.write('your email address: ')
a = sys.stdin.readline().strip()
t = '<a href="mailto:%s">%s</a>' % (a, a)
l = ['%%%2x' % ord(c) for c in list(t)]
s = ''.join(l)
n = len(s)
i = random.sample(range(1, n - 1), 9)
i = [0] + sorted(i) + [n]
c = '''
<script type="text/javascript">
//<![CDATA[
GoFish=new Array();
GoFish[0]="%s"+"%s";
GoFish[1]="%s"+"%s";
GoFish[2]="%s"+"%s";
GoFish[3]="%s"+"%s";
GoFish[4]="%s"+"%s";
OutString="";
for (j=0;j<GoFish.length;j++){
OutString+=GoFish[j];
}document.write(unescape(OutString));
//]]>
</script>
<noscript>Sorry, you need to enable JavaScript to email me.</noscript>
'''
print c % tuple(s[i[j]:i[j+1]] for j in range(0, 10))

Just copy and paste the script to a text file obfuscate.py and run it in a terminal session as follows:

python obfuscate.py