304-487 Computer Architecture Laboratory
Project
Abstract Examples
Here are some examples of preliminary project abstracts submitted
for approval after the initial research and project formulation stages. They
provide an indication of the required format for the preliminary abstract, as
well as the scope and diversity of the topics which have been achieved in the
term projects.
Fall 1996
Title: An Integer Multiplier For Large N Using Convolution
This project implements the multiplication of two integers using the convolution
algorithm (Leighton 1992). The multiplication works in a serial fashion and
takes 4N - 1 cycles to complete for an N x N bit multiplication using a
convolutional approach to calculating and storing individual bits of the
product. This project has been chosen in particular to relieve the furure
classes of worrying about multiplication of large numbers for other algorithms
by using minimum chip area for the multiplier (e.g. RSA algorithm requires a 512
X 512 bit multiplier).
The multiplier optimizes on area and the stress will be on beating the LPM
multipliers provided in the Altera Library as far as size and scalability is
concerned. During the past few days, we have had the opportunity to go through
many different types of algorithms for multiplication. We chose this one because
of its simplicity as far as scaling was concerned and the fact that lot of
students are currently facing problems of space when using multipliers from the
Altera library. Another area of stress woul d be the sc alability of the
multiplier and that is where our research will be most concentrated and will
result in this being a unique project. Although the actual implementation should
be a comparatively easy task, the design strategies will have to go in depth to
explore many options in an effort to achieve desirable results.
References:
1. F. Thomson Leighton, Introduction To Parallel Algorithms and Architectures:
Arrays, Trees, Hypercubes, Morgan Kaufmann.
2. Hamacher, Vranesic, Zaky, Computer Organization, Computer Science Series.
3. Hayes, Computer Architecture And Organization, McGraw Hill
4. Chaum, Rivest, Sherman, Advances In Cryptology, Plenum
5. Cormen, Leiserson, Rivest, Introduction to Algorithms (McGraw Hill)
6. Patterson, Hennessy, Computer Organization And Design, The Hardware/Software
Interface, Morgan Kaufmann.
Title: Pulse-Mode Digital Multilayer Neural Network
In this project, a pulse-mode digital multilayer neural network (DMNN) based on
stochastic computing techniques is implemented with simple logic gates as basic
computing elements. The pulse-mode signal representation and the use of simple
logic gates for neural operations lead to a massively parallel yet compact and
flexible network architecture, well suited for VLSI implementation.
Moreover, algebraic neural operations are replaced by stochastic processes using
pseudorandom pulse sequences. The synaptic weights and neuron states are
represented as probabilities and estimated as average pulse occurence rates in
corresponding pulse sequences.
Finally, a DMNN feedforward architecture for recognizing patterns (digits) is
presented and its performance is evaluated.. The architecture and all digital
sub-components in the DMNN are modeled and simulated in VHDL.
References:
1. Laurene Fausett, 'Fundamentals of Neural Networks', Prentice-Hall, 1994.
2. David E. Van den Bout, T. K. Miller, 'A Stochastic Architecture for Neural
Nets', Proceedings of IJCNN-88, 1988.
3. Young-Chul Kim, Micheal A. Shanblatt, 'Architecture and Statistical Model of
a Pulse-Mode Digital Multilayer Neural Network', IEEE Transactions on Neural
Networks, 1995.
4. Dennis J. Walker, M. A. Sivilotti, 'A Digital Neural Network Architecture for
VLSI', Proceedings of IJCNN-92, 1992.
5. Young-Chul Kim, Micheal A. Shanblatt, 'An implementable Digital Multilayer
Neural Network', proceedings of IJCNN-91, 1991.
Title: Design of the RSA (Rivest, Shamir, and Adleman) public-key
cryptosystem using Altera's VHDL."
Cryptography is a security measure used to render data unintelligible to
unauthorized parties. In this day and age, vast amounts of digital data is
transmitted within complex communications networks, and thus cryptography
becomes a necessity.
For our project, we will undertake the design of the RSA public-key cryptosystem
using VHDL. We want to be able to encrypt/decrypt a message sent between two
communicating parties, using both a public key and a secret key. Due to time
constraints and the level of complexity of the RSA algorithm, we will not be
incorporating the "digital signature" aspect of the RSA cryptosystem into our
design.
References:
1. T. H. Cormen, C. E. Leiserson and R. L. Rivest, "Introduction to Algorithms"
McGraw-Hill Book Company, 1990.
2. C. H. Meyer and S. M. Matyas, "Cryptography: A new dimension in computer data
security" John Wiley & Sons Inc., 1982.
3. A. G. Konheirm, "Cryptography: A Primer" John Wiley & Sons Inc., 1981.
Title: 8x8 Multicast Knockout Switch
In this project we take the general design of a Knockout Switch with the
following parameters:
# of inputs = 8 # of outputs = 8 cell length = 41 bits = 32 bits data + 9 bits
header
These parameters may be changed if we deem to do so.
Our implementation will have multicast capabilities (a cell at one input may go
to many outputs) and possibly a priority scheme. The probabities of cell loss
due to the internal switching as well as the output buffering will be discussed.
Our synthesized FPGA circuit will then be simulated in order to determine the
various performance paramaters (max clock frequency, max packet rate, etc.)
References:
1. Y. Yeh, M. Hluchyj, and A. Acampora, "The Knockout Switch: A Simple Modular
Architecture for High-Performance Packet Switching", IEEE J. Selected Areas
Commun., vol. SAC-5, no.8, pp.1274-1283, October 1987.
2. M.Prycker, "Asynchronous Transfer Mode", Prentice Hall, 1995
3. T. Robertazzi, "Performance Evaluation of High Speed Switching Fabrics and
Networks", IEEE Press, 1993.
4. G. Biran, "Introduction to ATM Switching",
ftp.technion.ac.il/teach/topnet/cnpp94/gbiran/atm_switch.html, October
1996.
Back to top
Fall 1997
Title: Implementation of Database Operations in Computer Hardware
One of the most popular application of computers is storage and retrieval of
large information databases. Currently most database operations are implemented
using software. This project implements most of the common database operations
directly into hardware and attempts to improve efficiency in terms of speed
compared to software based approaches.
The four primary operations implemented by most database programs are storage,
retrieval, sort and search through data.
The 'run length encoding' compression algorithm is used as the primary data
compression tool in order to conserve storage space requirements. A separate
decompression unit restores the original data during the retrieval process.
In order to sort data the quick sort algorithm is implemented in hardware. This
is followed by implementing a binary search operation on the sorted data.
In conclusion this paper evaluates design objectives and discusses the relative
advantages and disadvantages of a hardware based approach for database
operations.
References:
1. Gilbert Held, 'Data Compression: Techniques and Applications', John Wiley &
Sons Ltd., 1991.
2. Mulford, J. B., Ridall, R. K., 'Data compression techniques for economic
processing of large commercial files', ACM Symposium of Information Retrieval
and Storage.
3. Gonnet, G. H., Baeza-Yates, 'Handbook of Algorithms and Data Structures',
Addison-Wesley.
4. Silvester, P. P., 'Data Structures for Engineering Software', Computational
Mechanics Publications, 1993.
Title: A Simple ATM Switch using the Batcher-Banyan Network Topology
Asynchronous Transfer Mode (ATM) is a network standard that allows a wide range
of applications (voice, multimedia, computer data) to be carried on a single
network. This capability has generated much interest in ATM and made it the
standard for the Broadband ISDN network. The success of this new standard
depends on the availability of switches that can handle the high speeds required
by new applications.
This project will attempt to design a simple ATM switch using the Batcher-Banyan
network topology. The switch operates only at the ATM layer (multiplexing cells
onto a single stream, translation of VPIs) assuming that there is a physical
layer interface underneath that supports the UTOPIA interface. The switch will
only handle simple point-to-point connections, ignoring broadcast and multicast
issues; thus the design shall use the Batcher-Banyan network topology as a
switching fabric. A simple 2x2 switching element with limited support for
output contention will be designed.
References:
1. M. Boisseau, M. Demange, J-M. Munier, 'ATM Technology an Introduction',
Thomson Computer Press, 1996.
2. M. Prycker, 'Asynchronous Transfer Mode Solution for Broadband ISDN',
Prentice Hall, 1995.
3. W.J. Goralski, 'Introduction to ATM Networking', McGraw Hill, 1995.
Title: Polygon transformer and clipper
The purpose of this project is to implement graphic clipping routines and
polygon transformation routines. As it is well know that hardware implementation
is faster than its software counterpart, this design will attempt to increase
the speed of drawing polygons so that graphic intensive applications will
benefit from this improvement in speed.
The proposed design takes as input a polygon and performs the following
operations on it: Rotation, Translation, Scaling.
The design will also support window clipping and viewport mapping. For these,
two pairs of coordinates will specifiy the diagonal of the viewing window, while
another two coordinates will specify the diagonal of the viewport.
Once the window and the viewport are specified, the transformer will perform the
operations mentioned above. Once the polygon has been transformed, the clipper
will clip the part of the polygon outside the viewing window and will then map
the clipped polygon into the viewport window.
Hence, the final output coordinates of the polygon will be the result of the
three transformation operations, a clipping operation to fit the polygon into
the clipping window and finally a mapping operation to map the polygon into the
viewport window. The final coordinates sent as output will be relative to the
viewport's local coordinate system.
References:
1. Hearn, Donald - Computer graphics, C version (2nd Ed.), Upper Saddle River,
N.J. : Prentice Hall, 1997.
2. Mortenson, Michael E. - Geometric modeling, 2nd ed., New York : Wiley, c1997.
3. Computer graphics : principles and practice, 2nd ed. in C, Reading, Mass. :
Addison-Wesley, 1996.
Title: A hardware implementation for the DES algorithm.
Networking and secure distributed systems are major research areas at the
Digital Equipment Corporation's System Research Center (SRC). The Data
Encryption Standard (DES) was issued by the National Bureau of Standards (NBS)
in 1977 and was adopted by th e U.S. government. The DES is essentially an
algorithm that uses permutations, rotations and substitutions in order to
encipher 64-bit data blocks using a 56-bit secret key. Plaintext can easily be
recovered from the cipher once the secret key is available. Cryptographic m
odules may implement this standard in software, firmware or hardware or any
combination thereof. The work discussed here is a hardware implementation of the
DES on FPGA chips, using Altera's products. VHDL language will be used to
describe the design while the MAXPLUSII CAD software will be used to synthesize
it. However, due to time constraints, t he complexity of the algorithm has been
scaled-down by reducing the size of data blocks to 16-bits. This means that the
key's length and the number of iterations needed shall be modified accordingly.
References:
1. Federal Information Processing Standards Publication, "Publication 46-2",
December 1993.
2. Jennifer Seberry and Josed Pieprzyk, "Cryptography: An Introduction to
Computer Security.", Prentice-Hall, 1989.
3. Man Young Rhee, "Cryptography and Secure Data Communications.", McGraw-Hill,
1994.
4. R. Rivest, A. Shamir, and L. M. Adleman, "Cryptographic Communications System
and Method", US Patent 4,405,829, 1983.
5. "The RSA Frequently Asked Questions document", RSA Data Security Inc., 1995.
6. Mark Gibbs, "Guide To Networking", second edition, SAMS 1995.
Back to top
Winter 2000
Title: Electrical Storm Locator
The intended project is to build an electrical storm locator controller, which
will use electromagnetic detectors and a microphone to measure the distance,
intensity and direction of an electrical storm system. Using loop antennae to
detect the presence of a bolt of lightning, the sources direction can be
determined. Once a bolt of lightning has been detected a delay time between its
detection and the perception of its associated thunder clap can be used to
calculated the distance between the location of the lightning and the detector.
The intensity of each bolt of lightning can be measured by comparing the delay
time to the intensity of the magnetic flux since the magnetic flux dissipates in
strength as it propagates away from its source. Also, the intensity of the
electrical system can be calculated by counting the number of bolts of lightning
occurring within a given space of time.
Attached is one file: abstract.dpf diagram which could not be included in ASCII
file.
Title: VHDL Analysis of the Effects of Associativity on Cache Performance
The goal of this project is to illustrate the effects of associativity on cache
performance, via miss rate and access time measurements. This goal will be
accomplished by implementing two independent data caches, one direct-mapped and
the other four-way associative. Aside from their degree of associativity, both
caches will have the exact same policies. The writing policy will be write
through with no-write allocation and the replacement strategy will be LRU(least
recently used). The caches will be fed "virtual" signals in order to simulate
the CPU's behavior. The measurements will be taken from simulation results and
registered performance analysis.
References:
1. Jim Handy, The Cache Memory Book, Academic Press. 2. Patterson, Hennessy,
Computer Organization And Design, The Hardware/Software Interface, Morgan
Kaufmann. 3. Patterson, Hennessy, Computer Architecture, A Quantitative
Approach, Morgan Kaufmann. 4. David Loshin, Efficient Memory Programming, McGraw
Hill
Title: A Simple Low Cost VHDL implementation of Data Encryption Standard (DES)
Abstract: The Data Encryption Standard (DES) was selected by the National Bureau
of Standards (NBS) in 1977 as the standard for modern data computing encryption.
It is a mathematical algorithm commonly used cryptographic protection of modern
computing data. A '64 bit key' is used to encrypt 64 bit binary coded data. The
whole idea of encryption lies on this '64 bit key', since this unique set of 64
bit key will generate a unique set of 64 bit cipher text from 64 bit of plain
text. Since only 56 bit of the 64 bit key is used in the encryption process,
there are over 70,000,000,000,000,000 possible combinations to find a specific
key to decrypt the data.
This paper describes the VHDL implementation of the Data Encryption Standard on
the Altera FLEX10K series FPGA. The overview of the design process includes
initial permutation module, product transformation module, and inverse initial
permutation module. When the 64 input data bits (plain text) are passed through
the modules described, 64 output data bits (Cipher text) are formed.
References: 1. Denning, Dorothy E., "Cryptography and data security",
Addison-Wesley Publishing Company, Inc., 1982.
2. Harry Katzan Jr., "The Standard Data Encryption Algorithm", Petrocelli Books,
Inc. 1977.
3. Jan C. A. Van Der Lubbe, "Basic Methods of Cryptography", Cambridge
University Press, 1994.
4. Ingrid Verbauwhede, Frank Hoornaert, Joos Vandewalle, "Security and
Performance Optimization of a New DES Data Encryption Chip", IEEE Journal of
Solid-State Circuits, Vol.23, No. 3, June 1988.
Title: H100 Timeslot Interchange
This project implements the interchanging of constant bit rate (CBR) data bytes
on an H100 TDM bus. The switching of timeslot/streams (TSSTs) is programmable
by a Motorolla CPU interface module within the timeslot interchange (TSI), used
to access an internal memory which keeps the configuration/routing of each TSST.
Each TSST can either be routed to another TSST or simply left untouched.
For testing purposes, any TSST exiting the TSI can be programmed to force a data
byte onto the H100 bus. The outgoing TSST forced to this value does not have to
be programmed to route the data of an incoming TSST.
The TSI will act as an H100 slave on the bus, not driving any clocks. The TSI
will be capable of switching from the master clock to the slave clock if an
error is detected on the master clock. This functionality is desired by the
H100 specification.
The module will use the following H100 bus signals for operation: CT_C8_A,
CT_C8_B, CT_FRAME_A, CT_FRAME_B, and the data bus CT_D[31::0]. The remaining
clock signals described in the H100 specification are used for
backwards-compatibility and are not necessary for the operation of the TSI.
The architecture and all sub-modules in the TSI are modeled and simulated in
VHDL. An attempted to compile the TSI into an Altera device, of any size, will
be made. However given the scope of this project, this last goal could be
difficult to meet.
References:
1. Enterprise Computer Telephony Forum, H.100 Revision 1.0, Hardware
Compatibility Specification: CT Bus
Title: Carrier Sense Multiple Access with Collision Detection
Abstract: The purpose of our project is to design and test a media access
control device based on the Carrier Sense Multiple Access with Collision
Detection or CSMA/CD. The latter is an improvement of the CSMA protocol which
permits halting transmission as soon as a collision is detected during the
communication of data between stations. This protocol with collision detection
is the most popular MAC protocol to-date.
Ethernet is an example for a CSMA/CD network. The Ethernet system includes a
mechanism that detects when a collision occurs (collision detect). The system
also includes "listen before talk," in which stations listen for activity
(carrier sense) before transmitting, and supports access to a shared channel by
multiple stations (multiple access). Putting all these components together, one
can see why the Ethernet channel access protocol is called Carrier Sense
Multiple Access with Collision Detect (CSMA/CD).
Moreover, Ethernet is a baseband system that uses a Manchester encoding of high
and low voltages to place bits on a wire pair. In our design, the data to be
transmitted will be encoded using Manchester encoding : each bit time contains a
transition in the middle: a transition from low to high represents a 0 bit and a
transition from high to low represents a 1 bit.
In conclusion, we will design and test the CSMA/CD protocol and we will discuss
its advantages and disadvantages in the interconnection Networks field. The
architecture and all sub-components in the CSMA/CD protocol will be modeled and
simulated inVHDL using the Altera Max Plus II software.
References:
1. Ethernet and CSMA/CD: This article describes how and why CSMA/CD works on an
Ethernet network. Updated on Aug 5, 1998 - www.pcwebopedia.com/TERM/C/
Carrier_Sense_Multiple_Access_Collision_Detection html.
2. Charles Spurgeon's Ethernet Web Site - www.ots.utexas.edu/ethernet/.
3. David A. Patterson/ Jhon L. Hennessy, Computer Architecture: A quantitative
approach, 2nd Edition - Chapter 7. Interconnection Networks.
4. "Carrier Sense Multiple Access with Collision Detection (CSMA/CD)," IEEE Std
802.3-1990 Edition (ISO/DIS 8802-3), IEEE, New York (1990).
5. Ethernet Applet -
www.seas.upenn.edu/~ross/applets/daniel_brushteyn/enet.html.
Title: Design of a PCI bus interface for FPGAs
Abstract:
The Peripheral Component interconnect (PCI) was developed in 1993 by Intel and
now is overseen by a Special Interest Group. Its main purpose is to provide a
high band width to devices like monitors, video cards, sound cards, USB
interfaces etc. The PCI bus allows large data transactions directly with the
system memory and peripheral devices.
The purpose of this project is to implement a PCI bus interface in VHDL for
generic add-in cards, with the inclusion of the JTAG pins. The design of the PCI
interface has implemented in the previous years, but we aim to offer an
on-board testing mechanism for FPGAs. This design would include a master,
target and controllers for each. The basic cycle simulations would include,
master read/write, target read/write, system error and parity check.
References:
1) PCI Local Bus Specification, revision 2.1, PCI Special Interest Group
Title: Error Detection Design using Residue Arithmatic Method for Multiplication
High speed processing architectures are widely used in many computing
applications. As the VLSI circuitry gets smaller, the amounts of multiply
operations per second increases rapidly and error check and detection process
becomes more and more important. The error detection circuitry is to assure
that each bit is multiplied correctly.
In this project, the speed and hardware will be considered as two main factors
to optimise. To achieve maximum speed with minimum hardware, the residue
arithmetic codes will be implemented to verify the result of multiplication
through the error detection unit. Two types of error checking will be
implemented in VHDL: top-level and bit-level. In terms of top-level, error
detection is performed at the end, where the residues are compared with the
final result. While bit-level error detection is performed for every serial bit
shifted into the multiplier, so as to detect the error immediately.
The operating frequency and the resource usage will also be investigated in this
report. Low cost and high speed is essential for this error detection unit to
be practical and economical.
References:
1 Thomas J. Brosnan and Noel R. Strader II,"Modular Error Detection for
Bit-Serial Multiplication," IEEE Trans. Comput., Vol. 37 pp. 1043-1052, Sep.
1988.
2 T.R.N. Rao, "Error Coding for Arithmetic Processors." New York Academic,
1974.
Title: A Hardware implementation of Carrier Sense Multiple Access with Collision
Detection (CSMA/CD)
The purpose of this project is to design a network card based on the CSMA/CD
protocol. A network card/adapter is a media access control device for a
computer or terminal,it is a very essential component of a computer network.
Sharing a common transmission medium between multiple stations is one of the
advantage of CSMA/CD. Each terminal will listen to the medium, then transmit the
data packet in serial form when the medium is quiet. Although this method is
simple and straight forward, the only problem is transmission collision. It
occurs when two stations send out data packet at the same time. The stations
involved will 'backoff' for a period of time before re-transmission. Therefore,
collision detection is very important during transmission.
Since we are implementing a media control device, we assume the header
information, data and some control signals are provided by other devices. At the
other end, the receiver will pass the data to a upper level components. Due to
the time constrain, the IEEE Std 802.3 will be reduced in complexity. The basic
format of the data packet include synchronization preamble,destination address,
destination address and data body. The 'type' field and 'CRC' field will be
removed. Cyclic Redundancy Check (CRC) is very useful in error detection, but
it will significantly increase the complexity of this project. Moreover, the
data is encoded using Manchester Encoding, which will allow synchronization
between sender and receiver.
References:
1. Larry L. Peterson and Bruce S. Davie, 'Computer Networks: A system
approach', 2nd ed, Morgan Kaufmann, 2000
2. Gilbert Held,'Ethernet Networks: Designs, Implementation, Operation and
Management', John Wiley & Sons Canada,1996
Title: FAST FPGA IMPLEMENTATION OF THE ATM LAYER CELL TRANSPORT FUNCTIONS
Abstract: Asynchronous transfer mode (ATM) is a popular and extremely important
switching technique. Its small cell size and scaleability allow for the
transmission of isochronous and time-insensitive data from a wide range of
services (such as text, speech, or video) over the same network.
FPGA's are often used for the development of rapid prototypes of hardware
implementing ATM. However, the FPGA rapid prototypes are often much slower than
the ASIC designs that they are used to validate. The aim of this project is to
implement the ATM protocol's cell transport functions using FPGA's while
employing a deep-pipeline design methodology that would permit the FPGA circuit
to function at high operating frequencies while minimizing the resulting
latency.
References:
[1] D.E. McDysan, D.L. Spohn, ATM: Theory and Application,
McGraw-Hill Inc. 1994
[2] M.P. Clark, ATM Networks: Prnciples and Use, John
Wiley and Sons, Ltd. 1996
(Additional references may be added during the course of this project.)
Title: A VHDL Implementation of Differential Pulse Code Modulation (DPCM)
Algorithm.
In our present society, the telecommunication field has changed our lives
considerably. Where would we be without: Internet, Networking, Cellular phones
etc. This project consists of implementing the Differential Pulse Code
Modulation (DPCM) Algorithm in telecommunication theory. On a side note, PCM
(Pulse Code Modulation) is a sampling technique for digitalizing analog signals,
especially audio signals (speech coding).
The advantage of the DPCM over the PCM is that it predicts the next sample based
on the last few decoded samples. If the predictions are effective, then the
error signal will be minimized, allowing a more accurate representation of the
analog signal (speech) in digital format. The error signal between the predicted
samples and the actual samples will be smaller than the error signal between the
actual samples and the original signal. Therefore, we are able to quantize the
error signal using fewer bits than the original signal. This is the basis of
the Differential Pulse Code Modulation (DPCM) algorithm; to quantize the
difference between the original and predicted signals.
The DPCM is an algorithm that uses, quantizers, predictors, DPCM encoders and
DPCM decoders in order to convert an analog signal into a digital signal. The
analog signal is sampled and the difference between the actual sample value and
the predicted value is encoded to form a brand new digital value.
References
1.Simon Haykin , "Communication Systems", John Wiley & Sons Inc., 1994
2. J. G. Proakis and D. G. Manolakis, "Digital Signal Processing",
Prentice-Hall, 1996
3. Majid Rabbani and Paul W.Jones, "Digit Image Compression Techniques"
www.rasip.fer.hr/research/compress/algorithms/index.html
Back to top
Title: Substitution Ciphers - new and old
Abstract:
Data security and cryptography are critical fields of research in today's
information technology world. As the world becomes increasingly interconnected,
the need for privacy and access control is also even more emphasised. To
illustrate the concepts of cryptography, a cipher can be described as a method
of transforming a message in order to conceal its meaning from unauthorized
parties.
As a major objective of this project, we will be concentrating on the
implementation and comparison of the most popular subsitution ciphers. Some of
the ciphers we will consider are caeser's cipher, K shift cipher, and
Monoalphabetic substitution. These ciphers are used frequently and although easy
to implement, are quite useful in understanding the underlying concepts of
cryptography.
As a final objective of this project, we will devise and implement a new
substitution cipher. The cipher will be implemented and extensively tested using
VHDL, and we will endeavour to improve on the standards in use today.
References:
- Meyer, Carl H. "Cryptography : a new dimension in computer data security : a
guide for the design and implementation of secure systems": Wiley, 1982. - A. G.
Konheirm, "Cryptography: A Primer" John Wiley & Sons Inc., 1981. - Denning,
Dorothy Elizabeth Robling: "Cryptography and data security" : Addison-Wesley,
c1982. - Cryptography:
www.cs.wpi.edu/~cs513/f99cew/week12-crypt/week12-crypt.html
Title: Develop a high speed FPGA based fuzzy logic inference engine
Abstract:
Fuzzy logic systems are becoming much more widespread because of their ability
to simply model complex systems, where the relationship between the inputs and
the outputs is not well defined. These systems can be implemented using software
and a general-purpose computer or a fuzzy processor. However, these
implementations are general and do not always offer the best solution for
real-time, embedded systems, and other applications.
This project will attempt to implement a fuzzy logic inference engine. The
design will utilize a pipelined approach and will allow several identical
engines to be operated in parallel to increase performance. Since practical
systems use external memory modules to hold the rule parameters and membership
values, this design will take into account the need for sharing these resources.
References:
1. Aranguren, G.; Barron, M.; Arroyabe, J.L.; Garcia-Carreira, G. "A pipe-line
fuzzy controller in FPGA." Fuzzy Systems, 1997., Proceedings of the Sixth
IEEE International Conference on, Volume: 2 , 1997 Page(s): 635 -640 vol.2.
2. Bin Qiu; Pak L Woon. "The hardware implementation of a generic fuzzy rule
processor." Signal Processing Proceedings, 1998. ICSP '98. 1998 Fourth
International Conference on , Volume: 2 , 1998. Page(s): 1343 -1346 vol.2.
3. Hung, D.; Zajac, W. "Implementing a fuzzy inference engine using FPGA." ASIC
Conference and Exhibit, 1993. Proceedings., Sixth Annual IEEE International ,
Sept. 1993 Page(s): 349 -352.
4. Masmoudi, N.; Hachicha, M.; Kamoun, L. "Hardware design of programmable fuzzy
controller on FPGA." Fuzzy Systems Conference Proceedings, 1999. FUZZ-IEEE '99.
1999 IEEE International , Volume: 3 , 1999 Page(s): 1675 -1679 vol.3.
5. Parris, C.P.; Haggard, R.L. "An architecture for a high speed fuzzy logic
inference engine in FPGAs." System Theory, 1997., Proceedings of the
Twenty-Ninth Southeastern Symposium on , 1997. Page(s): 179 -182.
6. Ungering, A.P.; Goser, K. "Architecture of a 64-bit fuzzy inference
processor." Fuzzy Systems, 1994. IEEE World Congress on Computational
Intelligence., Proceedings of the Third IEEE Conference on , 1994 Page(s): 1776
-1780 vol.3.
Title: A VHDL Implementation of a User-Monitored Diabetes Insulin Pump
Abstract:
As Diabetes becomes a more pronounced and widespread ailment in today's society,
the use of more accurate, sophisticated medical technology related to insulin
injection and dosage becomes a necessity. This advanced technology, among
others, takes the form of automated insulin injectors: battery-operated pumps
whose functioning is regulated not only by the user, but also by a programmable
computer chip.
The chip regulating an insulin pump is responsible for managing two insulin
dosage rates, namely the basal rate, which injects minute quantities of insulin
on a regular frequency, and the bolus rate, the user-controlled insulin dosage.
In our case this chip, the intelligence behind the pump apparatus, sends
instructional signals to the reservoir and three pumps, while accurately keeping
track of time through a real-time clock. The pumps are placed at different
locations on the patient's body, in order to ensure adequate distribution.
The chip's inputs are as follows:
Initial inputting of a user's physical parameters (such as weight and age)
Setting of the basal rate of insulin dosage (done once and updated
infrequently) Setting of the bolus rate, dosage request undertaken after meals
or when needed from the following two dosage options:
1.one-time: user specifies the exact insulin dosage to be injected 2.preset:
user chooses the dosage >from a list of predefined quantities
Inputs allowing the user to define presets
The outputs are the quantities of insulin to be injected by each pump at a given
instant.
The VHDL implementation of this chip comprises a structural architecture of the
real-time clock, as well as behavioural code describing the chip's control
module, with structural code linking the two.
References:
1.Davidson, Mayer B. "Diabetes mellitus : diagnosis and treatment ", Saunders,
1998
2.Watkins, Peter J., Diabetes and its management, Blackwell Science, 1996.
3.Canadian Diabetes Association, www.diabetes.ca/
4.MiniMed, Inc., www.minimed.com/
Title: An FPGA Implementation of a MP3 Decoder
Abstract:
MPEG Layer-3, otherwise known as MP3, has generated a phenomenal interest among
Internet users, or at least among those who want to download highly compressed
digital audio files at near-CD quality.
MP3 is a standard for transmitting and recording compressed audio. The MP3
algorithm achieves compression by exploiting the perpetual limitations of the
human ear. The standard defines the decoding process and also the syntax of the
coded bitstream.
The purpose of this project is to design a high performance FPGA implementation
of an MP3 audio decoder using VHDL. Such a device would have a widespread use
in upcoming technologies such as portable MP3 audio players and handheld
computers.
References:
1. S. Hacker, "MP3: The Definitive Guide", 1st Edition, O'Reilly, 2000.
2. P. J. Ashenden, "The Designer's Guide to VHDL", Morgan Kaufmann, 1996.
3. M. Robertson, "Top Ten Things Everyone Should Know About MP3", MP3.com Inc.,
www.mp3.com/news/070.html?lang=eng, 1998.
4. "MPEG Audio Layer-3", Fraunhofer Institut,
www.iis.fhg.de/amm/techinf/layer3/index.html, 1998.
Title: Dynamic Instruction
Scheduling Of A DLX Machine Using Tomasulo Algorithm During past years,
microprocessor speed has been doubled every 18 months. For designers, one way
to archieve this performance gain is to improve the processor's architecture.
Tomasulo's algorithm could be found in most modern processors and it allows
rescheduling of instruction in order to minimize hazards and stalls during the
execution.
There are mainly two goals in this project, the first one is to achieve
correctness of the out of order execution, whereas the second is to optimize the
performance of the CPU by reducing hazards and stalls. The unit will contain a
hazard detection module to handle structual hazards, control hazards, WAW , WAR
and RAW. A specialized register file and register control module will be used
to implement register renaming that eliminates name dependences. Reservation
stations will store dispatched instructions with their tag, data or data
pointer. They will be issue to an execution unit once they are ready. Since the
implementation of execution units (adders, multiplier, memory,etc.) is out of
the scope of this project, test vectors for the common data bus will be used to
simulate executions of the instructions.
References:
1 Hennessy & Patterson, "Computer Architecture: A Quantitative Approach," 2nd
edition, Morgan Kaufmann, 1996.
2 Daniel Krning, "Design and Evaluation of a RISC Processor with a Tomasulo
Scheduler", www-wjp.cs.uni-sb.de/~kroening/tomasulo/diplom/
Title: The Blowfish Encryption Algorithm
"Blowfish" is a symmetric block cipher designed in 1993 by Bruce Scheiner as a
fast and free alternative to existing encryption algorithms such as DES or IDEA.
Unlike DES, whose 56-bit key length is vulnerable to brute-force attacks, the
Blowfish key length is variable from 32 bits to 448 bits (and as such can be
exported per U.S. export laws). Blowfish is unpatended and license-free for
all, and the source code is available in numerous programming languages.
As with DES, a data stream is encrypted using a known "public" key. This data
stream can then only be decrypted using a protected "private" key. Blowfish has
many improvements over DES that make it faster and easier to code.
In this project we will implement the Blowfish algorithm in VHDL. Due to time
constraints and limited space in the FPGA, small keys will have to be used.
References:
1. B. Schneier, "Fast Software Encryption", Cambridge Security Workshop
Proceedings (December 1993), Springer-Verlag, 1994, pp. 191-204
2. B. Schneier, "The Blowfish Encryption Algorithm -- One Year Later", Dr.
Dobb's Journal, September 1995
3. B. Schneier, "Applied Cryptography, Second Edition", John Wiley & Sons, 1996
Title: Fully Pipelined RISC Microcontroller CPU Core in VHDL
Abstract: The PIC12C508 is a popular microcontroller that is fully programmable
so that it can adapt to a specific application. It is small enough to be fitted
into a FPGA and is hence very cost effective. It can effectively be used as part
of a system on chip.
Our goal is to implement the functionality of the PIC12C508 microcontroller and
to meet the suggested operating speed of 4MHz. This microcontroller is
originally targeted towards an ASIC, but we will raise the challenge to obtain
the same performance using a cost effective FPGA.
The PIC12C508 is an 8-bit, fully static, EEPROM/EPROM/ROM-based CMOS
microcontroller. It has a RISC architecture using 33 single word, single cycle
instructions. This family of microcontrollers delivers a higher performance than
the rest of its competitors in the same price range. Additionally, it reduces
the client development time due to the simplicity of its instruction set.
This microcontroller allows the customization of applications ranging from
personal care appliances to low-power remote transmitters-receivers. Its main
characteristics are low cost, low power, high performance, ease of use and I/O
flexibility.
References: "Microchip Technology Inc.: PIC12C508 Device"
www.microchip.com/10/lit/pline/picmicro/families/12c5xx/devices/
12c508/index.htm "PIC12C5XX, 8-Pin, 8-Bit CMOS Microcontrollers" PIC12C508
Datasheet 40139e.PDF Microchip Technology 3/4/1999
Title: A circuit that interprets the results of ECG data to determine patient
risk of arrhythmia.
Abstract:
The electrocardiogram (ECG) is a non-invasive, routine test used by
cardiologists to record the electrical activity of the heart. Abnormalities in
this electrical signature can indicate a variety of heart problems. In a normal
ECG, 12 leads are attached to a patient's chest and back and the electric signal
of the patient's heart rate is recorded as a continuous wave (see Fig1.pdf(3)).
A physician visually interprets the features of the wave. This qualitative
interpretation can miss important features.
Our project will be to develop a digital circuit that monitors and records the
heart rate but also determines, via QT dispersion, whether or not there is a
risk of arrhythmia. The concept of determination is that a heart waveform is
divided into named regions (see Fig2.pdf(3)). The distance between the Q point
ant the T point is an indication of the speed of ventricular recovery time after
each heartbeat. This distance is measured independently by each of the 12 leads
attached to the patient. The QT dispersion is the difference between the maximum
QT interval and the minimum QT interval. It indicates the level of homogeneity
in with regards to ventricular recovery time. QT dispersion is a proven an
indicator of those at risk of sudden death due to arrhythmia. (Fig3.pdf(3)
shows the ECG of a patient with a long QT interval.)
References:
1. Zaret, Barry L., Marvin Moser, Laurence S. Cohen eds. "Yale Heart Book" New
York:Hearst Books:1992.
2. Pye M., A.C Quinn, S.M Cobbe. "QT interval
dispersion: a non-invasive marker of susceptibility to arrhythmia in patients
with sustained ventricular arrhythmias." British Heart Journal 1994; 71:511-514.
3. 12-lead ECG library (online):
homepages.enterprise.net/djenkins/ecghome.html
4. ECG Learning Center (University of Utah School of Medicine, Salt Lake City):
www-medlib.med.utah.edu/kw/ecg/
Title: Blowfish encryption algorithm
Abstract: This article discusses the design and implementation of the Blowfish
encryption algorithm. Blowfish is a symmetric block cipher with the capability
of taking a variable-length key from 32bits to 448bits. Unpatented and
license-free, Blowfish was designed by Bruce Schneier in 1993 as a fast, free
alternative to existing encryption algorithms, making it an ideal encryption
system for both domestic and exportable use.
Index Terms: AES, NIST, DES, block cipher, cryptography, RSA
Reference: www.counterpane.com/bfsverlag.html
www.rsasecurity.com/# csrc.nist.gov/encryption/aes/rijndael/
*NEURO/FUZZY COMPUTING*
Title: Motion Detection using VLSI Neural Arhitecture Abstract:
The project implEments a motion detector using a Neural network strategy. The
goal is build a Multilayer Perceptron: one input layer, one hidden, and one
output layer, and predict the motion of objects. The Neural Net algorithm will
use the technique of updating the weights. Neurons will be built which will be
provided some information about the environment. Based on the information
received the neural net will display a result. We will respond back to the
network telling it whether or not the answer was correct. Depending on the
result produced, an error will be caculated which will simply be the difference
between the target output and the output produced by the neural net. The error
will then be used to update the weights between the input, hidden and output
layers. Also to keep things simple, the neural net will be a fully connected
feed forward. The output will be looked on at a pixel to pixel basis. The
input will contain about 32 neurons to interpret a 256 X 256 pixels image. At
every clock cycle we interpret 16 bits, arranged as a 4 X 4 matrix. The final
solution will be a 64 X 64 pixel image. Each neuron will contain the following:
- a ROM memory, where net weights are stored;
- a sum device and a storage register for the pre-synaptic activation storage;
- a circuit for the activation function coding
Testing will be done using simple objects. There will be a training set to
train the neural net, and there will be a testing set to test the network.
All simulations will be done in VHDL
References: [1]
www.csai.unipa.it/people/alumni/condemi/vhdlweb/neuralchip2/
AACHEN_95_new.html>www.csai.unipa.it/people/alumni/condemi/
vhdlweb/neuralchip2/AACHEN_95_new.html
and 18 other refs!
Title: Elliptic Curve Scalar Multiplier for Use in Cryptosystems
Abstract: Elliptic curve cryptosystems are emerging as an alternative to RSA
encryption and other public-key cryptosystems because of their small key sizes.
This is critical to some applications, including smartcards and handheld
devices. In such a cryptosystem, the basic building block for secret key
exchange, authentification and certification is the elliptic curve scalar
multiplier.
The set of points which satisfies an elliptic curve equation have some useful
properties. In particular, the addition of two points on the curve gives
another point on the curve. This is not addition in the usual sense; it is a
set of calculations involving the coordinates of the points and calculated in
the finite field over which the curve is described. An addition of two elliptic
curve points consist of a series of underlying field additions, squarings,
multiplications and inversions. Elliptic curve scalar multiplication involves a
series of additions and subtractions of that kind.
Our goal is to write a VHDL description of a scalar multiplier based on the
paper in reference [1]. The elliptic curve used in this paper is defined as y^2
+ xy = x^3 + ax^2 + b.
References
[1] Elliptic Curve Scalar Multiplier Design Using FPGAs, Lijun Gao, Sarvesh
Shrivastava and Gerald E. Sobelman, Proceedings, First International Workshop on
Cryptographic Hardware and Embedded Systems, Lecture Notes in Computer Science,
Vol. 1717, Springer, 1999.
[2] Elliptic Curve Public Key Cryptosystems - An Introduction, Erik De Win and
Bart Preneel,
ftp://ftp.esat.kuleuven.ac.be/pub/cosic/dewin/coursetext.ps.gz
[3] Efficient Algorithms for Elliptic Curve Cryptosystems, Jorge Guajardo,
www.ece.wpi.edu/Research/crypt/theses/documents/ms_guajardo.pdf
[4] A Practical Implementation of Elliptic Curve Cryptosystems over GF(p) on a
16-bit Microcomputer, Toshio Hasegawa, Junko Nakajima and Mitsuru Matsui,
www.security.ece.orst.edu/documents/Elliptic/main.pdf
Title: Obstacle-Avoiding Navigational Control Unit
Abstract:
This project implements a wheelchair navigational control unit in VHDL to
provide motion in the direction opposed to any obstacles.
Obstacle Avoidance algorithms, namely the vector field histogram (VFH) and the
vector force field (VFF) methods, will be employed to allow the chair to move
around any barricades. A virtual square grid of a specific area set by the
range of the ultrasonic sensors is established around the wheelchair. The grid
is subdivided into cells. As the chair moves in the direction of an obstacle,
the cells are incremented with certainty values indicating the probability of an
obstacle in that cell. The unit processes the cell values and computes a
direction vector that not only is close to the users intended direction but also
avoids all nearby obstacles.
The navigational control unit is an interesting design project because it has
direct practical applications to aid persons with mental and physical
disabilities. The challenge of this project will be to implement the complex
mathematical computations of the VFH and VFF algorithms and coordinate the
different elements in the navigational unit to provide real-time directional
changes.
The testing of the unit is performed using a matrix representing an environment
scenario with obstacles. The success of the project is proven by the unit's
ability to provide a suitable direction vector that avoids any obstacles in the
path of motion.
Index Terms: Obstacle Avoidance, Vector Force Field, Vector Field Histogram,
Certainty Value.
References:
[1] D.Bell, S. Levine, Y. Koren, L. Jaros, R. Simpson, and J. Borenstein,
"The NavChair Assistive Wheelchair Navigation System" in IEEE Trans. Rehab.
Eng. Vol. 4, pp. 443-451, Dec. 1999.
[2] Y. Koren and J. Borenstein, "The Vector Field Histogram - Fast Obstacle
Avoidance for Mobile Robots" in IEEE Journal of Robotics and Automation, Vol. 7,
No.3, pp. 278-288, June 1999.
Title: Token Ring Network Card
The purpose of this project is to design a media access control device based on
the Token Ring Network protcol. The Token Ring media access method allows
multiple stations (nodes) to share a common bus transmission medium(ie a
network). The nodes are connected from one to another to form a ring and a
specific pattern of data - a token - is used to decide which node can use the
ring at one time. The token is a unique series of bits or frames that
continuously circulate on the ring, passing from one station to another. In
order to transmit data from one node to another, the sender must first seize
the token. Therefore only one node may transmit at any one time. The medium can
be n actual ring or a hub based system. Tokens and data will be transmited in
packets on the medium to be read by other stations. Both the token and the data
will require manchester encoding that will allow for synchronized
communication. This project will take an existing design and optimise it's
speed,minimize the hardware requirements, and modify the design such that it
supports variable length data packets.
References: 1) Page created byProf. Bernd-Peter Paris
-thalia.spec.gmu.edu/~pparis/classes/notes_101/node78.html 2)Other LAN
Technologies -home.ubalt.edu/abento/650/otherlan/sld001.htm 3)Cisco
Systems Inc.
-www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/tokenrng.htm
Title: ATM - PHY Interface UTOPIA Level 2, Version 1.0 compliant 8-bit
Interface, Multi-PHY, Single ATM
Abstract:
This project will implement one variation of the UTOPIA level 2 protocol. The
UTOPIA protocol defines the interface between an ATM network and the physical
layer. The interface cores that we will design will support an ATM (or link)
interface speed of 25 Mhz. Since the speed of the PHY layer is PHY dependant,
and can vary, the data received on the PHY layer will have to go through a rate
adaption FIFO so that it is ready for the physical layer. The ATM layer will be
represented by a "master" core that has transmit and receive capabilities. The
PHY layer will be represented by a "slave" core that will also have transmit and
receive capabilities. Up to 31 slave may be connected to one master. The master
controls the data bus in both directions. The cores will be optimized and tested
for Altera Flex10K devices. Cores of this type are used in industry indside
phone switching networks. The protocol is also becoming a popular data
transmission standard for data transfer between chips, boards, and networks.
References:
The ATM Forum, UTOPIA Level 2, Version 1.0 Specification
ftp://ftp.atmforum.com/pub/approved-specs/af-phy-0039.000.pdf
The ATM Forum, UTOPIA Level 1, Version 2.01 Specification
ftp://ftp.atmforum.com/pub/approved-specs/af-phy-0017.000.pdf
Title: A VHDL Implementation of Triple DES Encryption
Abstract:
In the early 1970s, due to a significant rise in computer activity, it was
decided that a strong cryptographic algorithm was needed to protect
non-classified information. The algorithm needed to be cheap, widely available
and very secure. These requirements were satisfied in 1977 with the release of
DES, the Data Encryption Standard. DES operates on 64 bit data, including
56-bit key with 8-bit parity, input data and initial vector. This algorithm has
been in use now for over 25 years but with a key size of only 56 bits, it is
believed to be vulnerable to attack. A minor variation of this standard and a
step above it, is the Triple DES which enjoys much wider use since it is not so
easy to break. Like DES, data is encrypted and decrypted in 64-bit chunks. It
takes a 192-bit key (24 characters) as input however and breaks it into three
keys. In fact, the Triple DES is simply three rounds of DES. This increases
security but at the cost of increased run time. Each round uses a different
permutation of your plaintext word. The data is encrypted in the first key,
decrypted with the second key and finally encrypted again with the third key.
Presently, Triple DES is an excellent and reliable choice for the security needs
of highly sensitive information.
The purpose of this project is to design and implement a hardware version of the
Triple DES algorithm for the Altera Flex10K series FPGA. This will include a
detailed description of the key features mentioned above which make use of both
permutation and substitution methods. The various modules will be developed
using the VHDL language.
References:
1) Aho, Hopcroft & Ullman. "The Design and Analysis of Computer Algorithms",
Addison-Wesley, 1974.
2) Harry Katzan Jr., "The Standard Data Encryption Algorithm", Petrocelli Books,
Inc. 1977.
3) Jan C. A. Van Der Lubbe, "Basic Methods of Cryptography", Cambridge
University Press, 1994.
4) Schneier, B. "Applied Cryptography" JohnWiley & Sons, 1996.
Title: A VHDL implementation of the Tiny Encryption Algorithm (TEA).
Abstract: Security nowadays has become a very important issue. When we evoke the
term security in the computer world, we think immediately about viruses and ways
to protect ourselves from them, the passwords we need for so many things and so
forth. But we forget that in our modern world we also need to protect our data
from unscrupulous eyes. For that reason encryption exists so that we can "hide"
some of our data. In our project, we will implement the TEA, The Tiny Encryption
Algorithm developed by David Wheeler and Roger Needham at the Computer
Laboratory of Cambridge University.
The Tiny Encryption Algorithm is one of the simplest, fastest and most efficient
cryptographic algorithms in existence because it uses a round function with
simple operations such as addition with exclusive or and shifting. It encrypts
64 data bits at a time using a 128-bit key. Also, the algorithm is optimised for
machines with 32-bit words. For our project, we will implement the TEA in VHDL
so that we will be able to encrypt and decrypt a message.
Included is a pdf file describing the C code of the TEA and a simple diagram
describing our VHDL design.
References: 1. David A. G. Gillies, "The Tiny Encryption Algorithm (TEA)",
vader.brad.ac.uk/tea/tea.shtml
2. Computer Security Group, Computer Laboratory, University of Cambridge, "Cryptographic Algorithms", www.cl.cam.ac.uk/Research/Security/studies/st-alg.html
3. David A. G. Gillies, "TEA Source Code", vader.brad.ac.uk/tea/source.shtml
Title: Distributed Station Network Router - Using a Simplified IEEE Std. 802.5
Packet Protocol and Topology
Abstract:
IBM originally developed the Token Ring network in the 1970s. It is generally a
primary local area network (LAN) technology second only to Ethernet/IEEE 802.3
in LAN popularity.
In this project, we will implement a network router following the Token Ring
Protocol, in the form of a multistation access unit (MSAU). This unit will
perform the scheduling and routing of packets through the use of carrier
detection techniques, packet error detection, as well as address polling and
address recognition. The simplified data packet will consist of both a frame and
control field in addition to the usual data and address fields. A starting and
ending delimiter will be appended to the data packet in order to facilitate
error detection.
A priority scheme will be implemented using a priority field and reservation
bit, supplemental to the actual circulation of an available or reserved token.
Other differences between the original 802.5 protocol and our simplified design
will only extend to the form of the token and data packets. Filed sizes will be
determined in the preliminary phases of the project.
References:
Alberto Leon-Garcia, Indra Widjaja "Communication Networks Fundamental Concepts
and Key Architectures", McGraw-Hill 1999
Dimitri Bertsekas, Robert Gallager, "Data Networks", (2nd edition), Prentice
Hall, 1992, ISBN 0-13-200916-1.
Andrew S. Tanembaum, "Computer Networks" (3rd edition), Prentice Hall, 1996.
ISBN 0-13-349945-6
Title: Hardware Implementation of Convolutional Encoder and Viterbi Decoder
Abstract:
A primary objective of any digital communication system is to transmit
information at the highest possible rate. More importantly, the information
should be received with the minimum distortion of errors. At the transmitter,
data is encoded to protect them up aginst channel noise or other forms of
distortion. The encoder adds redundant bits to the original stream of data.
The additional redunancy allows the encoded data to be decoded at the receiver
by providing clues to decipher this code word. Essentially, the encoder maps the
data stream to a code word and the decoder must map the code word back into the
original message.
The goal of this project is to implement a convolutional encoder (k = 2, n = 5)
and the Viterbi decoder which are found in second-generation as well as third
generation wireless communication systems.
References: 1. Simon Haykin , "Communication Systems", John Wiley & Sons Inc.,
1994 2. G. David Forney, "The Viterbi Algorithm", Proceedings of the IEEE, March
1973
Title: Digital Security controller with video display
Our project implements a digital security controller which can be used in many
environments such as a door, a safety deposit box or a locker. To improve the
liability of The controller, a simple encryption method is used. All data such
as user id and passwords are encrypted and can be decrypted internally. A
storage mechanism is built internally which allows to store up to 10 users'
data. Upon successful login of Administrator, a simple display unit is also
implemented which allows to connect to a monitor or simple display device. The
purpose of the display unit is to display necessary information such as current
user id and password. More features are shown below:
Criteria:
- Administrator Controllability - Password Encryption - Time Stamp - Storage up
to 10 user info - Alarm Equip - Timeout - Limit Trial - Display Ability
Reference:
1.)VHDL code for Digital lock example :
venus.ece.ndsu.nodak.edu/~joeljorg/ee423/notes/
2.)Programmable Devices : www.circellar.com/pastissues/TOCMar2k.htm
3.)Synchronous FSM Design Using VHDL :
www.ee.calpoly.edu/courseware/ee359/cp359x03.html
4.)Digital Combination Lock :
www.redbrick.dcu.ie/academic/CLD/chapter8/chapter08.doc5.html
5.)The real work :
lesbos.esat.kuleuven.ac.be/courses/HJ03/exercises/lock/node10. html
6.)Digital Lock FSM example :
lesbos.esat.kuleuven.ac.be/courses/HJ03/exercises/lock/node10. html
7.)Rapid Prototyping of Digital System (A tutorial approach) by Jamar O.Hambien
and Michael D.Furman
8.)VHDL for Logic Synthesis by Andrew Rushton
Back to top
Winter 2002
Title: VHDL Implementation of RSA Cryptosystem Using Montgomery Algorithm
Abstract: Due to the large amount of sensitive data that is transmitted across
the Internet, cryptography has undoubtedly become one of the most important
sectors of research and development. The RSA (Rivest, Shamir, and Adleman)
public key encryption algorithm remains one of the most popular such schemes,
due to its strength and applicability to both messages and digital signatures.
However, the nature of the algorithm requires some extensive computation,
meaning it can benefit greatly from an efficient implementation geared for
minimized latency. In addition, a faster implementation would make it suitable
for new applications, such as real-time systems. In this paper, we will
investigate the RSA cryptosystem by implementing it in VHDL, intended for the
Altera Flex 10K series FPGA. The Montgomery Algorithm for modular multiplication
is well suited to the RSA algorithm, and using this as part of our
implementation, we can increase the system's speed significantly. Since this
algorithm is asymmetric (decryption is not simply the reverse of the
encryption), we will implement a circuit capable of both encryption and
decryption.
References:
[1] M. Shand, J. Vuillemin, "Fast Implementations of RSA Cryptography", Proc.
IEEE 11th Symposium
on Computer Arithmetic, 1993. [2] J.H. Gwo, C.L. Wang, H.C. Hu, "Design and
Implementation of an RSA Public-Key Cryptosystem", Proceedings of the 1999 IEEE
International Symposium on Circuits and Systems, Volume 1, 1999.
[3] P.A. Wang, W.C. Tsai, C.B. Shung, "New VLSI architectures of RSA public-key
cryptosystem", Proceedings of the 1997 IEEE International Symposium on Circuits
and Systems, Volume 3, 1997.
[4] A. G. Konheirm, "Cryptography: A Primer" John Wiley & Sons Inc., 1981.
Title: Hardware implementation for the Svoboda Algorithm (BOOLEAN ANALYZER).
Abstract: Svoboda method is one of the important methods used for logic
minimization in electrical engineering. There exists software that is used to
find min-terms using the Svoboda algorithm for minimization. The time taken by
the software to find the min-terms increase exponentially with the number of
inputted variables. For example, an 8 variable min-term takes between 15-20 min
to be processed using the software; a 16 variables input will take at least 1
hour to get the output. The Hardware on the other hand takes less time to
process the min-terms. In this project will be implementing the Boolean Analyzer
in hardware using VHDL. The circuit will take an n-input, in our case n=8, then
it will store them in the memory to be processed through the formal divisor and
the formal multiplier, and then the final input will be stored back in theme
memory. This implementation was done before. In the old design, the circuits
were not synchronized together and the design required a lot of hardware to
implement. Our task now is to improve the old implementations by introducing a
better design that will control and synchronize the circuit and the goal also is
to reduce hardware consumption.
References:
Bhasker,J. Upper Saddle River, N.J.: PTR Prentice Hall, 19992-
Stephen D. Brown, Zvonko G. Vranesic., Fundamentals of digital
Title: ATM Switch Design
Abstract: Asynchronous Transfer Mode (ATM) has emerged as the promising
technology for broadband communication including data, audio and video transfer.
In this project, an ATM switch is implemented using an Altera FPGA. The chip
will decipher addressing information, check the priorities of incoming packets,
and route packets to their corresponding destinations. Each packet is also
checked for its communication type, which determines the order in which each
packet will be re-transmitted. The goal was to implement a working ATM switch
model that can handle packet routing and is scalable for use over many ports.
This will be tested by using a host module which will send both addressing
information, as well as data packets, to the switch. The working switch should
determine the destination address, and route the packet to the appropriate port.
The host will then be used to ensure that the packet was routed to the correct
output port.
References:
[1] A. Garcia-Leon, Communication Networks. New York, NY: McGraw-Hill Companies,
Inc., 2000.
[2] L. Peterson, Computer Networks, a Systems Approach. San Francisco, CA:
Morgan Kaufmann, 2000.
[3] A. Tanenbaum, Computer Networks. Upper Saddle River, NJ: Prentice Hall.
1996. logic with VHDL design McGraw-Hill, c2000.
Title: A Real Time Implementation of an ATM
Abstract: Automatic teller machines have become an important part of the
consumer's everyday life. The mere concept of an automatic teller machine raises
many different design issues, including data-basing, security, reliability,
encryption, and user friendliness. Our project will attempt to implement a
functional automatic teller machine. To simplify the design, our machine will
implement the following functions: withdraw deposit account balance We will
concentrate our design on processing the information inputted by the user:
verifying whether the pin number is a valid one and that it matches the card
inserted, determining the tasks to be completed, verifying whether the task can
be completed( enough funds in the client's account), re-calculating the new
balance, returning the money (if required) The user interface will be simplified
so as to not to take away from the focus of our design (as mentioned above). We
will use a simplified SSL (secure socket layer) type encryption system to
provide secure information, and the overall network will consist of three parts:
the ATM machine, the host computer, and the bank computer. The host computer
will be responsible for routing the client file to the right bank computer.
References:
1. Bowen, J. (1998). How ATMs Work. How Stuff Works.
http://www.howstuffworks.com/atm1.htm (2002, February 14).
2. (2000). Secure Socket Layer. Netscape.
home.netscape.com/security/techbriefs/ssl.html (2002, February 16).
3. Dion, B., Dissoubray, S. (2000). Modeling and Implementing Critical Real Time
Systems with Esterel Studio. Esterel Technologies.
www.esterel-technologies.com/esterel/article1.pdf> (2002, February 15).
4. Tyson, J. (1998). How Encryption Works. How Stuff Works.
http://www.esterel-technologies.com/esterel/article1.pdf (2002, February 17).
Title: HARDWARE IMPLEMENTATION OF THE BRESENHAM LINE AND CIRCLE ALGORITHMS
Performance of most Graphics systems can be traced down to issues related to
drawing of primitives (line, circles etc.) and polygon filling routines. Today's
high quality video systems place big performance demands on their graphics
packages. In order to increase rendering speeds, more and more hardware
processing power is being added to the Graphical Processor Unit through the
implementation of software routines into hardware. The Bresenham Line and Cricle
drawing algorithms are amongst the most widely used routines in Graphics
packages for their simplicity and efficiency. One of their main features is the
fact that they rely solely on integer arithmetic avoiding the use of time and
resource consuming floating point operations. In this project, we will explore
the speed difference between hardware and software implementations by comparing
the performance of the corresponding C and VHDL routines.
REFERENCES
1. Macvicar & Singh, "Accelerating DTP with Reconfigurable Computing Engines",
http://www.xilinx.com/labs/satnam/macvicar-singh-fpl98-final.pdf
2. Hearn & Baker "Computer Graphics: C Version", 2nd Edition, Prentice-Hall
1994.
3. Wright, W.E., "Parallelization of Bresenham's line and circle algorithms",
IEEE Computer Graphics and Applications, Volume:10 Issue:5, Sept. 1990, Page(s):
60 -67
Title: The Solitaire Encryption Algorithm.
Abstract: The Solitaire Encryption Algorithm was developed by Bruce Schneier. It
provides a high level of cryptographic strength since it relies upon the secrecy
of a key, rather than the secrecy of the algorithm itself. The key for a message
is the initial arrangement of a deck of cards. A keystream, a stream of numbers
between 1 and 26, is generated using one of the 54! possible permutations of the
deck. This algorithm was initially described in Neal Stephenson's Cryptonomicon
so that it could be used by secret field agents that only had access to a deck
of cards. The algorithm assigns values to cards of a deck as the numbers 1-52,
with two Jokers as 53-54. Letters of the alphabet are numbered 1-26. Using the
arrangement of the cards, a specific multi-step process of reassembling cards
and performing cuts to the deck, generates the keystream. Adding a keystream
number to a letter number, modulo 26, enciphers the letter. Solitaire is
reversible only if the key, which was used to encrypt the message, is known. The
goal of this project is to implement Solitaire using VHDL in order to be able to
successfully encrypt short messages. References:
[1] B. Schneier, "Applied Cryptography, Second Edition", John Wiley & Sons,
1996.
[2] D. Kahn, "The Codebreakers; The Comprehensive History of Secret
Communication from Ancient Times to the Internet", Scribner, 1996.
[3] D. R. Stinson, "Cryptography : Theory and Practice", CRC Press, 1995.
[4] N. Stephenson, "Cryptonomicon", Avon Books, 1999.
Title: Implementation of the Geometry Stage of the Graphics Pipeline
Abstract: The basic architecture of the graphics rendering pipeline is composed
of three main stages. The application stage is executed entirely through
software development, consisting mainly of animations, collision detections and
other higher-level operations. The geometry stage, which can be implemented
through hardware, is responsible for geometric transformations, lighting,
projections and clipping. "this stage computes what is to be drawn, how it
should be drawn, and where it should be drawn."[2] The final stage is the
rasterizer, responsible for the correct rendering of images into pixels onto the
screen. This project will focus on the implementation of the second stage of
this graphics pipeline. The geometry stage will be broken down into three
sub-stages, namely transformations, clipping, and viewport projections. In
real-time graphics, the simulation of real lighting conditions is not allotted
very much time and is thus implemented in the rasterizer section using light
maps. Further, the models used are composed of triangles, since these are the
geometric primitives employed by most graphics hardware. The primary goal will
be to establish a hardware architecture that will optimize for maximum
throughput, by heavily pipelining the design. The coordinate output from this
stage will be composed of the proper coordinates, ready to be passed on to the
final stage.
References:
[1] Blinn, Jim, A Trip Down The Graphics Pipeline, Morgan Kaufmann, California,
1996.
[2] Haines, Eric, Moller, Tomas, Real-Time Rendering, A K Peters, Massachusetts,
1999.
[3] Hearn, Donald, Baker, M. Pauline, Computer Graphics, C Version (2nd
edition), Prentice-Hall, 1994.
Title: The Wallace Tree Multiplier
Abstract: Integrated circuit design continuously evolves to improve speed of
execution, area on chip, power dissipation and other factors. One operation that
is common to a large number of applications and that can be implemented in a
number of different ways is multiplication. This project examines one of the
best algorithms for multiplication, the Wallace tree multiplier. The natural way
of performing multiplication is by generating partial sums, shifting them, and
adding them up. A Wallace tree is a tree of full adder layers that takes in bits
of the same weight (aligned bits) and adds them up. Each layer reduces 3 input
bits into 2 output bits. After as many stages as necessary, every tree is left
with a carry out and sum bit. Those are combined to yield the result. In this
project, we design a Wallace tree multiplier in VHDL. The inputs will be two
unsigned 32-bit integers. We will report the latency of several multiplications
and will analyse them to confirm the efficiency of the algorithm.
References:
1. H. Schmit, "The Wallace Tree",
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15828-s98/lectures/012
6/sld002.htm, 1998.
2. S. Warde, "Unsigned Multiplication",
http://www.unf.edu/~swarde/Execution_Units/Unsigned_Multiplication/unsigned_
multiplication.html, 1999.
3. D. Wilde, "Tree and Array Multipliers",
http://www.ee.byu.edu/ee/class/ee621/Lectures/L11.PDF, 2002.
Title: RNA Sequence to Amino Acid Chain Translation Unit
Abstract: Bioinformatics is an exciting field in which multiple disciplines
combine to form a bridge between the traditional sciences and emerging computer
technology. Proteins, large organic molecules, are directly involved in the
chemical processes essential for life.1 Proteins are composed of amino acid
subunits, which RNA encodes for. This project will implement in hardware an
efficient algorithm for mapping an RNA sequence of length n to all possible
amino acid chains, based on known base pair codons. 2 As the RNA to amino acid
mapping is only one portion of the genomic process3, this project can be
integrated into a larger design. The serial input of an RNA sequence will be
parsed to determine all possible amino acid sequences. The output will include
some termination marker (as yet undetermined) to denote distinct amino acid
sequences. The sequence will be stored in memory and will be processed by an
FSM. Pointers to the identified sub-sequences will be stored in the output stage
of the design; when the sequence has been fully processed, all identified
sequences will be output serially, in order of discovery. Because Altera MaxPlus
II will not be able to meet the requirements for simulation of our design, the
project will be implemented using ModelSim or Altera Quartus. The results will
be verified using existing web-based genetic software tools.
References:
1. Dalton, Mark. Codon Table.
http://www.cbc.umn.edu/~mwd/cell_www/chapter2/codon_table.html
2. Protein. Encyclopedia Britannica.
http://www.britannica.com/eb/article?eu=119721&tocid=0&query=protein
3. Translate Tool. ExPASy Molecular Biology Server.
http://ca.expasy.org/tools/dna.html
4. Canadian Bioinformatics Resource.
http://www.cbr.nrc.ca/newdocs/home/home.html?cbrlang=eng+navbar=/newdocs/home/ 1
2 Amino Acid Codons. http://www.bmrb.wisc.edu/referenc/codons.html
3 DNA is transcribed to RNA. RNA is translated to amino acid sequences, which
are assembled into proteins.
Title: A Simple FPGA implementation of an Elliptic Curve Cryptosystem.
Abstract: Cryptography is a necessary feature of modern computing, as more and
more transactions are being carried out everyday which require high-end security
over a possibly transparent medium such as the Internet. Contemporary
cryptography is based on the use of public key cryptosystems, which greatly
simplify the problem of establishing a secure communication between two users,
without the need to exchange secret private keys. One recently emerging public
key cryptosystem is based on Elliptical Curve Cryptography (ECC). ECC is
particularly well suited for hardware implementations on FPGAs due to its
smaller key size as opposed to other cryptosystems such as RSA (Rivest, Shamir,
Adelman). In fact it has been said to offer more security per bit then any other
known public key cryptosystem. Our project will tackle the design of an ECC
based cryptosystem in VHDL that can be implemented in regular Altera (Flex10K)
FPGAs.
References:
1. A.J. Menezes, "Elliptic Curve Public Key Cryptosystems.", Kluwer Academic
Publishers, 1993.
2. I. Blake, G. Seroussi, and N. Smart. "Elliptic Curves in Cryptography.",
Cambridge University Press, 1999.
3. M. Rosner, Elliptic Curve Cryptosystems on Reconfigurable Hardware, Master's
Thesis, Worcester Polytechnic Institute, May 1998.
4. K. H. Leung, K. W. Ma, W. K. Wong, and P. H. W. Leong., "FPGA Implementation
of a Microcoded Elliptic Curve Cryptographic Processor.", Proceedings of
Field-Programmable Custom Computing Machines (FCCM'00). (pg. 68-76), 2000.
Title: New AES (Rijndael) FPGA Implementation
Abstract: This project will design and implement the AES (Rijndael) encryption
algorithm. Recently chosen as the National Institute of Standards and
Technology's (NIST) official Advanced Encryption Algorithm (AES), the Rijndael
algorithm is now widely used by US Government and other organizations for data
security. Rijndael is a symmetric block cipher with variable block length and
key length ranging up to 256-bits each. AES is license-free, with
implementations in several languages widely available. The goal of the project
will be to produce a functional implementation of Rijndael in VHDL. The focus of
the development will be correctness, thus our results will be compared to
results generated by the NIST's official C implementation. The implementation
will be design for use with relatively small blocks and keys as proof of concept
given the time constraints of the project.
References:
J. Daemen and V. Rijmen, "AES Proposal: Rijndael,"
http://csrc.nist.gov/encryption/aes/rijndael/Rijndael.pdf NIST AES Homepage
http://csrc.nist.gov/encryption/aes/rijndael/ Rijndael Homepage
http://www.esat.kuleuven.ac.be/~rijmen/rijndael/ http://rijndael.com/
Title: Tomasulo's Algorithm for Dynamic Instruction Scheduling
Abstract: Modern processors use pipelining to increase throughput. Nevertheless,
this can introduce hazards and stalls during the processor's execution. In order
to achieve better performance, Tomasulo's algorithm can be used. This algorithm
allows the system to reschedule instructions dynamically. This project will
simulate the implementation of Tomasulo' algorithm in a DLX machine. The main
goals are to obtain correct outputs and to achieve better CPU performance. The
latter requires that hazards and stalls be reduced. Also the instruction
execution must continue in the presence of hazards and stalls. In order to
handle structural, control, WAW, WAR and RAW hazards, a hazard detection unit
will be included in the system architecture. Dispatched instructions will be
stored, along with their tag, data or data pointer, to reservation stations.
Once they are ready, they will be issued to an execution unit. To eliminate name
dependencies, a specialized register file and register control unit will be used
to implement register renaming. Finally, LPM modules will be used to implement
multiplication, addition, shifting, division and other operations. R eferences:
1. Hennessy & Patterson, "Computer Architecture: A Quantitative Approach," 2nd
edition, Morgan Kaufmann, 1996.
Title:! Cryptography and the RC2 algorithm
Abstract: Over the recent years, cryptography has played an increasingly
important role in both research areas and in actual implementation as a way to
provide a secure method of exchanging information over communication channels.!
In fact, the Advanced Encryption Standard (AES), which will become effective May
26th, 2002, has sparked great interest in the cryptographic community.
Cryptography, in its most simple form, can be viewed as a method of manipulating
data in order to conceal its meaning.! The process of ciphering and deciphering
may take on many forms and for the purpose of this project, the RC2 algorithm,
developed in 1987 by cryptography pioneer Ron Rivest, will be implemented in
hardware. RC2 is in widespread use in a number of commercial software packages,
including Lotus Notes, Microsoft Windows, Internet Explorer and Netscape
Communication's Navigator and Communicator. The RC2 algorithm is a secret-key
block encryption with input and output block sizes of 64 bits each and with a
variable key size, from one byte up to 128 bytes. RC2 encryption outperforms
that of the DES and is much easier to implement. In this project, the RC2
encryption/decryption algorithm will be implemented in VHDL and it will be
simulated on a FLEX 10K device.! Given the flexibility of the algorithm to use
small keys and the restricted memory of the FPGA, an eight byte key will be
used.! References:
1. RSA Security Inc., "FAQ" http://www.rsasecurity.com/rsalabs/faq/3-6-2.html
2. S. Vaudenay, "Fast software encryption: 5th international workshop, FSE 98,
Paris, France, March 23-25, 1998: proceedings",Springer, 1998.
3. R. Housley, "Triple-DES and RC2 Key Wrapping" !!
http://www.networksorcery.com/enp/rfc/rfc3217.txt
4. R. Rivest, "A Description of the RC2 Encryption Algorithm" !!
http://www.sevillaonline.com/ActiveX/RFC2268.txt
Title: Optical Character Recognition Using Neural Networks
Optical character recognition (OCR) refers to a process whereby printed
documents are transformed into ASCII files for the purpose of compact storage,
editing, fast retrieval, and other file manipulations through the use of a
computer. The objective of this project is to perform a VHDL implementation that
will enable us to recognize pre-determined letters in dot matrix form. The
system shall accept as input a single character in dot-matrix form. The presence
of basic line patterns in the image will be determined and passed on to a feed
forward fully connected neural network, which will attempt to recognize the
letter. The main tool that will be used to accomplish this task is
Backpropagation Neural Networks. There will be a training set to train the
network. This generation of the training set will be accomplished with the help
of MATLAB. An attempt will be also be made to recognize imperfectly drawn, noisy
or deformed letters to a fair amount of accuracy.
References:
1. Donald R. Tveter, The Pattern Recognition Basis of Artificial Intelligence,
IEEE Computer Society, 1998.
2. Donald R. Tveter, The Backprop Algorithm,
http://www.cim.mcgill.ca/~jer/courses/ai/readings/backprop.pdf
3. Carl G. Looney, Pattern Recognition Using Neural Networks, Oxford University
Press, 1997.
4. Christopher Bishop, Neural Networks for Pattern Recognition, Oxford
University Press, 1995.
5. Jason Tiscione, Optical Character Recognition Using Neural Networks in Java,
http://www.geocities.com/SiliconValley/2548/ochre.html
6. The Mathworks, Inc., Neural Network Toolbox,
http://www.mathworks.com/access/helpdesk/help/pdf_doc/nnet/nnet.pdf 7. Warren S.
Sarle, Neural Network FAQ, ftp://ftp.sas.com/pub/neural/FAQ.html
Title : FPGA Implememntation of a Character Recognition device
Character input recognition is the natural way to interface between humans and
the ever expanding computer universe. The current character input technologies
are slow, inconvenient and error-prone. Several designs of character recognition
hardware have been implemented and they are characterized by their differing
algorithms, reliability, hardware efficiency and processing speed. In this
project we will implement a character recognition unit. Some of the techniques
of recognizing characters include Fuzzy Logic and Hidden Markov Models. We,
however choose not to use these methodologies due to the fact that they require
input buffering, and are thus non-real time input methodologies. Since real-time
recognition is more of a requiremnet in today's computing world, we decided to
use Tangent Distance method. In this project, we will undertake the design of a
character recognition unit based on an FPGA based system. A character is to be
written on a 16 by 16 bit matrix input. This character will be transformed into
a number of slopes, fed in sequentially to a recognizer (ROM). Though the
character recognition has numerous applications, our main purpose behind its
design is to facilitate the character input into modern PDAs. Thus, users can
simply write out word documents and short memos without the need of a keyboard.
To test our design we will generate test vectors from hand written replicas and
feed these vectors to our unit. Above all, we will focus mainly on the accuracy
and reliability of our system.
References:
1)Patrice Y. Simard, Yann A. Le Cun, "Transformation Invariance in Pattern
recognition - Tangent Distance and tangent propagation".
http://research.microsoft.com/~patrice/PDF/tricks.pdf
2) A. Kosmala, D. Willett, G. Rigoll "A new hybrid approach to large vocabulary
cursive handwriting recognition". IEEE online Library, catalog number: 98EX170,
volume number 2, August 1998
3) IEE Colloquium on 'Handwriting and Pen-Based Input' (Digest No.1994/065).
Total Pages: 28 Accession Number: 4682435
4) Hurst, W.; Jie Yang; Waibel, A. 'Error repair in human handwriting: an
intelligent user interface for automatic online handwriting recognition'. IEEE
Catalog Number: 98EX174
Title: Scan Conversion: A hardware implementation of a 2-d Line-drawing unit.
Abstract: Line drawing is at the core of all computer graphics applications. It
is the building block of computer graphics from which polygons and more complex
shapes are made. Scan conversion is the conversion of a geometric figure (line,
circle, ellipse, font, etc) into the set of pixels that best represent it on a
screen. It is typically hardware accelerated on all modern graphics subsystems,
and constitutes a very important step in the graphics pipeline. For this
project, we will synthesize a low-latency line-drawing unit on an FPGA using
VHDL. The unit will accept 4 16-bit inputs, indicating the unisgned coordinates
of the line's beginning and end (x1, y1, x2, y2). The unit will generate one
pair of coordinates per clock cycle to describe the line, until the line has
been completely "drawn". Having such a large range of coordinates allows this
unit to be used on screens of very high resolutions. Many high-end graphics
subsystems also work internally on a resolution 4 times bigger than the screen
resolution then average out the color of 4 internal pixels to yield one screen
pixel. This is used to anti-alias the image. Our unit will therefore be able to
handle this situation.
References:
1. Hearn, Baker "Computer Graphics C version" Second Edition, Prentice-Hall 1994
2. McAllister, David K. "Line Drawing Algorithms",
http://www.cs.unc.edu/~davemc/Class/136/Lecture9/Lines.html , September 1998.
3. Foley, James D. and Van Dam, Andries, "Computer Graphics: Principles and
Practice", Second Edition (C version), Addison-Wesley, 1996.
Title: The Blowfish Algorithm
"The cryptographic community needs to provide the world with a new encryption
standard. DES, the workhorse encryption algorithm for the past fifteen years, is
nearing the end of its useful life. Its 56-bit key size is vulnerable to a
brute-force attack, and recent advances in differential cryptanalysis and linear
cryptanalysis indicate that DES is vulnerable to other attacks as well." Quote
from author Bruce Schneier from his book "Fast Software Encryption" Security is
of pivotal importance in a world where more and more services are being offered
online. Since the early days of computers and internet, there have been hackers
who use whatever methods required to decipher sensitive data for benevolent or
more serious purposes. It is for this reason we have selected the Blowfish
Algorithm as our project design: to better understand ciphering. This algorithm
is of variable-security size block cipher that is based on a Fiestel Network
implementation. It is fast, compact and simple according to it's creator Bruce
Schneier. The reason for choosing this algorithm is because according to many
sources, by using a non-invertible function as apart of it's algorithm, Blowfish
"is perhaps one of the most secure algorithms available. There are no known
attacks against Blowfish." (according to www.kremlinencrypt.com) Our project
will implement this algorithm structurally using VHDL code, with design
decisions (speed vs hardware, optimized for hardware, latency or throughput) to
be decided as we proceed. The algorithm belongs to the public domain. A diagram
depicting this algorithm has also been attached and a copy of the C code for
this algorithm as written by Schneier himself is available at:
http://www.counterpane.com/blowfish-download.html
References
Counterplane Labs: The Blowfish Algorithm
http://www.counterpane.com/blowfish.html
Description of a New Variable-Length Key, 64-Bit Block Cipher (Blowfish)
http://www.counterpane.com/bfsverlag.html
Concepts of Cryptography : An Introduction
http://www.kremlinencrypt.com/crypto/concepts.html
Concepts of Cryptography: Algorithms
http://www.kremlinencrypt.com/crypto/algorithms.html
B. Schneier, "Applied Cryptography, Second Edition", John Wiley & Sons,1996
B. Schneier, "The Twofish Encryption Algorithm", John Wiley & Sons, 1999
Title: Pipelined FPGA implementation of the 2-D Discrete Cosine Transform
Abstract: The 2-D Discrete Cosine Transform (DCT) is at the core of JPEG, MPEG
and HDTV multimedia compression. It is similar to the DFT in that it provides a
mapping from the spatial domain to the frequency domain. This project aims to
efficiently implement the 8x8 DCT with maximal throughput. The design is based
on the row-column decomposition method, and it consists of three optimally
pipelined stages. In the first stage, the 1-D DCT of each input row is computed.
The second stage provides a matrix transposition buffer. And the final stage
generates the 1-D DCT of the resulting columns. The system's inputs and outputs
are defined as follows: Inputs clk - Clock row_in[] - Input row: 8 8-bit
unsigned integers Outputs col_out[] - Output column: 8 11-bit signed DCT
coefficients When the pipeline is full, the circuit will accept a new row and
output a new column at every clock cycle.
References:
[1] Pipelined fast 2-D-DCT architecture for JPEG image compression, Agostini,
L.V.; Silva, I.S.; Bampi, S., Integrated Circuits and Systems Design, 2001, 14th
Symposium on., 2001 Page(s): 226 -231
[2] Efficient realization of FPGA-based two dimensional discrete cosine
transform Helal, H.I.; Salama, A.E.; Mashali, S. ASIC/SOC Conference, 1999.
Proceedings. Twelfth Annual IEEE International , 1999 Page(s): 186 -190
[3] Minimum Multiplicative Complexity Implementation of the 2-D DCT using Xylinx
FPGAS, Dick C. Proc. of SPIE's Photonics East '98, Configurable Computing:
Technology and Applications, Boston, MA, USA pp. 190-201, November 1-6 1998
Title: Associative Memory Array Processor for Character Recognition
Abstract: Parallel processing can play a key role in speeding up image
processing computations. An "Associative Processing System" comprises of
parallel 'content-addressable' memory cells. In this project, we design and
implement an Associative Memory Array Processor (AMAP) that can be used for
character recognition. We use the "bit-parallel organization" form of
associative memory. !!! !!! Our system will comprise of an input unit, an output
unit, an AMAP and a central control unit. As inputs, we will consider 8x8 bit
representations of characters. The AMAP will comprise of n slices (one for each
character), each having 8x8 identical processing elements. The operation of each
processing element can be divided into two processes: the "learning process" and
the "recognition process". During the learning process, different typefaces of
the same character are fed into the system. In the recognition process, a
decision is taken as to whether the given inputpattern matches any of the AMAP
patterns, and if so, a decoder is used to output the equivalent ASCII code. The
AMAP uses special comparison logic to access stored data in parallel according
to its contents, and it employs an algorithm !and architecture that are simple
as compared to neural networks. Please note that though we would like to allow
for recognizing the characters A-Z and 0-9, considering the time required for
the learning process, and also the amount of memory required, we will implement
our design such that it will allow only for the recognition of the characters
0-9.
References:
1. Nabeel El-Nadi, Gamal M. Aly, Zaki T. Fayed and H. M. Faheem, "AMAP for
typecasting," IEEE Potentials, Volume: 20 Issue: 2, April-May 2001, Page(s): 28
-30
2. L. Chisvin and! R.J. Duckworth, "Content-addressable and associative memory:
alternatives to the ubiquitous RAM," IEEE, Computer, Volume: 22, Issue:7, July
1989
Title: Implementing 2-D Median Filter in VHDL
The two dimensional (2-D) median filter is a non-linear image enhancement
technique that effectively removes salt & pepper noise (shot/impulse noise) from
grayscale intensity images. With this technique, it replaces each pixel by the
median of the gray levels in a neighbourhood of that pixel. The pixel block size
to determine the medians are 8x8. The filter is designed to operate on images
with 8-bit per pixel. The objective of the project is to write a VHDL
description of the median filter implemented on Alteras 10K10 chip. The 2-D
filter is built using a 1-D filter (constructed by a compare/swap algorithm),
memory units and transposing to create a 2-D application. Testing and
verification of the design will be done on several 400 x 400 images using
Matlab.
References:
Palaniswamy, Krishna J., Rizkalla, Maher E., Sinha, Akhouri C., El-Sharkawy,
Mohamed, and Slama, Paul, VHDL Implementation of 2-D median filter, IEEE 1999
Fisher, Bob, Perkins, Simon, Walker, Ashley, and Wolfart, Erik, Median Filter,
University of Edingurgh, UK, 1994 Available on URL
http://www.cee.hw.ac.uk/hipr/html/median.html
MATLAB Image Processing Toolbox, The Math Works Inc., 1997
Title: A Hardware Implementation of the Private-Key Cryptosystem RC6
Abstract: Created for RSA Data Security (now RSA Security), RC6 is a private-key
cryptosystem that was created as an evolutionary step after the RC5 algorithm,
which is still known as a very effective encryption system. It implements a
block cipher algorithm similar to that of RC5 and has been considered as a
candidate for the basis for the Advanced Encryption Standard (AES) and a
replacement for the aging DES system. It is a fairly simple cipher consisting of
generating a key component, then using the key to perform encryption and
decryption functions. RC6 excels in certain areas such as security and
performance. However, it was considered somewhat cumbersome to be implemented in
hardware, due to the algorithms involved, though it is possible. The goal of
this project is to implement this system in hardware, by simulating it in VHDL,
and analyzing its performance compared to software as to determine performance
difference and the applicability of RC6 as a hardware implemented cryptosystem.
References:
1. RSA Laboratories. "RC6 Block Cipher." http://www.rsasecurity.com/rsalabs/rc6/
2. Abdul Ragab, Nabil Ismael, and Osama Allah. "Enhancements and Implementation
of RC6 Block Cipher for Data
Security." http://www.ieee.org
3. Scott Contini, Ronald L. Rivest, M.J.B. Robshaw, and Yiqun Lisa Yin. "The
Security of the RC6 Block Cipher."
http://www.rsasecurity.com/rsalabs/rc6/security.pdf
Title: An FPGA Implementation of the Cyclic Redundancy Check (CRC) Algorithm
Abstract: Transmission errors are rare on the digital segments of wired
communication links but they are common on the analog segments - the so-called
"last mile". In addition, wireless communication, with its typically high error
rates, is becoming commonplace. It is important to detect transmission errors so
that remedial action may be taken. High-speed detection is necessary to
accommodate increasing data transmission speeds, hence the need for a hardware
implementation. In this paper, we present an FPGA implementation of the Cyclic
Redundancy Check (CRC) algorithm. Also known as the polynomial code, this error
detection method is in widespread use due to its effectiveness and efficient use
of error detection overhead bits. Bit strings are treated as representations of
polynomials: the bit string is the coefficient list and a bit's position within
the string indicates the power. The CRC algorithm is based on binary long
division and uses a fixed generator polynomial mutually agreed upon by the
transmitter and receiver. Our design, implemented in VHDL using Altera Max+Plus
II software, has two modes of operation: Calculate (for the transmitter) and
Check (for the receiver). In Calculate mode, the circuit takes two inputs: a
message (arbitrary data) and a generator polynomial, and uses them to calculate
the CRC. In Check mode, the inputs are the message, the CRC calculated
previously and the generator polynomial; the circuit checks the CRC and signals
the result via an Errors/No Errors output.
References:
1. Tanenbaum, Andrew S., "Computer Networks", Third Edition, Prentice Hall PTR,
1996.
2. Leon-Garcia, Alberto and Widjaja, Indra, "Communication Networks -
Fundamental Concepts and Key Architectures", McGraw-Hill Higher Education, 2000.
3. Brown, Steven and Vranesic, Zvonko, "Fundamentals of Digital Logic with VHDL
Design", McGraw-Hill Higher Education, 2000.
Title: Artificial Neural Network for Pattern Recognition
Abstract: Artificial Neural Network is a system modeled on the human brain. A
neural network attempts to simulate the layers of simple processing elements
called neurons. Each neuron is linked to other neurons with varying coefficients
of connectivity (weights) that represent the strengths of these connections.
Learning is accomplished by adjusting these strengths to cause the overall
network to produce the appropriate output. The most successful applications of
neural network systems are in pattern categorizing and pattern recognition. Such
systems can be used to identify an entity such as an illness or a chemical
compound. Another application of neural networks is character recognition and
handwriting recognition. This application is very useful in banking where it is
crucial to correctly read and recognize handwriting. In this project we will
attempt to implement a neural network that recognizes characters and numbers,
using simple logic gates. The computation performed in a neural network can
require a considerable amount of hardware. For this reason we will use
stochastic pulse sequences, and probabilities to represent the inputs to the
system and the weights of different connections.
References:
1. Pao Yoh-Han, "Adaptive Pattern Recognition and Neural Networks", Reading
mass: Addison-Wesley, 1989.
2. Haykin Simon, "Neural Networks: A comprehensive foundation", Upper Saddle
River, N.J., 1999.
3. Sriram, Ram D., "Intelligent systems for Engineering: A knowledge-based
approach", London, New York, 1997.
4. Jeffrey A. Dickson, Robert D. McLeod, Howard C. Card, "Stochastic Arithmetic
Implementations of Neural Networks", CCVLSI, 1992.
Name: Nguyen Minh Tu , Vichetr Lim.
Title: Token Ring
The purpose of this project is to implement a Token Ring. IBM originally did
this token ring. In this project we will implement a simplified version of the
token ring. The idea is that multiple stations share a common bus. Whichever
station possessed to token has the right to transmit. If a station receiving the
token has nothing to send, it has to pass the token to the next end station.
Else the station seized the token and sends the information. While the package
(information) is sent, no token is on the network. Thus no collision occurs. New
token will be released, once the information is successfully sent to its
destination. Each station has a limited amount of time to hold the token. This
project will be implemented from scratch. Therefore, it will focus on the
correctness, not the speed nor hardware optimization.
Reference:
Alberto Leon-Garcia, Indra Widjaja communication Networks Fundamental Concepts
and Key Architecture
http://www.rad.com/networks/1997/nettut/token_ring.html
http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/tokenrng.htm
http://www.webopedia.com/TERM/T/token_ring_network.html
Title:VHDL Implementation of the Go-Back-N ARQ Protocol
Automatic Request(ARQ) protocols provide reliable transfer of information over a
connection-oriented network or a data link. These protocols are essential when
transmitting over channels that are prone to errors. ARQ techniques provide
synchronization and timing recovery,coordinating the actions of two or more
separate machines with different state information. These capabilities are very
important for applications that involve voice, audio and video data. As a
result, ARQ forms the basis for peer-to-peer protocols that provide for the
reliable transfer of information.
In this project, we will implement Go-Back-N ARQ using VHDL. The Go-Back-N ARQ
is an example of a Sliding-Window protocol, where the transmitter maintains a
window of available packet sequence numbers and slides forward as packets are
sent. The transmitter sends a number of frames into the channel, optimistic that
the frames will be received correctly and not require transmission. If things
turn out as expected, an ACK for each frame will arrive in due time while the
transmitter is still busy sending frames into the channel. According to this
scheme, when the system is done with the first frame, the handling of the
subsequent frames is already well underway. In the case when a frame undergoes
transmission errors, the receiver ignores this frame and all subsequent frames.
Eventually, when the transmitter reaches the maximum allowable number of
outstanding frames, it is forced to "go back N" frames and begins retransmitting
all packets from the lost frame onwards.
Go-Back-N ARQ ensures more efficient channel utilization relative to the
Stop-and-Wait protocol wherein the transmitter always waits for an ACK before
sending a subsequent frame. Therefore, Go-Back-N ARQ provides flow control and
error-free transmission, achieving a better performance at the expense of
complexity.
References:
1. Leon-Garcia, Alberto and Indra Widjaja, "Communication networks: Fundamental
Concepts and Key Architectures", McGraw-Hill Companies, Inc., 2000.
2. Larry L. Peterson and Bruce S. Davie, "Computer Networks: A System Approach",
2nd ed, Morgan Kaufmann, 2000.
Title: Implementation of a Neural Network Based Path Planning Algorithm using
VHDL
Abstract: Path planning is one of the important problems in motion control of a
mobile robot. Given a starting position and a destination, the problem consists
of finding a continuous path for the robot from the starting position to the
goal position avoiding obstacles along the way. Software implementations are
common in the artificial intelligence field, however the growth of the problem
inreases exponentially, as the distance between the source and destination
increases. Therefore, the software becomes ineffective, given a large number of
nodes hence the proposal of hardware to perform computations faster. The
implementation in VHDL will consist of searching the graph for a sequence of
nodes that connect the start position to the goal position, avoiding any
obstacles. The graph will be represented by an 8X8 matrix consisting of '1's and
0's. A '1' indicating an obstacle and a '0' indicating free space. The input
will be a given starting coordinate and a destination coordinate. The output
will be a sequence of coordinates of the optimum path from the source to the
destination avoiding all obstacles.
References: Hardware implementation of a neural network based path planning
algorithm by using the VHDL, Chan, R.H.T.; Tam,
P.K.S.; Kwok, D.P.; Cheung, P.W.M. Industrial Electronics, Control, and
Instrumentation, 1993. Proceedings of the IECON '93., International Conference
on ,1993
Stuart Russell and Peter Norvig. Artifical Intelligence: A Modern Approach.
Prentice Hall, 1995.
Title: A VHDL Implementation of the International Data Encryption Algorithm
(IDEA)
Abstract: The International Data Encryption Algorithm (IDEA) is a
state-of-the-art data encryption algorithm that permits effective protection of
transmitted and stored data against unauthorized access by third parties. The
algorithm was developed by Xuejia Lai and James Massey in 1992 and is currently
used in many commercial software programs, including the popular e-mail
encryption and decryption program Pretty Good Privacy (PGP). IDEA is a
symmetrical block cipher algorithm that operates on a 64-bit block and uses a
128-bit key. This is twice the length of the key used in Data Encryption
Standard (DES), the first official U.S. government cipher intended for
commercial use, and still the most popular cryptosystem in use today. Therefore,
it is much more secure. In fact, while researchers have succeeded in breaking
DES encoded messages by brute-force, no such attempts on breaking the IDEA
algorithm have succeeded. In this project, we will implement IDEA in VHDL. Our
goal is to implement both encryption and decryption, since the algorithm itself
is structured so that the two processes are almost identical. We anticipate that
this project will allow us to demonstrate a wide range of skills in VHDL
programming, as the algorithm requires the use of operations from three
different algebraic groups and uses 8-encryption rounds followed by an output
transformation to produce the cipher text.
References:
1. Mitchell Shnier, Dictionary of PC Hardware and Data Communications Terms,
OReilly & Associates, November 1995.
2. A. G. Konheirm, "Cryptography: A Primer" John Wiley & Sons Inc., 1981.
3. C. H. Meyer and S. M. Matyas, "Cryptography: A new dimension in computer data
security" John Wiley & Sons Inc., 1982.
4. T. H. Cormen, C. E. Leiserson and R. L. Rivest, "Introduction to Algorithms"
McGraw-Hill Book Company, 1990.
5. Media Crypt, Highest Security With Top Performance IDEA International Data
Encryption Algorithm, Product Sheet/IDEA, October 2001.
http://www.mediacrypt.com/engl/Downloads/IDEAOctFinal.pdf
6. Media Crypt, IDEA Technical Description,
http://www.mediacrypt.com/engl/Content/technical_description.htm
7. John J. G. Savard, IDEA International Data Encryption Algorithm, 1998.
http://home.ecn.ab.ca/~jsavard/crypto/co0404.htm
Title: A hardware implementation of Fast Phong Shader
Today's Computer graphics applications require fast, efficient and realistic
image generation of graphic objects. These graphic objects are usually polygon
mesh approximations of curved surface objects. Currently there are two popular
methods for rendering (drawing) these polygons: Gourad Shading and Phong
Shading. Phong Shading renders more realistic images than Gourad Shading.
However, Phong Shading requires much more calculations than Gourad Shading and
is consequently slower. We will be implementing Fast Phong Shading, which is a
speeded up modification of Phong shading by approximating the calculations using
Taylor Series approximations. Hence, our implementation will produce quick and
realistic images. A polygon surface is rendered using Phong Shading by carrying
out the following steps: 1. Determine the average unit normal vector at each
polygon vertex. 2. Linearly interpolate the vertex normals over the surface of
the polygon. 3. Calculate projected pixel intensity for all the surface points.
References:
Donald Hearn & M. Pauline Baker, "Computer Graphics" 2nd edition.
Foley & VanDamn, " Computer Graphics" 2nd edition.
Title: LZW Compression
Abstract: Lempel-Ziv-Welch, or LZW compression is a lossless, 'dictionary based'
all purpose data compression technique that is most often used in PDF, TIFF and
GIF file compression. Dictionary based algorithms scan their input for sequences
of data that occur more than once. These sequences are then stored in a table
and references to the table are used to represent repetitive data within the
compressed file. Compression occurs when a single code is output instead of a
string of characters. LZW compression is fast and works best for files
containing lots of repetitive data. In this project we will design an FPGA
implementation of LZW compression and decompression. The circuit will be
described using VHDL and synthesized with the Altera MaxPlus II software.
References:
- Welch, Terry A. "A Technique for High Performance Data Compression". IEEE
Computer, Vol. 17, No. 6, 1984, pp. 8-19
http://www.cs.cf.ac.uk/Dave/Multimedia/node214.html
- Nelson, Mark. "LZW Data Compression". Dr. Dobb's Journal. October, 1989
http://dogma.net/markn/articles/lzw/lzw.htm
- Leurs, L. "LZW Compression".
http://users.belgacom.net/prepresspanic/techno/compressionlzw.htm
- Blackstock, Steve. "LZW and GIF explained".
http://www.la.unm.edu/~mbrettin/algorithms/lzexplained.html
Title: Digital Signal Processing - IIR/FIR Filter Bank
The tentative goal of the project is to implement a complete filter bank
consisting of an assortment of finite impulse response and infinite impulse
response systems. For each type of impulse response system, a low-pass,
band-pass and high-pass filter will be implemented using various windowing
techniques (ie: Bartlett, Blackman, Hamming, Kaiser...etc.). This will provide
different precision, complexity and speed implementations that will allow a user
to find a suitable filter for the application requirements. The filter bank may
require a user input for the windowing parameters, or may have these dimensions
preset for best performance and precision. Some applications may not require the
sharp cutoff spectrum that is offered by more complex filter designs, which may
involve longer processing latencies. In this case, a simpler implementation with
higher throughput may be desired. The filter bank will also have adjustable
parameters to allow for maximum customization abilities (ie: cutoff frequency,
bandwidth, band location...etc.). Since most DSP applications are continuous,
real-time processes, the filter bank will be optimized as much as possible for
maximum throughput. If time permits, several extra features can be implemented.
Extra Features: Narrowband scaling to allow for signal equalization
capabilities, antialiasing filtering, simple A/D & D/A converters, sampling rate
adjustment capabilities...etc.
References:
1. Alan V. Oppenheim, Alan S. Willsky, "Signals And Systems", 2nd ed, Prentice
Hall.
2. Alan V. Oppenheim, Ronald W. Schafer, "Discrete-Time Signal Processing", 2nd
Ed. Prentice Hall.
3. "TMS320C6000 Code Composer Studio Tutorial", Literature Number: SPRU301C,
copyright 2000, Texas Instruments Inc.
Title: Floating Point Manipulations
Abstract: Floating point computations have always been of interest in the field
of research in today's information technology world. Many great engineering and
scientific advances of recent decades would not have been possible without the
floating-point capabilities of digital computers. The complexity of handling
various cases of floating point has baffled even the most experienced
mathematicians. When representing floating point numbers the result might be
inexact but the question arises as to how much error is acceptable. Our interest
in floating point developed a lot over this semester because of its excessive
usage in this semester (Micro processor lab and Computer Architecture lab).
Since we have worked on it a lot already and realize all the issues and cases
that might arise due to the floating point manipulations we came out to the
conclusion that this would be an interesting as well as a challenging project.
As part of our project, we will be implementing the addition, subtraction,
multiplication and division of two IEEE standard 32 bit Floating point numbers.
Also, one of the major objective of this project will be to consider all the
special cases invovled in floating point manipulations which includes
representing zero, infinity, not-a-number (NaN), overflow, normalized numbers as
well as denormalized numbers. We would be dealing with cases where one of the
two number is denormalized and the other one is normalized and both numbers
being either normalized or denomarlized. The final deliverable will be a well
documented and organized working model of two 32-bits floating point
manipulations. The model will be implemented and tested in VHDL and we hope to
gain a better understanding of floating point numbers and the hidden mystery
behind their operation.
References:
- Patterson, David A., and Hennessy, John L., Computer Organization & Design -
The Hardware/Software Interface
- MicroProcessor Systems 304-426 (Author: Zilic Zeljko)
http://www.ece.mcgill.ca/~ee426/ (MicroProcessor Systems)
- IEEE-754 FLoating Point conversion (Author: Kevin J. Brewer)
http://babbage.cs.qc.edu/courses/cs341/IEEE-754.html
- Contemporary Logic Design, Benjamin/Cummings, Redwood City, 1994
- Single Precision Floating Point Unit (Author: Mani Sudha Yalamanachi) Email:
sudhadimpy@yahoo.com
http://www.ee.pdx.edu/~mperkows/CLASS_VHDL/VHDL_CLASS_2001/Floating/projreport.
html
Back to top
Winter 2003
Title: Simple and Symmetric: the RC4 Encryption Algorithm
Abstract: Since the advent of the Internet and of network transactions, [...]
protect sensitive information. The objective is to find a simple, fast,
reliable and easy to implement solution which is strong enough for security
needs. The RC4 encryption algorithm proves to be an ideal choice for those
wishing to ally strength, simplicity and efficiency. RC4 is a symmetrical
algorithm, which means the same method, or key, is used for encryption and
decryption; thus, the key does not need to be transmitted with the message.
Furthermore, RC4 is a strong algorithm: each key can only be used once,
increasing the level of protection without making the algorithm more complex,
while only a small set of keys are considered weak. Finally, RC4 is ten times
faster than the DES algorithm. These advantages make the RC4 algorithm
interesting to study and implement, and this is what we propose to do here.
In this project, we will develop an RC4 encryptor and decryptor on an Altera
FPGA chip, using VHDL. We intend to illustrate the inner workings of the RC4
mechanism, by streaming in unencrypted bits, encrypting them using a key and
showing the encrypted result at the output. This output will then be decrypted
using the same key using a distinct module. We will build scalable n bit models
which minimize latency; these modules can be combined to form larger modules,
which would then maximize throughput.
[1] M. Shaffer, "RC4 Encryption Using ASP & VBScript," January 2000,
[2] C. Grogans, J. Bethea , I. Hamdan , North Carolina Agricultural and
Technical State University, "RC4 Encryption Algorithm," March 2000,
[3] A. Roos "Weak Keys in RC4," Posted on newsgroup sci.crypt, 22 Sep 1995.
[4] R.J. Jenkins Jr., "ISAAC and RC4," 1996
Title: Hardware implementation of the Data Encryption Standard algorithm
Cryptography is the process of maintaining and communicating in secret writings
or ciphers, which is essential nowadays for organizations to maintain privacy
and data security. Several cryptography algorithms exist, with the more advanced
ones requiring the use of a "key" to control the encryption and decryption
processes. These algorithms can be grouped into Public (asymmetric), or Secret
(symmetric) key algorithms. Asymmetric algorithms make use of different keys for
encryption and decryption as opposed to symmetric algorithms which use the same
key for both processes.
The Data Encryption Standard (DES) symmetric algorithm is suitable for larger
messages because it is a fast block-encryption cipher; it encrypts entire data
blocks at a single time. The proposed project implementation will allow for
64-bit data block encryption using a 56-bit randomly generated key. This means
that the data blocks, which are operated on as bits, can be transformed into
2^64 possible combinations. A key used for a specific application can only
generate unique encryption of the data; hence unauthorized recipients cannot
recover the ciphertext without the key. This project design involves the
implementation of the DES algorithm using VHDL. The design will be tested and
simulated on the Altera MaxPLUS II software.
References:
[1] D. Stinson, ?Cryptography: Theory and Practice?. CRC Press, 1996.
[2] R. A. Mollin, ?An introduction to cryptography?. Chapman & Hall/CRC, 2001.
[3] D. E. Denning, ?Cryptography and data security?. Addison-Wesley, 1982.
[4] J. O. Grabbe, ?The DES Algorithm Illustrated?
Title: Design of the RSA public-key cryptosystem
Cryptography, simply defined, is the process of combining some input data,
called plaintext, with a user-specified password, called a key, to generate an
encrypted output, called ciphertext, in such a way that, given the ciphertext,
no one can recover the original plaintext without the encryption password in a
reasonable amount of time.
This project shall concentrate on the design a encryption/decryption system
based on the RSA algorithm (named after Ron Rivest, Adi Shamir and Len Adleman
who invented it in 1977) using VHDL. The goal is to be able to encrypt and
decrypt a message being sent between two communicating parties, using both a
public and private key. The basic algorithm is illustrated in the attached
file.
Delfs, Hans, "Introduction to cryptography: Principles and applications"
Springer, 2002.
Stinson, Douglas R, "Cryptography:Theory and practice" Chapman & Hall, 2002.
Graham, Ian G, "A Brief Introduction to Cryptography",
Title: Data Encryption Standard (DES) algorithm
The algorithm will be designed to encrypt and decrypt blocks of data consisting
of 64 bits under control of a 64-bit key. Decrypting will be accomplished by
using the same key as for encrypting, with the exception that it will be
accessed in the reverse order. A block to be encrypted will first be subjected
to an initial permutation (IP). Subsequently, it will iterate 16 times through
a complex key-dependent computation and finally it will be subjected to an
inverse permutation (IP-1). The IP and IP-1 units will alter the order of their
inputs. The key-dependent computation will be simply defined in terms of a
function f, called the cipher function, and a function KS, called the key
schedule. The input block will be split into two blocks L and R where L
corresponds to the left half of the input block and R to the right half. At
each of the 16 iterations, a new K, L and R will be computed according to the
following formulas: Kn = KS(n, KEY), Ln = Rn-1 and Rn = Ln1 (+) f(Rn-1, Kn)
where n represents the iteration and (+) represents a bit by bit addition modulo
2. * [1] B. Schneier, "Applied Cryptography", 2nd Edition, Wiley 1996.
[2] J. Gilmore, "Cracking DES: Secrets of Encryption Research, Wiretap Politics
& Chip Design", May 1998.
[3] R. K. Nichols, "ICSA Guide to Cryptography", McGraw-Hill, 1999.
[4] Federal Information Processing Standards Publications, "Announcing the
Standard for Data Encryption Standard",
http://www.itl.nist.gov/fipspubs/fip46-2.htm.
[5] Tropical Software, "DES Encryption",
http://www.tropsoft.com/strongenc/des.htm.
Title: Implementation of RC6 Block Cipher Cryptography System
In the ever-evolving world of data security, technological methods are
constantly being improved upon to better protect information. Encryption and
decryption ciphers are powerful tools in the field of data protection. While
many different algorithms exist, we have chosen to investigate and realize the
RC6 algorithm.
Designed by Ron Rivest in collaboration with associates from RSA laboratories,
the RC6 block cipher is an evolutionary improvement of RC5. It was designed to
meet the requirements of the Advanced Encryption Standard (AES). In order to be
AES compliant, a block cipher must utilize 128-bit input/output blocks. The RC6
cipher employs four 32-bit registers for operations that have the advantages of
doing two rotations per round and using more bits of data to determine rotation
amounts in each round. Also, it includes integer multiplication as an additional
primitive operation, a feature not found in RC5. Overall, RC6 maximizes the
diffusion achieved per round, allowing for greater security, fewer rounds and
increased throughput.
Therefore, we will implement the RC6 cryptosystem algorithm in VHDL intended for
use on the Altera Flex 10k series FPGA.
References:
1. RSA Laboratories | RC6 Block Cipher http://www.rsasecurity.com/rsalabs/rc6/
2. Ronald L. Rivest, M.J.B. Robshaw, R. Sidney, Y.L. Yin, "The RC6 --- Block
Cipher" ftp://ftp.rsasecurity.com/pub/rsalabs/rc6/rc6v11.pdf
3. "The RC6 Block Cipher: A simple fast secure AES proposal"
http://csrc.nist.gov/encryption/aes/round1/conf1/rc6-slides.pdf
Title: A Hardware Implementation of the International Data Encryption
Algorithm (IDEA)
Data encryption has been an increasingly important concept in the develompent of
secure communications. There is a vast array of different algorithms that exist
to encrypt and decrypt data. IDEA (International Data Encryption Algorithm) was
developed at ETH (the Swiss Federal Institute of Technology) and patented by the
Swiss firm Ascon.Compared to the DES (Data Encryption Standard) algorithm, IDEA
has been shown to be faster and more secure.
For IDEA, the same algorithm is used for encryption and decryption. It is a
block cipher algorithm that operates on 64-bit plaintext blocks, using a 128-bit
key. The 64-bit input block is divided into four 16-bit blocks, which become
input blocks to the first round of the algorithm. There are eight rounds in
total, and in each round the four blocks are XORed, added, and multiplied with
subkeys based on the original key. Between each round, the second and third
subblocks are swapped. The algorithm design will be coded in VHDL and
implemented on an Altera Flex10K FPGA. In order to implement the algorithm
within time and hardware constraints, the block size will be 16 bits rather than
64 bits.
References:
[1] Crypto Systems Incorporated (2002). [Online]. The IDEA Encryption Algorithm.
Available: http://www.finecrypt.net/idea.html. February 17, 2003 (accessed).
[2] Frame Technology (1994). [Online]. International Data Encryption Algorithm
(IDEA). Available: http://www.cs.nps.navy.mil/curricula/tracks/security.
February 17, 2003 (accessed).
[3] A. J. Menezes. Handbook of Applied Cryptography. Boca Raton: CRC Press,
1997.
[4] J. G. Savard (2000). [Online]. IDEA (International Data Encryption
Algorithm). Available: http://home.ecn.ab.ca/~jsavard/crypto/co040302.htm.
February 17, 2003 (accessed).
[5] B. Schneier. Applied cryptography :protocols, algorithms, and source code in
C. New York: Wiley,1996.
[6] TechTarget (2003). [Online]. International Data Encryption Algorithm.
Available: http://searchsecurity.techtarget.com. February 17, 2003 (accessed).
Title: RSA (Rivest, Shamir, and Adelman) public-key Cryptosystem.
RSA is a popular algorithm used for encrypting and decrypting data. The
algorithm was developed by three mathematicians Rivest, Shamir, and Adleman. The
difficulty in cracking RSA encryption is the reason why RSA is implemented in
many communication and security signatures today. There is an increased need for
secure transmission of data (online banking, intranet communication, digital
cellular phones) and a reliable means to encrypt data is required.
Developed in 1977, the system uses a private and public key. Every user has a
pair of keys: a public key, which is provided to all partners and a private key.
A message is encrypted with a public key and can only be decrypted with the
private key and vice-versa. the security of RSA is based on the difficulty tht
lies in factoring large numbers as opposed to multiplying two large numbers.
The same hardware can be used for encryption as well as decryption. For our
project, we will implement an RSA encryption/decryption system using VHDL.
1. T. Cormen, C. Leiserson, R, Rivest, Introduction to Algorithms, McGraw Hill
Publishers, 1991.
2. Hennessy,J.L., Patterson,D.A., Computer architecture: a quantitative
approach, Morgan Kaufman Publishers, Inc., 1990.
3. T. ElGamal, A public key cryptosystem and a signature scheme based on
discrete logarithms, IEEE Transactions on Information Theory, 31(4):469-472,
July 1985.
4. R. L. Rivest, A. Shamir, and L. Adelman, A method for obtaining digital
signatures and public-key cryptosystems, Commun. ACM, vol. 21, pp.120-126, 1978.
5.Cetin Kaya Koc, RSA Hardware Implementation, RSA Laboratories, Version 1.0,
1995.
Title: FPGA Implementation of the RSA Cryptography Algorithm
Cryptography is a strong security measure to protect information, maintain
privacy and confidentiality. As we move into an information society,
cryptography has become one of the main tools for access control, electronic
payments, corporate security on the internet and various types of networks.
Many cryptography algorithms are currently available but RSA algorithm has
become a standard for industrial-strength encryption, especially for data sent
over the internet. The RSA algorithm was named after Ron Rivest, Adi Shamir and
Len Adleman, who invented it in 1977. The basic ingredient in the RSA
cryptosystem algorithm is its computational complexity.
Our project presents the mathematical analysis of the RSA algorithm, where the
public key and private key are used to encrypt and decrypt messages. Also, the
analyzed RSA algorithm will be designed in VHDL and synthesized for
implementation on a MAX+PLUS II Altera?s FPGA chip.
[1] Denning, Dorothy E., "Cryptography and data security", Addison-Wesley
Publishing Company, Inc., 1982.
[2] Jennifer Seberry and Josed Pieprzyk, "Cryptography: An Introduction to
Computer Security.", Prentice-Hall, 1989.
[3] C. H. Meyer and S. M. Matyas, "Cryptography: A new dimension in computer
data security", John Wiley & Sons Inc., 1982.
[4] http://www.ssh.com/support/cryptography
[5] http://oregonstate.edu/dept/honors/makmur/
[6] http://www.di-mgt.com.au/rsa_alg.html
[7] http://www.rsasecurity.com/
Title: FPGA Implementation of the RSA Encryption Algorithm
With todays rapidly evolving communications market the need for secure
transmission of data is of critical importance. The RSA encryption algorithm
(Rivest, Adi Shamir, Adleman), invented in 1977, is used in most public-key
cryptography systems. Its security is based on the difficulty of factoring large
numbers. In this project, a VHDL implementation of the RSA cryptographic system
is presented based on the modular multiplication proposed by P. L. Montgomery.
RSA is an asymmetric cryptosystem that uses one key (public key) to encrypt a
message and a different key (private key) to decrypt it; therefore, both
encryption and decryption circuits will be implemented. In order to optimize the
speed of the circuits, the RSA encryption algorithm employed should be able to
carry out modular exponentiation efficiently. For this purpose, the Montgomery
multiplication will be used to perform the modular multiplication and squaring
during the exponentiation process.
[1] A. J. Menezes, P. van Oorschot, and S. Vanstone, Handbook of Applied
Cryptology.,CRC Press, 1996
[2] Bruce Schneir, Applied Cryptography. Second Edition, John Wiley & Sons,
1996.
[3] M. Shand and J. Vuillemin, Fast Implementations of RSA Cryptography., Proc.
11th IEEE Symposium on Computer Arithmetic, Canada, July 1993.
[4] Richard Holowczak, RSA Demo Applet.,
http://cisnet.baruch.cuny.edu/holowczak/classes/9444/rsademo/#overview
[5] RSA Laboratories, ?PKCS #1 v2.1: RSA Encryption Standard.?
http://www.rsasecurity.com/rsalabs/pkcs/pkcs-1/index.html
Title: Anti-Aliasing for Computer Generated Images
Anti-Aliasing is a very important part of generating high quality 3D images.
There are various approaches to perform this task. The most common approach is a
supersampling approach, where the image is generated at 2 or 4 times its
resolution, and average the intensity of each pixel that is actually drawn. The
main problem with this approach is that it is computationally very expensive.
The goal of our project is to implement an anti-aliasing which is much less
computationally expensive. Our apporach will be one which does not use
supersampling, at least not on the whole scene. By detecting where the edges
(jaggies) occur, we will calculate some weighting ratio to smooth out the
jaggies. This approach is simple, but much less expensive than the supersampling
approach mentionned above, thus making it an ideal candidate for real-time
graphics processing.
References:
1. Tsuyoshi Isshiki & Hiroaki Kunieda, "Efficient Anti-Aliasing Algorithm for
Computer Generated Images"
2. Yuan-Hau Yeh & Chen-Yi Lee, "A New Anti-Aliasing Algorithm for Computer
Graphics Images"
3. Foley, van Dam, Feiner, Hughes, "Computer Graphics: Principles and Practice"
2nd edition in C, Addison-Wesley, July 1997
Title: Hardware Implementation of the 2D Graphics Pipeline for Polygon
Rendering
The performance of most graphics system is largely dependant on the speed at
which it can render primitives, such as circles and polygons.To be completely
rendered, primitives have to go through the 2D graphics pipeline, which consists
of three conceptual stages: the application program, which feeds commands to the
CPU, the geometry subsystem, and the rasterization subsystem. Although all
three parts may be implemented in software, the amount of calculation involved
in rendering a primitive is very significant, and since the number of primitives
in one image is typically very large, it is usually beneficial to accelerate
parts of the pipeline in hardware. Thus, this project will focus on the hardware
implementation of the geometry and rasterization subsystems of the graphics
pipeline for polygon rendering. More specifically, we will consider triangle
primitives because they are the most often used polygons in graphics. The
geometry stage will perform per-vertex operations such as geometric
transformations (rotations, translations) and clipping. The latter will be done
using the Sutherland-Hodgeman clipping algorithm, which is easily implementable
in hardware. The rasterization component, on the other hand, will work on
individual pixels and its main function will be scan conversion. Since the
polygons considered here are formed of three lines, they will be rasterized
with the Bresenham's line algorithm. Finally, the main goal of this project will
be to optimize throughput, i.e. to render as many polygons per second as
possible. This objective will be achieved by further pipelining each component
of the graphics system, which is itself already pipelined.
[1] D. Hearn and M.P. Baker, "Computer Graphics, C Version", New Jersey:
Prentice Hall, 1997.
[2] J.D. Foley, A. van Dam, S.K. Feiner, J.F. Hughes, "Computer Graphics:
Principles and Practice in C", second edition, Addison-Wesley, 1996.
[3] E. Angel and D. Morrison, "Speeding Up Bresenham's Algorithm", IEEE Computer
Graphics and Applications, vol. 11, no. 6, pp. 16-17,
[4] M. Olano and T. Greer, "Triangle Scan Conversion using 2D Homogeneous
Coordinates", presented at the 1997 SIGGRAPH / Eurographics Workshop on Graphics
Hardware,
[5] F. Pfenning, "Clipping and Scan Conversion", class notes for 15-462 Computer
Graphics I, School of Computer Science, Carnegie Mellon University in Pittsburgh
PA, March 2002
Title: Single-Color Texture-Mapping Unit with MIP-mapping and Bilinear
Interpolation.
Texture mapping is a fundamental technique for rendering high quality images
inexpensively into a frame buffer by mapping an image, or a texture, to a
polygon. Because different dimensions of textures and polygons may cause
"sparkling", "moir" or "blocky" artifacts to appear, a texture mapping unit uses
a "Texture Magnification Filter" when the texture is considerably smaller than
the polygon and a "Texture Minification Filter" when a texture is considerably
larger than the polygon.
This paper describes a texture mapping unit that uses bilinear interpolation for
texture magnification and the "Nearest MipMap Nearest" MIP-mapping algorithm for
texture minification. This 2-D unit supports a frame buffer resolution of
256x256 pixels, a texture resolution up to 256x256 texels, and a single 8-bit
color channel. It can be cascaded with other chips to generate a 3-color system,
and can be arranged in parallel with other chips to support a larger frame
buffer resolution.
This design attempts to maximize throughput at the expense of resources; a given
texture is immediately filtered and its corresponding MIP-maps are generated in
order to optimize its mapping time into polygons.
References:
[1] D. Hearn and M. P. Baker, "Computer Graphics, C Version", 2nd edition,
Prentice-Hall, 1994.
[2] P. Heckbert, "Fundamentals of Texture Mapping and Image Warping", 1989,
http://www-users.itlabs.umn.edu/classes/Fall-2001/csci5980/notes/
Texture_Mapping1.ppt#1.
[3] M. Woo, J. Neider, and T. Davis, "OpenGL Programming Guide", 2nd edition,
Addison-Wesley, 1998.
[4] P. Haeberli and M. Segal, Silicon Graphics Inc. (SGI), "Texture Mapping as a
Fundamental Drawing Primitive", 1993, http://www.sgi.com/grafica/texmap/.
Title: 2D Antialiased Line-Drawing Unit
A fundamental operation on any raster graphics system is line drawing. Due to
the inherent limitations in the resolution of the display device, lines tend to
have a jagged or stair-step appearance. This distortion of information due to
low-frequency sampling is known as aliasing. This attracts unwanted attention in
both static and dynamic graphics. In the case of a moving image, aliased lines
flicker and have crawling edges. Anti-aliasing seeks to compensate for this
undersampling by varying the intensity of the border pixels to reduce the jagged
effect.
We are implementing a line-drawing unit that uses the Bresenham algorithm to
draw the line, and Wu?s algorithm for anti-aliasing. The inputs to the unit will
be the endpoints of the line, and the output will be the coordinates of the
pixels as well as their intensity. The unit will be described in VHDL and
synthesized on an FPGA.
References:
1. Hearn, Donald - Computer graphics, C version (2nd Ed.), Upper Saddle River,
N.J. : Prentice Hall, 1997.
2. Thaddeus, Jude - "Wu's Anti-aliasing Line Drawing"
http://www.sanctuary.org/~azimuth/coding/aaline.html
3. Elias, Hugo - "Wu-Anti-aliased Lines"
http://freespace.virgin.net/hugo.elias/graphics/x_wuline.htm
Title: A VHDL implementation of the MultiScale Retinex with Color Restoration Algorithm
Subtle differences exist between the direct human observation of scenes and that
of recorded color images. One of the big differences is the fact that the human
visual system is able to distinguish details and vivid colors in shadows and in
scenes that contain illuminant shifts. A number of algorithms have been proposed
to bridge this gap. The Multiscale Retinex with Color Restoration (MSRCR) has
shown itself to be one of the best algorithms in this field. This project will
attempt to implement the MSRCR in hardware. This will allow photographers to get
automatic image enhancement directly from their digital cameras. The algorithm
can be summarized by * where Ii(x,y) denotes the image distribution in the i'th
color spectral band, * the convolution operation, Fn(x,y) the surround function,
N the number of scales, G is the weight associated with each scale, and Ci(x,y)
is given by * where alpha is the color restoration constant and S=3 for images
in the RGB color scheme. There are mainly two goals in this project; the first
one is to achieve good approximation of this algorithm in hardware, whereas the
second is to examine the performance of our hardware approach and compare it to
the software based approaches. Simulations of an FPGA programmed using VHDL will
be presented and compared to its matlab counterpart.
1 Jobson, D.J.; Rahman, Z.; Woodell, G.A.; Properties and performance of a
center/surround retinex Image Processing,
2 Jobson, D.J.; Rahman, Z.; Woodell, G.A.; A multiscale retinex for bridging the
gap between color images and the human observation of scenes Image Processing
3 Rahman, Z.; Jobson, D.J.; Woodell, G.A.; Multi-scale retinex for color image
enhancement Image Processing, 1996.
4 Marina Balabanov and Yaron Zalika ; Image enhancement method based on
Multiscale Retinex
5 A. K. Jain, Fundamentals of Digital Image Processing.
Title: Maximal Throughput Implementation of the Sobel Edge Detection
Algorithm
Image processing and recognition techniques have garnered a remarkable amount of
interest over the past few years. Due to the complexity in such analysis,
feature detection and extraction is the first step in most image segmentation
techniques. One such method of accomplishing this extraction is thru edge
detection. The primary goal of this process is to develop sets of closed pixel
contours surrounding regions of interest in the image. The Sobel edge algorithm
uses two convolution kernels to detect intensity differences along the x and y
axis. The result of the operation can be represented as a 2D spatial gradient.
This allows for the emphasis of high spatial frequency areas corresponding the
edges of an image. As the processing of a video stream has substantial bandwidth
requirements, the hardware implementation of the Sobel algorithm is quite useful
in embedded and intelligent camera systems. This project will strive for a
maximal throughput implementation, to allow for maximum frame rates and
resolutions. Thus, the processing can occur in real-time without dropping any
frames.
1. R.C. Gonzalez and R.E. Woods, "Digital Image Processing". 2nd Edition,
Addison-Wesley, 1992.
2. Russ, John C. The Image Processing Handbook. CRC Press, Boca Raton, Florida,
1995.
3. James Matthews, "An Introduction to Edge Detection: The Sobel Edge Detector",
2002
4. R. Fisher, S. Perkins, A. Walker and E. Wolfar, "Sobel Edge Detector" 2000
Title: 2D Discrete Cosine Transform (DCT) for JPEG Image Compression
The Discrete Cosine Transform (DCT) is currently the most widely used transform
in image compression. Although there are other steps required in image
compression, the majority of the processing required to encode or decode the
images is done by the calculations performed in taking the DCT. The Discrete
Cosine Transform is similar to the Discrete Fourier Transform however, it is
much simpler because it avoids the use of complex numbers. The work it performs
is the main bulk of the work done in image compression. In order to implement
the DCT, the mathematical equation must be coded in VHDL. The 2D equation is
overly convoluted for VHDL coding, thus as a simplification would be preferred.
First it can be observed that the DCT is separable into 2 one-dimensional DCTs;
one along the rows and one along the columns. Thus, in order to switch between
rows and columns a transposing buffer will be required. This buffer consists of
an 8x8 memory array for transposing matrices of partial products (see figure 1
in PDF aattachment).
D.S. Taubman and M.W. Marcellin, JPEG2000 : image compression fundamentals,
standards, and practice. Boston : Kluwer Academic Publishers, 2002.
P. Symes, Video compression demystified. New York : McGraw-Hill, 2001.
Pipelined Fast 2-D DCT Architecture for JPEG Image Compression.(September
2001)[Online]. 2003
The Cameron Project: High-Level Programming of Image Processing Applications on
Reconfigurable Computing Machines1. (October 1998)
R. C. Gonzalez, R. E. Woods,Digital image processing. Upper Saddle River, N.J. :
Prentice Hall, c2002.
Title: Implementation and design of an EREC algorithm
Most compression algorithms used today comprise some form of variable length
coding (VLC).The basic idea behind VLC is to represent frequent data symbols by
short bursts of binary digits,while data symbols that are less often encountered
are represented by longer bursts of bits, resulting in a smaller average number
of bits needed to represent all the symbols from the source.
One of the main drawbacks of VLCs is that they are highly sensitive to error
propagation.In order to be able to reconstruct a sequence of symbols from a
sequence of bits, the receiver requires all bits to be transmitted correctly.
The aim of this project is to implement and design an algorithm for adapting VLC
schemes to give increased resilience to random and burst errors while
maintaining suitable compression performance.
Error resiliency will be achieved through the application of an EREC scheme that
consists in multiplexing variable length codes into fixed length data blocks.
The latter blocks are composed of slots, wherein each slot corresponds to the
beginning of a VLC, thereby allowing the decoder to automatically synchronize at
the beginning of each slot. The combination of error resiliency and data
compression makes the scheme appropriate for data streams requiring high
transfer rates.
The scheme will be tested under different noise conditions, to assess the
trade-off between its compression performance and error-resilience
characteristics, in the context of an image compression application.
Redmill, D.W.; Kingsbury, N.G.: The EREC: an error-resilient technique for
coding variable-length blocks of data. IEEE Transactions on Image Processing,
Volume: 5 Issue: 4, April 1996, pp. 565-574
Takishima, Y.; Wada, M.; Murakami, H : Variable length codes. IEEE Transactions
on Communications, Volume: 43 Issue: 2 Part: 3 , Feb.-March-April 1995 pp.
158-162
Title: Character Recognition through Neural Networks
This project seeks to apply neural networks in recognizing handwritten
characters in the English language. A multi-layer Neural Network equipped with a
back-propagation learning algorithm is used to enable fast learning. This allows
for specific types of handwriting to be recognized faster over time (learning
through the training of the neurons), thus allowing the program to be
user-specific.
The input layer of the MLP (Multi-Layer Perceptron) is a set of image pixels. An
X by Y pixel image is mapped onto X by Y neurons. During the training period,
the neurons are recursively fed characters to build their fault-tolerance
(variance between two different As). The training is done with no
pre-programming or bias.
Using windowing (breaking-up the characters into four/six/eight equal pieces)
expedites the recognition process. For example, if a set of pixels compromising
the top left hand corner of the character matches the data in the trained
neurons, the code need not do similar computations for the rest of the
character.
The aim of using Neural Networks coupled with VHDL is to automate and accelerate
the process of character recognition.
References:
1. E. W. Brown, "Character Recognition by Feature Point Extraction",
unpublished paper written at Northeastern University, 1992
2. D. S. Levine; Introduction to Neural & Cognitive Modeling, 1991, Lawrence
Erlbaum Associates, Publishers; Hillsdale, NJ
3. Patrick K. Simpson; Artificial Neural Systems, 1990, Pergamon Press, New
York, NY
4. John Wester Introduction to Neural Networks
http://www.nd.com/neurosolutions/products/ns/whatisNN.html
Title: Parallel String Matching with Broadcasting Scheme
String matching is at the core of many modern computationally intensive
activities such as DNA identification, internet search and database operation.
Generally, string matching is performed at the software level but critical or
large volume applications require a dedicated special purpose solution. On a
single-CPU architecture the processor must load and then compare the pattern
string piece by piece with the reference, and the task is even more complicated
when some mismatches are tolerated. While the software approach is limited by
its serial nature, the main benefit of an hardware implementation resides in its
parallelism. In fact many comparisons can be performed simultaneously on one or
many reference strings, speeding up the process by as much as the parallelism
degree (number of streams used).
In this paper, we present the VHDL design of a parallel string matching unit
synthesized for the Altera family of FPGAs and addressing the exact matching and
k-mismatches problems. Our approach is based on the method developed by Park and
George [1], in which a broadcasting scheme with multiple streams is presented.
This parallel scheme has two major advantages: (1) the number of multiple
streams can be chosen to gain the maximum benefit of the parallelism and (2) the
scheme can work on unknown sized reference strings. In addition, the
broadcasting scheme solves the exact matching and the k-mismatches problems with
a time complexity of O(n/d), where n is the length of the reference string and d
represents the number of different input streams used. This algorithm achieves
high performance by exploiting explicit parallelism. We then develop a
parameterized VHDL entity in which the degree of parallelism is controlled
explicitly by a variable number of input (and output) streams.
1. Jin H. Park and K. M. George, "Parallel String Matching Algorithms Based on
Dataflow," Proceedings of the 32nd Hawaii International Conference on System
Sciences, pp. 1-10, 1999.
2. Hsi-C. Lee and Fikret Ercal, "RMESH Algorithms for String Matching,"
Proceedings of the third International Symposium on Parallel Architectures,
Algorithms, and Networks, pp. 223-226, 1997.
3. Dan Gusfield, "Algorithms on strings, trees, and sequences : computer science
and computational biology", Cambridge, Cambridge University Press, 534 p., 1997.
Title: Implementation of a modified version of MIPS
A RISC (Reduced Instruction Set Computer) is a processor whose design is based
on the rapid execution of a sequence of simple instructions rather than a large
variety of complex ones. The central idea is that by speeding up the most
common, simple instructions, one could afford to pay a penalty in the unusual
case of a complex instruction and make a large net gain in performance.
There are several computer architectures based on this concept. Our project is
going to be based on the MIPS (Millions of Instructions Per Second)
architecture. Our goal is to implement a subset of MIPS using VHDL that deals
with integer operations only (as opposed to floating point ones). This compact
version will contain the most basic instructions of MIPS such as: load, store,
add, subtract, branch, and set less than. Figure 1 shows the implementation of
such an instruction set. In this pipelined architecture, each instruction will
take about 4 or 5 clock cycles to compute.
References:
[1] J. L. Hennessy & D. A. Patterson, ?Appendix A ? Pipelining: Basic and
Intermediate Concepts,? in Computer Architecture: A Quantitative Approach, 3rd
Edition. San Francisco: Morgan Kaufmann, 2003.
[2] G. Kane & J. Heinrich, MIPS RISC Architecture, 2nd Edition. Upper Saddle
River, NJ: Prentice Hall, 1992.
[3] D. Sweetman, See MIPS Run. San Francisco: Morgan Kaufmann, 1999.
[4] A. Hui & J. Zappulla, School of Computer Science, Monash University,
Clayton, Australia. ?MIPS Architecture Animation,? February, 2003,
http://www.csse.monash.edu.au/packages/mips/.
Title: VHDL Implementation of Floating Point Division
Floating point number representations allow us to use real numbers on a
computer. Floating point numbers consist of a sign, exponent, mantissa, and
base. Many applications use floating point division, hence there is a need to
develop a floating point divider.The circuit is modeled using the Very High
Speed Integrated Circuits Hardware Description Language(VHDL).
Floating Point Numbers IEEE Single Precision Exponent-8 bits, Mantissa-23 bits,
sign-1 bit = Total 32 bits Most computers support single (32-bit) precision
formats. Single precision format can express numbers from (-3.4 E 38 to 3.4 E
38).
The basic algorithm to divide two floating point numbers we are following is: -
Divide divisor mantissa from the dividend mantissa - Subtract the exponent of
the divisor from the dividend - Compute the sign of the result based on the sign
of the two operands - Normalize the result The divide by zero case will be
handled separately with IEEE standard to represent a zero will be followed.
other exceptions like NaN will be handled as specified in IEEE standard.
References: 1. David Goldberg, "Computer Arithmetic" (appendix H), Xerox Palo
Alto Research Center, 2003
Title: Verification, Synthesis and Implementation of the CORDIC VHDL Code
for a FPGA Device
Microprocessors have traditionally used a software approach for implementing
algorithms used in digital signal processing (DSP). This approach was primarily
used because of the difficulty in mapping these algorithms into hardware. In
addition, the software solution provided a cost effective and flexible method
but suffered in terms of speed. With the arrival of reconfigurable logic
computers, a high-speed hardware solution is possible for implementing these
algorithms. The Coordinate Rotation Digital Compute (CORDIC) is a hardware
efficient algorithm that can be mapped into FPGAs. It hence provides a more
attractive alternative to the traditional software approach. The CORDIC
Algorithm has found many applications such as high-end calculators, radars,
robotics and DSP transforms.
In this project, our main objective is to realize a hardware implementation of
the most popular trigonometric and transcendental functions which CORDIC
algorithm is known for viz. Sine, Cosine, Arctangent and Square Root. The CORDIC
algorithm uses a simple and efficient approach based on shiftaddition techniques
together with vector rotations. These techniques are implemented in VHDL and
mapped into FPGAs. The performance of the design is then compared to a software
alternative.
References:
1) J. Abruzzo, Applicability of CORDIC Algorithm to Arithmetic Processing, IEEE,
1985.
2) R. Andraka, A survey of CORDIC algorithms for FPGA based computers, ACM,
Monterey, 1998.
3) Behroos, Parhami, Computer arithmetic: algorithms and hardware design,Oxford
University Press, 2000.
4) P. Jarvis, Implementing CORDIC Algorithms, Dr. Dobbs Journal, Oct 1990.
5) J. Volder, The CORDIC computing technique, IRE Trans. Computers, 1959.
6) http://www.andraka.com/files/crdcsrvy.pdf
7) http://www.ee.byu.edu/ee/class/ee621/Lectures/L22.PDF
Title: Emulation of quantum circuit elements and Grovers Search Algorithm
using FPGA technology
This project involves construction of some key elements of quantum circuits
namely the Hadamard gates, controlled NOT-gates (CNOT gate) and the Conditional
Phase-Shift gates. Since the behavior of these quantum gates does not conform
fully to conventional circuit behavior, we would emulate the quantum behavior
using traditional subsystems synthesized on a FPGA. In order to measure the
usefulness of our emulated quantum gates we would then emulate the Grovers
Search algorithm using our gates.
The field of quantum computation and cryptography is at its elemental stage
where scientists and engineers are researching hard to understand complicated
quantum procedures and useful implementations of exciting quantum phenomena. The
difficulty currently faced in simulating such experiments in software is that
software simulations can not successfully simulate the level of parallelism
involved in quantum computations. One solution to this problem is to emulate
these quantum systems using non-Von Neumann based hardware technologies (such as
FPGAs) which allow us to perform these tasks in hardware where parallelism of
quantum systems can be more effectively emulated.
References:
1. V. V. Shende, A. K. Prasad, I. L. Markov and J. P. Hayes, `Reversible Logic
Circuit Synthesis', in Proc. ACM/IEEE Intl. Conf. Comp.-Aided Design, pp.
353-360, November 2002.
2. G. F. Viamontes, M. Rajagopalan, I. L. Markov and J. P. Hayes, `Gate-level
Simulation of Quantum Circuits', Proc. Asia and South-Pacific Design Automation
Conf., pp. 295-301, January 2003.
3. Michael Nielson, Quantum Computation and Information
Title: Built-In System-Test for an ALU
The increased functionality of todays microelectronic circuits, along with
faster, denser, and smaller packaging is making it increasingly difficult to
access, test, and verify chip and system functionality using traditional
external equipment and instruments. Often, the circuits being tested are more
complex than the testing units themselves, and so a complementary testing
paradigm is needed. Built-In Self-Test (BIST) is one such paradigm. BIST
attempts to move as much of the tester functionality as possible onto the
silicon. Embedding the tester equipment functionality into the semiconductor
product reduces the burden on and complexity of external test equipment, and
increases speed since on-chip access is faster.
Our BIST project involves the design and implementation of a Built-In Self Test
for a Arithmetic and Logic Unit. The overall design consists of three major
parts: ALU, BIST (key Components: linear feedback shift registers / multiple
input signature registers ) and a register-based micro-controller. The BIST
system will test the correct working of the ALU unit in an efficient and time
conserving manner. We will be creating our own ALU in order to ease the
simulation of faults. The output signatures of standard non-erroneous ALU will
be compared to the ALU under test in order to gauge the correctness of its
outputs.
REFERENCES:
Avra, L.J.; McCluskey, E.J. Synthesizing for scan dependence in built-in
self-testable designs. Visited Feb 20, 2003.
Kiefer, G.; Wunderlich, H.-J. Using BIST control for pattern generation. Visited
Feb 20, 2003.
Mohamed, Abdil Rashid. Built-In Self-Test (BIST)
http://www.ida.liu.se/~zebpe/teaching/test/lec12.pdf. ; Visited Feb 20, 2003.
Title : General Purpose Processor
There is a broad range of engineering concepts that could be implemented on
almost all electronic gadgets, home appliances and cars, which in most cases are
avoided due to the cost and complexity of the hardware. Thus were planning to
develop a simple processor that is fully synthesizable with the simplest
instruction set possible, and the least cost to develop it.
This processor will have a control unit, an arithmetic logic unit (ALU),
register unit, and a memory. The instruction set will be composed of load,
store, add, subtract, shift, AND, OR, XOR, jump and halt.
Because the objective of this processor is not to be implemented for real time
systems we wouldnt worry about its frequency but the challenge is to make it
reliable and as small as possible. Also the advantage is that it consumes so
little silicon; there will be plenty of room to add other peripherals.
[1] Patterson, David A., and Hennessy, John L., Computer Organization &
Design - The Hardware/Software Interface, Second Edition, Morgan Kaufmann, San
Francisco, 1998, ISBN 1-55860-428-6.
[2] Structured Computer organization, Andrew S. Tanenbaum Third edition 1990,
Prentice-Hall, Inc. ISBN 0-13-852875-1
[3] http://ciips.ee.uwa.edu.au/~morris/CA406/CA_ToC.html
[4] http://www.vautomation.com/Press/cd9707.html
Title: A VHDL implementation of the single cycle MIPS data path
RISC processors are simple machines that can execute simple instructions at a
high frequency. One of these processors is called MIPS. This project will
implement a VHDL model of the single cycle MIPS datapath. This project can serve
as a learning tool for beginner computer and electrical engineering students.
The processor supports all basic operations including arithmetic, logical and
branching instructions, which form the basis for all modern assembly code. The
aim of our project is to create a simple VHDL implementation of the single cycle
MIPS processor datapath that can be independently used to simulate all MIPS core
(non pseudo) instructions. We intend to implement the datapath by writing our
own components in VHDL and using Altera LPMs to simulate data and instruction
memory. Testing will be done by writing our own machine code which will
subsequently be loaded into the instruction memory before the processor is
started. We expect to have the results in the data memory. Our design will focus
on simplicity and realiability to best suit our target audience.
1. John L. Hennessy, David A. Patterson, "Computer Organization and Design The
Hardware/Software Interface" Morgan Kaufmann Publishers, 1998.
2. John. L. Hennessy, David A. Patterson, "Computer Architecture, A
Quantitative Approach (3rd Edition)" Morgan Kaufmann, 2002.
3.http://www-2.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15740-f97/public/doc/
mips-isa.pdf Title: MIPS IV Instruction Set, 1995 Author: Charles Price
Title: A 4-way Set-Associative Cache Management Unit
Cache is the name generally given to the first level of the memory hierarchy
encountered once the address of a block of data leaves a CPU. It plays a very
important role in computer performance, and deserves more attention and
appreciation from computer users. In order to fully understand the functionality
of a cache, one must examine the following four questions: 1. Where can a
block be placed in a cache? 2. How is a block found if it is in the cache?
3. Which block should be replaced on a cache miss? 4. What happens on a
write?
In our project, we will be implementing a 4-way set-associative cache management
unit using VHDL. The four questions that were previously mentioned will be our
design method ingredients. Our design would take, as inputs, block addresses
(from the CPU) and give out, as outputs, control signals to the cache and to the
lower level of memory hierarchy. This lower level of memory hierarchy is the
random access memory, where the cache receives its blocks of data. In general
these control signals will demonstrate how the CPU, cache, and RAM interface
with each other.
Design methods for performance optimization will also be one of our main
objectives. For instance, we will be using the write-back scheme with
no-write-allocate as our design method of writing since we believe this will
lead to better performance.
1. Patterson, Hennessy, Computer Organization And Design, The Hardware/Software
Interface, Morgan Kaufmann.
2. Patterson, Hennessy, Computer Architecture: A Quantitative Approach, Morgan
Kaufmann.
3. Hayes, Computer Architecture And Organization, McGraw Hill
4. Denis Howe, "The Free Online Dictionary of Computing",
http://foldoc.doc.ic.ac.uk/.
Title: Electric Snowmobile DC Brushless Motor PWM Controller
The McGill Electric Snowmobile Team requires an electrically efficient
controller for its DC brushless motor. The current controller has many features
that are not necessary and lacks features that would be beneficial. Under the
recommendation of Professor Ooi, we have chosen to design a motor controller
that can adjust voltage levels depending on the throttle with multiple
supporting features which will be tuned for direct use with the electric
snowmobile. The design has to be energy efficient since the electric snowmobile
needs to run in harsh conditions. Our current battery packs are regular
lead-acid ones which do not perform well in the cold. We therefore need to keep
the design small for minimal power consumption. Furthermore, since the team is
fairly new, money is limited so smaller FPGA?s would reduce the cost.
A detailed description of PWM is described in the attached PDF file (page 4-6).
Since the snowmobile has a DC brushless motor, it requires a 3-phase AC signal
to run. This AC signal needs to vary in amplitude to increase or decrease the
torque output of the motor. However, the batteries used supply a ?constant? DC
voltage. This is where the controller comes in. The controller adjusts the
constant DC voltage to a varying DC voltage depending on the throttle by
switching two transistors (complements of each other) on and off for specific
periods of time (PWM). We will also implement the following convenient features
for the snowmobile specifically: -Synchronous switching -Regenerative -Reverse
mode -Reverse and brake light control Cruise control with speed resume -Extreme
over-under voltage protection -Dead-time tuning to support different transistor
technology as well as 7 and 8-bit duty-cycle resolution
1. Adel S. Sedra, Kenneth C. Smith, "Microelectronic Circuits", Fourth Edition,
Oxford University Press, 1998.
2. Bus Compatible Digital PWM Controller, IXDP 610 http://www.ixys.com/91600.pdf
3. New Generation Motors. " NGM-EVC-200 series controller Operating Manual",
Version 1.10D.
Title: A Missile Interception Model
Missile interception is a defensive tactic which consists of launching a
defensive blocking missile (the pursuer) in response to an ennemy balistic
missile attack (the target). The pursuer's goal is to disable the target missile
through a frontal collision in orbit. If the target manages to penetrate the
atmosphere, there is nothing that can be done to stop it anymore. The tricky
part in intercepting a missile is that the target is pre-programmed to change
it's trajectory during it's flight. So the real pursuer is equipped with sensors
that helps detect any movement.
Our project will focus on a comparison between the hardware based approach
(VHDL) and the software based approach (using C and/or Matlab). Each simulation
will consist of a target launch followed by a pursuer launch. The progression of
both entities will br traced and recorded. The final outcome will detect if the
pursuer was able to make contact and intercept the target, or if the pursuer
missed the target.
REFERENCES:
1- Zarchan, P. (1997) "Tactical and Strategic Missile Guidance" American
Institute of Aeronautics and Astronautics
2- Ben-Asher, J. and Yaesh, I. (1998) "Advances in Missile Guidance Theory"
American Institute of Aeronautics and Astronautics
3- Welch, G. and Bishop, G. (2002) "An introduction to the Kalman Filter"
Department of Computer Science, University of North Carolina
Title Testing and Implementation of Convolutional Encoder and Viterbi
Decoder in VHDL.
With the advent of wireless communication, an increasing amount of data is being
channeled through transmission lines. The occurrence of unwanted noise and
interference is therefore a serious issue that must be dealt with since errors
are being introduced into the transmission. One way to solve this problem is
using an error correcting code scheme such as the convolutional encoder and
Viterbi decoder.
The Viterbi decoder logically explores in parallel every possible data sequence.
It compares each data component with different possible paths and picks up the
closest match. For the implementation of the encoder/decoder in VHDL, the
specification will be a 2 bit input, 1 bit output and memory order of 3.
The secondary goal is to create a test bench to measure the ability of the
convolutional encoder and Viterbi decoder to fix errors in the signal. It is
necessary to test extensively this coding scheme so that its performance can be
compared to simulations done in C++ and MATLAB. Finally, the noise and
interference will be simulated on the FPGA chip.
[1] Wang, Y et al. DSP Applications for Real Time Equalization, 2002. Note:
Paper written for ECSE-494C.
[2] Garr, David A. Iterative Decoding for the GSM System, 1998.
[3] Huang, Fu-Hua. Chapter 3: Turbo Code Encoder,
http://scholar.lib.vt.edu/theses/available/etd-71897-15815/unrestricted/chap3.
pdf
[4] Sklar, Bernard. Digital Communications Fundamentals and Applications Second
Edition. New Jersey: Prentice HALL, 2001.
Title: VHDL Implementation of Selective Repeat Error Recovery
In today's rapid advancement and intensive usage of communications, the accuracy
and efficiency of data transmission across networking channels is very
significant. In order to achieve reliable data transfer between two end systems,
an Automatic Repeat Request (ARQ) protocol must be used. Stop-And-Wait,
Go-Back-N, and Selective Repeat are the three ARQ protocols in use today, the
latter being the most efficient and complex one. These schemes make use of flow
control, sequence numbers, acknowledgements, and timers to ensure that the data
is delivered from a sending process to a receiving process correctly and in
order.
Even though the Selective Repeat procedure is one of the most complex ARQ
protocol, its 'renowned high channel efficiency' creates our motivation to
implement it in VHDL. In this project, a simplified version of the Selective
Repeat error recovery will be developed. Nevertheless, the simplifications will
not alter the error recovery method used by this ARQ protocol, a method
consisting in retransmitting the lost packets only. The SR error recovery
operation will be observed through the simulation of a sender-receiver data
transaction, and its performance will be evaluated by its protocol throughput
and its rate of data delivery.
References: [1] J. F. Kurose, K. W. Ross, Computer Networking: A Top-Down
Approach Featuring the Internet, Addison-Wesley, 2nd edition, 2003
[2] Srinidhi Varadarajan, Reliable Transmission: A State Machine Reliable
Transmission http://courses.cs.vt.edu/~cs5516/spring02/reliable_tx_1.pdf
[3] Don Towsley, Chapter 3: Transport Layer
http://www.cs.umd.edu/~shankar/417-F01/Slides/chapter3a/sld033.htm
[4] http://www.erg.abdn.ac.uk/users/gorry/course/arq-pages/sr.html
Back to top